Queueing theory

Queueing theory is a branch of applied mathematics that studies the behavior of waiting lines — queues — and the systems that form them. It provides formal tools for analyzing throughput, latency, utilization, and stability under varying load conditions, and it is the theoretical foundation for understanding message queue systems like Amazon SQS, Apache Kafka, and RabbitMQ. Despite its mathematical elegance, queueing theory is often misapplied in software engineering because its simplifying assumptions — Poisson arrivals, exponential service times, steady-state behavior — rarely hold in real distributed systems.

The field was developed in the early 20th century by Agner Krarup Erlang, a Danish mathematician who studied telephone networks for the Copenhagen Telephone Company. Erlang's work established the core insight that remains relevant today: the relationship between arrival rate (λ), service rate (μ), and the number of servers (c) determines the probabilistic behavior of the queue. When λ < cμ, the system is stable and the queue length has a stationary distribution. When λ ≥ cμ, the queue grows without bound. The boundary between stability and instability is sharp, and it is determined by the system's design parameters, not by the quality of the implementation.

Kendall's Notation

Queueing systems are classified using Kendall's notation, a shorthand that describes the arrival process, the service process, the number of servers, and the system capacity. The notation A/B/c/K/N/D specifies:

A: The arrival process distribution (M for Markovian/Poisson, D for deterministic, G for general)
B: The service time distribution (M, D, or G)
c: The number of parallel servers
K: The system capacity (maximum number of customers in the system, including those being served)
N: The population size (for finite-source models)
D: The service discipline (FIFO, LIFO, priority, etc.)

The most commonly studied model is the M/M/1 queue: Poisson arrivals, exponential service times, one server, infinite capacity. For the M/M/1 queue, the average queue length L = λ/(μ - λ), and the average waiting time W = 1/(μ - λ). These formulas are elegant but dangerous: they assume steady state, which requires that the system has been running for an infinite amount of time with constant parameters. In real systems, arrival rates spike, service times vary, and the steady