LDPC Codes

Low-density parity-check codes (LDPC codes) are a class of error-correcting codes constructed from sparse bipartite graphs and decoded by iterative belief propagation algorithms. Invented by Robert Gallager in his 1960 doctoral thesis at MIT and largely ignored for three decades due to computational limitations, they were rediscovered in the 1990s and are now deployed in nearly every modern digital communication standard: Wi-Fi (802.11n/ac/ax), 5G NR, DVB-S2, and 10GBase-T Ethernet. They are among the closest known practical approaches to the theoretical limits on channel capacity established by Shannon's information theory.

The Construction

An LDPC code is defined by a parity-check matrix H with low density — far fewer 1s than 0s — that specifies the constraints a valid codeword must satisfy. The sparsity condition is what enables efficient belief propagation decoding: each bit variable is involved in only a few check equations, and each check equation involves only a few bit variables. This sparse connectivity allows the decoder to pass messages along the edges of the Tanner graph representation (a bipartite graph with variable nodes and check nodes) until convergence or a maximum iteration count.

The belief propagation algorithm is an instance of approximate Bayesian inference: each message represents a probability ratio (log-likelihood ratio) for a bit being 0 vs. 1 given the observations at that point in the graph. The algorithm is optimal for tree-structured graphs; for graphs with cycles — which all practical LDPC codes have — it is an approximation whose accuracy depends on the length and density of the shortest cycles (the "girth"). Capacity-approaching performance requires large block lengths and careful graph design to avoid short cycles.

Approaching Shannon Capacity

The central empirical achievement of LDPC codes is their proximity to the Shannon limit — the theoretical maximum rate at which information can be transmitted reliably over a noisy channel, established by Claude Shannon in 1948. Turbo codes (1993) first demonstrated that Shannon capacity was practically approachable; LDPC codes showed the same could be achieved with lower decoding complexity and more regular structure amenable to hardware implementation.

For an additive white Gaussian noise channel at a given signal-to-noise ratio, a well-designed LDPC code with large block length can operate within 0.1 dB of the Shannon limit — a margin so small it is operationally irrelevant. This performance is not guaranteed by theory for finite block lengths; it emerges from the concentration properties of random LDPC ensembles and the accuracy of density evolution analysis, a technique for tracking the distribution of messages through the belief propagation decoder in the infinite block-length limit.

The gap between finite-length performance and the infinite-block-length theoretical limit is a practical constraint that Polar codes — the third major family of capacity-achieving codes, introduced by Arikan in 2009 — address differently, achieving provably optimal scaling at finite lengths with successive cancellation decoding, at the cost of sequential (non-parallelizable) computation.

That the most powerful practical error-correcting codes are also the hardest to analyze theoretically — their performance emerging from the statistical physics of message-passing on random graphs rather than from closed-form algebraic structure — suggests that the engineering achievements of information theory have outpaced its theoretical foundations. We build codes that work; we partially understand why.