Noisy-channel coding theorem

Noisy-channel coding theorem (also called Shannon's channel coding theorem) is the central result of information theory. It states that for any communication channel with a finite capacity C, there exist codes that permit information transmission at any rate R < C with arbitrarily low error probability — provided the block length is sufficiently large. Conversely, for any rate R > C, the error probability is bounded away from zero regardless of the coding scheme. The theorem establishes a hard boundary: capacity is not an engineering benchmark to be approached; it is a mathematical limit that cannot be exceeded.

The theorem was proved by Claude Shannon in 1948, in the same paper that founded information theory. The proof is existential, not constructive. Shannon showed that random codes — codes whose codewords are chosen uniformly at random from the space of all possible words — achieve the capacity with probability approaching one as the block length grows. This is the random coding argument, and it transformed communication engineering by proving that optimal performance was theoretically achievable, even if the specific codes were not yet known.

The Structure of the Proof

The proof has two parts: the achievability bound and the converse.

Achievability. Fix a rate R < C and a block length n. Choose 2^{nR} codewords independently and uniformly at random from the set of all binary strings of length n. For a given transmitted codeword, the channel noise transforms it into a received word drawn from a probability distribution determined by the channel statistics. The decoder employs maximum-likelihood decoding: it declares the transmitted codeword to be the one that maximizes the probability of producing the received word.

The probability of decoding error can be bounded by the probability that any other codeword is closer (in the channel's probability metric) to the received word than the true codeword. Because the codewords are chosen independently, the expected number of competing codewords within the noise cloud around the true codeword is exponentially small in n when R < C. By the union bound, the total error probability vanishes as n → ∞.

This argument is remarkable for what it does not require. It does not require algebraic structure, efficient encoding, or efficient decoding. It requires only randomness and block length. The price of this generality is that the resulting code is not efficiently decodable: the decoder must search through all 2^{nR} codewords, an exponential task. Constructing codes that are both capacity-approaching and efficiently decodable took fifty years and produced turbo codes, LDPC codes, and polar codes.

Converse. The converse proves that rates above capacity are impossible. For any code with rate R > C, Fano's inequality relates the error probability to the conditional entropy of the message given the received word. Information-theoretic identities show that this conditional entropy cannot be made arbitrarily small: the mutual information between channel input and output is bounded by C, and if R > C, the receiver cannot resolve enough uncertainty to recover the message reliably. The error probability is bounded below by a positive constant independent of the code.

Capacity and the Physical World

The capacity C is not merely a mathematical quantity. It is a physical limit. For the binary symmetric channel with crossover probability p, the capacity is C = 1 − H_2(p), where H_2 is the binary entropy function. For the additive white Gaussian noise (AWGN) channel, the capacity is C = ½ log_2(1 + SNR), where SNR is the signal-to-noise ratio. These formulas connect abstract information theory to measurable physical quantities: bit-flip probabilities, noise power spectral densities, signal amplitudes.

The theorem implies that reliable communication is possible even over arbitrarily noisy channels, provided the noise is not so severe that the capacity drops to zero. This is non-intuitive. Before Shannon, engineers believed that reducing noise required reducing rate — that reliability and speed traded off. Shannon showed that the tradeoff disappears once you code correctly: up to capacity, you can have both. The limit is sharp and absolute.

The Gap Between Existence and Construction

The noisy-channel coding theorem is an existence proof. It says that good codes exist. It does not say how to find them. This gap defined coding theory for half a century.

The first constructive codes — Hamming codes, Reed-Solomon codes, convolutional codes — achieved only fractions of capacity. The breakthrough came in 1993 with turbo codes, which used iterative decoding between two simple codes to approach capacity within fractions of a decibel. LDPC codes, invented by Gallager in 1962 but forgotten for decades, were rediscovered and shown to achieve capacity with belief propagation decoding. Polar codes, invented by Arıkan in 2009, were the first codes provably capacity-achieving with efficient decoding, though their finite-length performance lags behind turbo and LDPC codes.

Each of these constructions reveals a different face of the same underlying structure: the geometry of high-dimensional spaces, where random points are exponentially unlikely to be close to each other. The codes that approach capacity are the codes that most closely mimic the random coding argument while admitting structure that enables efficient decoding.

The noisy-channel coding theorem is not a statement about engineering. It is a statement about the relationship between structure and randomness. It says that structure can always defeat randomness, provided the structure is sufficiently complex and the rate of structure-addition does not exceed the channel's capacity for absorbing it. This is emergence in its most precise form: not the vague claim that the whole is greater than the sum of its parts, but the exact statement that the whole can be made reliable even when every part is unreliable.