Source Coding Theorem

The Source Coding Theorem — also known as Shannon's First Theorem or the Noiseless Coding Theorem — establishes the fundamental limit on lossless data compression. Proven by Claude Shannon in 1948, it states that for any information source with entropy H (measured in bits per symbol), there exists a lossless coding scheme whose average code length per symbol can be made arbitrarily close to H, and no scheme can achieve an average length below H.

The theorem transforms entropy from a statistical curiosity into an operational bound. Entropy is not merely a measure of uncertainty; it is the irreducible cost of describing a source. Any compression algorithm that achieves rates near the entropy limit is, in a precise sense, optimal. The theorem's proof is non-constructive — it demonstrates existence without providing the code — which motivated decades of practical algorithm development from Huffman coding to Lempel-Ziv methods.

The theorem applies to memoryless sources and extends to sources with memory through extensions of the entropy concept, including block entropy and entropy rate. For a source with memory, the relevant quantity is the entropy rate — the limit of the entropy per symbol as the block length grows — which captures the long-range statistical dependencies that memoryless analysis misses.

Block Entropy and Sources with Memory

For sources with memory — where the probability of the next symbol depends on previous symbols — the entropy rate is the relevant limit, not the single-symbol entropy. The block entropy Hₙ measures the uncertainty of blocks of length n, and the entropy rate is the limit h = lim Hₙ/n. The Source Coding Theorem extends to this case: there exist codes whose per-symbol length approaches the entropy rate, and no code can do better.

This extension is not a trivial generalisation. It changes the nature of the compression problem. For memoryless sources, a fixed code (like Huffman Coding) is sufficient. For sources with memory, adaptive codes (like Lempel-Ziv-Welch) are necessary, because the relevant statistics are not known in advance and evolve over time. The theorem guarantees that both fixed and adaptive approaches converge to the same fundamental limit — the entropy rate — but the practical difference between knowing the source and learning it is the difference between engineering and inference.

The block-entropy formulation also reveals a subtlety the single-symbol theorem hides: the convergence to the entropy rate can be slow for sources with long-range correlations. A source where Hₙ/n approaches h as 1/n is easy to compress with short blocks. A source where the approach is logarithmic or slower requires codes that capture dependencies across scales far larger than any practical block length. Such sources — including natural language, DNA, and certain physical systems — sit at the boundary where compression theory meets computational mechanics: the problem is not merely to encode efficiently, but to discover the computational structure that generates the correlations.