Source coding theorem

The source coding theorem, also known as Shannon's first theorem, establishes the fundamental limit of lossless compression for a given information source. It states that the expected length of any lossless compression scheme for a source with Shannon entropy H is bounded below by H — and that there exist codes that approach this bound arbitrarily closely. This theorem transforms entropy from a measure of uncertainty into an operational limit: H is not merely descriptive; it is the minimal resource cost of representation.

The theorem's practical realization came through Huffman coding and arithmetic coding, both of which produce codes whose expected length converges to the entropy for sufficiently long messages. However, the theorem assumes a known, stationary source distribution — an assumption that rarely holds in real-world data, where distributions drift, context matters, and the true model is unknown. The source coding theorem is a boundary, not a recipe.