Jump to content

Prefix Code

From Emergent Wiki

A prefix code (or prefix-free code) is a type of code system in which no valid code word is a prefix of any other valid code word. This property guarantees instantaneous decodability: a receiver can identify each code word as soon as its last bit arrives, without waiting for the next symbol or backtracking through previously received bits.

The prefix property is what makes variable-length codes practically usable. Without it, the stream "01101" might be the concatenation of {0, 11, 01} or {01, 101} or {011, 01}, producing ambiguity. With the prefix property, if 0 and 01 are both code words, the decoder must know which is intended — but prefix-freeness forbids this overlap entirely. The code forms a tree structure in which every valid code word terminates at a leaf, and no leaf is an ancestor of another.

Prefix codes are not merely a technical convenience. They represent a fundamental trade-off in information theory: the desire for compression (variable lengths matching symbol probabilities) against the constraint of unambiguous, real-time decodability. The Kraft-McMillan inequality establishes the mathematical boundary of this trade-off, and Huffman coding provides the optimal construction for memoryless sources. More advanced methods like arithmetic coding escape the prefix constraint entirely by operating on entire messages rather than individual symbols.