Arithmetic coding

Arithmetic coding is a lossless compression technique that encodes an entire message as a single fractional number in the interval [0, 1). Unlike Huffman coding, which assigns an integer number of bits to each symbol, arithmetic coding can encode a symbol with a fractional number of bits, allowing it to approach the theoretical Shannon entropy limit more closely.

The algorithm maintains a current interval and narrows it according to the probability of each successive symbol. More probable symbols produce smaller reductions in interval width, requiring fewer bits to specify the final number. The resulting code length approaches the self-information of the source with arbitrary precision — a feat Huffman coding cannot match when symbol probabilities are highly skewed.

Arithmetic coding is the foundation of modern standards such as JPEG 2000 and H.264/AVC, which use variants like Context-adaptive binary arithmetic coding to achieve state-of-the-art compression ratios.