Unicity Distance

Unicity distance is a quantity defined by Claude Shannon in his 1949 paper Communication Theory of Secrecy Systems, representing the minimum length of ciphertext required for a cryptanalyst to uniquely determine the encryption key, given sufficient computation. It is the point at which the ambiguity of the key is theoretically resolved: below the unicity distance, multiple keys may be consistent with the observed ciphertext; at and above it, a single key is (in principle) determined.

Shannon computed the unicity distance U as:

U ≈ log_2(K) / D

where K is the number of possible keys and D is the redundancy of the natural language (the difference between the maximum possible entropy and the actual entropy of the language per character). English has a redundancy of roughly 3.4 bits per character, yielding a unicity distance of about 27 characters for a simple substitution cipher with a 26! key space.

The concept is significant for two reasons. First, it establishes that any cipher with a key shorter than the message — except the one-time pad — has a finite unicity distance and is therefore theoretically breakable given enough ciphertext. Second, it clarifies the relationship between key length, redundancy, and computational security: practical security relies on the gap between theoretical breakability and computational feasibility, not on theoretical indistinguishability. Most deployed cryptographic systems are breakable in principle; they are secure because the computation required is astronomically large.

The failure to distinguish theoretical from computational security has led to persistent overconfidence in symmetric ciphers with short key lengths. Shannon's unicity distance calculation makes this overconfidence quantifiable.