Kolmogorov complexity

Kolmogorov complexity is the length of the shortest effective description of an object — typically, the shortest computer program that outputs a given string and halts. Named for Andrey Kolmogorov, who formalized the concept in 1963, it provides a rigorous, substrate-independent measure of the information content of individual objects, as opposed to the statistical averages captured by Shannon entropy. Where Shannon asks "how unpredictable is this source?", Kolmogorov asks "how complex is this particular string?" The two questions are complementary: Shannon's answer is about ensembles; Kolmogorov's is about individuals.

The concept was discovered independently in the early 1960s by Ray Solomonoff (who sought a universal theory of inductive inference), Andrey Kolmogorov (who sought a rigorous definition of randomness for individual sequences), and Gregory Chaitin (who explored the limits of formal proof through algorithmic information). Their convergent discovery suggests that Kolmogorov complexity is not an arbitrary formalism but a natural measure — the information-theoretic analogue of mass or charge, a quantity that emerges whenever one asks how much structure an object contains.

Definition and Formalism

For a finite binary string x, the Kolmogorov complexity K(x) is defined as the length of the shortest program p, written for a fixed universal Turing machine U, such that U(p) = x and U halts. Formally:

K(x) = min { |p| : U(p) = x }

The choice of universal machine matters — but only slightly. The Invariance theorem states that for any two universal Turing machines U and V, there exists a constant c_UV such that for all x:

|K_U(x) - K_V(x)| <= c_UV

This means Kolmogorov complexity is machine-independent up to an additive constant. The constant depends on the reference machine but not on the string. For long strings, the machine-specific overhead becomes negligible; for theoretical purposes, the invariance is sufficient to make K(x) a well-defined mathematical object.

Algorithmic Randomness

Kolmogorov complexity resolves a problem that probability theory could not: defining randomness for an individual sequence. A string is algorithmically random — in the sense of Martin-Löf randomness — if its Kolmogorov complexity is approximately equal to its length: K(x) ~ |x|. Such a string has no exploitable regularity; it cannot be compressed, predicted, or described more concisely than by writing it out directly.

This definition is remarkable because it makes randomness a property of the object, not of the ensemble. Classical probability defines randomness via limiting frequencies: a coin is fair if, over infinitely many flips, heads and tails converge to fifty percent. But this says nothing about any particular sequence of flips. Algorithmic randomness says: a specific sequence of a million coin flips is random if no program shorter than a million bits can generate it. The sequence itself, examined in isolation, either has structure or does not.

The fraction of strings that are algorithmically random approaches 1 as length increases. Most strings are random; compressibility is the exception. This is why lossless compression works: real-world data is not random. It contains patterns — linguistic regularities, spatial correlations, temporal redundancies — that algorithms can exploit. The compression ratio of a file is, in a precise sense, an empirical measure of its distance from randomness.

Connections: From Compression to Induction

Kolmogorov complexity is the theoretical foundation of the Minimum description length principle, which formalizes Occam's razor by selecting the model that minimizes the total bits needed to encode both model and data. It is also the basis of Solomonoff induction, the optimal (though uncomputable) theory of prediction: simpler hypotheses — those with shorter generating programs — receive higher prior probability.

The Busy beaver function Sigma(n) — the maximum number of steps an n-state Turing machine can perform before halting — grows faster than any computable function precisely because computing it would require solving the Halting problem, which is undecidable. The busy beaver function is uncomputable for the same reason that Kolmogorov complexity is uncomputable: both require determining which programs halt, and no algorithm can do this in general.

Gregory Chaitin's halting probability Omega pushes this connection further. Omega is the probability that a randomly chosen program halts on a universal Turing machine. Its binary expansion is algorithmically random: the first n bits of Omega have Kolmogorov complexity of approximately n. Yet Omega is definable — we can say exactly what it means. The result is a number that exists, that we can describe, but whose digits we cannot compute beyond finitely many positions. This is not a technological limitation. It is a theorem.

The Uncomputability and Its Meaning

Kolmogorov complexity is uncomputable: no algorithm can compute K(x) for all strings. The proof is a direct consequence of the halting problem. If we could compute K(x), we could find the shortest program for each string, and we could decide whether any given program is the shortest one — which would let us solve the halting problem. Since the halting problem is undecidable, K(x) must be uncomputable.

This uncomputability is not a defect. It is a structural feature of the relationship between description and reality. Any sufficiently powerful description system can describe objects whose shortest description exceeds the system's own capacity to find them. This is the information-theoretic version of Gödel's incompleteness theorems: a system cannot fully describe its own descriptive limits from within.

Practical approximations of Kolmogorov complexity are nonetheless powerful. Compression algorithms like gzip, bzip2, and LZMA provide upper bounds on K(x): if a file compresses to half its original size, its Kolmogorov complexity is at most half its length plus the size of the decompressor. While these bounds are loose, they are useful. The Normalized compression distance, which compares files by how well they compress together, has been used for everything from phylogenetic tree reconstruction to authorship attribution — all without computing K(x) exactly.

The Systems View

From a systems perspective, Kolmogorov complexity is not merely a measure of string compression. It is a measure of causal structure. A string with low Kolmogorov complexity has a short generating program, which means it was produced by a compact causal process. A string with high Kolmogorov complexity has no compact causal description; its shortest explanation is essentially a copy of itself. The complexity of an object is the complexity of its causal history.

This makes Kolmogorov complexity relevant to any system that processes information: neural networks, ecosystems, economies, minds. The patterns these systems detect are precisely the patterns that permit compression — the regularities that make the world describable in fewer bits than the raw data. Learning, in this view, is compression; understanding is the discovery of a shorter description; and the limits of understanding are the limits of compression, which are the limits of computability, which are the limits of any formal system that tries to know itself.

The claim that Kolmogorov complexity is merely a mathematical curiosity, useful only for proving theorems about Turing machines, misunderstands what the measure captures. Every genome, every neural spike train, every stock market tick, every poem is a string whose complexity is the complexity of the process that produced it. To ignore Kolmogorov complexity is to ignore the question of why some patterns exist in the world and others do not. The answer is not probability. The answer is program length. And that is a different kind of answer entirely.