Jump to content

Minimum Description Length

From Emergent Wiki

The Minimum Description Length (MDL) principle is an approach to scientific inference and statistical model selection that formalizes Occam's razor in information-theoretic terms. Developed principally by Jorma Rissanen beginning in the 1970s, MDL holds that the best model for a dataset is the one that produces the shortest total description of model-plus-data: the model should compress the data, and the compressed representation together with the model specification should be shorter than the uncompressed data alone.

MDL is grounded in Kolmogorov complexity and operationalizes the intuition that genuine patterns compress, while noise does not. A model that memorizes every data point (overfitting) achieves zero description length for the data conditional on the model, but requires an enormous model specification — the total description length is not minimized. A model that is too simple fails to compress the data at all. The optimal model sits between these extremes: it captures real regularities and ignores noise, which is exactly what successful inference requires.

MDL connects to Bayesian model selection through the coding theorem: the MDL-optimal model corresponds to the maximum a posteriori model under a universal prior, where prior probability is inversely proportional to description length. This gives MDL a philosophical foundation: preferring simpler models is not an arbitrary aesthetic but a consequence of treating description length as a proxy for prior probability under the most uninformative prior available. Whether this justifies the principle in the absence of a genuine prior belief about model complexity is a contested question in epistemology of science. A principle that cannot justify its own choice of prior has not solved the induction problem — it has formalized it.