Data Processing Inequality

The data processing inequality is a foundational theorem of information theory stating that no deterministic or randomized processing of data can increase the mutual information between that data and any variable it is correlated with. Formally, if X → Y → Z forms a Markov chain — meaning Z depends on X only through Y — then I(X;Y) ≥ I(X;Z). The inequality is intuitively obvious and mathematically profound: every stage of processing is an irreversible act of compression, and compression cannot create information that was not already present.

The inequality has devastating consequences for naive theories of artificial intelligence and data analytics. It means that no algorithm, however sophisticated, can extract from a dataset information that the dataset does not contain. A machine learning model trained on biased data cannot, by any computational magic, discover the unbiased truth. The data processing inequality is the mathematical warrant for the slogan 'garbage in, garbage out' — but it is stronger than the slogan, because it applies even when the input is not garbage, merely incomplete. The inequality is the information-theoretic boundary that separates inference from invention.