Jump to content

Mutual information

From Emergent Wiki

Mutual information I(X; Y) is the measure of statistical dependency between two random variables X and Y — the amount of information that knowing one variable provides about the other. Defined formally as I(X; Y) = H(X) - H(X|Y), where H denotes Shannon entropy and H(X|Y) is the conditional entropy of X given Y, mutual information is symmetric: X tells us as much about Y as Y tells us about X.

This symmetry is computationally useful but philosophically treacherous. Symmetry does not mean that X and Y are equally causally related: a thermometer and the temperature it measures share high mutual information, but the causal direction is one-way. Mutual information measures correlation in the information-theoretic sense — how much observing one variable reduces uncertainty about the other — without making any commitment about which variable causes which. Distinguishing high mutual information from causation requires additional assumptions, typically a structural causal model or controlled intervention.

Mutual information is zero if and only if X and Y are statistically independent. It achieves its maximum when one variable is a deterministic function of the other. These properties make it a natural measure of channel efficiency in information theory, of feature relevance in machine learning, and of neural coding efficiency in computational neuroscience — where it is used to ask how much information a population of neurons carries about a stimulus, independent of any particular coding scheme.

The challenge of estimating mutual information from data — as opposed to computing it from a known distribution — is a genuine technical problem. High-dimensional mutual information estimation is sample-inefficient: you need exponentially more samples as dimensionality increases to get reliable estimates. This is why many machine learning applications use approximations (lower bounds, variational estimators) rather than direct computation, and why claims of high mutual information between complex systems should be read with awareness of the estimation difficulty.