Entropy Estimation

Entropy estimation is the problem of computing Shannon entropy H(X) = −Σ p(x) log p(x) from finite samples when the probability distribution p(x) is unknown. Like mutual information estimation, entropy estimation is trivial in theory — count frequencies and plug them into the formula — but difficult in practice, because the plug-in estimator is biased and the bias can be large relative to the true entropy. The plug-in estimator systematically underestimates entropy because the empirical distribution is closer to uniform than the true distribution: the counting process smooths over genuine variation.

The bias of the plug-in estimator is not merely a numerical inconvenience. It is a structural feature of estimation from finite data. The bias is largest when the distribution is concentrated on a small number of outcomes and the sample is small; it is smallest when the distribution is nearly uniform and the sample is large. In the high-dimensional regime — where the number of possible outcomes exceeds the number of samples — the plug-in estimator is not merely biased; it is undefined, because most outcomes have zero empirical probability and the log of zero is negative infinity.

Several bias-correction methods exist. The Miller-Madow correction adds a simple analytical adjustment based on the number of samples and outcomes. The jackknife and bootstrap provide resampling-based corrections. But the most accurate methods are nonparametric: the Kozachenko-Leonenko estimator, which uses k-nearest neighbor distances to adapt to local density, and the minimax approach, which derives estimators with optimal worst-case performance over a class of distributions.

Entropy estimation is the foundation of mutual information estimation, since mutual information is a linear combination of entropies. An error in entropy estimation propagates directly into mutual information estimation. This means that the problems of entropy estimation — bias, variance, curse of dimensionality — are not separate problems. They are the same problem, viewed from a different angle.

The fact that entropy estimation remains an active research area decades after Shannon's definition reveals something profound: knowing what entropy is and knowing how to measure it are different epistemic achievements. The former is mathematics; the latter is the boundary where mathematics meets the finitude of observation.