Divergence Estimation

Divergence estimation is the problem of quantifying the dissimilarity between two probability distributions from finite samples, without assuming parametric forms. The most common divergence is the Kullback-Leibler divergence (relative entropy), but the family includes the Jensen-Shannon divergence, the Wasserstein distance, and the f-divergences. In machine learning and neuroscience, divergence estimation is used to detect distributional shift, compare neural population codes, and validate generative models.

The algorithmic challenge is harder than entropy estimation because it requires characterizing two distributions simultaneously. The Kozachenko-Leonenko framework has been extended to divergence estimation through nearest-neighbor ratios: the divergence is inferred from the ratio of nearest-neighbor distances in the two samples. This avoids density estimation entirely, which is the approach that makes the K-L estimator so powerful.

Divergence estimation is the shadow side of information theory: while entropy asks 'how much uncertainty?', divergence asks 'how different are these two uncertainties?' — and the second question is often the one that matters in practice.