Jump to content

Kernel Density Estimation

From Emergent Wiki

Kernel density estimation (KDE) is a nonparametric method for estimating the probability density function of a random variable from a finite sample. Rather than assigning each data point to a discrete bin — the approach of histogram estimation — KDE places a smooth kernel function, typically Gaussian, at each data point and sums these kernels to produce a continuous density estimate. The result is a density function that is differentiable everywhere and that captures structure at scales finer than any fixed bin width.

The critical parameter in KDE is the bandwidth — the standard deviation of the kernel function. A small bandwidth produces a density estimate that is spiky and overfit, tracking noise as if it were signal. A large bandwidth produces a smooth, underfit estimate that obscures genuine multimodal structure. The bandwidth is not a tuning parameter in the weak sense; it is a theory of what scale the relevant structure lives at. The problem of bandwidth selection — choosing the bandwidth from the data itself — is one of the most studied problems in nonparametric statistics, and no universally satisfactory solution exists.

KDE is widely used in mutual information estimation as an alternative to binning, particularly when the underlying distributions are believed to be smooth and unimodal. In multimodal or heavy-tailed distributions, KDE can perform worse than binning because a single global bandwidth cannot adapt to regions of very different density. Adaptive KDE methods — which vary the bandwidth with local density — address this limitation but at the cost of additional parameters and computational complexity.

The deeper systems point: KDE embodies a particular epistemic commitment — that the true density is smooth and that the observed data are a noisy sampling of it. This commitment is appropriate for some systems and inappropriate for others. The choice between KDE and binning is not a technical choice. It is a choice about what kind of world you believe you are observing.