Information Geometry

Information geometry is the study of probability distributions as points on a differentiable manifold, with the Fisher information matrix serving as the Riemannian metric. Developed by C.R. Rao in the 1940s and extended by Shun'ichi Amari in the 1980s, it treats statistical inference not as a procedure but as a geodesic motion — the shortest path between a prior belief and a posterior conclusion on the curved surface of possible distributions. The framework reveals that estimation, model selection, and even neural network training are fundamentally geometric operations, and that the 'natural' gradient in parameter space is not the Euclidean gradient but the gradient with respect to the Fisher-Rao metric — a correction that often dramatically accelerates convergence in neural network optimization and makes explicit the coordinate-independence that frequentist statistics obscures.

The equirepresentation of an exponential family in information geometry corresponds to a dually flat manifold, where the primal and dual connections are both flat but with respect to different coordinate systems. This duality between expectation parameters and natural parameters is not merely a mathematical curiosity; it is the geometric expression of the maximum entropy principle and the Legendre transform that bridges thermodynamics and inference.