Fisher information
Fisher information is a measure of the amount of information that an observable random variable carries about an unknown parameter upon which its probability distribution depends. Named after Ronald Fisher, who introduced it in 1925, Fisher information occupies a peculiar position in the mathematical sciences: it is simultaneously a statistical quantity, a geometric object, and a physical observable. This triple identity is not a coincidence. It is evidence that information, geometry, and physics are not separate domains but aspects of a single structure.
Definition and Core Properties
For a probability distribution with density function f(x;θ) parameterized by θ, the Fisher information is defined as:
- I(θ) = E[(∂/∂θ log f(X;θ))²]
where the expectation is taken over the random variable X. The quantity inside the expectation is the score function — the derivative of the log-likelihood with respect to the parameter. Fisher information measures the expected curvature of the log-likelihood surface at the true parameter value. Where the likelihood is sharply curved, small changes in the parameter produce large changes in the probability of the observed data; the parameter is tightly constrained. Where the likelihood is flat, the data are uninformative.
Fisher information is additive over independent observations: the information in N independent samples is N times the information in one sample. This linear scaling is the mathematical basis for the intuition that more data yields more precise estimates — though, critically, it is the information that scales linearly, not the precision itself.
The Cramér-Rao Bound and Efficiency
The most consequential theorem involving Fisher information is the Cramér-Rao bound. It states that the variance of any unbiased estimator of θ is bounded below by the reciprocal of the Fisher information:
- Var(θ̂) ≥ 1/I(θ)
No unbiased estimator can do better than this bound. Maximum likelihood estimators achieve it asymptotically — as the sample size grows, they become efficient in the precise sense that their variance approaches the Cramér-Rao limit. This is why maximum likelihood estimation dominates applied statistics: it is the estimation method that extracts all available information from the data, at least in the limit of large samples.
The bound reveals a fundamental tradeoff: information constrains uncertainty. The more information the data carry about a parameter, the less room there is for the estimator to vary. This is not a theorem about computation or about human knowledge. It is a theorem about the geometry of probability distributions — a statement that the shape of the likelihood function places hard limits on what can be inferred, regardless of the ingenuity of the inference method.
Fisher Information as Geometry
Fisher's insight, deepened by C. R. Rao in the 1940s and radically extended in the 1980s, was that Fisher information defines a Riemannian metric on the space of probability distributions. This space — the statistical manifold — is a curved geometric object whose local geometry is determined by the Fisher information matrix. The distance between two distributions, measured in this metric, is precisely the amount of information required to discriminate between them.
This geometric perspective transforms statistics from a collection of techniques into a branch of differential geometry. The Jeffreys prior, a default prior in Bayesian statistics that is invariant under reparameterization, is simply the volume element of the Fisher-Rao metric. The Rao-Blackwell theorem, a cornerstone of estimation theory, is a statement about projections onto submanifolds. Even the Cramér-Rao bound becomes a statement about the curvature of geodesics.
The field of information geometry, founded by Shun'ichi Amari, has extended this geometric framework to infinite-dimensional spaces, to quantum systems, and to machine learning. In each domain, the same structure appears: a manifold of models, a metric given by Fisher information, and a set of natural connections (dual connections) that encode the relationship between different parameterizations. The persistence of this structure across domains suggests that Fisher information is not merely a statistical tool but a universal measure of discriminability.
Information and Physical Reality
Fisher information acquires its deepest significance when placed in physical context. In 1998, B. Roy Frieden proposed that the laws of physics themselves can be derived from a principle of extremal Fisher information — that physical systems evolve to maximize the information extractable from their observable properties. While the proposal remains controversial, it connects to a broader pattern: the Fisher information of a quantum wave function is related to its kinetic energy, and the uncertainty principle can be derived as a Cramér-Rao inequality for position and momentum.
Landauer's principle, that erasure costs kT ln 2, can be reformulated in Fisher-information terms: the information destroyed by erasure is precisely the Fisher information that the erased state carried about the parameters of its generating process. This unifies thermodynamic entropy, Shannon information, and Fisher information in a single framework where information is not an abstraction but a physical quantity with energy cost and geometric structure.
The triple identity of Fisher information — statistical, geometric, physical — is not a curiosity. It is a clue. It suggests that the divisions between statistics, geometry, and physics are institutional conveniences, not natural kinds. Where these divisions are respected, each field treats Fisher information as a tool specific to its domain. Where they are ignored, the same structure appears everywhere, as if the universe were written in a single language that only looks like different dialects when read through disciplinary lenses. The systems claim is not that Fisher information explains everything. It is that anything that resists description in Fisher-information terms is either not yet understood or not yet formalized enough to reveal its structure.