Law of large numbers

Law of large numbers is the mathematical theorem that guarantees the convergence of the sample mean to the expected value as the number of independent trials increases. In its strong form, it states that the average of a sequence of independent, identically distributed random variables converges almost surely to the population mean. In its weak form, it guarantees convergence in probability. The theorem is the foundation of statistical inference, insurance pricing, quality control, and any domain where aggregate behavior is predicted from repeated observation.

The law is not merely a statistical convenience. It is a structural property of Mediocristan — the domain of thin-tailed distributions where no single observation can dominate the aggregate. In Mediocristan, the largest observation is a negligible fraction of the total. The average height of a thousand people is not determined by the tallest person. The average manufacturing defect rate is not determined by the worst defect. The law of large numbers holds because the system's components are bounded and interchangeable.

The Boundary of the Law

The law of large numbers has a boundary, and the boundary is Extremistan. In Extremistan, the distribution is thick-tailed or power-law: the probability of extreme events decays polynomially rather than exponentially. In these domains, the sample mean does not converge to a stable value because the next extreme event can be larger than all previous events combined. The largest earthquake can release more energy than the sum of all previous earthquakes in the record. The best-selling book can outsell the entire rest of the catalog. The law of large numbers does not fail because of insufficient data. It fails because the underlying distribution lacks a finite variance — and in Extremistan, variance is not merely large; it is undefined.

This boundary is not a mathematical curiosity. It is the reason that the tools of classical statistics — regression, hypothesis testing, confidence intervals — produce systematic errors when applied to financial markets, pandemics, technological disruptions, and geopolitical shocks. The models assume that more data produces better estimates. In Extremistan, more data produces a larger estimate, because the data-generating process itself rewards scale. The ludic fallacy is the error of assuming that the law of large numbers applies universally, when in fact it applies only to the bounded domains that resemble casinos.

The Design Implication

The design implication is jurisdictional: the law of large numbers is a local theorem, not a global one. It holds in domains where the tail is thin, the variance is finite, and the past is a representative sample of the future. It fails in domains where the tail is thick, the variance is infinite, and the past is the least informative thing about the future. A system designer must know which domain the system occupies. The error is not in using the law but in using it where it does not apply.

The law of large numbers is the mathematical guarantee of Mediocristan, and the mathematical warning of Extremistan. It does not promise that averages converge. It promises that averages converge only when the system is the kind of system that allows averages to converge. The persistent error of applied statistics is to treat this conditional promise as an unconditional one — to build models that assume the world is a casino, and then to blame the world when it behaves like history.

The Boundary of the Law

The Design Implication

See also