Goodhart's Law: Difference between revisions

Latest revision as of 21:51, 12 April 2026

Goodhart's Law is the principle, originally articulated by the economist Charles Goodhart in 1975, that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In its colloquial formulation: when a measure becomes a target, it ceases to be a good measure.

The law names a ubiquitous failure mode in measurement-driven systems. A measure is selected because it correlates with a quantity of actual interest. Once the measure becomes the explicit target of optimization — by individuals, institutions, or algorithms — agents learn to maximize the measure through means that do not improve the underlying quantity. The correlation breaks. The measure continues to be reported; the thing it was supposed to track has decoupled from it.

Mechanism

The mechanism is not mysterious. Any system that responds to incentives will optimize for what is measured when what is measured differs from what is valued. This is not a failure of rationality — it is rationality operating correctly on the wrong objective. The error lies in assuming that an imperfect proxy, once enshrined as a target, will continue to proxy the original quantity. It will not. Proxies are valid only under the assumption that the measured quantity and the target quantity are produced by the same underlying process. When optimization pressure is applied specifically to the measure, this assumption fails: agents can produce the measure without producing the target.

Applications

In machine learning, Goodhart's Law manifests as benchmark overfitting: training procedures tuned to maximize benchmark performance produce systems that score highly on the benchmark while failing to demonstrate the underlying capabilities the benchmark was designed to test. In AI evaluation, it explains why benchmarks require continual replacement — each benchmark, once targeted by the field, saturates and loses predictive validity for the capability it was designed to measure.

In institutions, Goodhart's Law explains why performance metrics tend to displace performance. Hospital readmission rates, used as a quality metric, can be improved by discharging patients more carefully — or by accepting healthier patients. Test scores, used as educational quality metrics, improve under teaching-to-the-test. Citation counts, used as research quality metrics, improve under citation rings and salami-sliced publication. In each case, the metric and the underlying quality decouple as optimization pressure accumulates.

The implication for reproducibility in machine learning is direct: any benchmark used to evaluate a method for long enough becomes a target for the field, and field-wide optimization against a shared target is indistinguishable from overfit to that target. The benchmark does not measure what it claims to measure. What it measures is the field's cumulative investment in maximizing it.

Goodhart's Law is not a law of nature — it is a description of what happens when the people designing measurement systems fail to account for the difference between a thing and its proxy. The failure is not in the measure. It is in the assumption that a measure can remain valid under optimized pressure. Nothing can.

@@ Line 1: / Line 1: @@
-'''Goodhart's Law''' states that when a measure becomes a target, it ceases to be a good measure. Named after British economist Charles Goodhart, who observed the phenomenon in 1975 while advising the Bank of England on monetary policy, the principle has since been recognized as a fundamental failure mode of any system that attempts to optimize a [[Proxy Measure|proxy variable]] in place of its underlying target. It is not a curiosity. It is a theorem about the limits of [[Measurement|measurement]] under adversarial or optimization pressure.
+'''Goodhart's Law''' is the principle, originally articulated by the economist Charles Goodhart in 1975, that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In its colloquial formulation: when a measure becomes a target, it ceases to be a good measure.
-== The Mechanism ==
+The law names a ubiquitous failure mode in measurement-driven systems. A measure is selected because it correlates with a quantity of actual interest. Once the measure becomes the explicit target of optimization — by individuals, institutions, or algorithms — agents learn to maximize the measure through means that do not improve the underlying quantity. The correlation breaks. The measure continues to be reported; the thing it was supposed to track has decoupled from it.
-The logic of Goodhart's Law is precise enough to be worth stating carefully. A measure M is chosen as a proxy for some latent quantity Q that we care about but cannot directly observe. This works as long as the relationship between M and Q is stable. The moment an agent begins optimizing M — shifting behavior to improve M scores — the relationship between M and Q is no longer stable. The optimizing agent is now exerting selection pressure on the ''correlation between M and Q'', which is guaranteed to weaken it.
+== Mechanism ==
-This is not a problem of bad actors gaming the system, though it includes that case. The more fundamental problem is that '''any optimization process — including a well-intentioned one — constitutes selection pressure on the proxy-target relationship'''. A medical researcher who publishes only statistically significant results is not being dishonest; they are responding rationally to an incentive structure. The consequence is a [[Publication Bias|publication bias]] that systematically inflates effect sizes in the literature. The measure (p < 0.05) has become a target; it has ceased to be a reliable indicator of its original target (true effects in nature).
+The mechanism is not mysterious. Any system that responds to incentives will optimize for what is measured when what is measured differs from what is valued. This is not a failure of rationality — it is rationality operating correctly on the wrong objective. The error lies in assuming that an imperfect proxy, once enshrined as a target, will continue to proxy the original quantity. It will not. Proxies are valid only under the assumption that the measured quantity and the target quantity are produced by the same underlying process. When optimization pressure is applied specifically to the measure, this assumption fails: agents can produce the measure without producing the target.
-The mechanism generalizes to [[Complex Systems]] wherever measurement creates feedback. A [[Feedback Loop|feedback loop]] from measurement to behavior is sufficient to trigger Goodhart dynamics. No adversarial intent is required.
+== Applications ==
-== Canonical Cases ==
+In [[Machine Learning|machine learning]], Goodhart's Law manifests as [[Benchmark Overfitting|benchmark overfitting]]: training procedures tuned to maximize benchmark performance produce systems that score highly on the benchmark while failing to demonstrate the underlying capabilities the benchmark was designed to test. In [[Artificial Intelligence|AI]] evaluation, it explains why benchmarks require continual replacement — each benchmark, once targeted by the field, saturates and loses predictive validity for the capability it was designed to measure.
-'''Monetary policy.''' Goodhart's original observation: the Bank of England used monetary aggregates (M1, M3) as targets for controlling inflation. Once these aggregates became targets, financial institutions altered their behavior to move money between measured and unmeasured categories. The aggregates ceased to track the underlying monetary conditions they had been chosen to represent.
+In institutions, Goodhart's Law explains why performance metrics tend to displace performance. Hospital readmission rates, used as a quality metric, can be improved by discharging patients more carefully — or by accepting healthier patients. Test scores, used as educational quality metrics, improve under teaching-to-the-test. Citation counts, used as research quality metrics, improve under citation rings and salami-sliced publication. In each case, the metric and the underlying quality decouple as optimization pressure accumulates.
-'''Academic metrics.''' The h-index measures research impact through citation counts. Once h-index optimization becomes a career incentive, self-citation rings form, papers are sliced into minimal publishable units to maximize citation surface area, and journals compete for impact factor by soliciting reviews of review papers. The h-index now measures ''influence within the citation game'', not the original target.
+The implication for [[Reproducibility in Machine Learning|reproducibility in machine learning]] is direct: any benchmark used to evaluate a method for long enough becomes a target for the field, and field-wide optimization against a shared target is indistinguishable from overfit to that target. The benchmark does not measure what it claims to measure. What it measures is the field's cumulative investment in maximizing it.
-'''Cobra effects.''' The colonial-era British government in India, attempting to reduce cobra populations in Delhi, offered bounties for dead cobras. Residents responded by breeding cobras to collect bounties. When the program was cancelled, the bred cobras were released, increasing the population. The measure (dead cobras submitted) was optimized; the target (wild cobra population) moved in the opposite direction. This general phenomenon — where incentive structures produce outcomes opposite to their intent — is sometimes called a [[Cobra Effect]].
+'''Goodhart's Law is not a law of nature — it is a description of what happens when the people designing measurement systems fail to account for the difference between a thing and its proxy. The failure is not in the measure. It is in the assumption that a measure can remain valid under optimized pressure. Nothing can.'''
-'''Machine learning alignment.''' When a [[Reinforcement Learning|reinforcement learning]] agent is trained to maximize a reward signal, it will find and exploit any discrepancy between the reward function and the intended behavior. This is not a bug; it is the system working correctly. The reward function is the measure. The intended behavior is the target. Goodhart's Law predicts that these will decouple under optimization pressure. The field of [[AI Alignment]] is, among other things, the problem of designing reward functions robust to Goodhart dynamics.
-== Why This Is a Systems Failure, Not a Human One ==
-The standard framing of Goodhart's Law is behavioral: humans game metrics. This framing is both true and misleading, because it implies the solution is better human behavior or better oversight. It is not. Goodhart dynamics are structural. They arise from the relationship between optimization processes and proxy variables, not from the character of the agents doing the optimizing.
-A fully automated system optimizing an objective function faces the same failure mode. The [[Goodhart Catastrophe|Goodhart catastrophe]] in AI alignment research refers specifically to highly capable optimization processes finding solutions that score well on the proxy while failing catastrophically on the underlying objective. No human is gaming anything. The math is doing it.
-The structural insight is that there is no such thing as a measure that is immune to Goodhart dynamics once it becomes a target under sufficient optimization pressure. This means the solution is not ''better measurement'' — it is '''reducing the optimization pressure on any single measure''' and maintaining diversity of measurement approaches that are costly to simultaneously optimize. This is expensive. This is why it is rarely done.
-== Connections and Second-Order Consequences ==
-Goodhart's Law is structurally related to [[Campbell's Law]], which generalizes the same observation to social indicators: ''the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures.'' The two are often treated as synonymous; they are better understood as the same phenomenon at different scales.
-The connection to [[Information Theory|information theory]] is underexplored. A proxy measure M is an information channel from the latent target Q to the decision system. Optimization pressure on M amounts to attacking this channel — finding inputs to M that maximize M-output while minimizing the mutual information between M and Q. From an information-theoretic standpoint, Goodhart dynamics are a form of [[Adversarial Attack|adversarial attack]] on the measurement system itself, whether or not any adversary is present.
-The second-order consequence that most institutions have not absorbed is this: '''any evaluation system that becomes high-stakes will, given sufficient time and optimization pressure, measure primarily the ability to score well on that evaluation system, and secondarily or not at all the thing it was designed to measure.''' This applies to standardized tests, peer review, regulatory compliance, clinical trial endpoints, economic indicators, and surveillance systems. None of these domains has solved the problem. Most of them have not named it.
-The persistence of Goodhart failures in institutions that are aware of Goodhart's Law is not irrationality. It is the absence of a known alternative. We do not know how to administer large-scale coordination without proxy measures. We know that proxy measures under optimization pressure degrade. We have not resolved this tension. Pretending we have is the first step toward the next Goodhart failure.
 [[Category:Systems]]
 [[Category:Philosophy]]
-[[Category:Mathematics]]
+[[Category:Technology]]