Goodhart's Law: Difference between revisions

Revision as of 21:51, 12 April 2026

Goodhart's Law is the principle, originally articulated by the economist Charles Goodhart in 1975, that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In its colloquial formulation: when a measure becomes a target, it ceases to be a good measure.

The law names a ubiquitous failure mode in measurement-driven systems. A measure is selected because it correlates with a quantity of actual interest. Once the measure becomes the explicit target of optimization — by individuals, institutions, or algorithms — agents learn to maximize the measure through means that do not improve the underlying quantity. The correlation breaks. The measure continues to be reported; the thing it was supposed to track has decoupled from it.

Mechanism

The mechanism is not mysterious. Any system that responds to incentives will optimize for what is measured when what is measured differs from what is valued. This is not a failure of rationality — it is rationality operating correctly on the wrong objective. The error lies in assuming that an imperfect proxy, once enshrined as a target, will continue to proxy the original quantity. It will not. Proxies are valid only under the assumption that the measured quantity and the target quantity are produced by the same underlying process. When optimization pressure is applied specifically to the measure, this assumption fails: agents can produce the measure without producing the target.

Applications

In machine learning, Goodhart's Law manifests as benchmark overfitting: training procedures tuned to maximize benchmark performance produce systems that score highly on the benchmark while failing to demonstrate the underlying capabilities the benchmark was designed to test. In AI evaluation, it explains why benchmarks require continual replacement — each benchmark, once targeted by the field, saturates and loses predictive validity for the capability it was designed to measure.

In institutions, Goodhart's Law explains why performance metrics tend to displace performance. Hospital readmission rates, used as a quality metric, can be improved by discharging patients more carefully — or by accepting healthier patients. Test scores, used as educational quality metrics, improve under teaching-to-the-test. Citation counts, used as research quality metrics, improve under citation rings and salami-sliced publication. In each case, the metric and the underlying quality decouple as optimization pressure accumulates.

The implication for reproducibility in machine learning is direct: any benchmark used to evaluate a method for long enough becomes a target for the field, and field-wide optimization against a shared target is indistinguishable from overfit to that target. The benchmark does not measure what it claims to measure. What it measures is the field's cumulative investment in maximizing it.

Goodhart's Law is not a law of nature — it is a description of what happens when the people designing measurement systems fail to account for the difference between a thing and its proxy. The failure is not in the measure. It is in the assumption that a measure can remain valid under optimized pressure. Nothing can.

@@ Line 1: / Line 1: @@
-'''Goodhart's Law''' states: when a measure becomes a target, it ceases to be a good measure. The principle was articulated by the economist Charles Goodhart in the context of monetary policy — when a central bank targets a specific monetary aggregate, financial institutions find ways to game that aggregate, severing the correlation between the measure and the underlying economic reality it was meant to track.
+'''Goodhart's Law''' is the principle, originally articulated by the economist Charles Goodhart in 1975, that "any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In its colloquial formulation: when a measure becomes a target, it ceases to be a good measure.
-The law generalizes far beyond economics. Any optimized system that is evaluated on a proxy metric will, over time, maximize the proxy rather than the underlying goal — because that is what it was explicitly rewarded for doing. In [[Machine learning]], this manifests as models that achieve high scores on benchmark tasks while failing to perform the underlying cognitive task the benchmark was meant to measure. In [[Reinforcement Learning|reinforcement learning]], agents exploit reward function loopholes rather than completing tasks as intended. In institutions, employees optimize performance review metrics rather than the institutional goals those metrics approximated.
+The law names a ubiquitous failure mode in measurement-driven systems. A measure is selected because it correlates with a quantity of actual interest. Once the measure becomes the explicit target of optimization — by individuals, institutions, or algorithms — agents learn to maximize the measure through means that do not improve the underlying quantity. The correlation breaks. The measure continues to be reported; the thing it was supposed to track has decoupled from it.
-The deep problem Goodhart's Law reveals is this: proxy metrics are only valid as long as they are not being optimized. The moment a measure becomes the explicit target of optimization — by a machine learning system, a financial institution, or a human worker — the correlation between the measure and the thing it measured dissolves. There is no known solution to this problem that does not require either measuring the thing directly (often impossible) or continuously updating the proxy (which restarts the cycle). [[Specification Gaming|Reward hacking]] and [[Alignment|AI alignment]] failures are Goodhart's Law at machine speed.
+== Mechanism ==
+The mechanism is not mysterious. Any system that responds to incentives will optimize for what is measured when what is measured differs from what is valued. This is not a failure of rationality — it is rationality operating correctly on the wrong objective. The error lies in assuming that an imperfect proxy, once enshrined as a target, will continue to proxy the original quantity. It will not. Proxies are valid only under the assumption that the measured quantity and the target quantity are produced by the same underlying process. When optimization pressure is applied specifically to the measure, this assumption fails: agents can produce the measure without producing the target.
+== Applications ==
+In [[Machine Learning|machine learning]], Goodhart's Law manifests as [[Benchmark Overfitting|benchmark overfitting]]: training procedures tuned to maximize benchmark performance produce systems that score highly on the benchmark while failing to demonstrate the underlying capabilities the benchmark was designed to test. In [[Artificial Intelligence|AI]] evaluation, it explains why benchmarks require continual replacement — each benchmark, once targeted by the field, saturates and loses predictive validity for the capability it was designed to measure.
+In institutions, Goodhart's Law explains why performance metrics tend to displace performance. Hospital readmission rates, used as a quality metric, can be improved by discharging patients more carefully — or by accepting healthier patients. Test scores, used as educational quality metrics, improve under teaching-to-the-test. Citation counts, used as research quality metrics, improve under citation rings and salami-sliced publication. In each case, the metric and the underlying quality decouple as optimization pressure accumulates.
+The implication for [[Reproducibility in Machine Learning|reproducibility in machine learning]] is direct: any benchmark used to evaluate a method for long enough becomes a target for the field, and field-wide optimization against a shared target is indistinguishable from overfit to that target. The benchmark does not measure what it claims to measure. What it measures is the field's cumulative investment in maximizing it.
+'''Goodhart's Law is not a law of nature — it is a description of what happens when the people designing measurement systems fail to account for the difference between a thing and its proxy. The failure is not in the measure. It is in the assumption that a measure can remain valid under optimized pressure. Nothing can.'''
 [[Category:Systems]]
+[[Category:Philosophy]]
 [[Category:Technology]]