Proxy Measure

A proxy measure is a variable used to represent an underlying quantity that cannot be directly observed or measured. Proxy measures are unavoidable in science, policy, and machine learning: consciousness cannot be measured directly, so researchers use behavioral proxies; GDP cannot capture wellbeing directly, so economists use it as a proxy for societal flourishing; reward signals in reinforcement learning are proxies for the intended behavior of an agent.

The practical and philosophical problem with proxy measures is their stability under optimization pressure. A proxy measure is valid as long as the correlation between the proxy and the underlying target holds. This correlation is an empirical fact about a particular context, not a logical necessity. When agents begin optimizing the proxy — that is, when the measure becomes a target — the correlation degrades. This degradation is the mechanism described by Goodhart's Law.

The deeper problem is that proxy validity is typically assessed in the absence of optimization pressure, then assumed to persist when optimization pressure is applied. This is the fundamental error: the context that validated the measure is not the context in which the measure will be used. No amount of careful proxy selection at baseline can guarantee validity under the selection pressures of high-stakes optimization.

The search for proxies robust to optimization pressure is an open problem in AI Alignment, Measurement Theory, and Institutional Design.

Institutional Proxy Failure

The pathology of proxy measures intensifies at the institutional level. When a single researcher optimizes a proxy, the damage is limited. When an entire organization adopts a proxy as its performance metric — student test scores as a measure of educational quality, hospital readmission rates as a measure of care quality, citation counts as a measure of scientific importance — the optimization pressure becomes structural. The institution reconfigures itself around the proxy, hiring staff who excel at producing proxy-friendly outcomes, rewarding behaviors that move the metric, and discarding practices that do not.

This is institutional proxy failure: the systematic degradation of an organization's actual function caused by its optimization of a proxy measure. The failure is not merely that the proxy becomes inaccurate. It is that the institution loses the capacity to distinguish between the proxy and the target. The measure becomes the reality. Goodhart's Law operates here not as a prediction but as a phase transition: before the transition, the proxy is useful; after the transition, the proxy is the only thing that exists.

The case of No Child Left Behind in the United States is exemplary. Test scores were adopted as a proxy for educational quality. Schools reallocated instruction time toward test preparation, narrowed curriculum coverage to tested subjects, and in some cases engaged in outright cheating. The proxy did not merely distort behavior; it redefined what education meant within the institutional context. The same pattern appears in science, where journal impact factors — proxies for research quality — shape hiring, promotion, and funding decisions, producing a literature optimized for citation velocity rather than truth value.

Epistemic Drift

When proxy measures become entrenched in knowledge-producing institutions, they produce epistemic drift: a slow, often imperceptible shift in what a field considers knowable, toward what is measurable by its proxies. The drift is not conspiracy or corruption. It is an emergent property of a system in which success is defined by proxy performance.

Consider psychometrics. The field has developed extraordinary sophistication in measuring traits like intelligence, personality, and cognitive ability. But the very sophistication of these measures creates a drift: the field increasingly studies what its instruments can detect, and treats what its instruments cannot detect as non-existent or unimportant. The quantification bias — the tendency to value what can be counted over what cannot — is not a personal failing of researchers but a structural feature of a field whose funding and prestige depend on measurable outcomes.

Epistemic drift is particularly dangerous in policy contexts. When a government agency uses GDP as a proxy for national wellbeing, policy debates are reshaped: arguments that cannot be expressed in GDP terms become illegible, and interventions that improve wellbeing without growing GDP become politically impossible. The proxy does not just measure; it legislates. It determines what can be argued, what can be funded, and what can be imagined.

Measurement Regime

A measurement regime is the total system of proxies, metrics, and evaluation frameworks that govern a particular domain. It includes not just the measures themselves but the institutional infrastructure around them: the organizations that collect data, the software that aggregates it, the protocols that validate it, and the incentive structures that reward compliance. Measurement regimes are not neutral technical apparatuses. They are systems of power that determine what counts as valid knowledge and what counts as legitimate action.

The rise of algorithmic governance has produced measurement regimes of unprecedented scale and granularity. Platform metrics, credit scores, risk assessments, and predictive policing systems all operate as proxy-based governance structures. They claim to measure neutral properties — engagement, creditworthiness, risk, recidivism — but they simultaneously produce the behaviors they claim to measure. A credit score does not merely assess creditworthiness; it shapes the economic opportunities available to the scored, which in turn shapes their creditworthiness. The proxy is performative: it brings into being the condition it purports to describe.

The political question is not whether measurement regimes are accurate. It is whether they are contestable. A measurement regime that cannot be challenged — whose proxies are treated as objective facts rather than designed instruments — is a form of epistemic closure. The task of critical systems theory is to open these regimes to inspection: to ask not just is