External Validity

External validity is the degree to which a causal claim established in one context remains true when transported to another. It is not merely a statistical property of a study sample, nor a question of whether the participants resemble the target population. At its core, external validity is a claim about the stability of causal structure across contexts — about whether the mechanisms that produced an effect in a laboratory, a clinic, or a trial site still operate in the same way when the setting, population, or moment changes.

The term is most commonly associated with the Randomized Controlled Trial, where it names the gap between the tightly controlled conditions of the experiment and the messy reality of clinical practice or policy implementation. An RCT may demonstrate with high internal validity that a drug reduces blood pressure in its trial population. But does the drug work in populations with comorbidities excluded from the trial? At different doses? In health systems with different adherence patterns? The RCT answers what happened in its own closed world. External validity asks whether that world's causal laws hold in ours.

Dimensions of Transport

External validity is not a single binary property. It fragments along at least three dimensions, each raising distinct structural questions:

Population Validity: Does the effect hold in a different group of people — different ages, genetics, comorbidities, or socioeconomic contexts? This is the dimension most commonly invoked, but it is often misunderstood as a demographic matching problem. The deeper question is whether the underlying causal mechanisms that mediate the effect operate identically across populations. A blood pressure drug may work through the renin-angiotensin system in one population but face countervailing mechanisms in another.

Temporal Validity: Does the effect hold at a different time? Contexts drift. The same educational intervention may have worked in 2010 but not in 2025 because the background media environment has shifted. Temporal validity is the most neglected dimension of external validity, yet it is arguably the most important for long-term policy. Institutions, norms, and technologies co-evolve with interventions, meaning that the causal graph itself may be time-dependent.

Ecological Validity: Does the effect hold when the institutional and environmental context changes? This includes everything from the healthcare system in which a drug is deployed to the incentive structures surrounding a behavioral intervention. Ecological validity asks not about the individuals but about the system in which they are embedded. An intervention that works in a high-trust, well-resourced context may fail in a low-trust, resource-constrained one not because the biological mechanism changed, but because the social mechanism did.

External Validity and Structural Assumptions

The philosophical core of external validity is the Structural Assumption: the presupposition that a system's causal architecture is stable enough to support transport. When we claim that an RCT result generalizes, we are not merely claiming that the effect size is similar elsewhere. We are claiming that the causal structure — the variables, their relationships, the absence of unmeasured confounders, the direction of effects — is preserved across the source and target contexts.

This is why external validity cannot be established by statistical methods alone. Meta-analysis can pool effect sizes across studies, but it cannot test whether the studies share the same causal structure. Heterogeneity in effect sizes across studies may reflect true variation in the underlying mechanism, or it may reflect variation in unmeasured moderators that the meta-analyst does not know exist. The transportability framework developed by Judea Pearl and others attempts to formalize the conditions under which causal effects can be transferred across contexts, but the framework's validity depends on the same structural assumptions it seeks to make explicit.

The Pragmatics of Generalization

Given that perfect external validity is unattainable, the question becomes: how do we reason under structural uncertainty? One approach is mechanism-based generalization: instead of asking whether the effect replicates, ask whether the mechanism replicates. If we have a plausible account of how an intervention produces its effect, we can reason about whether that mechanism would be active in the target context. This shifts the burden from statistical extrapolation to theoretical modeling — from what happened to why it happened.

Another approach is adaptive implementation: treat the target context as a new experiment. Roll out the intervention with monitoring, measure intermediate outcomes, and be prepared to modify or abandon the intervention if the mechanism fails to engage. This is not a rejection of external validity but an acceptance of its limits. We generalize not by assuming transportability but by testing it in vivo.

The persistent failure to distinguish internal validity from external validity — to celebrate a well-run RCT as if its conclusions automatically apply beyond the trial — represents one of the most costly epistemic errors in evidence-based policy. An internally valid study that is externally void is a monument to precision over relevance.