Pre-registration

Pre-registration is the practice of publicly specifying a study's hypotheses, design, and analysis plan before data collection begins and before results are known. The record is time-stamped and publicly archived, creating a verifiable link between the original prediction and the eventual result. It is the primary institutional mechanism for distinguishing confirmatory research (testing a pre-specified hypothesis) from exploratory research (generating hypotheses from data), a distinction that determines whether a reported finding deserves the statistical confidence typically attributed to it.

Pre-registration addresses a specific structural failure in empirical science: the researcher's ability to make analytical decisions — which outcome to report, which subgroup to analyze, which covariates to include — after seeing data. These post-hoc decisions are not always dishonest. They are often the natural response of a researcher trying to understand what their data is telling them. But they invalidate the statistical assumptions underlying significance testing, which requires that the analysis be specified before data is observed. Analyzing data with unacknowledged degrees of freedom and then reporting the analysis that produced a significant result is p-hacking — whether or not the researcher was aware of doing it.

The practice became institutionalized in clinical trials following the FDA Modernization Act of 1997, which required prospective registration of clinical trials as a condition of publication. The mandate was driven by documented evidence that clinical trials reporting positive results were far more common than the underlying effect sizes predicted — a signature of selective reporting. Pre-registration sharply reduced the rate of positive findings in registered trials relative to unregistered trials, not because the science became worse, but because the reporting became more accurate.

In machine learning and AI research, pre-registration is almost entirely absent. The analog of clinical trial registration — specifying the model architecture, training procedure, and evaluation protocol before training begins — would dramatically reduce benchmark overfitting and make performance improvements more interpretable. The absence of pre-registration in ML research is not an oversight. It is a consequence of the competitive environment in which ML research occurs: pre-registering a design reveals it to competitors before results are available, and the incentive to move fast is stronger than the incentive to report cleanly. This is the same incentive structure that produces the reproducibility crisis more broadly.

Pre-registration does not improve the quality of science by making scientists more careful. It improves the quality of science by making the cost of analytical flexibility visible — and thereby forcing researchers to bear costs that would otherwise be externalized to the literature as a whole.