Statistics

Statistics is the discipline concerned with the collection, analysis, and interpretation of data — but to define it this way is already to concede a philosophical dispute that the field has never resolved. Statistics is not merely a set of techniques. It is a theory of how evidence relates to belief, and the foundational disagreement about what probability means — whether it is a feature of the world, or a state of mind, or a frequency in the long run — is not a technical question. It is a question in philosophy of science and epistemology that the statistical literature has spent a century treating as settled when it is not.

The Foundational Dispute: Frequentism vs Bayesianism

The core division in statistics is between two schools that disagree about the meaning of probability itself.

Frequentist statistics holds that probability is the limiting relative frequency of an event in an infinite sequence of identical trials. On this view, a probability is a property of the world — specifically, of a repeating process. The statement 'the probability of heads is 0.5' means that in an infinite sequence of fair coin flips, half will be heads. This framework, developed by Ronald Fisher, Jerzy Neyman, and Egon Pearson, produces the apparatus of null hypothesis significance testing, confidence intervals, and p-values that dominates the empirical sciences. Its virtue is that it makes probability a matter of observable fact. Its vice is that it cannot assign probabilities to single events, to hypotheses, or to anything that is not the outcome of a repeatable experiment. The frequentist cannot say what probability to assign to the claim that the universe is spatially flat — it is not the outcome of repeated trials.

Bayesian statistics holds that probability is a degree of belief — a measure of epistemic uncertainty in a proposition, updated by evidence via Bayes' theorem. On this view, the statement 'the probability of heads is 0.5' is a report about an agent's state of knowledge, not a fact about the world. The Bayesian can assign probabilities to unique events, scientific hypotheses, and parameters — but at the cost of requiring a prior probability distribution whose specification is subjective and whose choice determines the conclusions. The Bayesian machine is coherent in a formal sense: if you start with a prior and update rationally on evidence, your beliefs will be internally consistent. Whether they will be correct depends entirely on whether your prior was well-calibrated — a question that Bayesian theory cannot answer from within.

The Structure of Statistical Inference

Beneath the frequentist-Bayesian dispute is a shared structure that both schools use, which reveals what statistics is essentially doing: it is solving the inverse problem of inference.

Data is generated by some process. The process has parameters — unknowns that determine which data are likely. Statistical inference runs backward from observed data to inferences about parameters. This is an underdetermined problem: many parameter values could have generated the same data, and the question is which parameter values the data provide evidence for. Both frequentism and Bayesianism are proposed solutions to this underdetermination, and both make choices that are not forced by logic.

The frequentist solution is to ask: over all possible datasets this experiment could have produced, how often would this estimator give the right answer? This is the frequentist criterion of consistency, efficiency, and unbiasedness. It evaluates estimators by their long-run performance, not their performance on the particular dataset at hand.

The Bayesian solution is to ask: given the data I actually observed and my prior beliefs, what should my posterior beliefs be? This is a coherence criterion: it ensures that an agent's beliefs do not violate the axioms of probability. It says nothing about whether those beliefs are accurate.

Neither solution answers the question a scientist actually wants answered: given this specific dataset, what should I conclude? The frequentist answer — how your procedure performs on average — is an answer to a different question. The Bayesian answer — what a hypothetical prior implies about posteriors — is also an answer to a different question. The question the scientist wants answered is not addressed by either framework as standardly formulated.

The Replication Crisis as Foundational Failure

The replication crisis — the discovery, beginning in the 2010s, that a substantial fraction of published findings in psychology, medicine, and social science do not replicate — is not primarily a statistical crisis. It is a foundational crisis about what statistics was supposed to do.

The p-value threshold of 0.05 was not a discovery. It was a convention — Fisher's rule of thumb — that was institutionalized as a criterion of publishability and treated as a criterion of truth. The distinction between these two uses collapsed in practice: a result with p < 0.05 came to mean this finding is real, not this finding would be surprising if the null hypothesis were true. This conflation is a conceptual error, not a mathematical one. It is the result of using a frequentist tool — which answers questions about long-run procedures — to answer a question about individual experiments.

The Bayesian remedy — replace p-values with Bayes factors, estimate posterior probabilities, report credible intervals — addresses some of the conceptual confusion but introduces new ones. Bayes factors depend on priors. Credible intervals are only meaningful relative to a prior. In applied settings, the choice of prior is frequently arbitrary, and the appearance of rigor conceals the same subjectivity that the p-value was supposed to remove.

The deeper lesson the replication crisis reveals: statistics cannot substitute for scientific realism about effect sizes, mechanisms, and theoretical plausibility. A statistical method that would count as evidence for anything, given appropriate data torture, is not providing evidence. It is providing the appearance of evidence. The foundational problem is not which statistical framework to use, but whether statistical frameworks — in the absence of strong theory — can do the epistemic work that empirical science demands of them.

The persistent failure to resolve the frequentist-Bayesian dispute, combined with the replication crisis's demonstration that standard practice has produced systematic error, suggests that statistics as currently constituted is a discipline that has not yet earned the epistemic authority it routinely claims. The field requires not better methods but a clearer account of what it is doing and what it can honestly promise.