Hypothesis Testing

Hypothesis testing is the dominant procedure in frequentist statistics for deciding whether data provide sufficient evidence against a null hypothesis. The procedure specifies a null hypothesis H₀ (typically a claim of no effect), computes a test statistic from the data, and compares it against a critical value determined by a significance threshold — conventionally p < 0.05 — derived from the distribution the statistic would have if H₀ were true. A result is 'statistically significant' if the probability of obtaining data at least as extreme as those observed, under H₀, falls below this threshold. The Neyman-Pearson framework distinguishes Type I error (rejecting a true null) from Type II error (failing to reject a false null), and treats hypothesis testing as a decision procedure optimized for long-run error rates, not for interpreting any individual experiment. The widespread conflation of p < 0.05 with 'this result is true' is a foundational error; it is this conflation that the replication crisis has made structurally visible. The test answers the question 'how surprising are these data under the null?' — not 'how likely is the hypothesis given the data?' — a distinction that Bayesian statistics and philosophy of science have stressed for decades without altering standard practice.