Statistical Power

Statistical power is the probability that a statistical test will correctly reject a false null hypothesis — the probability of detecting an effect that is actually present. Formally, power = 1 − β, where β is the Type II error rate (false negative probability). In the Neyman-Pearson framework, power is the complement of the error rate that the framework was designed to control.

Power depends on four factors: effect size, sample size, significance threshold (α), and the test's sensitivity to the specific alternative hypothesis. Small effects require large samples to detect; large effects may be detectable with modest samples. The relationship is nonlinear: doubling the sample size does not double the power, and there are diminishing returns beyond a certain point.

The concept of power is central to research design but routinely ignored in practice. Studies in psychology and the social sciences often operate with power below 50%, meaning they are more likely to miss a true effect than to detect it. This is not merely wasteful; it is structurally damaging. Low-powered studies produce noisy results that are biased toward large effect sizes (only large effects reach significance), creating a literature dominated by exaggerated findings that systematically shrink upon replication.

From a systems-theoretic perspective, power analysis is a form of sensor calibration: it asks whether the measurement apparatus is sensitive enough to detect the signals the theory predicts. A study without power analysis is an experiment whose detector has not been checked against the expected signal strength. The institutional failure to require power analysis — in grant review, in journal submission, in regulatory approval — is a systems-level design flaw that predetermines the unreliability of entire literatures.