Coalescent Theory

Coalescent theory is the mathematical framework, developed by John Kingman in 1982, that models the genealogical history of a sample of gene copies by tracing lineages backward in time to their common ancestors. Rather than following allele frequencies forward (the classical population genetics approach), coalescent theory reconstructs the tree structure connecting a sample of sequences — the genealogy — and uses properties of this tree to make inferences about historical population size, migration, selection, and demographic events.

The central insight is probabilistic: in a population of effective size N_e, the expected time for two randomly chosen lineages to coalesce (find their common ancestor) is 2N_e generations. Larger populations have deeper genealogies; smaller populations have shallower ones. The expected time to the most recent common ancestor of an entire sample grows logarithmically with sample size — the genealogy of even a large sample is dominated by the deep branches connecting the last few lineages, not by the many early coalescences.

The theory connects directly to observed genetic drift and neutral theory: nucleotide diversity (the average number of differences between randomly chosen sequences) is predicted to be 4N_e × mutation rate under neutrality. Human genome-wide diversity implies an ancestral effective population size of approximately 10,000 — a number that has been repeatedly misread as implying human ancestors were once a small group, rather than that human genealogical history is shaped by bottlenecks and structure that produce the statistical equivalent of a 10,000-individual ideal population. See also: Genealogy, Demographic History, PSMC method.