Tajima's D
Tajima's D is a population genetics statistic developed by Fumio Tajima in 1989 to test the neutral theory of molecular evolution. It compares two measures of genetic diversity: one based on the number of segregating sites (polymorphic loci) in a sample, and one based on the average number of pairwise differences between sequences.
Under neutral evolution, these two measures are expected to be equal, and Tajima's D is approximately zero. Positive values indicate an excess of polymorphism relative to the number of segregating sites, which can signal balancing selection or a population bottleneck. Negative values indicate an excess of rare variants, which can signal purifying selection, positive directional selection, or population expansion.
The statistic is widely used in genome-wide scans to identify loci under non-neutral selection. The major histocompatibility complex (MHC) in humans consistently shows positive Tajima's D values, reflecting the balancing selection that maintains its extraordinary polymorphism. The statistic has limitations — it is sensitive to demographic history and can produce false signals if population structure is not accounted for — but it remains one of the most robust tools for detecting selection from sequence data alone.