Hierarchical Models

Hierarchical models (also called multilevel models or mixed-effects models) are statistical frameworks in which parameters are themselves treated as random variables drawn from a higher-level distribution, rather than as fixed unknown quantities to be estimated in isolation. The central insight is that observations within a group share information about the group-level distribution, and that this information can be pooled across groups to improve estimates — a process called partial pooling or shrinkage.

A classic example: estimating the effectiveness of a medical treatment across many hospitals. A non-hierarchical approach either treats each hospital separately (no pooling — ignores shared information) or combines all hospitals into one estimate (complete pooling — ignores hospital-level variation). Hierarchical models do neither: they let hospitals share information via a common prior on hospital-level parameters, estimated from the data itself.

This makes hierarchical models a natural implementation of empirical Bayesian inference: the higher-level distribution acts as a data-derived prior on lower-level parameters. The prior is not assumed from first principles but estimated from the observed variation across groups, then used to regularize individual estimates. Hospitals with limited data are pulled toward the grand mean; hospitals with extensive data are allowed to differ.

Hierarchical models are now standard in cognitive science, educational research, ecology, and clinical trial design. Their spread has been limited primarily by computational complexity and the misinterpretation of random effects as nuisance terms to be controlled for rather than as informative structure about variation in the population.