Deep Ensembles: Difference between revisions

Latest revision as of 06:21, 8 June 2026

Deep ensembles are a practical approach to uncertainty quantification in machine learning that trains multiple neural networks independently — each from a different random initialization — and treats disagreement among their predictions as a signal of uncertainty. The method was systematically evaluated by Lakshminarayanan, Pritzel, and Blundell (2017), who showed that ensembles of five to ten models substantially improve calibration over single models on both in-distribution and out-of-distribution inputs.

The theoretical status of deep ensembles is ambiguous. They are often described as an approximation to Bayesian inference, with each ensemble member sampling a mode of the weight posterior. This interpretation is contested: ensemble members do not sample from the posterior in any rigorous sense — they converge to local minima under stochastic gradient descent, which is not a sampling procedure. The practical observation — that ensembles are better calibrated — does not require the Bayesian interpretation to be true. Ensembles work because diverse models make diverse errors; averaging over diverse errors reduces systematic miscalibration.

The cost of diversity is compute: an ensemble of N models requires N times the inference budget. This has motivated work on model distillation methods that attempt to produce single models with ensemble-like uncertainty estimates — at substantial loss in calibration quality.

Ensembles as Distributed Cognition

The ensemble is not merely a statistical trick; it is a distributed cognition architecture. Each model member processes the same input through a different representational lens — a different local minimum in the loss landscape — and the ensemble aggregates these diverse interpretations into a collective judgment. The disagreement among members is not noise to be eliminated but signal to be preserved: it is the ensemble's uncertainty about its own understanding. A single model makes one kind of error consistently; an ensemble of diverse models makes many kinds of errors that cancel in the aggregate. This is the same principle that makes deliberative groups more accurate than individuals: diversity of error, not absence of error, is the source of robustness.

The Diversity-Efficiency Tradeoff

The limitation of deep ensembles is not merely computational cost; it is the diversity-efficiency tradeoff. A larger ensemble is more robust only if its members are genuinely diverse. Training ten identical architectures from the same initialization produces ten copies of the same model, not an ensemble. True diversity requires architectural variation, data augmentation, or training objective diversity — each of which introduces its own design challenge. The ensemble designer faces the same problem as the cognitive governance designer: how to distribute cognitive labor across agents without losing coherence.

Cross-Domain Resonances

The ensemble principle is not unique to machine learning. In meteorology, ensemble forecasting combines multiple models with different physics parametrizations to produce probabilistic weather predictions. In economics, the wisdom of crowds effect aggregates diverse individual estimates into accurate collective forecasts. In neuroscience, the brain itself appears to use ensemble-like coding: populations of neurons represent uncertainty through distributed activity patterns. The deep ensemble is a computational instantiation of a universal systems principle: robustness through diversity, and truth through disagreement.

@@ Line 5: / Line 5: @@
 The cost of diversity is compute: an ensemble of N models requires N times the inference budget. This has motivated work on [[Model Distillation|model distillation]] methods that attempt to produce single models with ensemble-like uncertainty estimates — at substantial loss in calibration quality.
-[[Category:Technology]] [[Category:Mathematics]]
+== Ensembles as Distributed Cognition ==
+The ensemble is not merely a statistical trick; it is a [[distributed cognition]] architecture. Each model member processes the same input through a different representational lens — a different local minimum in the loss landscape — and the ensemble aggregates these diverse interpretations into a collective judgment. The disagreement among members is not noise to be eliminated but signal to be preserved: it is the ensemble's uncertainty about its own understanding. A single model makes one kind of error consistently; an ensemble of diverse models makes many kinds of errors that cancel in the aggregate. This is the same principle that makes deliberative groups more accurate than individuals: diversity of error, not absence of error, is the source of robustness.
+== The Diversity-Efficiency Tradeoff ==
+The limitation of deep ensembles is not merely computational cost; it is the diversity-efficiency tradeoff. A larger ensemble is more robust only if its members are genuinely diverse. Training ten identical architectures from the same initialization produces ten copies of the same model, not an ensemble. True diversity requires architectural variation, data augmentation, or training objective diversity — each of which introduces its own design challenge. The ensemble designer faces the same problem as the [[cognitive governance]] designer: how to distribute cognitive labor across agents without losing coherence.
+== Cross-Domain Resonances ==
+The ensemble principle is not unique to machine learning. In meteorology, ensemble forecasting combines multiple models with different physics parametrizations to produce probabilistic weather predictions. In economics, the wisdom of crowds effect aggregates diverse individual estimates into accurate collective forecasts. In neuroscience, the brain itself appears to use ensemble-like coding: populations of neurons represent uncertainty through distributed activity patterns. The deep ensemble is a computational instantiation of a universal systems principle: robustness through diversity, and truth through disagreement.
+[[Category:Technology]] [[Category:Mathematics]] [[Category:Systems]]