Bagging

Bagging, short for Bootstrap Aggregating, is an ensemble learning method introduced by Leo Breiman in 1996. It reduces the variance of a predictive model by training multiple instances of the same base learner on different random subsets of the training data and aggregating their predictions. The subsets are generated by bootstrap sampling — drawing observations with replacement from the original dataset — which means each training set contains approximately 63.2% of the unique observations, with the remainder appearing as duplicates.

The core insight of bagging is that variance is local to the training set. A decision tree trained on one bootstrap sample may overfit to the specific noise in that sample, but another tree trained on a different sample will overfit to different noise. When the predictions are averaged (for regression) or voted on (for classification), the idiosyncratic errors cancel, while the true signal — being present in all samples — survives. The result is a model with lower variance and comparable or slightly higher bias, producing more stable and generalizable predictions.

The Bootstrap Mechanism

The bootstrap, invented by Bradley Efron in 1979, is a resampling technique originally designed for statistical inference. In bagging, it is repurposed as a diversity engine. Each bootstrap sample is a different lens on the same data-generating process, and each base learner sees a different world. The diversity is not arbitrary; it is systematic, because the sampling distribution is known. This distinguishes bagging from methods that inject noise into the model architecture itself.

The probability that any given observation is included in a bootstrap sample of size N is 1 − (1 − 1/N)^N, which converges to approximately 0.632 as N grows. The remaining 37% of observations — the out-of-bag (OOB) samples — can be used as an internal validation set, providing an unbiased estimate of generalization error without requiring a separate cross-validation procedure. This is not merely a convenience. It is a structural property of the bootstrap that turns the sampling process into a self-contained experimental design.

When Bagging Works

Bagging is most effective for high-variance, low-bias models — models that are unstable, meaning small changes in the training data produce large changes in the predictions. Decision trees are the canonical example: a single deep tree can fit almost any training set perfectly but generalizes poorly. Averaging many deep trees reduces the variance without increasing the bias significantly, because the bias of the average is the average of the biases, and each tree is already low-bias.

Bagging is less effective for low-variance models such as linear regression or nearest-neighbor classifiers. If the base learner is already stable, the bootstrap samples produce nearly identical models, and the ensemble gains little. The condition for bagging to succeed is therefore not merely that the model is powerful, but that it is sensitively dependent on the data — that it is, in the language of dynamical systems, chaotic with respect to its inputs.

From Bagging to Random Forests

Bagging was a stepping stone, not a destination. Breiman himself extended it into Random Forest in 2001 by adding a second source of randomness: at each split in each tree, only a random subset of features is considered. This further decorrelates the trees, making the ensemble more effective than bagging alone. A random forest is, in essence, a bagged ensemble of feature-subspaced trees. The two mechanisms — data randomness and feature randomness — operate orthogonally, creating a combinatorial explosion of perspectives.

The relationship between bagging and random forests illustrates a general principle in systems design: diversity at one level is necessary but not sufficient. Bagging creates diversity in the training data. Random forests create diversity in both the data and the feature space. The systems theorist who stops at bagging has solved half the problem and mistaken it for the whole.

The Systems Interpretation

Bagging is not merely a statistical trick. It is a model of distributed cognition under uncertainty. Each bootstrap sample is an agent with partial information. The aggregation is the consensus mechanism. The OOB error is the system's internal quality control. The entire architecture mirrors the structure of epistemic communities: multiple observers, each with a noisy but partially independent view, whose collective judgment outperforms any individual.

This is why bagging works in domains where the data-generating process is complex and the model class is misspecified. The ensemble does not need to capture the true process; it needs to capture the ensemble of approximations that are each wrong in different ways. The wisdom is in the disagreement, not in the agreement. Any theory of machine learning that treats disagreement as a failure to be eliminated — rather than a resource to be exploited — has misunderstood the nature of learning in complex systems.

The bootstrap is not a statistical approximation. It is a philosophical device: it asks what the world looks like from a slightly different angle, and it answers by building a parliament of slightly different worlds. The parliament is smarter than any of its members.