How bagging lowers model variance

We saw that decision trees are likely to make poor predictions due to high variance, which implies that the tree structure is quite sensitive to the composition of the training sample. We have also seen that a model with low variance, such as linear regression, produces similar estimates despite different training samples as long as there are sufficient samples given the number of features.

For a given a set of independent observations, each with a variance of σ2, the standard error of the sample mean is given by σ/n. In other words, averaging over a larger set of observations reduces the variance. A natural way to reduce the variance of a model and its generalization error would thus be to collect many training sets from the population, train a different model on each dataset, and average the resulting predictions.

In practice, we do not typically have the luxury of many different training sets. This is where bagging, short for bootstrap aggregation, comes in. Bagging is a general-purpose method to reduce the variance of a machine learning model, which is particularly useful and popular when applied to decision trees.

Bagging refers to the aggregation of bootstrap samples, which are random samples with replacement. Such a random sample has the same number of observations as the original dataset but may contain duplicates due to replacement. 

Bagging increases predictive accuracy but decreases model interpretability because it's no longer possible to visualize the tree to understand the importance of each feature. As an ensemble algorithm, bagging methods train a given number of base estimators on these bootstrapped samples and then aggregate their predictions into a final ensemble prediction. 

Bagging reduces the variance of the base estimators by randomizing how, for example, each tree is grown and then averages the predictions to reduce their generalization error. It is often a straightforward approach to improve on a given model without the need to change the underlying algorithm. It works best with complex models that have low bias and high variance, such as deep decision trees, because its goal is to limit overfitting. Boosting methods, in contrast, work best with weak models, such as shallow decision trees.

There are several bagging methods that differ by the random sampling process they apply to the training set:

  • Pasting draws random samples from the training data without replacement, whereas bagging samples with replacement
  • Random subspaces randomly sample from the features (that is, the columns) without replacement
  • Random patches train base estimators by randomly sampling both observations and features
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.168.28