Understanding averaging ensembles

Averaging methods have a long history in machine learning and are commonly applied to fields such as molecular dynamics and audio signal processing. Such ensembles are typically seen as exact replicas of a given system.

An averaging ensemble is essentially a collection of models that train on the same dataset. Their results are then aggregated in a number of ways.

One common method involves creating multiple model configurations that take different parameter subsets as input. Techniques that take this approach are referred to collectively as bagging methods.

Bagging methods come in many different flavors. However, they typically only differ in the way they draw random subsets of the training set:

Pasting methods draw random subsets of the samples without replacement of data samples.
Bagging methods draw random subsets of the samples with replacement of data samples.
Random subspace methods draw random subsets of the features but train on all data samples.
Random patch methods draw random subsets of both samples and features.

Averaging ensembles can be used to reduce the variability of a model's performance.

In scikit-learn, bagging methods can be realized using the BaggingClassifier and BaggingRegressor meta-estimators. These are meta-estimators because they allow us to build an ensemble from any other base estimator.

Table of Contents for Understanding averaging ensembles

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding averaging ensembles