Ensemble models

Ensemble learning involves combining several machine learning models into a single new model that aims to make better predictions than any individual model. More specifically, an ensemble integrates the predictions of several base estimators trained using one or more given learning algorithms to reduce the generalization error that these models may produce on their own.

For ensemble learning to achieve this goal, the individual models must be:

  • Accurate: They outperform a naive baseline (such as the sample mean or class proportions)
  • Independent: Their predictions are generated differently to produce different errors

Ensemble methods are among the most successful machine learning algorithms, in particular for standard numerical data. Large ensembles are very successful in machine learning competitions and may consist of many distinct individual models that have been combined by hand or using another machine learning algorithm.

There are several disadvantages to combining predictions made by different models. These include reduced interpretability, and higher complexity and cost of training, prediction, and model maintenance. As a result, in practice (outside of competitions), the small gains in accuracy from large-scale ensembling may not be worth the added costs.

There are two groups of ensemble methods that are typically distinguished depending on how they optimize the constituent models and then integrate the results for a single ensemble prediction:

  • Averaging methods train several base estimators independently and then average their predictions. If the base models are not biased and make different prediction errors that are not highly correlated, then the combined prediction may have lower variance and can be more reliable. This resembles the construction of a portfolio from assets with uncorrelated returns to reduce the volatility without sacrificing the return.

  • Boosting methods, in contrast, train base estimators sequentially with the specific goal to reduce the bias of the combined estimator. The motivation is to combine several weak models into a powerful ensemble.

We will focus on automatic averaging methods in the remainder of this chapter, and boosting methods in Chapter 11, Gradient Boosting Machines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.38.125