Adaptive boosting

Like bagging, boosting is an ensemble learning algorithm that combines base learners (typically decision trees) into an ensemble. Boosting was initially developed for classification problems, but can also be used for regression, and has been called one of the most potent learning ideas introduced in the last 20 years (as described in Elements of Statistical Learning by Trevor Hastie, et al.; see GitHub for links to references). Like bagging, it is a general method or metamethod that can be applied to many statistical learning models.

The motivation for the development of boosting was to find a method to combine the outputs of many weak models (a predictor is called weak when it performs just slightly better than random guessing) into a more powerful, that is, boosted joint prediction. In general, boosting learns an additive hypothesis, H_M, of a form similar to linear regression. However, now each of the m= 1,..., M elements of the summation is a weak base learner, called h_t that itself requires training. The following formula summarizes the approach:

As discussed in the last chapter, bagging trains base learners on different random samples of the training data. Boosting, in contrast, proceeds sequentially by training the base learners on data that is repeatedly modified to reflect the cumulative learning results. The goal is to ensure that the next base learner compensates for the shortcomings of the current ensemble. We will see in this chapter that boosting algorithms differ in how they define shortcomings. The ensemble makes predictions using a weighted average of the predictions of the weak models.

The first boosting algorithm that came with a mathematical proof that it enhances the performance of weak learners was developed by Robert Schapire and Yoav Freund around 1990. In 1997, a practical solution for classification problems emerged in the form of the adaptive boosting (AdaBoost) algorithm, which won the Göedel Prize in 2003. About another five years later, this algorithm was extended to arbitrary objective functions when Leo Breiman (who invented random forests) connected the approach to gradient descent, and Jerome Friedman came up with gradient boosting in 1999. Numerous optimized implementations, such as XGBoost, LightGBM, and CatBoost, have emerged in recent years and firmly established gradient boosting as the go-to solution for structured data.

In the following sections, we will briefly introduce AdaBoost and then focus on the gradient boosting model, as well as several state-of-the-art implementations of this very powerful and flexible algorithm.

Table of Contents for Adaptive boosting

Create new playlist

Sign In

Sign Up

Table of Contents for
Adaptive boosting