Gradient Boosting Machines

In the previous chapter, we learned about how random forests improve the predictions made by individual decision trees by combining them into an ensemble that reduces the high variance of individual trees. Random forests use bagging, which is short for bootstrap aggregation, to introduce random elements into the process of growing individual trees.

More specifically, bagging draws samples from the data with replacement so that each tree is trained on a different but equal-sized random subset of the data (with some observations repeating). Random forests also randomly select a subset of the features so that both the rows and the columns of the data that are used to train each tree are random versions of the original data. The ensemble then generates predictions by averaging over the outputs of the individual trees.

Individual trees are usually grown deep to ensure low bias while relying on the randomized training process to produce different, uncorrelated prediction errors that have a lower variance when aggregated than individual tree predictions. In other words, the randomized training aims to decorrelate or diversify the errors made by the individual trees so that the ensemble is much less susceptible to overfitting, has lower variance, and generalizes better to new data.

In this chapter, we will explore boosting, an alternative machine learning (ML) algorithm for ensembles of decision trees that often produces even better results. The key difference is that boosting modifies the data that is used to train each tree based on the cumulative errors made by the model before adding the new tree. In contrast to random forests which train many trees independently from each other using different versions of the training set, boosting proceeds sequentially using reweighted versions of the data. State-of-the-art boosting implementations also adopt the randomization strategies of random forests.

In this chapter, we will see how boosting has evolved into one of the most successful ML algorithms over the last three decades. At the time of writing, it has come to dominate machine learning competitions for structured data (as opposed to high-dimensional images or speech, for example, where the relationship between the input and output is more complex, and deep learning excels at). More specifically, in this chapter we will cover the following topics:

How boosting works, and how it compares to bagging
How boosting has evolved from adaptive to gradient boosting
How to use and tune AdaBoost and gradient boosting models with sklearn
How state-of-the-art GBM implementations dramatically speed up computation
How to prevent overfitting of gradient boosting models
How to build, tune, and evaluate gradient boosting models on large datasets using xgboost, lightgbm, and catboost
How to interpret and gain insights from gradient boosting models

Table of Contents for Gradient Boosting Machines

Create new playlist

Sign In

Sign Up

Table of Contents for
Gradient Boosting Machines