Example of gradient tree boosting with Scikit-Learn

In this example, we want to employ a gradient tree boosting classifier (class GradientBoostingClassifier) and check the impact of the maximum tree depth (parameter max_depth) on the performance. Considering the previous example, we start by setting n_estimators=50 and learning_rate=0.8:

import numpy as np

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

scores_md = []
eta = 0.8

for md in range(2, 13):
    gbc = GradientBoostingClassifier(n_estimators=50, learning_rate=eta, max_depth=md, random_state=1000)
    scores_md.append(np.mean(cross_val_score(gbc, X, Y, cv=10)))

The result is shown in the following diagram:

10-fold Cross-validation accuracy as a function of the maximum tree depth

As explained in the first section, the maximum depth of a decision tree is strictly related to the possibility of interaction among features. This can be a positive or negative aspect when the trees are employed in an ensemble. A very high interaction level can create over-complex separation hyperplanes and reduce the overall variance. In other cases, a limited interaction results in a higher bias. With this particular (and simple) dataset, the gradient boosting algorithm can achieve better performances when the max depth is 2 (consider that the root has a depth equal to zero) and this is partially confirmed by both the feature importance analysis and dimensionality reductions. In many real-world situations, the result of such a research could be completely different, with increased performance, therefore I suggest you cross-validate the results (it's better to employ a grid search) starting from a minimum depth and increasing the value until the maximum accuracy has been achieved. With max_depth=2, we want now to tune up the learning rate, which is a fundamental parameter in this algorithm:

import numpy as np

scores_eta = []

for eta in np.linspace(0.01, 1.0, 100):
    gbr = GradientBoostingClassifier(n_estimators=50, learning_rate=eta, max_depth=2, random_state=1000)
    scores_eta.append(np.mean(cross_val_score(gbr, X, Y, cv=10)))

The corresponding plot is shown in the following diagram:

10-fold Cross-validation accuracy as a function of the learning rate (max depth equal to 2)

Unsurprisingly, gradient tree boosting outperforms AdaBoost with η ≈ 0.9, achieving a cross-validation accuracy slightly lower than 0.99. The example is very simple, but it clearly shows the power of this kind of techniques. The main drawback is the complexity. Contrary to single models, ensembles are more sensitive to changes to the hyperparameters and more detailed research must be conducted in order to optimize the models. When the datasets are not excessively large, cross-validation remains the best choice. If, instead, we are pretty sure that the dataset represents almost perfectly the underlying data generating process, it's possible to shuffle it and split it into two (training/test) or three blocks (training/test/validation) and proceed by optimizing the hyperparameters and trying to overfit the test set (this expression can seem strange, but overfitting the test set means maximizing the generalization ability while learning perfectly the training set structure).

Table of Contents for Example of gradient tree boosting with Scikit-Learn

Create new playlist

Sign In

Sign Up

Table of Contents for
Example of gradient tree boosting with Scikit-Learn