Bagged decision trees

To apply bagging to decision trees, we create bootstrap samples from our training data by repeatedly sampling with replacement, then train one decision tree on each of these samples, and create an ensemble prediction by averaging over the predictions of the different trees.

Bagged decision trees are usually grown large, that is, have many levels and leaf nodes and are not pruned so that each tree has low bias but high variance. The effect of averaging their predictions then aims to reduce their variance. Bagging has been shown to substantially improve predictive performance by constructing ensembles that combine hundreds or even thousands of trees trained on bootstrap samples.

To illustrate the effect of bagging on the variance of a regression tree, we can use the BaggingRegressor meta-estimator provided by sklearn. It trains a user-defined base estimator based on parameters that specify the sampling strategy:

  • max_samples and max_features control the size of the subsets drawn from the rows and the columns, respectively
  • bootstrap and bootstrap_features determine whether each of these samples is drawn with or without replacement

The following example uses an exponential function to generate training samples for a single DecisionTreeRegressor and a BaggingRegressor ensemble that consists of ten trees, each grown ten levels deep. Both models are trained on the random samples and predict outcomes for the actual function with added noise.

Since we know the true function, we can decompose the mean-squared error into bias, variance, and noise, and compare the relative size of these components for both models according to the following breakdown:

For 100 repeated random training and test samples of 250 and 500 observations each, we find that the variance of the predictions of the individual decision tree is almost twice as high as that for the small ensemble of 10 bagged trees based on bootstrapped samples:

noise = .5  # noise relative to std(y)
noise = y.std() * noise_to_signal

X_test = choice(x, size=test_size, replace=False)

max_depth = 10
n_estimators=10

tree = DecisionTreeRegressor(max_depth=max_depth)
bagged_tree = BaggingRegressor(base_estimator=tree, n_estimators=n_estimators)
learners = {'Decision Tree': tree, 'Bagging Regressor': bagged_tree}

predictions = {k: pd.DataFrame() for k, v in learners.items()}
for i in range(reps):
X_train = choice(x, train_size)
y_train = f(X_train) + normal(scale=noise, size=train_size)
for label, learner in learners.items():
learner.fit(X=X_train.reshape(-1, 1), y=y_train)
preds = pd.DataFrame({i: learner.predict(X_test.reshape(-1, 1))}, index=X_test)
predictions[label] = pd.concat([predictions[label], preds], axis=1)

For each model, the following plot shows the mean prediction and a band of two standard deviations around the mean for both models in the upper panel, and the bias-variance-noise breakdown based on the values for the true function in the bottom panel:

See the notebook random_forest  for implementation details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.211.91.23