Overfitting and regularization

Decision trees have a strong tendency to overfit, especially when a dataset has a large number of features relative to the number of samples. As discussed in previous chapters, overfitting increases the prediction error because the model does not only learn the signal contained in the training data, but also the noise.

There are several ways to address the risk of overfitting:

Dimensionality reduction (Chapter 12, Unsupervised Learning) improves the feature-to-sample ratio by representing the existing features with fewer, more informative, and less noisy features.
Ensemble models, such as random forests, combine multiple trees while randomizing the tree construction, as we will see in the second part of this chapter.
Decision trees provide several regularization hyperparameters to limit the growth of a tree and the associated complexity. While every split increases the number of nodes, it also reduces the number of samples available per node to support a prediction. For each additional level, twice the number of samples is needed to populate the new nodes with the same sample density.
Tree-pruning is an additional tool to reduce the complexity of a tree by eliminating nodes or entire parts of a tree that add little value but increase the model's variance. Cost-complexity-pruning, for instance, starts with a large tree and recursively reduces its size by replacing nodes with leaves, essentially running the tree construction in reverse. The various steps produce a sequence of trees that can then be compared using cross-validation to select the ideal size.

Table of Contents for Overfitting and regularization

Create new playlist

Sign In

Sign Up

Table of Contents for
Overfitting and regularization