Shrinkage and learning rate

Shrinkage techniques apply a penalty for increased model complexity to the model's loss function. For boosting ensembles, shrinkage can be applied by scaling the contribution of each new ensemble member down by a factor between 0 and 1. This factor is called the learning rate of the boosting ensemble. Reducing the learning rate increases shrinkage because it lowers the contribution of each new decision tree to the ensemble.

The learning rate has the opposite effect of the ensemble size, which tends to increase for lower learning rates. Lower learning rates coupled with larger ensembles have been found to reduce the test error, in particular for regression and probability estimation. Large numbers of iterations are computationally more expensive but often feasible with fast state-of-the-art implementations as long as the individual trees remain shallow. Depending on the implementation, you can also use adaptive learning rates that adjust to the number of iterations, typically lowering the impact of trees added later in the process. We will see some examples later in this chapter.

Table of Contents for Shrinkage and learning rate

Create new playlist

Sign In

Sign Up

Table of Contents for
Shrinkage and learning rate