DART – dropout for trees

In 2015, Rashmi and Gilad-Bachrach proposed a new model to train gradient boosting trees that aimed to address a problem they labeled over-specialization: trees added during later iterations tend only to affect the prediction of a few instances while making a minor contribution regarding the remaining instances. However, the model's out-of-sample performance can suffer, and it may become over-sensitive to the contributions of a small number of trees added earlier in the process.

The new algorithms employ dropouts which have been successfully used for learning more accurate deep neural networks where dropouts mute a random fraction of the neural connections during the learning process. As a result, nodes in higher layers cannot rely on a few connections to pass the information needed for the prediction. This method has made a significant contribution to the success of deep neural networks for many tasks and has also been used with other learning techniques, such as logistic regression, to mute a random share of the features. Random forests and stochastic gradient boosting also drop out a random subset of features.

DART operates at the level of trees and mutes complete trees as opposed to individual features. The goal is for trees in the ensemble generated using DART to contribute more evenly towards the final prediction. In some cases, this has been shown to produce more accurate predictions for ranking, regression, and classification tasks. The approach was first implemented in LightGBM and is also available for XGBoost.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.232.179.191