Boosting

Boosting offers an alternative take on the problem of how to combine models together to achieve greater performance. In particular, it is especially suited to weak learners. Weak learners are models that produce an accuracy that is better than a model that randomly guesses, but not by much. One way to create a weak learner is to use a model whose complexity is configurable.

For example, we can train a multilayer perceptron network with a very small number of hidden layer neurons. Similarly, we can train a decision tree but only allow the tree to comprise a single node, resulting in a single split in the input data. This special type of decision tree is known as a stump.

When we looked at bagging, the key idea was to take a set of random bootstrapped samples of the training data and then train multiple versions of the same model using these different samples. In the classical boosting scenario, there is no random component, as all the models use all of the training data.

For classification, boosting works by building a model on the training data and then measuring the classification accuracy on that training data. The individual observations that were misclassified by the model are given a larger weight than those that were correctly classified, and then the model is retrained again using these new weights. This is then repeated multiple times, each time adjusting the weights of individual observations based on whether they were correctly classified or not in the last iteration.

To combat overfitting, the ensemble classifier is built as a weighted average of all the models trained in this sequence, with the weights usually being proportional to the classification accuracy of each individual model. As we are using the entire training data, there are no out-of-bag observations and so the accuracy in each case is measured using the training data itself. Regression with boosting is usually done by adjusting the weights of observation based on some measure of the distance between the predicted value and the labeled value.

AdaBoost

Continuing our focus on classification problems, we now introduce AdaBoost, which is short for adaptive boosting. In particular, we will focus on Discrete AdaBoost, as it makes predictions on binary classes. We will use -1 and 1 as the class labels. Real AdaBoost is an extension of AdaBoost, in which the outputs are the class probabilities. In our version of AdaBoost, all of the training data is used; however, there are other versions of AdaBoost in which the training data is also sampled. There are also multiclass extensions of AdaBoost as well as extensions that are suited to regression-type problems.

AdaBoost for binary classification

Inputs:

  • data: The input data frame containing the input features and a column with the binary output label
  • M: An integer, representing the number of models that we want to train

Output:

  • models: A series of Μ trained models
  • alphas: A vector of M model weights

Method:

1. Initialize a vector of observation weights, w, of length n with entries wi = 1/n. This vector will be updated in every iteration.

2. Using the current value of the observation weights and all the data in the training set, train a classifier model Gm.

3. Compute the weighted error rate as the sum of all misclassified observations multiplied by their observation weights, divided by the sum of the weight vector. Following our usual convention of using xi as an observation and yi as its label, we can express this using the following equation:

AdaBoost

4. We then set the model weight for this model, αm, as the logarithm of the ratio between the accuracy and error rates. In a formula, this is:

AdaBoost

5. We then update the observation weights vector, w, for the next iteration. Incorrectly classified observations have their weight multiplied by AdaBoost, thereby increasing their weight for the next iteration. Correctly classified observations have their weight multiplied by AdaBoost, thereby reducing their weight for the next iteration.

6. Renormalize the weights vector so that the sum of the weights is 1.

7. Repeat steps two through six M times in order to produce M models.

8. Define our ensemble classifier as the sign of the weighted sum of the outputs of all the boosted models:

AdaBoost
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.0.248