Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Predicting complex skill learning with boosting

We will revisit our Skillcraft data set in this section—this time in the context of another boosting technique known as stochastic gradient boosting. The main characteristic of this method is that in every iteration of boosting, we compute a gradient in the direction of the errors that are made by the model trained in the current iteration.

This gradient is then used in order to guide the construction of the model that will be added in the next iteration. Stochastic gradient boosting is commonly used with decision trees, and a good implementation in R can be found in the gbm package, which provides us with the gbm() function. For regression problems, we need to specify the distribution parameter to be gaussian. In addition, we can specify the number of trees we want to build (which is equivalent to the number of iterations of boosting) via the n.trees parameter, as well as a shrinkage parameter that is used to control the algorithm's learning rate.

> boostedtree <- gbm(LeagueIndex ~ ., data = skillcraft_train, 
  distribution = "gaussian", n.trees = 10000, shrinkage = 0.1)

Note

To learn more about how stochastic gradient boosting works, a good source to consult is the paper titled Stochastic Gradient Boosting. This was written by Jerome H. Friedman and appears in the February 2002 issue of the journal Computational Statistics & Data Analysis.

In order to make predictions with this setup, we need to use the gbm.perf() function, whose job it is to take the boosted model we built and pick out the optimal number of boosting iterations. We can then provide this to our predict() function in order to make predictions on our test data. To measure the SSE on our test set, we will use the compute_SSE() function that we wrote in Chapter 6, Tree-based Methods:

> best.iter <- gbm.perf(boostedtree, method = "OOB")
> boostedtree_predictions <- predict(boostedtree, 
                                     skillcraft_test, best.iter)
> (boostedtree_SSE <- compute_SSE(boostedtree_predictions, 
                                  skillcraft_test$LeagueIndex))
[1] 555.2997

A bit of experimentation has revealed that we can't get substantially better results than this by allowing the algorithm to iterate over more trees. Despite this, we are already performing better using this method than both the single and bagged tree classifiers.

Limitations of boosting

Boosting is a very powerful technique that continues to receive a lot of attention and research, but it is not without its limitations. Boosting relies on combining weak learners together. In particular, we can expect to get the most out of boosting when the models that are used are not already complex models themselves. We already saw an example of this with neural networks, by noting that the more complex architecture of three hidden neurons gives a better learner to begin with than the simpler architecture of a single hidden neuron.

Combining weak learners may be a way to reduce overfitting, but this is not always effective. By default, boosting uses all of its training data and progressively tries to correct mistakes that it makes without any penalizing or shrinkage criterion (although the individual models trained may themselves be regularized). Consequently, boosting can sometimes overfit.

Finally, a very important limitation is that many boosting algorithms have a symmetric loss function. Specifically, there is no distinction that is made in classification between a false positive classification error and a false negative classification error. Every type of error is treated the same when the observation weights are updated.

In practice, this might not be desirable, in that one of the two errors may be more costly. For example, on the website for our MAGIC Telescope data set, the authors state that a false positive of detecting gamma rays where there are none, is worse than a false negative of misclassifying gamma rays as background radiation. Cost-sensitive extensions of boosting algorithms have been proposed, however.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Predicting complex skill learning with boosting

Create new playlist

Sign In

Sign Up

Predicting complex skill learning with boosting

Note

Limitations of boosting

Table of Contents for
Predicting complex skill learning with boosting