Summary

In this chapter, we deviated from our usual pattern of learning a new type of model and instead focused on techniques to build ensembles of models that we have seen before. We discovered that there are numerous ways to combine models in a meaningful way, each with its own advantages and limitations. Our first technique for building ensemble models was bagging. The central idea behind bagging is that we build multiple versions of the same model using bootstrap samples of the training data. We then average the predictions made by these models in order to construct our overall prediction. By building many different versions of the model we can smooth out errors made due to overfitting and end up with a model that has reduced variance.

A different approach to building model ensembles uses all of the training data and is known as boosting. Here, the defining characteristic is to train a sequence of models but each time we weigh each observation with a different weight depending on whether we classified that observation correctly in the previous model. There are many variants of boosting and we presented two of the most well-known algorithms, AdaBoost and stochastic gradient boosting. The averaging process that operates over the predictions made by individual models to compute the final prediction often weighs each model by its performance.

Traditional texts that present bagging and boosting introduce them in the context of decision trees. There is good reason for this, as the decision tree is the prototypical model for which bagging and boosting have been applied. Boosting in particular works best on models that are weak learners and decision trees can easily be made into weak learners by significantly restricting their size and complexity during construction.

At the same time, however, this often leaves the reader with a view that ensemble methods only work for decision trees, or without any experience in how they can be applied to other methods. In this chapter, we emphasized how these techniques are general and how they can be used with a number of different types of models. Consequently, we applied these techniques to models that we have seen before, such as neural networks and logistic regression.

The final type of ensemble model that we studied was the random forest. This is a very popular and powerful algorithm based on bagging decision trees. The key breakthrough behind this model is the use of an input feature sampling procedure, which limits the choice of features that are available to split on during the construction of each tree. In doing this, the model reduces the correlation between trees, captures significant localized variations in the output and improves the degree of variance reduction in the final result. Another key benefit of this model is that it scales well with a larger number of input features. For our real-world Skillcraft data set, we discovered that random forests and stochastic gradient boosting produced the best performance.

In the next chapter, we will introduce another type of model with a distinct structure known as the probabilistic graphical model. These models use a graphical structure in order to explicitly represent the conditional independence between input features. Probabilistic graphical models find applications across a wide variety of predictive tasks from spam e-mail identification to DNA sequence labeling.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.198.94