Ensemble learning

When we talked about random forests, that was an example of ensemble learning, where we're actually combining multiple models together to come up with a better result than any single model could come up with. So, let's learn about that in a little bit more depth. Let's talk about ensemble learning a little bit more.

So, remember random forests? We had a bunch of decision trees that were using different subsamples of the input data, and different sets of attributes that it would branch on, and they all voted on the final result when you were trying to classify something at the end. That's an example of ensemble learning. Another example: when we were talking about k-means clustering, we had the idea of maybe using different k-means models with different initial random centroids, and letting them all vote on the final result as well. That is also an example of ensemble learning.

Basically, the idea is that you have more than one model, and they might be the same kind of model or it might be different kinds of models, but you run them all, on your set of training data, and they all vote on the final result for whatever it is you're trying to predict. And oftentimes, you'll find that this ensemble of different models produces better results than any single model could on its own.

A good example, from a few years ago, was the Netflix prize. Netflix ran a contest where they offered, I think it was a million dollars, to any researcher who could outperform their existing movie recommendation algorithm. The ones that won were ensemble approaches, where they actually ran multiple recommender algorithms at once and let them all vote on the final result. So, ensemble learning can be a very powerful, yet simple tool, for increasing the quality of your final results in machine learning. Let us now try to explore various types of ensemble learning:

Bootstrap aggregating or bagging: Now, random forests use a technique called bagging, short for bootstrap aggregating. This means that we take random subsamples of our training data and feed them into different versions of the same model and let them all vote on the final result. If you remember, random forests took many different decision trees that use a different random sample of the training data to train on, and then they all came together in the end to vote on a final result. That's bagging.
Boosting: Boosting is an alternate model, and the idea here is that you start with a model, but each subsequent model boosts the attributes that address the areas that were misclassified by the previous model. So, you run train/tests on a model, you figure out what are the attributes that it's basically getting wrong, and then you boost those attributes in subsequent models - in hopes that those subsequent models will pay more attention to them, and get them right. So, that's the general idea behind boosting. You run a model, figure out its weak points, amplify the focus on those weak points as you go, and keep building more and more models that keep refining that model, based on the weaknesses of the previous one.
Bucket of models: Another technique, and this is what that Netflix prize-winner did, is called a bucket of models, where you might have entirely different models that try to predict something. Maybe I'm using k-means, a decision tree, and regression. I can run all three of those models together on a set of training data and let them all vote on the final classification result when I'm trying to predict something. And maybe that would be better than using any one of those models in isolation.
Stacking: Stacking has the same idea. So, you run multiple models on the data, combine the results together somehow. The subtle difference here between bucket of models and stacking, is that you pick the model that wins. So, you'd run train/test, you find the model that works best for your data, and you use that model. By contrast, stacking will combine the results of all those models together, to arrive at a final result.

Now, there is a whole field of research on ensemble learning that tries to find the optimal ways of doing ensemble learning, and if you want to sound smart, usually that involves using the word Bayes a lot. So, there are some very advanced methods of doing ensemble learning but all of them have weak points, and I think this is yet another lesson in that we should always use the simplest technique that works well for us.

Now these are all very complicated techniques that I can't really get into in the scope of this book, but at the end of the day, it's hard to outperform just the simple techniques that we've already talked about. A few of the complex techniques are listed here:

Bayes optical classifier: In theory, there's something called the Bayes Optimal Classifier that will always be the best, but it's impractical, because it's computationally prohibitive to do it.
Bayesian parameter averaging: Many people have tried to do variations of the Bayes Optimal Classifier to make it more practical, like the Bayesian Parameter Averaging variation. But it's still susceptible to overfitting and it's often outperformed by bagging, which is the same idea behind random forests; you just resample the data multiple times, run different models, and let them all vote on the final result. Turns out that works just as well, and it's a heck of a lot simpler!
Bayesian model combination: Finally, there's something called Bayesian Model Combination that tries to solve all the shortcomings of Bayes Optimal Classifier and Bayesian Parameter Averaging. But, at the end of the day, it doesn't do much better than just cross validating against the combination of models.

Again, these are very complex techniques that are very difficult to use. In practice, we're better off with the simpler ones that we've talked about in more detail. But, if you want to sound smart and use the word Bayes a lot it's good to be familiar with these techniques at least, and know what they are.

So, that's ensemble learning. Again, the takeaway is that the simple techniques, like bootstrap aggregating, or bagging, or boosting, or stacking, or bucket of models, are usually the right choices. There are some much fancier techniques out there but they're largely theoretical. But, at least you know about them now.

It's always a good idea to try ensemble learning out. It's been proven time and time again that it will produce better results than any single model, so definitely consider it!

Table of Contents for Ensemble learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Ensemble learning