12. Combining Learners

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12. Combining Learners

In [1]:

# setup
from mlwpy import *

digits = datasets.load_digits()
digits_ftrs, digits_tgt = digits.data, digits.target

diabetes = datasets.load_diabetes()
diabetes_ftrs, diabetes_tgt = diabetes.data, diabetes.target

iris = datasets.load_iris()
tts = skms.train_test_split(iris.data, iris.target,
                            test_size=.75, stratify=iris.target)
(iris_train_ftrs, iris_test_ftrs,
 iris_train_tgt, iris_test_tgt) = tts

12.1 Ensembles

Up to this point, we’ve talked about learning methods as stand-alone, singular entities. For example, when we use linear regression (LR) or a decision tree (DT), that’s the single and entire model we are using. We might connect it with other preprocessing steps, but the LR or DT is the model. However, there’s an interesting variation we can play on this theme. Much like teams draw on different characters of their members to create functional success and choirs draw on different voices to create musical beauty, different learning systems can be combined to improve on their individual components. In the machine learning community, combinations of multiple learners are called ensembles.

To draw on our factory analogy, imagine that we are trying to make something like a car. We probably have many factory machines that make the subcomponents of the car: the engine, the body, the wheels, the tires, and the windows. To assemble the car, we send all of these components to a big factory and out pops a working car. The car itself still has all of the stand-alone components, and each component does its own thing—produces power, turns wheels, slows the vehicle—but they all work together to do car things. On the other hand, if we are making mass quantities of an Italian dressing—yes, I prefer homemade too—the components we combine together lose their individual identity. That’s one of the wonders of baking, cooking, and mixology—the way the ingredients combine to form new flavors. While the dressing factory might be a bit smaller than the car factory, we could still bring together ingredients that were themselves the output of other machines.

These two scenarios mirror the two main divisions of ensemble methods (Figure 12.1). Some ensemble methods divide work into different components that are responsible for different regions. Taken together, the component regions cover everything. In other ensemble methods, every component model predicts everywhere—there are no regional divisions—and then those component predictions are combined into a single prediction.

Two main divisions of ensemble methods are shown. — Figure 12.1 On the left, multiple models make predictions everywhere –they are generalists. On the right, different regions are predicted by distinct, specialist models.

The first scenario represents no regions and several models. The entire region of three models A, B, and C are combined to a single model for predictions and result from this model is derived. The second scenario represents different regions and different models. Specific regions (X, Y, and Z) of each model are highlighted. A model is selected by region and predictions are made. Here, a model from region 'X' is selected and result is derived.

Here’s another intuitive take on ensembles that emphasizes the way we combine their component learners. At the risk of turning this discussion into a Facebook flamewar, I’ll create an analogous scenario with two hypothetical forms of democratic legislature shown in Figure 12.2. In the first form, SpecialistRepresentation, every representative is assigned a specialty, such as foreign or domestic issues. When an issue comes up for a vote, reps only vote on their area of specialty. On domestic issues, the foreign specialists will abstain from voting. The hope with this form of government is that reps can become more educated on their particular topic and therefore able to reach better, more informed decisions (don’t roll your eyes). On the other side, GeneralistRepresentation, every rep votes on everything and we simply take the winning action. Here, the hope is that by averaging many competing and diverse ideas, we end up with some reasonable answer.

Two hypothetical forms of democratic legislature are shown. — Figure 12.2 Top: representatives only make decisions in their area of expertise. Bottom: every representative makes decisions on every topic – they are all generalists.

The first scenario shows that representatives of respective area of expertise (foreign policy and domestic policy) make decisions only for the questions related to their respective area and then a combined action is performed. In the next scenario, representatives make decisions for all policy questions and perform instant action.

Very often, ensembles are constructed from relatively primitive component models. When we start training multiple component models, we have to pay the training cost for each of them. If the component training times are high, our total training time may be very high. On the learning performance side, the result of combining many simple learners gives us a net effect of acting like a more powerful learner. While it might be tempting to use ensembles with more powerful components to create an omnipotent learning algorithm, we won’t pursue that path. If you happen to have unlimited computing resources, it might make a fun weekend project. Please be mindful that you don’t accidentally turn on Skynet. We rarely need to create custom ensembles; we typically use off-the-shelf ensemble methods. Ensembles underlie several powerful techniques and, if we have multiple learning systems that we need to combine, an ensemble is the perfect way to combine them.

12.2 Voting Ensembles

A conceptually simple way of forming an ensemble is (1) build several different models on the same dataset and then, on a new example, (2) combine the predictions from the different models to get a single final prediction. For regression problems, our combination of predictions can be any of the summary statistics we’ve seen—for example, the mean or median. When we combined nearest neighbors to make a prediction (Section 3.5), we used these. The analogy with ensembles is that we simply create and train a few models, get a few predictions, and then take the average—whether it’s a literal arithmetic mean or some fancier variation. Presto, that’s our final prediction. For classification, we can take a majority vote, try to get a measure of certainty from the base classifiers and take a weighted vote, or come up with other clever ideas.

In [2]:

Table of Contents for 12. Combining Learners

Create new playlist

Sign In

Sign Up

12. Combining Learners

12.1 Ensembles

12.2 Voting Ensembles

12.3 Bagging and Random Forests

12.3.1 Bootstrapping

12.3.2 From Bootstrapping to Bagging

12.3.3 Through the Random Forest

12.4 Boosting

12.4.1 Boosting Details

12.4.1.1 Improvements with Boosting Iterations

12.5 Comparing the Tree-Ensemble Methods

12.6 EOC

12.6.1 Summary

12.6.2 Notes

12.6.3 Exercises

Table of Contents for
12. Combining Learners