Using classification models to predict class labels

With these tools in hand, we can now take on our first real classification example.

Consider the small town of Randomville, where people are crazy about their two sports teams, the Randomville Reds and the Randomville Blues. The Reds had been around for a long time, and people loved them. But then, some out-of-town millionaire came along and bought the Reds' top scorer and started a new team, the Blues. To the discontent of most Reds fans, the top scorer would go on to win the championship title with the Blues. Years later, he would return to the Reds, despite some backlash from fans who could never forgive him for his earlier career choices. But anyway, you can see why fans of the Reds don't necessarily get along with fans of the Blues. In fact, these two fan bases are so divided that they never even live next to each other. I've even heard stories where the Red fans deliberately moved away once Blues fans moved in next door. True story!

Anyway, we are new in town and are trying to sell some Blues merchandise to people by going from door to door. However, every now and then we come across a bleeding-heart Reds fan who yells at us for selling Blues stuff and chases us off their lawn. Not nice! It would be much less stressful, and a better use of our time, to avoid these houses altogether and just visit the Blues fans instead.

Confident that we can learn to predict where the Reds fans live, we start keeping track of our encounters. If we come by a Reds fan's house, we draw a red triangle on our handy town map; otherwise, we draw a blue square. After a while, we get a pretty good idea of where everyone lives:

However, now, we approach the house that is marked as a green circle in the preceding map. Should we knock on their door? We try to find some clue as to what team they prefer (perhaps a team flag hanging from the back porch), but we can't see any. How can we know if it is safe to knock on their door?

What this silly example illustrates is exactly the kind of problem a supervised learning algorithm can solve. We have a bunch of observations (houses, their locations, and their colors) that make up our training data. We can use this data to learn from experience so that, when we face the task of predicting the color of a new house, we can make a well-informed estimate.

As we mentioned earlier, fans of the Reds are really passionate about their team, so they would never move next to a Blues fan. Couldn't we use this information and look at all of the neighboring houses, to find out what kind of fan lives in the new house?

This is exactly what the k-NN algorithm would do.

Table of Contents for Using classification models to predict class labels

Create new playlist

Sign In

Sign Up

Table of Contents for
Using classification models to predict class labels