Ensemble Methods for Multi-Class Classification

Our modern world is already interconnected with many devices for collecting data about human behavior - for example, our cell phones are small spies in our pockets tracking number of steps, route, or our eating habits. Even the watches that we wear now can track everything from the number of steps we take to our heart rate at any given moment in time.

In all these situations, the gadgets try to guess what the user is doing based on collected data to provide reports of the user's activities through the day. From a machine learning perspective, the task can be viewed as a classification problem: detecting patterns in collected data and assigning the right activity category to them (that is, swimming, running, sleeping). But importantly, it is still supervised problem - that means to train a model, we need to provide observations annotated by actual categories.

In this section, we are going to focus on ensemble methods for modeling problems of multi-class classification - sometimes referred to as multinomial classification - using a sensor dataset provided by the UCI dataset library.

Note that multi-class classification should not be confused with multi-label classification whereby multiple labels can be predicted for a given example. For example, a blog post can be tagged with multiple labels as one blog can encompass any number of topics; however, in multi-class classification, we are forced to choose one out of N possible topics where N > 2 possible labels.

The reader is going to learn in this chapter about the following topics:

  • Preparing data for multi-class classification, including handling missing values
  • Multi-class classification using the Spark RF algorithm
  • Evaluating the quality of Spark classification models using different measures
  • Building H2O tree-based classification models and exploring their quality
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.249.37