True or false? Positive or negative? Pass or no pass? User clicks on the ad versus not clicking the ad? If you've ever asked/encountered these questions before then you are already familiar with the concept of binary classification.
At it's core, binary classification - also referred to as binomial classification - attempts to categorize a set of elements into two distinct groups using a classification rule, which in our case, can be a machine learning algorithm. This chapter shows how to deal with it in the context of Spark and big data. We are going to explain and demonstrate:
- Spark MLlib models for binary classification including decision trees, random forest, and the gradient boosted machine
- Binary classification support in H2O
- Searching for the best model in a hyperspace of parameters
- Evaluation metrics for binomial models