Classification problems

Classification problems have categorical outcome variables. Most predictors will output a score to indicate whether an observation belongs to a certain class. In the second step, these scores are then translated into actual predictions. 

In the binary case, where we will label the classes positive and negative, the score typically varies between zero or is normalized accordingly. Once the scores are converted into 0-1 predictions, there can be four outcomes, because each of the two existing classes can be either correctly or incorrectly predicted. With more than two classes, there can be more cases if you differentiate between the several potential mistakes.

All error metrics are computed from the breakdown of predictions across the four fields of the 2 x 2 confusion matrix that associates actual and predicted classes. The metrics listed in the table shown in the following diagram, such as accuracy, evaluate a model for a given threshold:

A classifier does not necessarily need to output calibrated probabilities, but should rather produce scores that are relative to each other in distinguishing positive from negative cases. Hence, the threshold is a decision variable that can and should be optimized, taking into account the costs and benefits of correct and incorrect predictions. A lower threshold implies more positive predictions, with a potentially rising false positive rate, and for a higher threshold, the opposite is likely to be true.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.232.88.17