Classification in caret and ROC curves

Classification models are harder to evaluate than regression models, because when we are classifying labels, we might have a severe imbalance. If, for example, we were predicting whether people are going to finish their university degrees or not, and 50% of people finish their degrees, the accuracy would be an ideal metric. But what happens when we have 95% of people finishing their degrees? In that case, the accuracy will be a very bad metric (maybe the model explains most of that 95% well, but doesn't work for the other 5% of the data).

There are several ways of assessing how well a classification model works that consider class imbalance. Apart from all these metrics, we can work with either ROC and precision-recall curves that allow us to choose a model that has the right performance for each label.

Remember that most classification models really output a probability that a sample belongs to class A or B. This probability is later transformed into a proper label (usually a 1/0 label) depending on whether this probability is greater or lower than a threshold. This threshold is usually assumed to be 0.5, but that doesn't need to be the case. By changing this threshold, we will alter the number of labels predicted to be 0 or 1.

More formally, we can define two metrics: the true positive rate, and the false positive rate. The former is the proportion of 1s correctly predicted as 1s, whereas the latter is the proportion of 0s incorrectly predicted as 0s. These metrics can be reversed, and we could replace the 1 by 0 in the previous statements. Let's suppose that we have the following predictions and true results:

Predictions	Real values
0.8	1
0.3	0

If we set up the threshold to be 0.1, both would be classified as 1s. The true positive rate is 100% but the false positive rate is 50%. If we change the threshold to 0.5, we get a true positive rate of 100% and a false positive rate of 0%. If we finally use a threshold of 0.9, we get a 0% true positive rate and a 0% false positive rate. We can obviously tune the threshold in order to achieve the right balance that we want for our model. This relationship can be plotted, yielding the ROC curve. Models can be compared by comparing the total area under the ROC curve (a good model will have a larger area, meaning that across all possible thresholds, it performs well). A nice thing about the ROC curve, is that the curve for a random model (one that randomly predicts 0s and 1s can be plotted and will actually be a 45-degree line).

Another way of posing this, is to define the so-called recall as the true positive rate, and the precision as the number of incorrectly predicted 1s (when they should be 0s) out of the total number of predicted 1s. By changing the threshold, the precision-recall values will change in the same way, yielding the precision-recall curve.

Many data scientists prefer the precision-recall curve over the ROC curve for highly imbalanced datasets. The reason is because precision-recall uses precision (it has the number of predicted 1s in the denominator – presumably low), whereas ROC uses a false positive rate (it has the number of 0s in the denominator – presumably high). This causes the precision-recall curve to change more drastically between models (a good model will have a substantial extra portion the reason why the area under the curve is used).

Table of Contents for Classification in caret and ROC curves

Create new playlist

Sign In

Sign Up

Table of Contents for
Classification in caret and ROC curves