There's more...

The ROC curve loses its credibility when it comes to evaluating the performance of the model when we are dealing with class imbalance. That is why, in such cases, we should use another curve—the Precision-Recall curve. That is because, for calculating both precision and recall, we do not use the true negatives, and only consider the correct prediction of the minority class (the positive one).

Calculate precision and recall for different thresholds:

y_pred_prob = tree_classifier.predict_proba(X_test_ohe)[:, 1]
precision, recall, thresholds = metrics.precision_recall_curve(y_test, 
                                                               y_pred_prob)

Having calculated the required elements, we can plot the curve:

ax = plt.subplot()
ax.plot(recall, precision, 
        label=f'PR-AUC = {metrics.auc(recall, precision):.2f}')
ax.set(title='Precision-Recall Curve', 
       xlabel='Recall', 
       ylabel='Precision')
ax.legend()

Running the code results in the following plot:

As a summary metric, we can approximate the area under the Precision-Recall curve by calling metrics.auc(recall, precision). In contrast to the ROC-AUC, the PR-AUC ranges from 0 to 1, where 1 indicates the perfect model. A model with a PR-AUC of 1 can identify all the positive observations (perfect recall), while not wrongly labeling a single negative observation as a positive one (perfect precision). We can consider models that bow towards the (1, 1) point as skillful.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...