Investigating advanced classifiers

In Chapter 8, Identifying Credit Default with Machine Learning, we learned how to build an entire pipeline, with the goal of predicting customer default, that is, their inability to repay their debts. For the machine learning part, we used a decision tree classifier, which is one of the basic algorithms.

There are a few ways to possibly improve the performance of the model, some of them include:

Gathering more observations
Adding extra features—either by gathering additional data or through feature engineering
Using more advanced models
Tuning the hyperparameters

There is a common rule that data scientists spend 80% of their time on a project gathering and cleaning data while spending only 20% on the actual modeling. In line with this, adding more data might greatly improve a model's performance, especially when dealing with imbalanced classes in a classification problem. But finding new data is not always possible, or might simply be too complicated. Then, the other solution may be to use more advanced models or to tune the hyperparameters to squeeze out some extra performance.

In the default prediction model we worked on in Chapter 8, Identifying Credit Default with Machine Learning, it was not feasible to gather additional data. We can also assume we did our best to manually create new features. In this recipe, we focus on using more advanced classifiers (based on decision trees).

Table of Contents for Investigating advanced classifiers

Create new playlist

Sign In

Sign Up

Table of Contents for
Investigating advanced classifiers