Investigating advanced classifiers

In Chapter 8, Identifying Credit Default with Machine Learningwe learned how to build an entire pipeline, with the goal of predicting customer default, that is, their inability to repay their debts. For the machine learning part, we used a decision tree classifier, which is one of the basic algorithms.

There are a few ways to possibly improve the performance of the model, some of them include:

  • Gathering more observations
  • Adding extra features—either by gathering additional data or through feature engineering
  • Using more advanced models
  • Tuning the hyperparameters

There is a common rule that data scientists spend 80% of their time on a project gathering and cleaning data while spending only 20% on the actual modeling. In line with this, adding more data might greatly improve a model's performance, especially when dealing with imbalanced classes in a classification problem. But finding new data is not always possible, or might simply be too complicated. Then, the other solution may be to use more advanced models or to tune the hyperparameters to squeeze out some extra performance.

In the default prediction model we worked on in Chapter 8, Identifying Credit Default with Machine Learningit was not feasible to gather additional data. We can also assume we did our best to manually create new features. In this recipe, we focus on using more advanced classifiers (based on decision trees).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.181.163