Advanced Machine Learning Models in Finance

In Chapter 8, Identifying Credit Default with Machine Learning, we introduced the workflow of solving a real-life problem using machine learning. We went over the entire pipeline, from cleaning the data to training a model (a classifier, in that case) and evaluating its performance. However, this is rarely the end of the project. We used a simple decision tree classifier, which most of the time can be used as a benchmark or minimum viable product (MVP). We will now approach a few more advanced topics.

We start the chapter by presenting how to use more advanced classifiers (also based on decision trees). Some of them (such as XGBoost or LightGBM) are frequently used for winning machine learning competitions (such as those found on Kaggle). Additionally, we introduce the concept of stacking multiple machine learning models, to further improve prediction performance.

In the finance industry (but not exclusively), it is crucial to understand the logic behind the model's prediction. For example, a bank needs to have actual reasons for declining a credit request or can try to limit its losses by predicting which customers are likely to default on a loan. That is why we introduce a few methods to investigate feature importance, some of which are model-agnostic.

Another common real-life problem concerns dealing with imbalanced data, that is, when one class (such as default or fraud) is rarely observed in practice, which makes it difficult to train a model to accurately capture the minority class observations. We introduce a few common approaches to handling class imbalance and compare their performance on a credit card fraud dataset.

Lastly, we expand on hyperparameter tuning, explained in the previous chapter. There, we used either an exhaustive grid search or a randomized search, both of which are uninformed. This means that there is no underlying logic in selecting the next set of hyperparameters to investigate. This time, we introduce Bayesian optimization, whereby the past attempts are used to select the next set. This approach can significantly speed up the tuning phase of our projects. 

In this chapter, we present the following recipes:

  • Investigating advanced classifiers
  • Using stacking for improved performance
  • Investigating the feature importance
  • Investigating different approaches to handling imbalanced data
  • Bayesian hyperparameter optimization
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.64.126