The Machine Learning Process

In this chapter, we will start to illustrate how you can use a broad range of supervised and unsupervised machine learning (ML) models for algorithmic trading. We will explain each model's assumptions and use cases before we demonstrate relevant applications using various Python libraries. The categories of models will include:

  • Linear models for the regression and classification of cross-section, time series, and panel data
  • Generalized additive models, including non-linear tree-based models, such as decision trees
  • Ensemble models, including random forest and gradient-boosting machines
  • Unsupervised linear and nonlinear methods for dimensionality reduction and clustering
  • Neural network models, including recurrent and convolutional architectures
  • Reinforcement learning models

We will apply these models to the market, fundamental, and alternative data sources introduced in the first part of this book. We will further build on the material covered so far by showing you how to embed these models in an algorithmic trading strategy to generate or combine alpha factors or to optimize the portfolio-management process and evaluate their performance.

There are several aspects that many of these models and their uses have in common. This chapter covers these common aspects so that we can focus on model-specific usage in the following chapters. They include the overarching goal of learning a functional relationship from data by optimizing an objective or loss function. They also include the closely related methods of measuring model performance. 

We distinguish between unsupervised and supervised learning and supervised regression and classification problems, and outline use cases for algorithmic trading. We contrast the use of supervised learning for statistical inference of relationships between input and output data with the use for the prediction of future outputs from future inputs. We also illustrate how prediction errors are due to the model's bias or variance, or because of a high noise-to-signal ratio in the data. Most importantly, we present methods to diagnose sources of errors and improve your model's performance.

In this chapter, we will cover the following topics:

  • How supervised and unsupervised learning using data works
  • How to apply the ML workflow
  • How to formulate loss functions for regression and classification
  • How to train and evaluate supervised learning models
  • How the bias-variance trade-off impacts prediction errors
  • How to diagnose and address prediction errors 
  • How to train a model using cross-validation to manage the bias-variance trade-off 
  • How to implement cross-validation using scikit-learn
  • Why the nature of financial data requires different approaches to out-of-sample testing

If you are already quite familiar with ML, feel free to skip ahead and dive right into learning how to use linear models to produce and combine alpha factors for an algorithmic trading strategy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.6.75