The Machine Learning Process

In this chapter, we will start to illustrate how you can use a broad range of supervised and unsupervised machine learning (ML) models for algorithmic trading. We will explain each model's assumptions and use cases before we demonstrate relevant applications using various Python libraries. The categories of models will include:

Linear models for the regression and classification of cross-section, time series, and panel data
Generalized additive models, including non-linear tree-based models, such as decision trees
Ensemble models, including random forest and gradient-boosting machines
Unsupervised linear and nonlinear methods for dimensionality reduction and clustering
Neural network models, including recurrent and convolutional architectures
Reinforcement learning models

We will apply these models to the market, fundamental, and alternative data sources introduced in the first part of this book. We will further build on the material covered so far by showing you how to embed these models in an algorithmic trading strategy to generate or combine alpha factors or to optimize the portfolio-management process and evaluate their performance.

There are several aspects that many of these models and their uses have in common. This chapter covers these common aspects so that we can focus on model-specific usage in the following chapters. They include the overarching goal of learning a functional relationship from data by optimizing an objective or loss function. They also include the closely related methods of measuring model performance.

We distinguish between unsupervised and supervised learning and supervised regression and classification problems, and outline use cases for algorithmic trading. We contrast the use of supervised learning for statistical inference of relationships between input and output data with the use for the prediction of future outputs from future inputs. We also illustrate how prediction errors are due to the model's bias or variance, or because of a high noise-to-signal ratio in the data. Most importantly, we present methods to diagnose sources of errors and improve your model's performance.

In this chapter, we will cover the following topics:

How supervised and unsupervised learning using data works
How to apply the ML workflow
How to formulate loss functions for regression and classification
How to train and evaluate supervised learning models
How the bias-variance trade-off impacts prediction errors
How to diagnose and address prediction errors
How to train a model using cross-validation to manage the bias-variance trade-off
How to implement cross-validation using scikit-learn
Why the nature of financial data requires different approaches to out-of-sample testing

If you are already quite familiar with ML, feel free to skip ahead and dive right into learning how to use linear models to produce and combine alpha factors for an algorithmic trading strategy.

Table of Contents for The Machine Learning Process

Create new playlist

Sign In

Sign Up

Table of Contents for
The Machine Learning Process