Purging, embargoing, and combinatorial CV

For financial data, labels are often derived from overlapping data points as returns are computed from prices in multiple periods. In the context of trading strategies, the results of a model's prediction, which may imply taking a position in an asset, may only be known later, when this decision is evaluated—for example, when a position is closed out.

The resulting risks include the leaking of information from the test into the training set, likely leading to an artificially inflated performance that needs to be addressed by ensuring that all data is point-in-time—that is, truly available and known at the time it is used as the input for a model. Several methods have been proposed by Marcos Lopez de Prado in Advances in Financial Machine Learning to address these challenges of financial data for cross-validation, as shown in the following list:

Purging: Eliminate training data points where the evaluation occurs after the prediction of a point-in-time data point in the validation set to avoid look-ahead bias.
Embargoing: Further eliminate training samples that follow a test period.
Combinatorial cross-validation: Walk-forward CV severely limits the historical paths that can be tested. Instead, given T observations, compute all possible train/test splits for N<T groups that each maintain their order, and purge and embargo potentially overlapping groups. Then, train the model on all combinations of N-k groups while testing the model on the remaining k groups. The result is a much larger number of possible historical paths.

Prado's Advances in Financial Machine Learning contains sample code to implement these approaches; the code is also available via the new library, timeseriescv.

Table of Contents for Purging, embargoing, and combinatorial CV

Create new playlist

Sign In

Sign Up

Table of Contents for
Purging, embargoing, and combinatorial CV