Custom time series cross-validation

 Our data consists of grouped time series data that requires a custom cross-validation function to provide the train and test indices that ensure that the test data immediately follows the training data for each equity and we do not inadvertently create a look-ahead bias or leakage.

We can achieve this using the following function that returns a generator yielding pairs of train and test dates. The set of train dates that ensure a minimum length of the training periods. The number of pairs depends on the parameter nfolds. The distinct test periods do not overlap and are located at the end of the period available in the data. After a test period is used, it becomes part of the training data that grow in size accordingly:

def time_series_split(d=model_data, nfolds=5, min_train=21):
"""Generate train/test dates for nfolds
with at least min_train train obs
"""
train_dates = d[:min_train].tolist()
n = int(len(dates)/(nfolds + 1)) + 1
test_folds = [d[i:i + n] for i in range(min_train, len(d), n)]
for test_dates in test_folds:
if len(train_dates) > min_train:
yield train_dates, test_dates
train_dates.extend(test_dates)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.85.211.2