How to code a custom cross-validation class

We also construct a custom cross-validation class tailored to the format of the data just created, which has pandas MultiIndex with two levels, one for the ticker and one for the data:

class OneStepTimeSeriesSplit:
"""Generates tuples of train_idx, test_idx pairs
Assumes the index contains a level labeled 'date'"""

def __init__(self, n_splits=3, test_period_length=1, shuffle=False):
self.n_splits = n_splits
self.test_period_length = test_period_length
self.shuffle = shuffle
self.test_end = n_splits * test_period_length

@staticmethod
def chunks(l, chunk_size):
for i in range(0, len(l), chunk_size):
yield l[i:i + chunk_size]

def split(self, X, y=None, groups=None):
unique_dates = (X.index
.get_level_values('date')
.unique()
.sort_values(ascending=False)[:self.test_end])

dates = X.reset_index()[['date']]
for test_date in self.chunks(unique_dates, self.test_period_length):
train_idx = dates[dates.date < min(test_date)].index
test_idx = dates[dates.date.isin(test_date)].index
if self.shuffle:
np.random.shuffle(list(train_idx))
yield train_idx, test_idx

OneStepTimeSeriesSplit ensures a split of training and validation sets that avoids a lookahead bias by training models using only data up to period T-1 for each stock when validating using data for month T. We will only use one-step-ahead forecasts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.23.231.207