Cross-validating the model

We will use 250 folds to generally predict about 2 days of forward returns following the historical training data that will gradually increase in length. Each iteration obtains the appropriate training and test dates from our custom cross-validation function, selects the corresponding features and targets, and then trains and predicts accordingly. We capture the root mean squared error as well as the Spearman rank correlation between actual and predicted values:

nfolds = 250
lr = LinearRegression()

test_results, result_idx, preds = [], [], pd.DataFrame()
for train_dates, test_dates in time_series_split(dates, nfolds=nfolds):
X_train = model_data.loc[idx[train_dates], features]
y_train = model_data.loc[idx[train_dates], target]
lr.fit(X=X_train, y=y_train)

X_test = model_data.loc[idx[test_dates], features]
y_test = model_data.loc[idx[test_dates], target]
y_pred = lr.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_pred=y_pred, y_true=y_test))
ic, pval = spearmanr(y_pred, y_test)

test_results.append([rmse, ic, pval])
preds = preds.append(y_test.to_frame('actuals').assign(predicted=y_pred))
result_idx.append(train_dates[-1])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.226.169