Now, we will extract the required features to train and test our data:
feature_columns = ['SNPS_log_return_positive', 'SNPS_log_return_negative'] for i in range(len(codes)): index = codes[i].split("/")[1] feature_columns.extend([ '{}_log_return_1'.format(index), '{}_log_return_2'.format(index), '{}_log_return_3'.format(index) ]) features_and_labels = pd.DataFrame(columns=feature_columns) closings['SNPS_log_return_positive'] = 0 closings.ix[closings['SNPS_log_return'] >= 0, 'SNPS_log_return_positive'] = 1 closings['SNPS_log_return_negative'] = 0 closings.ix[closings['SNPS_log_return'] < 0, 'SNPS_log_return_negative'] = 1 for i in range(7, len(closings)): feed_dict = {'SNPS_log_return_positive': closings['SNPS_log_return_positive'].ix[i], 'SNPS_log_return_negative': closings['SNPS_log_return_negative'].ix[i]} for j in range(len(codes)): index = codes[j].split("/")[1] k = 1 if j == len(codes) - 1 else 0 feed_dict.update({'{}_log_return_1'.format(index): closings['{}_log_return'.format(index)].ix[i - k], '{}_log_return_2'.format(index): closings['{}_log_return'.format(index)].ix[i - 1 - k], '{}_log_return_3'.format(index): closings['{}_log_return'.format(index)].ix[i - 2 - k]}) features_and_labels = features_and_labels.append(feed_dict, ignore_index=True)
We are storing all our features and labels in the features_and_label variable. The SNPS_log_return_positive and SNPS_log_return_negative keys store the point where the log returns for SNPS are positive and negative, respectively. They are 1 if true and 0 if false. These two keys will act as the labels for the network.
The other keys are to store the values of other markets for the last 3 days (and for the preceding 3 days for SNPS because today's value won't be available to us for this market).