How to prepare the data

We use a simplified version of the data set constructed in Chapter 4, Alpha Factor Research. It consists of daily stock prices provided by Quandl for the 2010-2017 period and various engineered features. The details can be found in the data_prep notebook in the GitHub repo for this chapter. The decision tree models in this chapter are not equipped to handle missing or categorical variables, so we will apply dummy encoding to the latter after dropping any of the former.

