We can construct a date-based slicing function now that our dataframe is indexed by a datetime timestamp. To do so, we will define a Boolean mask and use that mask to select the existing dataframe. While we could certainly construct this in one line, I think it's a little easier to read this way, as shown in the following code:
def select_dates(df, start, end):
mask = (df.index > start) & (df.index <= end)
return df[mask]
Now that we can grab portions of the dataframe using dates, we can easily create a training and test dataframe with a few calls to these functions, using the following code:
df = read_data()
df_train = select_dates(df, start="2017-01-01", end="2017-05-31")
df_test = select_dates(df, start="2017-06-01", end="2017-06-30")
Before we can use these datasets, we will need to difference them, as shown next.