Creating a lagged training set

For each training example, we want to train the network to predict a value x_t , given a sequence of lags .The ideal number of lags is a hyperparameter, so some experimentation is in order.

Structuring the input in this way is a requirement of the BPTT algorithm, as we have previously talked about. We will use the following code to train the dataset:

def lag_dataframe(data, lags=1):
    df = pd.DataFrame(data)
    columns = [df.shift(i) for i in range(lags, 0, -1)]
    columns.append(df)
    df = pd.concat(columns, axis=1)
    df.fillna(0, inplace=True)

    cols = df.columns.tolist()
    for i, col in enumerate(cols):
        if i == 0:
            cols[i] = "x"
        else:
            cols[i] = "x-" + str(i)

    cols[-1] = "y"
    df.columns = cols
    return df

As an example, if we were to call lag_dataframe with lags = 3, we would expect a dataset returned with x_t-1, x_t-2, and x_t-3. I find it very difficult to understand lag code like this, so if you do too, you aren't alone. I recommend running it and building some familiarity with the operation.

When choosing the number lags, you might need to also consider how many lags you want to wait for before you're able to make a prediction, when you deploy your model to production.

Table of Contents for Creating a lagged training set

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating a lagged training set