Data preparation

For this example, we won't be using a validation set, or rather we will be using our test set as our validation set. When working on forecasting problems like this one, validation becomes a challenging endeavor because the further the training data gets from the testing data, the more likely it is to perform poorly. On the other hand, this doesn't provide much protection from overfitting.

To keep things simple, here we will use only a test set and hope for the best.

Before we move on, let's take a look at the overall flow for the data prep we will do. In order to use this dataset to train an LSTM, we will need to:

Load the dataset and convert epoch times into pandas date times.
Create a train and test set by slicing on date ranges.
Difference our dataset.
Scale the differences to be in a scale closer to our activation functions. We will use -1 to 1 since we're going to be using tanh as the activation
Create a training set where each target x_t has a sequence of lags x_t-1...x_t-n associated with it. In this training set, you can think of x_t as our typical dependent variable y. The sequence of lags x_t-1...x_t-n can be thought of as the typical X training matrix.

I'm going to cover each step in the coming topics, showing the relevant code as we go.

Table of Contents for Data preparation

Create new playlist

Sign In

Sign Up

Table of Contents for
Data preparation