Data preparation

For this example, we won't be using a validation set, or rather we will be using our test set as our validation set. When working on forecasting problems like this one, validation becomes a challenging endeavor because the further the training data gets from the testing data, the more likely it is to perform poorly. On the other hand, this doesn't provide much protection from overfitting.

To keep things simple, here we will use only a test set and hope for the best.

Before we move on, let's take a look at the overall flow for the data prep we will do. In order to use this dataset to train an LSTM, we will need to:

  1. Load the dataset and convert epoch times into pandas date times.
  2. Create a train and test set by slicing on date ranges. 
  3. Difference our dataset.
  4. Scale the differences to be in a scale closer to our activation functions. We will use -1 to 1 since we're going to be using tanh as the activation
  5. Create a training set where each target xt has a sequence of lags xt-1...xt-n associated with it. In this training set, you can think of xt as our typical dependent variable y. The sequence of lags xt-1...xt-n can be thought of as the typical X training matrix.

I'm going to cover each step in the coming topics, showing the relevant code as we go.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.123.189