How it works...

As this is the third recipe on training neural networks using PyTorch, many of the concepts are already familiar. That is why we focus on the new ones and refer to the previous two recipes for more details.

After loading the libraries, we defined a number of parameters for the data and training of the RNN. In Step 3, we downloaded stock prices of Intel from the years 2010-2019. We resampled to the weekly frequency by taking each week's last adjusted close price. For the validation set, we use the second half of 2019. We used index slicing with the selected dates to calculate the size of the validation set (27 weeks).

In Step 4, we used MinMaxScaler of scikit-learn to scale the data to fit the range of [0,1]. We fitted the scaler using the training data and transformed both training and validation sets. Lastly, we concatenated the two sets back into one array, as we will use it to create the input dataset.

In Step 5, we used the already familiar custom function, create_input_data, to create the inputs and targets. For this task, we use 12 lags (~3 months of data) to predict the next week's close price.

In Step 6, we calculated the naïve forecast using the already familiar approach. For the calculations, we used the unscaled data, as by using the naïve approach, we are not introducing any bias.

In Step 7, we created the DataLoader objects used for convenient batch generation. The inputs (features) are stored as a 3D tensor with dimensions of [number of observations, length of the series, number of features]. As we are using a univariate time series for forecasting future values, the number of features is 1. We allowed the training set to be shuffled, even though we are dealing with a time series task. The reason for this is that the sequence order we want to learn using the RNN is already captured within the lagged features. The training set contains 483 sequences of length 12.

In Step 8, we defined the RNN model. The approach is very similar to what we did in the previous recipes (defining a class inheriting from nn.Module). This time, we specified more arguments for the class:

input_size: The expected number of features
hidden_size: The number of neurons in the hidden state (and the RNN's output)
n_layers: The number of RNN layers stacked on top of one another (default of 1)
output_size: The size of the output; for our many-to-one case, it is 1

While defining the RNN part of the network, we indicated batch_first=True. This tells the network that the first argument of the input will be the batch size (the next ones will be the length of the series and the number of features). We also wanted to use the ReLU activation function instead of the default tanh (as a potential solution to the vanishing gradient problem; however, this should not be the case with such a short series). In our architecture, we passed the last time step's values of the RNN's output (we do not use the additional output, which is the hidden state) to a fully connected layer, which outputs the predicted value for the next element of the sequence.

While defining the architecture, we used the nn.RNN module instead of nn.RNNCell. The reason for this is that the former leads to easier modifications such as stacking multiple RNN cells. The same principles apply to LSTMs and GRUs.

We can manually initialize the hidden state in the class using torch.zeros. However, doing nothing results in automatic initialization with zeros.

Another possible solution to the vanishing gradient problem is Truncated Backpropagation Through Time. Without going into too much detail, we can detach the hidden state (using the detach method) while passing it to the RNN layer.

In Step 9, we instantiated the model (with 6 neurons in the hidden state, 1 RNN layer, 1 feature), optimizer, and the loss function (MSE; however, we actually used the RMSE as in the previous recipe). In Step 10, we trained the network.

The rest of the steps are analogous to the steps in the previous recipes. The only difference that must be mentioned is that the predictions obtained using the model are on a different scale than the stock prices. To convert them back to the original scale, we had to use the inverse_transform method of the previously defined MinMaxScaler.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...