Backpropagation through time

We are already aware that RNNs are cyclical graphs, unlike feedforward networks, which are acyclic directional graphs. In feedforward networks, the error derivatives are calculated from the layer above. However, in an RNN we don't have such layering to perform error derivative calculations. A simple solution to this problem is to unroll the RNN and make it similar to a feedforward network. To enable this, the hidden units from the RNN are replicated at each time step. Each time step replication forms a layer that is similar to layers in a feedforward network. Each time step t layer connects to all possible layers in the time step t+1. Therefore, we randomly initialize the weights, unroll the network, and then use backpropagation to optimize the weights in the hidden layer. The lowest layer is initialized by passing parameters. These parameters are also optimized as a part of backpropagation. The backpropagation through time algorithm involves the following steps:

Provide a sequence of time steps of input and output pairs to the network
Unroll the network, then calculate and accumulate errors across each time step
Roll up the network and update weights
Repeat

In summary, with BPTT, the error is back propagated from the last to the first time step, while unrolling all the time steps. The error for each time step is calculated, which allows updating the weights. The following diagram is a visualization showing the backpropagation through time:

Backpropagation through time in an RNN

It should be noted that as the number of time steps increases, the BPTT algorithm can get computationally very expensive.

Table of Contents for Backpropagation through time

Create new playlist

Sign In

Sign Up

Table of Contents for
Backpropagation through time