Gated Recurrent Units

An alternative to LSTM units are GRUs. These were first described by a team that was led by another significant figure in the history of deep learning, Yoshua Bengio. Their initial paper, Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (2014), offers an interesting way of thinking about these ways of augmenting the effectiveness of our RNNs.

Specifically, they draw an equivalence between the Tanh activation function in a vanilla RNN and LSTM/GRU units, also describing them as activations. The difference in the nature of their activation is whether information is retained, unchanged, or updated in the units themselves. In effect, the use of the Tanh function means that your network becomes even more selective about the information that takes it from one step to the next.

GRUs differ from LSTMs in that they get rid of the cell state, thus reducing the overall number of tensor operations your network is performing. They also use a single reset gate instead of the LSTM's input and forget gates, further simplifying the network's architecture.

Here is a logical representation of the GRU:

Here, we can see a combination of the forget/input gates in a single reset gate (z(t) and r(t)), with the single state S(t) carried forward to the next timestep.

Table of Contents for Gated Recurrent Units

Create new playlist

Sign In

Sign Up

Table of Contents for
Gated Recurrent Units