Exploding gradients

Exploding gradients relate to a situation where the BPTT algorithm assigns an insanely high importance to the weights, without a rationale. The problem results in an unstable network. In extreme situations, the values of weights can become so large that the values overflow and result in NaN values.

The exploding gradients problem can be detected through observing the following subtle signs while training the network:

The model weights quickly become very large during training
The model weights become NaN values during training
The error gradient values are consistently above 1.0 for each node and layer during training

There are several ways in which one could handle the exploding gradients problem. The following are some of the popular techniques:

This problem can be easily solved if we can truncate or squash the gradients. This is known as gradient clipping.
Updating weights across fewer prior time steps during training may also reduce the exploding gradient problem. This technique of having fewer step updates is called truncated backpropagation through time (TBPTT). It is an altered version of the BPTT training algorithm where the sequence is processed one time step at a time, and periodically (k1 time steps) the BPTT update is performed back for a fixed number of time steps (k2 time steps). k1 is the number of forward-pass time steps between updates. k2 is the number of time steps to which to apply BPTT.
Weight regularization can be done by checking the size of network weights and applying a penalty to the networks loss function for large weight values.

By using long short-term memory units (LSTMs) or gated recurrent units (GRUs) instead of plain vanilla RNNs.
Careful initialization of weights such as Xavier initialization or He initialization.

Table of Contents for Exploding gradients

Create new playlist

Sign In

Sign Up

Table of Contents for
Exploding gradients