What makes a neuron recurrent?

Recurrent neural networks have loops, which allow information to persist from one prediction to the next. This means that the output for each neuron depends on both the current input, and the previous outputs of the network, as shown in the following image:

If we were to flatten this diagram out across time, it would look more like the following graph. This idea of the network informing itself is where the term recurrent comes from, although as a CS major I always think of it as a recursive neural network.

In the preceding diagram, we can see that neuron A takes input xt0 in and outputs ht0 at time step 0. Then at time step 1, the neuron uses input xt1, and a signal from it's previous time step, to output ht1. At time step 2, it now considers it's input xt2 and the signal from the previous time step, which may still contain information from time step 0. We continue this way until we reach the final time step in the sequence and the network grows it's memory from step to step.

Standard RNNs use a weight matrix to mix in the previous time step's signal with the product of the current time step's input and the hidden weight matrix. This is all combined before feeding it through a non-linear function, most often a hyperbolic tangent function. For each time step this looks like:

Here at is a linear combination of the previous time step output and the current time step's input, both parameterized by weight matrices, W and U respectively. Once at has been calculated, it's exposed to a non-linear function, most often the hyperbolic tangent ht. Finally, the neuron's output ot combines ht with a weight matrix, V, and a bias, c.

As you look at this structure, try to imagine a situation where you have some information that's very important, very early in the sequence. As the sequence gets longer, the more likely it is for that important early information to be forgotten as new signals overpower old information easily. Mathematically, the gradient of the unit will either vanish or explode.

This is a major shortcoming of standard RNNs. In practice, traditional RNNs struggle to learn really long-term interactions in a sequence. They're forgetful!

Next, let's take a look at Long Short Term Memory Networks, which can overcome this limitation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.179.220