Long Short Term Memory (LSTM)

An LSTM network can control when to let the input enter the neuron, when to remember what has been learned in the previous time step, and when to let the output pass on to the next timestamp. All these decisions are self-tuned and only based on the input. At first glance, an LSTM looks difficult to understand but it is not. Let's use the following figure to explain how it works:

An example of an LSTM cell

First, we need a logistic function σ (see Chapter 2Regression) to compute a value between 0 and 1 and control which piece of information flows through the LSTM gates. Remember that the logistic function is differentiable and therefore it allows backpropagation. Then, we need an operator ⊗ that takes two matrices of the same dimensions and produces another matrix where each element ij is the product of the elements ij  of the original two matrices. Similarly, we need an operator ⊕ that takes two matrices of the same dimensions and produces another matrix where each element ij is the sum of the elements ij of the original two matrices. With these basic blocks, we consider the input Xi at time i and simply juxtapose it with output Yi-1 from the previous step.

The equation ft = σ(Wf.[yi-1,xt] + bf is implements a logistic regression which controls the activation gate ⊗ and it is used to decide how much information from the previous candidate value Ci-1 should be passed to the next candidate value C(here, Wf and bf are the weight matrix and the bias used for the logistic regression). If the logistic outputs 1, this would mean don't forget the previous cell state Ci-1; if it outputs 0, this would mean forget the previous cell state Ci-1. Any number in (0,1) will represent the amount of information to be passed.

Then we have two equations: si = σ(Ws[Yi-1,xi]+bs) used to control via ⊗ how much of the information Ĉi = tanh(WC.[Yi-1,xi] + bc produced by the current cell should be added to the next candidate value Ci via the operator ⊕ according to the scheme represented in the preceding figure.

To implement what has discussed with the operators ⊕ and ⊗, we need another equation where the actual sums + and multiplications * take place:   Ci=ft*Ci-1 + sii

Finally, we need to decide which part of the current cell should be sent to the Yi output. This is simple: we take a logistic regression equation one more time and use this to control via an ⊗ operation which part of the candidate value should go to the output. Here, there is a little piece that deserves care and it is the use of the tanh function to squash the output into [-1, 1]. This latest step is described by the equation: 

Now, I understand that this looks like a lot of math but there are two pieces of good news. First, the math part is not so difficult after all if you understand what the goal is that we want to achieve. Second, you can use LSTM cells as a blackbox drop-in replacement of standard RNN cells and immediately get the benefit of solving the problem of a vanishing gradient. For this reason, you really don't need to know all the math. You just take the TensorFlow LSTM implementation from the library and use it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.95.245