LSTM Architecture

The desire to model sequential data more effectively without the limitations of the gradient problem, led researchers to create the Long Short Term Memory (LSTM) variant on the previous RNN model architecture . LSTM achieves better performance because it incorporates gates to control the process of memory in the cell. The following diagram shows an LSTM cell:

Figure 5.x : An LSTM unit (source : http://colah.github.io/posts/2015-08-Understanding-LSTMs)

LSTM consist of three primary elements labeled as 1,2,3 in the above diagram:

The forget gate f(t): This gate provides the ability in the LSTM cell architecture to forget information that is not needed. The sigmoid activation accepts the input X(t) and h(t-1) and effectively decides to remove pieces of old output information by passing a 0. The output of this gate is f(t)*c(t-1).
Information from the new input X(t) that is determined to be retained needs to be stored in the next step in the cell state. A Sigmoid activation is used in this process to update or ignore parts of the new information. Next a vector of all possible values for the new input is created by a tanh activation function. The new cell state is the product of these two values. Then this new memory is then added to old memory c(t-1) to give c(t).
The last process of the LSTM cell is to determine the final output. A sigmoid layer decides which parts of the cell state to output. We then put the cell state through a tanh activation to generate all the possible values and multiply it by the output of the sigmoid gate, to produce desired outputs according to a non-linear function.

These three steps in the LSTM cell process, produce a significant result, that being the model can be trained to learn what information to retain in long term memory and which information to forget. Genius!

Table of Contents for LSTM Architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM Architecture