Long Short Term Memory Networks

Long Short Term Memory Networks (LSTMs) work really well whenever you might need a recurrent network. As you might have guessed, LSTMs excel at learning long-term interactions. In fact, that's what they were designed to do.

LSTMs are able to both accumulate information from previous time steps, and selectively choose when to forget some irrelevant information in favor of some new more relevant information.

As an example, consider the sequence In highschool I took Spanish. When I went to France I spoke French. If we were training a network to predict the word French, it would be very important to remember France and selectively forget Spanish, because the context has shifted. LSTMs can selectively forget things, when the context of the sequence changes.

To accomplish this selective long-term memory, LSTMs implement a forget gate, which earns the LSTM membership into a family of neural networks known as gated neural networks. This forget gate allows the LSTM to selectively learn when information should be discarded from it's long term memory.

Another key characteristic of the LSTM is an internal self loop, that lets the unit accumulate information for long terms. This loop is used in addition to the loop we've seen in the RNN, which can be thought of as an outer loop between time steps.

Relative to the other neurons we've seen, LSTMs are quite complex, as shown in the following image:

Each LSTM unit, when unrolled, has an input for time step t called x_t, an output called o_t, and a memory bus C that carries memory from the previous time step C_t-1 to the next C_t.

In addition to these inputs, the unit also contains several gates. The first, which we've already mentioned, is the forget gate, labeled F_t in the diagram:

The output of this gate, which will be between 0 and 1, is pointwise multiplied with C_t-1. This allows the gate to regulate the flow of information from C_t-1 to C_t.

The next gate, the input gate i_t, is used in conjunction with a function Candidate C_t. Candidate C_t learns a vector that could be added to the memory state. The input gate learns which values in the bus C get updated. The following formula illustrates i_t and Candidate C_t:

We take the pointwise product of i_t and Candidate C_t decides what to add to bus C, after using F_t to decide what to forget, as shown in the following formula:

Finally, we will decide what gets output. The output comes primarily from the memory bus C; however, it's filtered by yet another gate called the output gate. The following formula illustrates the output:

While complex, LSTMs are incredibly effective at a variety of problems. While multiple variants of the LSTM exist, this basic implementation is for the most part still considered state-of-the-art across a very wide range of tasks.

One of those tasks is predicting the next value in a time series, which is what we will be using an LSTM for in this chapter. However, before we start applying LSTMs to a time series, a brief refresher on time series analysis and more traditional methods is warranted.

Table of Contents for Long Short Term Memory Networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Long Short Term Memory Networks