Long Short Term Memory Networks

Long Short Term Memory Networks (LSTMs) work really well whenever you might need a recurrent network. As you might have guessed, LSTMs excel at learning long-term interactions. In fact, that's what they were designed to do.

LSTMs are able to both accumulate information from previous time steps, and selectively choose when to forget some irrelevant information in favor of some new more relevant information.

As an example, consider the sequence In highschool I took Spanish. When I went to France I spoke French. If we were training a network to predict the word French, it would be very important to remember France and selectively forget Spanish, because the context has shifted. LSTMs can selectively forget things, when the context of the sequence changes.

To accomplish this selective long-term memory, LSTMs implement a forget gate, which earns the LSTM membership into a family of neural networks known as gated neural networks. This forget gate allows the LSTM to selectively learn when information should be discarded from it's long term memory.

Another key characteristic of the LSTM is an internal self loop, that lets the unit accumulate information for long terms. This loop is used in addition to the loop we've seen in the RNN, which can be thought of as an outer loop between time steps.

Relative to the other neurons we've seen, LSTMs are quite complex, as shown in the following image:

Each LSTM unit, when unrolled, has an input for time step t called xt, an output called ot, and a memory bus C that carries memory from the previous time step Ct-1 to the next Ct.

In addition to these inputs, the unit also contains several gates. The first, which we've already mentioned, is the forget gate, labeled Ft in the diagram:

The output of this gate, which will be between 0 and 1, is pointwise multiplied with Ct-1. This allows the gate to regulate the flow of information from Ct-1 to Ct.

The next gate, the input gate it, is used in conjunction with a function Candidate Ct. Candidate Ct learns a vector that could be added to the memory state. The input gate learns which values in the bus C get updated. The following formula illustrates it and Candidate Ct:

We take the pointwise product of it and Candidate Ct decides what to add to bus C, after using Ft to decide what to forget, as shown in the following formula:

Finally, we will decide what gets output. The output comes primarily from the memory bus C; however, it's filtered by yet another gate called the output gate. The following formula illustrates the output:

While complex, LSTMs are incredibly effective at a variety of problems. While multiple variants of the LSTM exist, this basic implementation is for the most part still considered state-of-the-art across a very wide range of tasks.

One of those tasks is predicting the next value in a time series, which is what we will be using an LSTM for in this chapter. However, before we start applying LSTMs to a time series, a brief refresher on time series analysis and more traditional methods is warranted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.123.73