Long Short-Term Memory RNN

RNNs are pretty cool, right? But we have seen a problem in training the RNNs called the vanishing gradient problem. Let's explore that a bit. The sky is __. An RNN can easily predict the last word as blue based on the information it has seen. But an RNN cannot cover long-term dependencies. What does that mean? Let's say Archie lived in China for 20 years. He loves listening to good music. He is a very big comic fan. He is fluent in _. Now, you would predict the blank as Chinese. How did you predict that? Because you understood that Archie lived for 20 years in China, you thought he might be fluent in Chinese. But an RNN cannot retain all of this information in memory to say that Archie is fluent in Chinese. Due to the vanishing gradient problem, it cannot recollect/remember the information for a long time in memory. How do we solve that?

Here comes LSTM to the rescue!!!!

LSTM is a variant of the RNN that resolves the vanishing gradient problem. LSTM retains information in the memory as long as it is required. So basically, RNN cells are replaced with LSTM. How does LSTM achieve this?

A typical LSTM cell is shown in the following diagram:

 

LSTM cells are called memory and they are responsible for storing information. But how long does the information have to be in the memory? When can we delete the old information and update the cell with the new one? All of these decisions will be made by three special gates as follows:

  • Forget gate
  • Input gate
  • Output gate

If you look at the LSTM cell, the top horizontal line Ct is called the cell state. It is where the information flows. Information on the cell state will be constantly updated by LSTM gates. Now, we will see the function of these gates:

  • Forget gate: The forget gate is responsible for deciding what information should not be in the cell state. Look at the following statement:
    Harry is a good singer. He lives in New York. Zayn is also a good singer. 
    As soon as we start talking about Zayn, the network will understand that the subject has been changed from Harry to Zayn and the information about Harry is no longer required. Now, the forget gate will remove/forget information about Harry from the cell state. 
  • Input gate: The input gate is responsible for deciding what information should be stored in the memory. Let's consider the same example:
    Harry is a good singer. He lives in New York. Zayn is also a good singer. 
    So, after the forget gate removes information from the cell state, the input gate decides what information has to be in the memory. Here, since the information about Harry is removed from the cell state by the forget gate, the input gate decides to update the cell state with the information about Zayn. 
  • Output gate: The output gate is responsible for deciding what information should be shown from the cell state at a time, t. Now, consider the following sentence:
    Zayn's debut album was a huge success. Congrats ____
    Here, congrats is an adjective which is used to describe a noun. The output layer will predict Zayn (noun), to fill in the blank.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.217.5