RNN Architectures

We will mostly use the LSTM cell since it has proven better in most NLP tasks. The principle benefit of the Long Short-Term Memory (LSTM) in RNN architectures is that it enables model training over long sequences while retaining memory. To solve the gradient problem, LSTMs include more gates that effectively control access to the cell state.

We've found that Colah’s blog post (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) is a great place to go to obtain a good understand the working of LSTMs.

These small LSTM units of RNN can be combined in multiple forms to solve various kinds of use-cases. RNNs are quite flexible in terms of combining the different input and output patterns:

Many to One: The model takes a complete input sequence to make a single prediction. This is used in sentiment models
One to Many: This model transforms a single input like a numerical date to generate a sequence string like "day", "month","year".
Many to Many: This is the seq2seq model, that takes as input the entire sequence into a second sequence form like Q/A systems.

This figure maps out these relationships nicely:

In this chapter, we will focus on the Many to Many relationship also known as sequence-to-sequence (seq2seq) architecture to build a question-answer chatbot. The standard RNN approach to solving the seq2seq problem involves three primary components:

Encoders: Transform the input sentences into some abstract encoded representation.
Hidden Layer: Encoded sentence transformation representations are manipulated
Decoders: Output a decoded target sequence.

Let's build our intuition on Recurrent Neural Networks by first implementing basic forms of RNN models.

Table of Contents for RNN Architectures

Create new playlist

Sign In

Sign Up

Table of Contents for
RNN Architectures