RNN

The birds are flying in the ____. If I ask you to predict the blank, you might predict sky. How did you predict that the word sky would be a good fit to fill this blank? Because you read the whole sentence and predicted sky would be the right word based on understanding the context of the sentence. If we ask our normal neural network to predict the right word for this blank, it will not predict the correct word. This is because a normal neural network's output is based on only the current input. So, the input to the neural network will be just the previous word, the. That is, in normal neural networks, each input is independent of the others. So, it will not perform well in a case where we have to remember the sequence of input to predict the next sequence.

How do we make our network remember the whole sentence to predict the next word correctly? Here is where RNN comes into play. RNN predicts the output not only based on the current input but also on the previous hidden state. You might be wondering why RNN has to predict the output based on the current input and the previous hidden state and why it can't just use the current input and the previous input instead of the current input and the previous hidden state to predict the output. This is because the previous input will store information about the previous word, while the previous hidden state captures information about the whole sentence, that is, the previous hidden states stores the context. So, it is useful to predict the output based on the current input and the previous hidden state instead of just the current input and previous input.

RNN is a special type of neural network that is widely applied over sequential data. In other words, it is applied over the data where ordering matters. In a nutshell, RNN has a memory which holds previous information. It is widely applied over various Natural Language Processing (NLP) tasks such as machine translation, sentiment analysis, and so on. It is also applied over time series data such as stock market data. Still not clear what RNN is exactly? Look at the following diagram showing the comparison of normal neural networks and RNN:

Did you notice how RNN differs from the normal neural networks we saw in the previous topic? Yes. The difference is that there is a loop in the hidden states which implies how previous hidden states are used to calculate the output.

Still confusing? Look at the following unrolled version of an RNN:

As you can see, the output y₁ is predicted based on the current input x₁, the current hidden state h_1, and also the previous hidden state h₀. Similarly, look at how output y₂ is computed. It takes the current input x₂ and the current hidden state h₂ as well as the previous hidden state h₁. This is how RNN works; it takes the current input and previous hidden state to predict the output. We can call these hidden states a memory as they hold information that has been seen so far.

Now, we will see a little bit of math:

In the preceding diagram:

U represents the input to the hidden state weight matrix
W represents the hidden to the hidden state weight matrix
V represents the hidden to the output state weight matrix

So, in the forward pass, we compute the following:

That is, hidden state at a time t = tanh( [input to hidden weight matrix * input ] + [ hidden to hidden weight matrix * previous hidden state at a time t-1 ] ):

That is, output at a time t = Sigmoid (hidden to output weight matrix * hidden state at a time t).

We can also define our loss function as a cross-entropy loss, like so:

In the preceding example, is the actual word at time t and is the predicted word at time t. Since we take the whole sequence as a training sample, the total loss will be the sum of loss at each time step.

Table of Contents for RNN

Create new playlist

Sign In

Sign Up

Table of Contents for
RNN