Comparison of feedforward neural networks and RNNs

One fundamental difference between other neural networks and RNNs is that, in all other networks, the inputs are independent of each other. However, in an RNN, all the inputs are related to each other. In an application, to predict the next word in a given sentence, the relationship between all the previous words helps to predict the current output. In other words, an RNN remembers all these relationships while training itself. This is not the case with other types of neural networks. A representation of a feedforward network is illustrated in the following diagram:

Feedforward neural network architecture

From the preceding diagram, we can see that no loops are involved in the feedforward network architecture. This is in contrast to that of the RNN architecture depicted in the diagrams of the RNN circuit diagram and the RNN unfolded computational graph. The series of mathematical operations in a feedforward network is performed at the nodes and the information is processed straight through, with no loops whatsoever.

Using supervised learning, the input that is fed to a feedforward network is transformed into output. The output in this case could be a label if it is classification, or it is a number in the case of regression. If we consider image classification, a label can be cat or dog for an image given as input.

A feedforward neural network is trained on labeled images until errors are minimized in predicting the labels. Once trained, the model is able to classify even images that it has not seen previously. A trained feedforward network can be exposed to any random collection of photographs; the categorization of the first photograph does not have any impact or influence on the second or subsequent photographs that the model needs to categorize. Let's discuss this with an example for better clarity on the concept: if the first image is seen as a dog by the feedforward network, it does not imply that the second image will be classified as cat. In other words, the predictions that the model arrives at have no notion of order in time, and the decision regarding the label is arrived at just based on the current input that is provided. To summarize, in feedforward networks no information on historical predictions is used for current predictions. This is very different from RNNs, where the previous prediction is considered in order to aid the current prediction.

Another important difference is that feedforward networks, by design, map one input to one output, whereas RNNs can take multiple forms: map one input to multiple outputs, many inputs to many outputs, or many inputs to one output. The following diagram depicts the various input-output mappings possible with RNNs:

Input-output mapping possibilities with RNNs

Let's review some of the practical applications of the various input-output mappings possible with an RNN. Each rectangle in the preceding diagram is a vector and the arrows represent functions, for example, a matrix multiplication. The input vectors are the lower rectangles (colored in red), and the output vectors are the upper rectangles (colored in blue color). The middle rectangles (colored in green) are vectors that hold the RNN's state.

The following are the various forms of mapping illustrated in the diagram:

  • One input to one output: The leftmost one is a vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output; for example, image classification.
  • One input to many outputs: Sequence output, for example, image captioning takes an image as input and it outputs a sentence of words.
  • Many inputs to one output: Sequence input, for example, sentiment analysis where a given sentence is given as input to the RNN, and the output is a classification expressing positive or negative sentiment of the sentence.
  • Many inputs to many outputs: Sequence input and sequence output; for example, for a machine translation task, an RNN reads a sentence in English as input and then outputs a sentence in Hindi or some other language.
  • Many inputs to many outputs: Synced sequence input and output, for example, video classification where we wish to label each frame of the video.

Let's now review the final difference between a feedforward network and an RNN. The way backpropagation is done in order to set the weights in a feedforward neural network is different from that of what is called backpropagation through time (BPTT), which is carried out in an RNN. We are already aware that the objective of the backpropagation algorithm in neural networks is to adjust the weights of a neural network to minimize the error of the network outputs compared to some expected output in response to corresponding inputs. Backpropagation itself is a supervised learning algorithm that allows the neural network to be corrected with regard to the specific errors made. The backpropagation algorithm involves the following steps:

  1. Provide training input to the neural network and propagate it through the network to get the output
  2. Compare the predicted output to the actual output and calculate the error
  3. Calculate the derivatives of the error with respect to the learned network weights
  4. Modify the weights to minimize the error
  5. Repeat

In feedforward networks, it makes sense to run backpropagation at the end as the output is available only at the end. In RNNs, the output is produced at each time step and this output influences the output in the subsequent time steps. In other words, in RNNs, the error of a time step depends on the previous time step. Therefore, the normal backpropagation algorithms are not suitable for RNNs. Hence, a different algorithm known as BPTT is used to modify the weights in an RNN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.164.24