Limitations of neural networks

In this section, we will discuss in detail the issues faced by neural networks, which will become the stepping stone for building deep learning networks.

Vanishing gradients, local optimum, and slow training

One of the major issues with neural networks is the problem of "vanishing gradient" (References [8]). We will try to give a simple explanation of the issue rather than exploring the mathematical derivations in depth. We will choose the sigmoid activation function and a two-layer neural network, as shown in the following figure, to demonstrate the issue:

Vanishing gradients, local optimum, and slow training

Figure 5: Vanishing Gradient issue.

As we saw in the activation function description, the sigmoid function squashes the output between the range 0 and 1. The derivative of the sigmoid function g'(a) = g(a)(1 – g(a)) has a range between 0 and 0.25. The goal of learning is to minimize the output loss, that is, Vanishing gradients, local optimum, and slow training. In general, the output error does not go to 0, so maximum iterations; a user-specified parameter determines the quality of learning and backpropagation of the errors.

Simplifying to illustrate the effect of output error on the input weight layer:

Vanishing gradients, local optimum, and slow training

Each of the transformations, for instance, from output to hidden, involves multiplication of two terms, both less than 1:

Vanishing gradients, local optimum, and slow training

Thus, the value becomes so small when it reaches the input layer that the propagation of the gradient has almost vanished. This is known as the vanishing gradient problem.

A paradoxical situation arises when you need to add more layers to make features more interesting in the hidden layers. But adding more layers also increases the errors. As you add more layers, the input layers become "slow to train," which causes the output layers to be more inaccurate as they are dependent on the input layers; further, and for the same number of iterations, the errors increase with the increase in the number of layers.

With a fixed number of maximum iterations, more layers and slow propagation of errors can lead to a "local optimum."

Another issue with basic neural networks is the number of parameters. Finding effective size and weights for each hidden layer and bias becomes more challenging with the increase in the number of layers. If we increase the number of layers, the parameters increase in polynomials. Fitting the parameters for the data requires a large number of data samples. This can result in the problem discussed before, that is, overfitting.

In the next few sections, we will start learning about the building blocks of deep learning that help overcome these issues.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.236.70