The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as while the actual value of the target variable is defined as y.

Once both y and are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

Table of Contents for The back propagation function

Create new playlist

Sign In

Sign Up

Table of Contents for
The back propagation function