Gradient descent explained

Gradient descent uses the partial derivative of the loss or error function in order to propagate the updates back to the neuron weights. Our cost function in this example is the sigmoid function, which relates back to our activation function. In order to find the gradient for the output neuron, we need to derive the partial derivative of the sigmoid function. The following graph shows how the gradient descent method walks down the derivative in order to find the minimum:

Gradient descent algorithm visualized

If you plan to spend anymore time studying neural networks, deep learning, or machine learning, you will certainly study the mathematics of gradient descent and backward propagation in more depth. However, it is unlikely that you will get further exposure to the basic concepts of programming a neural network, so this chapter will be a good future reference.

Let's take a look at the CalculateError function, which simply subtracts the neuron's output value from what its value should have been:

public double CalculateError(double target)
{
    return target - Value;
}

Then, scroll to the UpdateWeights method, as shown in the following code:

public void UpdateWeights(double learnRate, double momentum)
{
    var prevDelta = BiasDelta;
    BiasDelta = learnRate * Gradient;
    Bias += BiasDelta + momentum * prevDelta;

    foreach (var synapse in InputSynapses)
    {
        prevDelta = synapse.WeightDelta;
        synapse.WeightDelta = learnRate * Gradient *         
                               synapse.InputNeuron.Value;
        synapse.Weight += synapse.WeightDelta + momentum * prevDelta;
    }
}

UpdateWeights then adjusts each of the neurons' weights based on learnRate and momentum; learnRate and momentum set the speed at which the NN will learn. We often want to control the learning rate of the algorithm to prevent overfitting and falling into a local minimum or maximum. After that, the code is relatively straightforward, with it looping through the synapse connections and updating the weights with a new value. The Bias is used to control the intercept of the sigmoid activation function, thus allowing the neuron to adjust its initial activation function. We can see how the Bias can alter the activation function in the following graph:

Effect of Bias on the sigmoid activation function

Adjusting the Bias allows for the neuron to start firing or activating at a value other than 0, as indicated in the preceding graph. Thus, if the value of Bias is 2, then the neuron will start activating at -2, as shown in the graph.

Table of Contents for Gradient descent explained

Create new playlist

Sign In

Sign Up

Table of Contents for
Gradient descent explained