Backward propagation explained

In this example, we are pretraining our model (supervised learning) to a simple function described by a set of inputs (1.0, 0.1, 0) and expected outputs of (0, 1.0, 1.0), which is represented by the graph/chart we saw earlier. In essence, we want our neural net to learn the function defined by those points and be able to output those results. We do this by calling net.Train, passing in datasets and the minimum expected error. This trains the network by backward propagating the error through each neuron of the network until a minimum error can be reached. Then, the training stops and the network declares itself ready.

Backward propagation works using a simple iterative optimization algorithm called gradient descent, which uses the minimum error to minimize each of the neuron input weights so that the global minimum error can be reached. To fully understand this, we will need to go into some differential calculus and derivatives. Instead, we will take a shortcut and just look at what the code is doing in the Train method of the NeuralNet class:

public void Train(List<DataSet> dataSets, double minimumError)
{
  var error = 1.0;
  var numEpochs = 0;
  while (error > minimumError && numEpochs < int.MaxValue)
  { 
    var errors = new List<double>();
    foreach (var dataSet in dataSets)
    {
      ForwardPropagate(dataSet.Values);
      BackPropagate(dataSet.Targets);
      errors.Add(CalculateError(dataSet.Targets));
    }
    error = errors.Average();
    numEpochs++;
  }
}

The code here is relatively straightforward. We set an error and numEpochs. Then, we start a while loop that ends when the error is greater than the minimumError (global) and the numEpochs is less than the maximum int value. Inside the loop, we then loop through each dataSet in dataSets. First, ForwardPropagate is used on the inputs of the dataset values to determine output. Then, BackPropagate is used on the dataset target value to adjust the weights on each of the neurons using gradient descent. Let's take a look inside the BackPropagate method:

private void BackPropagate(params double[] targets)
{
    var i = 0;
    OutputLayer.ForEach(a => a.CalculateGradient(targets[i++]));
    HiddenLayer.ForEach(a => a.CalculateGradient());
    HiddenLayer.ForEach(a => a.UpdateWeights(LearnRate, Momentum));
    OutputLayer.ForEach(a => a.UpdateWeights(LearnRate, Momentum));
}

This method just elegantly loops through each layer of neurons using ForEach from System.Linq. First, it calculates the gradient in the output and hidden layers and then it adjusts the weights in reverse order: first the hidden and then the output. Next, we will dissect the CalculateGradient method:

public double CalculateGradient(double? target = null)
{
  if (target == null)
    return Gradient = OutputSynapses.Sum(a => a.OutputNeuron.Gradient * a.Weight) * Sigmoid.Derivative(Value);

  return Gradient = CalculateError(target.Value) * Sigmoid.Derivative(Value);
}

We can see that the CalculateGradient method takes a nullable double called target. If target is null, the Gradient is calculated by summing the previous gradient multiplied by the input weights. Otherwise, the Gradient is calculated by multiplying the error by the derivative of the Sigmoid. Remember that, sigmoid was our activation function, which is essentially what we are trying to minimize. If you recall from calculus, we can take the derivative of a function in order to determine its minimum or maximum value. In fact, in order to use the gradient descent method for backward propagation, your activation function has to be differentiable.

Table of Contents for Backward propagation explained

Create new playlist

Sign In

Sign Up

Table of Contents for
Backward propagation explained