Getting ready

The network is first presented with M training pairs (X, Y), with X as the input and Y the desired output. The input is propagated from input via the activation function g(h) through the hidden layers, up to the output layers. The output Y_hat is the output of the network, giving the error = Y- Y_hat.

The loss function J(W) is as follows:

Here, i varies over all the neurons of the output layer (1 to N). The change in weights W_ij, connecting i^th output layer neuron to j^th hidden layer neuron, can then be determined using the gradient of J(W) and employing the chain rule for differentiation:

Here, O_j is the output of the hidden layer neuron, j and h represent activity. This was easy, but now how do we find the update for weights W_jk connecting the neuron k from n^th hidden layer to the neuron j of n+1^th hidden layer? The process is the same--we will use the gradient of the loss function and chain rule for differentiation, but we will be calculating it for W_jk this time:

Now that the equations are in place, let's see how to do it in TensorFlow. In this recipe, we work with our same old the MNIST dataset (http://yann.lecun.com/exdb/mnist/).

Table of Contents for Getting ready

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting ready