Getting ready

The network is first presented with M training pairs (X, Y), with X as the input and Y the desired output. The input is propagated from input via the activation function g(h) through the hidden layers, up to the output layers. The output Yhat is the output of the network, giving the error = Y- Yhat.

The loss function J(W) is as follows:

Here, i varies over all the neurons of the output layer (1 to N). The change in weights Wij, connecting ith output layer neuron to jth hidden layer neuron, can then be determined using the gradient of J(W) and employing the chain rule for differentiation:

Here, Oj is the output of the hidden layer neuron, j and h represent activity. This was easy, but now how do we find the update for weights Wjk connecting the neuron k from nth hidden layer to the neuron j of n+1th hidden layer? The process is the same--we will use the gradient of the loss function and chain rule for differentiation, but we will be calculating it for Wjk this time:

Now that the equations are in place, let's see how to do it in TensorFlow. In this recipe, we work with our same old the MNIST dataset (http://yann.lecun.com/exdb/mnist/).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.67.40