Single hidden layer model

Here, we'll put the basics of neural network into practice. We'll adapt the logistic regression TenserFlow code into a single hidden layer of neurons. Then, you'll learn the idea behind backpropagation to compute the weights, that is, train the net. Finally, you'll train your first true neural network in TensorFlow.

The TensorFlow code for this section should look familiar. It's just a slightly evolved version of the logistic regression code. Let's look at how to add a hidden layer of neurons that will compute nonlinear combinations of our input pixels.

You should start with a fresh Python session, execute the code to read in, and set up the data as in the logistic model. It's the same code, just copied to the new file:

import tensorflow as tf
import numpy as np
import math
from tqdm import tqdm
%autoindent
try:
    from tqdm import tqdm
except ImportError:
    def tqdm(x, *args, **kwargs):
        return x

You can always go back to the previous sections and remind yourself what that code does; everything up to the num_hidden variable will get you up to speed.

Exploring the single hidden layer model

Let's now explore the single hidden layer model in a step-by-step process:

  1. First, let's specify how many neurons we want with num_hidden = 128; this is essentially how many nonlinear combinations will get passed to the logistic progression in the end.
  2. To accommodate this, we also need to update the shape of the W1 and b1 weight tensors. They're now feeding into our hidden neurons, so they need to match the shape:
    W1 = tf.Variable(tf.truncated_normal([1296, num_hidden],
                                       stddev=1./math.sqrt(1296)))
    b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden]))
  3. The way we compute the activation function of the weighted sum is with the single h1 line; this is to multiply our input pixels by their respective weights for each neuron:
    h1 = tf.sigmoid(tf.matmul(x,W1) + b1)

    Add the neuron bias term, and finally, put this through the sigmoid activation function; at this point, we have 128 intermediate values:

    Exploring the single hidden layer model
  4. Now it's just your friendly logistic regression again; you already know what to do. These newly computed 128 features need their own set of weights and biases to compute a score on the output class, that's W2 and b2, respectively. Note how the shape matches the shape of the neurons 128, and the number of the output class is 5:
    W2 = tf.Variable(tf.truncated_normal([num_hidden, 5],
                                          stddev=1./math.sqrt(5)))
    b2 = tf.Variable(tf.constant(0.1,shape=[5]))
    sess.run(tf.global_variables_initializer())

    In all these weights, we initialize them with this strange truncated normal call. With neural networks, we want to get a good spread of initial values so our weights can climb to meaningful values rather than just getting zeroed out.

  5. Truncated normal holds random values from a normal distribution with the given standard deviation, a research standard scaled to the number of inputs, but throws out values that are too extreme, hence the truncation part of this. With our weights and neurons all defined, we set the final softmax model just as we did before, except we need to take care to use our 128 neurons as the input, h1, and the associated weights and biases, W2 and b2:
    y = tf.nn.softmax(tf.matmul(h1,W2) + b2)

Backpropagation

The key to training the weights of a neural network and many other machine learning models is called backpropagation.

Backpropagation

A full derivation is beyond the scope of this book, but let's go through it intuitively. When you train a model such as logistic regression in air and your training set comes directly from poorly chosen weights, you can see which weights should be adjusted and by how much and change them accordingly.

Formally, TensorFlow does this by computing the derivative of the air with respect to the weight and adjusting the weight by a fraction of this. Backpropagation is really an extension of the same process.

You start at the bottom output or cost function layer, computing derivatives, and use those to compute associated derivatives with neurons one layer up. We can compute the appropriate partial derivative of the cost with respect to the weight we want to adjust by adding up the product of the derivatives on the path from the cost up to the weight. The formula shown in the preceding diagram just spells out what the red arrows show. If this seems complicated, don't worry.

TensorFlow handles it for you behind the scenes with the optimizer. Because we carefully specified our model using TensorFlow to train it almost exactly the same as before, we'll use the same code here:

epochs = 5000
train_acc = np.zeros(epochs//10)
test_acc = np.zeros(epochs//10)
for i in tqdm(range(epochs), ascii=True):
    if i % 10 == 0: # Record summary data, and the accuracy
        # Check accuracy on train set
        A = accuracy.eval(feed_dict={x: train.reshape([-1,1296]), y_: onehot_train})
        train_acc[i//10] = A

        # And now the validation set
        A = accuracy.eval(feed_dict={x: test.reshape([-1,1296]), y_: onehot_test})
        test_acc[i//10] = A
    train_step.run(feed_dict={x: train.reshape([-1,1296]), y_: onehot_train})

One thing to note is that, because we have these hidden neurons, there are many more weights to fit the model. This means that our model will take longer to run and that it has to take more iterations to train. Let's run it through 5000 epochs this time:

Backpropagation

This model probably takes longer than the previous one, maybe four times as long. So you can expect a few minutes to 10 minutes, depending on your computer. With the model training now, we'll look at verifying the accuracy later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.11.211