Here, we'll put the basics of neural network into practice. We'll adapt the logistic regression TenserFlow code into a single hidden layer of neurons. Then, you'll learn the idea behind backpropagation to compute the weights, that is, train the net. Finally, you'll train your first true neural network in TensorFlow.
The TensorFlow code for this section should look familiar. It's just a slightly evolved version of the logistic regression code. Let's look at how to add a hidden layer of neurons that will compute nonlinear combinations of our input pixels.
You should start with a fresh Python session, execute the code to read in, and set up the data as in the logistic model. It's the same code, just copied to the new file:
import tensorflow as tf import numpy as np import math from tqdm import tqdm %autoindent try: from tqdm import tqdm except ImportError: def tqdm(x, *args, **kwargs): return x
You can always go back to the previous sections and remind yourself what that code does; everything up to the num_hidden
variable will get you up to speed.
Let's now explore the single hidden layer model in a step-by-step process:
num_hidden = 128
; this is essentially how many nonlinear combinations will get passed to the logistic progression in the end.W1
and b1
weight tensors. They're now feeding into our hidden neurons, so they need to match the shape:W1 = tf.Variable(tf.truncated_normal([1296, num_hidden], stddev=1./math.sqrt(1296))) b1 = tf.Variable(tf.constant(0.1,shape=[num_hidden]))
h1
line; this is to multiply our input pixels by their respective weights for each neuron:h1 = tf.sigmoid(tf.matmul(x,W1) + b1)
Add the neuron bias term, and finally, put this through the sigmoid
activation function; at this point, we have 128 intermediate values:
W2
and b2
, respectively. Note how the shape matches the shape of the neurons 128, and the number of the output class is 5
:W2 = tf.Variable(tf.truncated_normal([num_hidden, 5], stddev=1./math.sqrt(5))) b2 = tf.Variable(tf.constant(0.1,shape=[5])) sess.run(tf.global_variables_initializer())
In all these weights, we initialize them with this strange truncated normal call. With neural networks, we want to get a good spread of initial values so our weights can climb to meaningful values rather than just getting zeroed out.
softmax
model just as we did before, except we need to take care to use our 128 neurons as the input, h1
, and the associated weights and biases, W2
and b2
:y = tf.nn.softmax(tf.matmul(h1,W2) + b2)
The key to training the weights of a neural network and many other machine learning models is called backpropagation.
A full derivation is beyond the scope of this book, but let's go through it intuitively. When you train a model such as logistic regression in air and your training set comes directly from poorly chosen weights, you can see which weights should be adjusted and by how much and change them accordingly.
Formally, TensorFlow does this by computing the derivative of the air with respect to the weight and adjusting the weight by a fraction of this. Backpropagation is really an extension of the same process.
You start at the bottom output or cost function layer, computing derivatives, and use those to compute associated derivatives with neurons one layer up. We can compute the appropriate partial derivative of the cost with respect to the weight we want to adjust by adding up the product of the derivatives on the path from the cost up to the weight. The formula shown in the preceding diagram just spells out what the red arrows show. If this seems complicated, don't worry.
TensorFlow handles it for you behind the scenes with the optimizer. Because we carefully specified our model using TensorFlow to train it almost exactly the same as before, we'll use the same code here:
epochs = 5000 train_acc = np.zeros(epochs//10) test_acc = np.zeros(epochs//10) for i in tqdm(range(epochs), ascii=True): if i % 10 == 0: # Record summary data, and the accuracy # Check accuracy on train set A = accuracy.eval(feed_dict={x: train.reshape([-1,1296]), y_: onehot_train}) train_acc[i//10] = A # And now the validation set A = accuracy.eval(feed_dict={x: test.reshape([-1,1296]), y_: onehot_test}) test_acc[i//10] = A train_step.run(feed_dict={x: train.reshape([-1,1296]), y_: onehot_train})
One thing to note is that, because we have these hidden neurons, there are many more weights to fit the model. This means that our model will take longer to run and that it has to take more iterations to train. Let's run it through 5000
epochs this time:
This model probably takes longer than the previous one, maybe four times as long. So you can expect a few minutes to 10 minutes, depending on your computer. With the model training now, we'll look at verifying the accuracy later.
18.218.11.211