First, you'll learn about the loss function for our machine learning classifier and implement it in TensorFlow. Then, we'll quickly train the model by evaluating the right TensorFlow node. Finally, we'll verify that our model is reasonably accurate and the weights make sense.
Optimizing our model really means minimizing how wrong we are. With our labels in one-hot style, it's easy to compare these with the class probabilities predicted by the model. The categorical cross_entropy
function is a formal way to measure this. While the exact statistics are beyond the scope of this course, you can think of it as punishing the model for more for less accurate predictions. To compute it, we multiply our one-hot real labels element-wise with the natural log of the predicted probabilities, then sum these values and negate them. Conveniently, TensorFlow already includes this function as tf.nn.softmax_cross_entropy_with_logits()
and we can just call that:
# Climb on cross-entropy cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits( logits = y + 1e-50, labels = y_))
Note that we are adding a small error value of 1e-50
here to avoid numerical instability problems.
TensorFlow is convenient in that it provides built-in optimizers to take advantage of the loss function we just wrote. Gradient descent is a common choice and will slowly nudge our weights toward better results. This is the node that will update our weights:
# How we train train_step = tf.train.GradientDescentOptimizer( 0.02).minimize(cross_entropy)
Before we actually start training, we should specify a few more nodes to assess how well the model does:
# Define accuracy correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast( correct_prediction, "float"))
The correct_prediction
node is 1
if our model assigns the highest probability to the correct class, and 0
otherwise. The accuracy
variable averages these predictions over the available data, giving us an overall sense of how well the model did.
When training in machine learning, we often want to use the same data point multiple times to squeeze all the information out. Each pass through the entire training data is called an epoch. Here, we're going to save both the training and validation accuracy every 10 epochs:
# Actually train epochs = 1000 train_acc = np.zeros(epochs//10) test_acc = np.zeros(epochs//10) for i in tqdm(range(epochs)): # Record summary data, and the accuracy if i % 10 == 0: # Check accuracy on train set A = accuracy.eval(feed_dict={ x: train.reshape([-1,1296]), y_: onehot_train}) train_acc[i//10] = A # And now the validation set A = accuracy.eval(feed_dict={ x: test.reshape([-1,1296]), y_: onehot_test}) test_acc[i//10] = A train_step.run(feed_dict={ x: train.reshape([-1,1296]), y_: onehot_train})
Note that we use feed_dict
to pass in different types of data to get different output values. Finally, train_step.run
updates the model every iteration. This should only take a few minutes on a typical computer, much less if you're using a GPU, and a bit more on an underpowered machine.
You just trained a model with TensorFlow; awesome!
After 1,000 epochs, let's take a look at the model. If you have Matplotlib installed, you can view the accuracies in a graphical plot; if not, you can still look at the number. For the final results, use the following code:
# Notice that accuracy flattens out print(train_acc[-1]) print(test_acc[-1])
If you do have Matplotlib installed, you can use the following code to display the plot:
# Plot the accuracy curves plt.figure(figsize=(6,6)) plt.plot(train_acc,'bo') plt.plot(test_acc,'rx')
You should see something like the following plot (note that we used some random initialization, so it might not be exactly the same):
It seems like the validation accuracy flattens out after about 400-500 iterations; beyond this, our model may either be overfitting or not learning much more. Also, even though the final accuracy of about 40 percent might seem poor, recall that, with five classes, a totally random guess would only have 20 percent accuracy. With this limited dataset, the simple model is doing all it can.
It's also often helpful to look at computed weights. These can give you a clue as to what the model thinks is important. Let's plot them by pixel position for a given class:
# Look at a subplot of the weights for each font f, plts = plt.subplots(5, sharex=True) for i in range(5): plts[i].pcolor(W.eval()[:,i].reshape([36,36]))
This should give you a result similar to the following (again, if the plot comes out very wide, you can squeeze in the window size to square it up):
We can see that the weights near the interior are important in some models, while the weights on the outside are essentially zero. This makes sense, since none of the font characters reach the corners of the images.
Again, note that your final results might look a little different due to random initialization effects. Always feel free to experiment and change the parameters of the model; that's how you'll learn new things.
In this chapter, we installed TensorFlow on a machine we can use. After some small steps with basic computations, we jumped into a machine learning problem, successfully building a decent model with just logistic regression and a few lines of TensorFlow code.
In the next chapter, we'll see TensorFlow in its prime with deep neural networks.
3.144.9.82