We're going to wrap-up deep CNN by evaluating our model's accuracy. Last time, we set up the final font recognition model. Now, let's see how it does. In this section, we're going to learn how to handle dropouts during training. Then, we'll see what accuracy the model achieved. Finally, we'll visualize the weights to understand what the model learned.
Make sure you pick up in your IPython session after training in the previous model. Recall that when we trained our model, we used dropout
to remove some outputs.
While this helps with overfitting, during testing we want to make sure to use every neuron. This both increases the accuracy and makes sure that we don't forget to evaluate part of the model. And that's why in the following code lines we have, keep_prob
is 1.0
, to always keep all the neurons.
# Check accuracy on train set A = accuracy.eval(feed_dict={x: train, y_: onehot_train, keep_prob: 1.0}) train_acc[i//10] = A # And now the validation set A = accuracy.eval(feed_dict={x: test, y_: onehot_test, keep_prob: 1.0}) test_acc[i//10] = A
Let's see how the final model did; just take a look at the training and testing accuracy as usual:
The training accuracy here topped 85 percent, and the testing accuracy isn't too far behind. Not too bad. How good a model does, depends on how noisy the input data is. If we only have a small amount of information, both in the number of examples and number of parameters or pixels, then we can't expect a model to perform perfectly.
In this case, one metric you can apply is how well a human could classify images of a single letter to each of these fonts. Some of the fonts are very distinctive, while others are similar, especially for certain letters. Because this is a novel dataset, there isn't a direct benchmark to compare against, but you can challenge yourself to beat the model presented in this course. If you do so, you might want to reduce the training time. Smaller networks with fewer parameters and simpler computations, of course, will be faster. Alternatively, if you start using a GPU or at least a multicore CPU, you can get dramatic speedups. Often 10X is better, depending on the hardware.
Part of this is parallelism, and part of it is highly-efficient low-level libraries fine-tuned for neural networks. But the easiest thing to do is start simple and work your way up to more complex models, just like you've been doing with this problem. Back to this model, let's see the confusion matrix:
# Look at the final testing confusion matrix pred = np.argmax(y.eval( feed_dict={x: test, keep_prob: 1.0, y_: onehot_test}), axis = 1) conf = np.zeros([5,5]) for p,t in zip(pred,np.argmax(onehot_test, axis=1)): conf[t,p] += 1 plt.matshow(conf) plt.colorbar()
The following is the output:
Here, we can see that the model is generally doing a good job on the various classes. Class 1
still isn't perfect, but it's much better than in the previous models. By building up smaller scale features into larger pieces, we have finally found some good indicators for the classes. Your images might not look exactly the same. It's possible to get a little unlucky with the results, depending on the random initialization of your weights.
Let's look at the weights for the 16 features of the first convolutional layer:
# Let's look at a subplot of some weights f, plts = plt.subplots(4,4) for i in range(16): plts[i//4,i%4].matshow(W1.eval()[:,:,0,i], cmap = plt.cm.gray_r)
Because the window size is 3x3, each one is a 3x3 matrix. Uh-huh! We can see the weights are definitely pulling out small-scale features.
You can see certain things like edges being detected or rounded corners, different things like that. If we redo the model with a larger window, this might be even more apparent. But it's impressive how many features you can spot in just these small patches.
Let's also look at the final layer weights, just to see how the different font classes interpret the final densely connected neurons.
Each row represents a class, and each column is one of the final hidden layer neurons. Some classes are influenced strongly by certain neurons, while others have minimal effect. And you can see how a given neuron is very important, positively or negatively, for certain classes, while mostly neutral for the others.
Note that because we've flattened our convolutions, we don't expect to see obvious structure in the output. These columns could be in any order, and still produce the same results. In the final section of the chapter, we checked out a real, live and frankly, pretty nice deep convolutional neural network model. We built up the idea using the practice of using convolutional and pooling layers, in order to extract small and large-scale features in structured data, such as images.
For many problems, this is among the most powerful types of neural networks.
18.227.183.131