Classification is a supervised learning method for predicting a class label for a given example of input data. Although we introduced classification with MNIST, we work through the famous Fashion-MNIST dataset to delve deeper into the topic.
Fashion-MNIST is intended as a direct replacement for MNIST to better benchmark machine learning algorithms. It shares the same image size and structure of training and test splits, but is a more challenging classification problem.
MNIST benchmarking has several associated problems. It’s far too easy for standard machine learning algorithms to achieve over 97% accuracy. It’s even easier for deep learning models to achieve over 99% accuracy. The dataset is overused. Finally, MNIST cannot represent modern computer vision tasks.
Notebooks for chapters are located at the following URL: https://github.com/paperd/tensorflow.
- 1.
Click Runtime in the top-left menu.
- 2.
Click Change runtime type from the drop-down menu.
- 3.
Choose GPU from the Hardware accelerator drop-down menu.
- 4.
Click SAVE.
Import the tensorflow library. If ‘/device:GPU:0’ is displayed, the GPU is active. If ‘..’ is displayed, the regular CPU is active.
Fashion-MNIST Dataset
Fashion-MNIST is a dataset of clothing article images created by Zalando Research consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 × 28 grayscale image associated with a label from ten classes.
Zalando Research is an organization that utilizes an agile design process that combines invaluable human experience with the power of machine learning. Zalando concentrates on exploring novel ways to use generative models in fashion design for rapid visualizations and prototyping.
Load Fashion-MNIST as a TFDS
Since Fashion-MNIST is a tfds.data.Dataset, we can easily load it with tfds.load. To get a list of all TFDS, just run the tfds.list_builders method as demonstrated in Chapter 3.
Since we already have info from the train data, we don’t need to load it again for the test data.
Each image consists of 28 × 28 pixels. The 1 dimension indicates that images are grayscale. Each label is a scalar.
Explore the Dataset
We see the name, description, and homepage. We also see the shape and datatype of feature images and labels. We see that we have 70,000 examples with train and test splits of 60,000 and 10,000, respectively. A lot of other information is also included.
We have ten classes representing ten clothing articles.
The show_examples method displays sample images and labels. The label name and associated class number is displayed under each image. For example, the class number for Pullover is 2 because it is the third label in the class labels list.
Custom function for displaying sample data
Import a couple of libraries. The function accepts a dataset, number of samples to display, and a colormap. A colormap is an array of colors used to map pixel data to the actual color values. The matplotlib library provides a variety of built-in colormaps.
We assign an example image and label to variables. We then display the label name and its associated class number. We end by displaying the image with the imshow() function. We use [:, :, 0] to grab all pixels from each image.
Now, let’s build a custom function to display a grid of samples.
Take samples from the train set
Function that displays a grid of examples
Build the Input Pipeline
Build the input pipeline
Shuffle and batch train data. Scale train images. Cache and prefetch train images. Batch test data. Scale test images. Cache and prefetch test images. Use batch size of 128 and shuffle buffer size of 5,000 for this experiment.
Do not shuffle test data because it is considered new to the neural network model.
Caching a TFDS can significantly improve performance. The cache method of a tf.data.Dataset can cache a dataset either in memory or on local storage, which saves operations like file opening and data reading from being executed during each epoch.
Adding prefetch is a good idea because it adds efficiency to the batching process. While our training algorithm is working on one batch, TensorFlow is working on the dataset in parallel to get the next batch ready. So prefetch can dramatically improve training performance.
Both train and test images are 28 × 28 × 1. The 1 value means that images are grayscale. That is, images are in black and white.
Build the Model
Simple feedforward neural network
Import requisite libraries. Clear any previous model sessions and generate a seed to facilitate reproducibility of results. The first layer flattens images. The second layer uses relu activation on 512 neurons to process the data. The third layer uses dropout to reduce overfitting. The fourth layer uses softmax activation on ten neurons to account for class labels.
Model Summary
Output shape of the first layer is (None, 784). We get 784 by multiplying 28 by 28. We have no parameters at this layer because it is only used to bring data into the model.
Output shape of the second layer is (None, 512) because we have 512 neurons at this layer. We get parameters of 401,920 by multiplying 512 neurons at this layer by 784 neurons from the previous layer and adding 512 at this layer.
Output shape of the third layer is (None, 512), and parameters are 0 because dropout doesn’t impact neurons or parameters. Output shape of the fourth layer is (None, 10) because we have ten neurons at this layer to deal with ten output classes. We get parameters of 5,130 by multiplying 10 neurons at this layer by 512 from the previous layer and adding 10 neurons at this layer.
None is used because TensorFlow models can accept any batch size.
Compile the Model
Train the Model
We get pretty good accuracy with not much overfitting.
Generalize on Test Data
Visualize Performance
The fit method automatically records the history of the training process as a dictionary. So we can assign training information to a variable. In this case, we assign it to history. The history attribute of the variable contains the dictionary information.
The dictionary history.history contains loss, accuracy, val_loss, and val_accuracy metrics that the model measures at the end of each epoch on the training set and validation (or test) set.
Plot training performance
We don’t have much overfitting, and our model accuracy is pretty good for such a simple neural network.
Predict Labels for Test Images
Now that we have a trained model, we can make predictions based on test images. We predict from test images because the model sees these images as new data.
We use the predict method on the processed test set test_fs.
Since Fashion-MNIST has ten class labels, each prediction consists of an array of ten numbers that represent the model’s confidence in how well the image corresponds to each of the ten different articles of clothing.
The first prediction is at index 0 because Python indexes range from 0 to 9999 for test set size of 10,000. It’s hard to tell which of the values has the highest number by looking at the array of float number values.
Now, we can clearly see the value with the highest number. The position with the highest confidence corresponds to the position in the class labels array.
The prediction must be between 0 and 9 because our targets are between 0 and 9.
So we have the predicted clothing article for the first image in the test set.
We take the first image from a batch of 128 images because we set batch size to 128. If the prediction matches the test image, it was a correct prediction.
Build a Prediction Plot
Now that we have a trained model, we can build a prediction plot.
Take samples from the test set
Squeeze each image to remove the 1 dimension for plotting purposes.
Function to build a prediction plot
Invoke the prediction plot function
Any clothing article in red means that the prediction was incorrect. Above each article of clothing is the actual label, prediction in parentheses, and confidence in the prediction.
Load Fashion-MNIST as a Keras Dataset
Although TFDS are recommended for TensorFlow 2.x application, we show you how to work with a Keras dataset because of its popularity in industry. The newness of TensorFlow 2.x means that it doesn’t yet have the industry penetration of Keras.
Explore the Data
The Keras dataset contains the same data as the Fashion-MNIST TFDS. Train data consists of 60,000 28 × 28 feature images and 60,000 labels. Test data consists of 10,000 28 × 28 feature images and 10,000 labels. Train images are contained in the train tuple. So train[0] represents images and train[1] represents labels.
Since we access training labels with train[1], we grab the first label with train[1][0].
Visualize the First Image
Visualize Sample Images
Code to visualize examples
Prepare Data for Training
To prepare training data for TensorFlow consumption, we need to grab images and labels from train and test data. We scale images to ensure that each input parameter (a pixel, in our case) has a similar data distribution. The distribution of such data resembles a Gaussian curve centered at zero. Scaling data makes convergence faster while training the network.
Prepare the input pipeline
Set BATCH_SIZE to 128 so the model will run faster than with a smaller batch size. Experiment with this number and see what happens. We also set SHUFFLE_BUFFER_SIZE to 5000 so the shuffle method works well. Again, experiment with this number to see what happens.
Build the Model
Keras model
As we already know, it is always a good idea to clear any previous modeling sessions. Remember that we already created a model and trained it earlier in the chapter. Also, generating a seed ensures that the results are consistent. In machine learning, such consistency is called reproducibility. That is, the seed provides a starting point for the random generator that allows us to reproduce results in a consistent manner. You can use any integer value for the random seed number. We use 0.
The first layer, Flatten(), flattens the 2D matrix of 28 × 28 images to a 1D array of 784 pixels. The second layer is the first true layer. It accepts input images into 128 neurons and performs relu activation. The final layer is the output layer. It accepts output from the input layer with ten neurons that represent the ten classes of clothing articles and performs softmax activation.
Model Summary
Compile the Model
Train the Model
Generalize on Test Data
Visualize Training
The fit method automatically records the history of the training process as a dictionary. So we can assign training history to a variable. In this case, we assigned it to history. The history attribute of the variable contains the dictionary information.
Dictionary history.history contains loss and other metrics the model measures at the end of each epoch on the training set and validation set.
Training history plots
Overfitting is minimal because training accuracy is closely aligned with test accuracy. We did employ dropout in the model to reduce overfitting. Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel. The definition doesn’t make it easier to understand for anyone new to deep learning. So let’s explain how it works.
During training, some number of layer outputs are randomly ignored or dropped out. By randomly dropping out layer outputs, effectively the layer looks like and is treated like a layer with a different number of nodes and connectivity to the prior layer. So each update to a layer during training is performed with a different view of the configured layer. Dropout is a simple but very effective technique to reduce overfitting.
Setting dropout at 0.4 means that we randomly drop out 40% of the layer outputs. You can easily experiment with dropout by changing this value. However, you should keep dropout rate at or less than 0.5 because otherwise you are removing too much data!
Experiment with dropout levels, but keep it at or less than 0.5 to avoid removing too much data.
If the model is overfitting (train accuracy is greater than test accuracy) with the dropout value you set, you can increase it. If the model is underfitting (train accuracy is less than test accuracy), you can decrease it.
Predict Labels for Test Images
We can make predictions on test images since we have a trained model.
As with the Fashion-MNIST TFDS, a prediction is an array of ten numbers that represent the model’s confidence that the image corresponds to each of the ten articles of clothing.
Predict the First Image
The prediction can only be between 0 and 9 because our targets are between 0 and 9.
If the prediction and actual images match, the prediction is correct.
Predict Four Images
Visualize the first four actual test images
Explore Misclassifications
Let’s explore predictions and actual test image labels to find some misclassifications.
The y_pred variable holds a list of prediction labels.
If you don’t get any misclassifications, increase the size of n and rerun the last two code snippets.
We now have an array of misclassifications by their index.
Confidence in misclassifications
Visualize Misclassifications
Function to visualize misclassifications
The confidence may still be pretty high even though the prediction was incorrect. Although neural networks tend to perform well for prediction, their estimated predicted probabilities by the output of a softmax layer tend to be too high. That is, they are too confident in their predictions. Remember that the probability with the highest value is the predicted class from the prediction array. And this probability is the confidence. However, the prediction is not necessarily correct unless the model has 100% accuracy.
Sophisticated visualization of misclassifications
Misclassifications, if any, are in red. Above each article of clothing is the actual label, prediction in parentheses, and confidence in the prediction.
Predict from a Single Image
We can make a prediction on a single image. We choose a number between 0 and 9,999 because we want an image from the test set. Alternatively, we can generate a random number between 0 and 9,999.
Displayed is the index of the randomly chosen image.
TensorFlow models are optimized to make predictions on a batch or collection of examples at once. So add the single image to a batch where it is the only member.
The expand_dims method inserts a new axis that appears at the axis position in the expanded array shape. So the new shape is (1, 28, 28).
The index position in the prediction array is displayed.
Visualize Single Image Prediction
Visualization of single image prediction
Get prediction by label name and actual label name for the image. Get prediction confidence from the predictions object that we created earlier. The predictions object holds all predictions based on the test set. The indx value provides the position of the confidence value in the predictions object.
Confusion Matrix
To understand the TensorFlow confusion matrix, imagine that above the first row of the matrix are classes 0–9 from left to right. These represent predictions. Imagine that to the left of the first column of the matrix are classes 0–9 from top to bottom. These represent actual labels. Correct classifications are along the diagonal. Misclassifications are not on the diagonal.
The first set of correct classifications is in position row one, column one and represents class 0. The second set is in position row two, column two and represents class 1. And so on…
An example misclassification set is row one, column two. This represents the number of predictions for class 1 that were misclassified as class 0. For a technical explanation of the TensorFlow confusion matrix, peruse the following URL:
For a general explanation of a confusion matrix, peruse the following URL:
www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
Number of Hidden Layers
For many problems, we can begin with a single hidden layer and get reasonable results as we did with our Fashion-MNIST model in this chapter. For more complex problems, we should add layers until we start overfitting the training set.
Number of Neurons in Hidden Layers
The number of neurons in the input and output layers is based on the type of input and output your task requires. For example, the Fashion-MNIST task requires 28 × 28 = 784 input neurons and ten output neurons. For hidden layers, it is very difficult to determine. You can try increasing the number of neurons gradually until the network starts overfitting. In practice, we typically pick a model with more layers and neurons than we need and then use early stopping and other regularization techniques to reduce overfitting. Since this is an introduction, we won’t delve any deeper into tuning networks.
Generally, we get a better model by increasing the number of layers rather than the number of neurons per layer. Of course, the number of layers and neurons we include is limited by our available computational resources.