© David Paper 2021
D. PaperTensorFlow 2.x in the Colaboratory Cloudhttps://doi.org/10.1007/978-1-4842-6649-6_7

7. Convolutional Neural Networks

David Paper1  
(1)
Logan, UT, USA
 

With feedforward neural networks, we achieved good training performance with MNIST and Fashion-MNIST datasets. But images in these datasets are simple and centered within the input space that contains them. That is, they are centered within the pixel matrix that holds them. Input space is all the possible inputs to a model.

Feedforward neural networks are very good at identifying patterns. So, if images occupy the same position within their input space, feedforward nets can quickly and effectively identify image patterns. And, if images are simple in terms of number of image pixels, patterns emerge more easily. But, if images don’t occupy the same positions in their input spaces, feedforward nets have great difficulty identifying patterns and thereby perform horribly! So we need a different model to work with these types of images.

We can train complex and off-center images with convolutional neural networks and get good results. A convolutional neural network (CNN or ConvNet) is a class of deep neural networks most commonly applied to visual imagery analysis. CNNs are inspired by their resemblance to biological processes in that the connectivity between neurons resembles the organization of the visual cortex in humans.

A CNN works differently than a feedforward network because it treats data as spatial. Instead of neurons being connected to every neuron in the previous layer, they are instead only connected to neurons close to them, and all have the same weight. The simplification in the connections means that the network upholds the spatial aspect of the dataset.

Suppose the image is a profile of a child’s face. A CNN doesn’t think the child’s eye is repeated all over the image. It can efficiently locate the child’s eye in the image because of the filtering process it undertakes.

A CNN processes an image dataset by assigning importance to various elements in each image, which enables it to differentiate between images. Importance is calibrated with learnable weights and biases. Preprocessing is much lower when compared to other classification algorithms because a CNN has the ability to learn how to adjust its filters during training.

The core building block of a CNN is the convolutional layer. A convolutional layer contains a series of filters that transform an input image by extracting features into feature maps for consumption by the next layer in the network. The transformation convolves the image with a set of learnable filters (or convolutional kernels) that have a small receptive field.

Convolution preserves the relationship between pixels by learning image features using small squares of input data. A convolutional kernel is a filter used on a subset of the pixel values of an input image. So a convolutional kernel is one of the small squares of input data that learns image features. A receptive field is the part of an image where a convolutional kernel operates at a given point in time. Feature maps of a CNN capture the result of applying convolutional kernels to an input image. So individual neurons respond to stimuli only in a restricted region of the receptive field (or visual field).

Whew! Simply, a convolutional kernel is a small matrix with its height and width smaller than the image to be convolved. During training, the kernel slides across the entire height and width of the input image, and the dot product of the kernel and the image is computed at every spatial position of the image. These computations create feature maps as output. So the entire image is convolved by a convolutional kernel! Such convolution is the key to the efficiency of a CNN because the filtering process allows it to adjust filter parameters during training.

We begin by discussing the CNN architecture. We start with some sample images to help you understand the type of data we are working with. We continue by building a complete CNN experiment. We work with the famous cifar10 dataset. This dataset contains 60,000 images created to allow deep learning aficionados to create and test deep learning models. We demonstrate how to load the data, build the input pipeline, and model the data. We also show you how to make predictions.

Notebooks for chapters are located at the following URL: https://github.com/paperd/tensorflow.

Enable the GPU (if not already enabled):
  1. 1.

    Click Runtime in the top-left menu.

     
  2. 2.

    Click Change runtime type from the drop-down menu.

     
  3. 3.

    Choose GPU from the Hardware accelerator drop-down menu.

     
  4. 4.

    Click SAVE.

     
Test if GPU is active:
import tensorflow as tf
# display tf version and test if GPU is active
tf.__version__, tf.test.gpu_device_name()

Import the tensorflow library. If ‘/device:GPU:0’ is displayed, the GPU is active. If ‘..’ is displayed, the regular CPU is active.

CNN Architecture

Like a feedforward neural network, a CNN consists of multiple layers. However, the convolutional layer and pooling layer make it unique. Like other neural networks, it also has a ReLU (rectified linear unit) layer and a fully connected layer. The ReLU layer in any neural net acts as an activation function ensuring nonlinearity as the data moves through each layer of the network. Without ReLU activation, the data being fed into each layer would lose the dimensionality that we want it to maintain. That is, we would lose the integrity of the original data as it moves through the network. The fully connected layer allows a CNN to perform classification on the data.

As noted earlier, the most important building block of a CNN is the convolutional layer. Neurons in the first convolutional layer are connected to every single pixel in the input image, but only to pixels in their receptive fields, that is, only to pixels close to them. A convolutional layer works by placing a filter (or convolutional kernel) over an array of image pixels. The filtering process creates a convolved feature map, which is the output of a convolutional layer.

A feature map is created by projecting input features from our data to hidden units to form new features to feed to the next layer. A hidden unit corresponds to the output of a single filter at a single particular x/y offset in the input volume. Simply, a hidden unit is the value at a particular x,y,z coordinate in the output volume.

Once we have a convolved feature map, we move to the pooling layer. The pooling layer subsamples a particular feature map. Subsampling shrinks the size of the input image to reduce computational load, memory usage, and the number of parameters. Reducing the number of parameters that the network needs to process also limits the risk of overfitting. The output of the pooling layer is a pooled feature map.

We can pool feature maps in two ways. Max pooling takes the maximum input of a particular convolved feature map. Average pooling takes the average input of a particular convolved feature map.

The process of creating pooled feature maps results in feature extraction that enables the network to build up a picture of the image data. With a picture of the image data, the network moves into the fully connected layer to perform classification. As we did with feedforward nets, we flatten the data for consumption by the fully connected layer because it can only process linear data.

Conceptually, a CNN is pretty complex as you can tell from our discussion. But implementing a CNN in TensorFlow is pretty straightforward. Each input image is typically represented as a 3D tensor of shape height, width, and channels. When classifying a 3D color image, we feed CNN image data in three channels, namely, red, green, and blue. Color images are typically referred to as RGB images. A batch (e.g., mini-batch) is represented as a 4D tensor of shape batch size, height, width, and channels.

Load Sample Images

The scikit-learn load_sample_image method allows us to practice with two color images – china.jpg and flower.jpg. The method loads the numpy array of a single sample image and returns it as a 3D numpy array consisting of height by width by color.

Load the images:
from sklearn.datasets import load_sample_image
china, flower = load_sample_image('china.jpg'),
load_sample_image('flower.jpg')
china.shape, flower.shape

Both china and flower images are represented as 427 × 640-pixel matrices with three channels to account for RGB color.

Display Images

Listing 7-1 displays the images.
import matplotlib.pyplot as plt
# function to plot RGB images
def plot_color_image(image):
    plt.imshow(image, interpolation="nearest")
    plt.axis("off")
plot_color_image(china)
plt.show()
plot_color_image(flower)
Listing 7-1

Display china and flower images

Scale Images

Scaling images improves training performance. Since each image pixel is represented by a byte from 0 to 255, we divide each image by 255 to scale it.

Listing 7-2 scales images.
import numpy as np
# slice off a few pixels prior to scaling
br = ' '
print ('pixels as loaded:', br)
print ('china pixels:', end = '  ')
print (np.around(china[0][0]))
print ('flower pixels:', end = ' ')
print (np.around(flower[0][0]), br)
# scale images
china_sc, flower_sc = china / 255., flower / 255.
# slice off some pixels to verify that scaling worked
print ('pixels scaled:', br)
print ('china pixels:', end = '  ')
print (np.around(china_sc[0][0], decimals=3))
print ('flower pixels:', end = ' ')
print (np.around(flower_sc[0][0], decimals=3))
Listing 7-2

Scale images

Scaling worked because pixel intensity is between 0 and 1.

Display Scaled Images

Plot scaled images:
plot_color_image(china_sc)
plt.show()
plot_color_image(flower_sc)

Scaling doesn’t impact the images, which makes sense because scaling modifies pixel intensity proportionally. That is, each pixel number is converted proportionally to a number between 0 and 1.

Get More Images

Let’s get a couple more images. To get images for this book, just follow these simple steps:
  1. 1.

    Go to the GitHub URL for this book: https://github.com/paperd/tensorflow.

     
  2. 2.

    Locate the image you want to download and click it.

     
  3. 3.

    Click the Download button.

     
  4. 4.

    Right-click anywhere inside the image.

     
  5. 5.

    Click Save image as….

     
  6. 6.

    Save the image on your computer.

     
  7. 7.

    Drag and drop the image to your Google Drive Colab Notebooks folder.

     
  8. 8.

    Repeat steps 1–7 as necessary for multiple images.

     

For this lesson, go to the book URL, click chapter7, click images, click fish.jpg, click the Download button, right-click inside the image, and click Save image as… to save it on your computer. Drag and drop the image to your Google Drive Colab Notebooks folder. Repeat the same process for the happy_moon.jpg image.

../images/501128_1_En_7_Chapter/501128_1_En_7_Figa_HTML.jpg../images/501128_1_En_7_Chapter/501128_1_En_7_Figb_HTML.jpg

Mount Google Drive

Mount Colab to Google Drive:
from google.colab import drive
drive.mount('/content/drive')

Click the URL, choose a Google account, click Allow, copy the authorization code and paste it into Colab in the textbox Enter your authorization code:, and press the Enter key on your keyboard.

Copy Images to Google Drive

Before executing the code in this section, be sure that you have the fish.jpg and happy_moon.jpg images in the Colab Notebooks directory on your Google Drive!

Check your Google Drive account to verify the proper path. We saved the images to the Colab Notebooks directory, which is recommended. If you save them somewhere else, you must change the paths accordingly.

Listing 7-3 loads the images to the Colab environment, scales the images, and displays them.
# be sure to copy images to the directory on Google Drive
from PIL import Image
import numpy as np
# create paths to images
fish_path = 'drive/My Drive/Colab Notebooks/fish.jpg'
moon_path = 'drive/My Drive/Colab Notebooks/happy_moon.jpg'
# create images
fish, moon  = Image.open(fish_path), Image.open(moon_path)
# convert images to numpy arrays and scale
fish_np, moon_np = np.array(fish), np.array(moon)
fish_sc, moon_sc = fish_np / 255., moon_np / 255.
# display images
plot_color_image(fish_sc)
plt.show()
plot_color_image(moon_sc)
Listing 7-3

Load, scale, and display images

Verify that new images were scaled properly:
# slice off some pixels and display
print ('fish pixels:', end = ' ')
print (np.around(fish_sc[0][0], decimals=3), br)
print ('moon pixels:', end = ' ')
print (np.around(moon_sc[0][0], decimals=3))

All is well so far.

Check Image Shapes

For machine learning applications, images must be of the same shape.

Let’s explore image shapes:
print ('original shapes:')
display (china_sc.shape, flower_sc.shape)
print (), print ('new shapes:')
display (fish_sc.shape, moon_sc.shape)

Uh-oh! Shapes are not the same! What do we do?

Resize Images

Let’s resize the fish and moon images to equalize shapes:
fish_rs = np.array(tf.image.resize(
    fish_sc, [427, 640]))
moon_rs = np.array(tf.image.resize(
    moon_sc, [427, 640]))
fish_rs.shape, moon_rs.shape

Now, all four images have size of (427, 640, 3).

Plot the resized images:
plot_color_image(fish_rs)
plt.show()
plot_color_image(moon_rs)

Success! We resized the new images to correspond to the original ones.

Create a Batch of Images

Create a batch with all four images:
new_images = np.array([china_sc, flower_sc,
                       fish_rs, moon_rs])
new_images.shape

Now, we have a batch of four 427 × 640 color images. RGB color is indicated by the 3 dimension.

Create Filters

Let’s create two simple 7 × 7 filters. We want our first filter to have a vertical white line in the middle and our second to have a horizontal white line in the middle. Filters are used to extract features from images during the process of convolution. Typically, filters are referred to as convolutional kernels.

Create the filters:
# assign some variables
batch_size, height, width, channels = new_images.shape
# create 2 filters
ck = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)
ck.shape

The zeros method returns a given shape and type filled with zeros. Since variable ck is filled with zeros, all of its pixels are black. Remember that pixel image values are integers that range from 0 (black) to 255 (white).

So ck is a 4D tensor that contains two 7 × 7 convolutional kernels with three channels. The filters must have three channels to match the color images in the batch of images we created.

Add a vertical white line and horizontal white line:
ck[:, 3, :, 0] = 1  # add vertical line
ck[3, :, :, 1] = 1  # add horizontal line

The code changes the intensity of select pixels to get a vertical white line and a horizontal white line.

Plot Convolutional Kernels

Listing 7-4 plots the two convolutional kernels we just created.
# function to plot filters
def plot_image(image):
    plt.imshow(image, cmap="gray", interpolation="nearest")
    plt.axis("off")
print ('vertical convolutional kernels:')
plot_image(ck[:, :, 0, 0])
plt.show()
print ('horizontal convolutional kernels:')
plot_image(ck[:, :, 0, 1])
Listing 7-4

Convolutional kernel plots

We see that the vertical and horizontal white lines (or convolutional kernels) are in position. So we have successfully created two simple convolutional kernels.

Apply a 2D Convolutional Layer

Apply a 2D convolutional layer to the batch of images:
# apply a 2D convolutional layer
outputs = tf.nn.conv2d(new_images, ck, strides=1,
 padding='SAME')

The tf.nn.conv2d method computes a 2D convolution given 4D input and convolutional kernel tensors. We set strides equal to 1. A stride is the number of pixels we shift the convolutional kernels over the input matrix during training. With strides of 1, we move the convolutional kernels one pixel at a time. We set padding to SAME. Padding is the number of pixels added to an image when it is being processed by the CNN. For example, if padding is set to zero, every pixel value that is added will be of value zero. Padding set to SAME means that we use zero padding.

After applying the convolutional layer, the variable outputs contains the feature maps based on our images. Since each convolutional kernel creates a feature map (and we have two convolutional kernels), each image has two feature maps.

Visualize Feature Maps

Visualize the feature maps we just created as shown in Listing 7-5.
rows = 4  # one row for each image
columns = 2  # two feature maps for each image
cnt = 1
fig = plt.figure(figsize=(8, 8))
for i, img in enumerate(outputs):
  for j in (0, 1):
    fig.add_subplot(rows, columns, cnt)
    plt.imshow(outputs[i, :, :, j], cmap="gray")
    plt.axis('off')
    cnt += 1
plt.show()
Listing 7-5

Feature maps plot

Since we have two convolutional kernels and four images, the convolutional layer produces eight feature maps. Just multiply 2 by 4! So we have two feature maps for each image. Wow! With two simple convolutional kernels, we were able to extract excellent facsimiles of our batch of images by applying a single convolutional layer.

CNN with Trainable Filters

We just manually defined two convolutional kernels. But, in a real CNN, we typically define convolutional kernels as trainable variables so the neural net can learn the convolutional kernels that work best.

Create a simple model that lets the network decide the best convolutional kernels:
conv = tf.keras.layers.Conv2D(filters=32, kernel_size=3,
                              strides=1, padding="SAME",
                              activation='relu')

We create a Conv2D layer with 32 convolutional kernels. Each convolutional kernel is a 3 × 3 tensor indicated by kernel_size. We use strides of 1 both horizontally and vertically. Padding is SAME. Finally, we apply relu activation to the output.

Convolutional layers have quite a few hyperparameters including number of filters (or convolutional kernels), height and width of convolutional kernels, strides, padding type, and activation type. To get the best performance, we can tune the hyperparameters. But tuning is an advanced topic that we believe is not appropriate for an introductory book. Instead, we provide fundamental examples that you can practice to develop practical skills.

Building a CNN

Although a CNN is a sequential neural net, it does differ from a feedforward sequential neural net in two important ways. First, it has a convolutional base that is not fully connected. Second, it has a pooling layer to reduce the sample size of feature maps created by each convolutional layer. We still use a fully connected layer for classification.

We begin this experiment by loading a dataset of color images. We continue by preparing the data for TensorFlow consumption. We then build and test a CNN model. The dataset we use is cifar10. We previously modeled this dataset with a feedforward model, but our results were horrible. So we want to show you how much better a CNN works with complex color images.

Load Data

The recommended way to load cifar10 is as a TFDS:
# import TFDS library
import tensorflow_datasets as tfds
Load train and test sets:
train, info = tfds.load('cifar10', split="train",
                        with_info=True, shuffle_files=True)
test = tfds.load('cifar10', split="test")

Since we already have info from the train set, we don’t need it again for the test set.

Verify tensors:
train.element_spec, test.element_spec

Train and test shapes are 32 × 32 × 3. So each image is a 32 × 32 three-channel image. The 3 dimension informs the model that images are RGB color.

Display Information About the Dataset

The info object displays information about the data:
info

We see the name, description, homepage, and shapes and datatypes of feature images and labels. We also see that the dataset has 60,000 images with train and test splits of 50,000 and 10,000, respectively.

Extract Class Labels

Extract some useful information from the info object:
br = ' '
num_classes = info.features['label'].num_classes
class_labels = info.features['label'].names
print ('number of classes:', num_classes, br)
print ('class labels:', class_labels)

Display Samples

The show_examples method shows a few examples:
fig = tfds.show_examples(train, info)

Build a Custom Function to Display Samples

Listing 7-6 is a function that displays samples.
import matplotlib.pyplot as plt, numpy as np
def display_samples(data, num, cmap):
  for example in data.take(num):
    image, label = example['image'], example['label']
    print ('Label:', class_labels[label.numpy()], end=', ')
    print ('Index:', label.numpy())
    plt.imshow(image.numpy()[:, :, 0].astype(np.float32),
               cmap=plt.get_cmap(cmap))
    plt.show()
Listing 7-6

Function to display samples

The function retrieves images and label names. It then displays the image with its label name and index. The index is the class label as a number.

Invoke the function:
# choose colormap by changing 'indx'
cmap = ['coolwarm', 'viridis', 'plasma',
        'seismic', 'twilight', 'Spectral']
indx, samples = 5, 3
display_samples(train, samples, cmap[indx])

Change the color by adjusting indx between 0 and 5. Change the number of samples displayed by adjusting samples. Peruse the following URL to learn more about colormaps: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html.

Build a Custom Function to Display a Grid of Examples

Begin by taking 30 examples from the train set as shown in Listing 7-7.
num = 30
images, labels = [], []
for example in train.take(num):
  image, label = example['image'], example['label']
  images.append(tf.squeeze(image.numpy()))
  labels.append(label.numpy())
Listing 7-7

Processed examples from the train set

To enable image plotting, we remove (or squeeze) the 3 dimension from the image matrix.

Build the function as shown in Listing 7-8.
def display_grid(feature, target, n_rows, n_cols, cl):
  plt.figure(figsize=(n_cols * 1.5, n_rows * 1.5))
  for row in range(n_rows):
    for col in range(n_cols):
      index = n_cols * row + col
      plt.subplot(n_rows, n_cols, index + 1)
      plt.imshow(feature[index], cmap="binary",
                 interpolation='nearest')
      plt.axis('off')
      plt.title(cl[target[index]], fontsize=12)
  plt.subplots_adjust(wspace=0.2, hspace=0.5)
Listing 7-8

Function to display a grid of examples

Invoke the function:
rows, cols = 5, 6
display_grid(images, labels, rows, cols, class_labels)

Voilà!

Pinpoint Metadata

Leverage the info object to pinpoint metadata:
print ('Number of training examples:', end=' ')
print (info.splits['train'].num_examples)
print ('Number of test examples:', end=' ')
print (info.splits['test'].num_examples)

Build the Input Pipeline

Build the input pipeline as shown in Listing 7-9.
BATCH_SIZE = 128
SHUFFLE_SIZE = 5000
train_1 = train.shuffle(SHUFFLE_SIZE).batch(BATCH_SIZE)
train_2 = train_1.map(lambda items: (
    tf.cast(items['image'], tf.float32) / 255., items['label']))
train_cf = train_2.cache().prefetch(1)
test_1 = test.batch(BATCH_SIZE)
test_2 = test_1.map(lambda items: (
    tf.cast(items['image'], tf.float32) / 255., items['label']))
test_cf = test_2.cache().prefetch(1)
Listing 7-9

Build the input pipeline

We build the input pipeline by shuffling train data, batching, scaling, caching, and prefetching. We scale images by mapping with a lambda function. Adding the cache method increases performance on a TFDS because data is read and written only once rather than during each epoch. Adding the prefetch method is a good idea because it adds efficiency to the batching process. That is, while our training algorithm is working on one batch, TensorFlow is working on the dataset in parallel to get the next batch ready. So prefetch can dramatically improve training performance.

Verify that train and test sets are created correctly:
train_cf.element_spec, test_cf.element_spec

Create the Model

Begin with a relatively robust CNN model because it is the only way to get decent performance from complex color images. Don’t be daunted by the number of layers! Remember that a CNN has a convolutional base and fully connected network. So we can think of a CNN in two parts. First, we build the convolutional base that includes one or more convolutional layers and pooling layers. Pooling layers are included to subsample the feature maps outputted from convolutional layers to reduce computational expense. Next, we build a fully connected layer for classification.

Create the model by following these steps:
  1. 1.

    Import libraries.

     
  2. 2.

    Clear previous models.

     
  3. 3.

    Create the model.

     
Import libraries:
# import libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D,
Dense, Flatten, Dropout
Clear previous models and plant a seed:
# clear previous models and generate a seed
tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)
Create the model as shown in Listing 7-10.
# build the model
model = Sequential([
  Conv2D(32, (3, 3), activation = 'relu', padding="same",
         input_shape=[32, 32, 3], strides=1),
  MaxPooling2D(2),
  Conv2D(64, (3, 3), activation="relu", padding="same"),
  MaxPooling2D(2),
  Conv2D(64, (3, 3), activation="relu", padding="same"),
  Flatten(),
  Dense(64, activation="relu"),
  Dropout(0.5),
  Dense(10, activation="softmax")
])
Listing 7-10

Build the model

The first layer is the convolutional base, which uses 32 convolutional kernels and kernel size of 3 × 3. We use relu activation, same padding, and strides of 1. We also set shape at 32 × 32 × 3 to match the 32 × 32-pixel images. Since images are in color, we include the 3 value at the end. Next, we include a max pooling layer of size 2 (so it divides each spatial dimension by a factor of 2) to subsample the feature maps from the first convolutional layer. We then repeat the same structure twice, but increase the number of convolutional kernels to 64. It is common practice to double the number of convolutional kernels after each pooling layer.

We continue with the fully connected network, which flattens its inputs because a dense network expects a 1D array of features for each instance. We need to add the fully connected layer to enable classification of our ten labels. We continue with a dense layer of 64 neurons. We add dropout to reduce overfitting. The final dense layer accepts ten inputs to match the number of labels. It uses softmax activation.

Model Summary

Inspect the model:
model.summary()

Parameters are the number of learnable weights during training. The convolutional layer is where the CNN begins to learn. But calculating parameters for a CNN is more complex than a feedforward network.

The first layer is a convolutional layer with 32 neurons acting on the data. Filter size is 3 × 3. So we have a 3 × 3 × 32 filter since our input has 32 dimensions (or neurons) for a total of 288. Multiply 288 by 3 to account for the 3D RGB color images for a total of 864. Add 32 neurons at this layer to get a total of 896 parameters.

There are no parameters to learn at the pooling layers. So we have 0 parameters.

The second convolutional layer has 64 neurons acting on the data. Filter size is 3 × 3. So we have a 3 × 3 × 64 filter since we have 64 dimensions for a total of 576. Multiply 32 neurons from the previous convolutional layer by 576 to get a total of 18,432. Add 64 neurons at this layer to get a total of 18,496 parameters.

The third convolutional layer has 64 neurons acting on the data. Filter size is 3 × 3. So we have a 3 × 3 × 64 filter since we have 64 dimensions for a total of 576. Multiply 64 neurons from the previous convolutional layer by 576 to get a total of 36,864. Add 64 neurons at this layer to get a total of 36,928 parameters.

The fully connected dense layer is calculated as before. We get 262,144 by multiplying 4,096 neurons at this layer by 64 neurons from the previous layer. Add 64 neurons at this layer to get a total of 262,208 parameters.

The output layer has 650 parameters by multiplying 64 neurons from the previous layer by 10 at this layer and adding 10 neurons at this layer. Whew!

Model Layers

Inspect model layers:
model.layers

Compile the Model

Through experimentation, we found that the Nadam optimizer performed the best:
model.compile(optimizer='nadam',
 loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Train the Model

Train the model for ten epochs:
epochs = 10
history = model.fit(train_cf, epochs=epochs,
                    verbose=1, validation_data=test_cf)

Although our model is not state of the art, we do much better than we did with a feedforward net.

Generalize on Test Data

Generalize:
print('Test accuracy:', end=' ')
test_loss, test_acc = model.evaluate(test_cf, verbose=2)

Visualize Training Performance

Listing 7-11 visualizes training.
plt.plot(history.history['accuracy'], label="accuracy")
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()
plt.plot(history.history['loss'], label="loss")
plt.plot(history.history['val_loss'], label = 'val_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim([0.5, 1.0])
plt.legend(loc='lower right')
plt.show()
Listing 7-11

Visualization of training performance

Predict Labels for Test Images

Predict on test_cf data:
predictions = np.argmax(model.predict(test_cf), axis=-1)

Wrap the predict method with the argmax method to get predicted labels directly rather than generating probability arrays.

Get predictions by class number:
# predictions by class number
predictions
Get predictions by label:
# predictions by class label
np.array(class_labels)[predictions]
Get the first five predictions:
# 5 predictions
pred_5 = predictions[:5]
pred_5
Convert label numbers to label names:
pred_labels = np.array(class_labels)[pred_5]
pred_labels
Get the first five actual labels as shown in Listing 7-12.
# take the first batch of images
ls = []
for _, label in test_cf.take(1):
  ls.append(label.numpy())
# slice first five from batch
actuals = ls[0][0:5]
# convert to labels
actuals = [class_labels[row] for row in actuals]
actuals
Listing 7-12

First five actual labels

Get the first batch of images. Since we set batch size at 128, we get the first 128 images. Slice the first five images from the batch. Convert to label names.

Compare pred_labels to actual_labels to get an idea of prediction performance.

Build a Prediction Plot

Begin by taking 20 samples from the test set as shown in Listing 7-13.
num = 20
images, labels = [], []
for example in test.take(num):
  image, label = example['image'], example['label']
  images.append(tf.squeeze(image.numpy()))
  labels.append(label.numpy())
Listing 7-13

Take samples from the test set

Build a Custom Function

Build a function to display results as shown in Listing 7-14.
def display_test(feature, target, num_images,
                 n_rows, n_cols, cl, p):
  for i in range(num_images):
    plt.subplot(n_rows, 2*n_cols, 2*i+1)
    plt.imshow(feature[i])
    title_obj = plt.title(cl[target[i]] + ' (' +
 cl[p[i]] + ') ')
    if cl[target[i]] == cl[p[i]]:
      title_obj
    else:
      plt.getp(title_obj, 'text')
      plt.setp(title_obj, color="r")
  plt.tight_layout()
plt.show()
Listing 7-14

Function to display results

Invoke the function:
num_rows, num_cols = 5, 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
display_test(images, labels, num_images, num_rows,
             num_cols, class_labels, predictions)

Titles in red indicate misclassifications.

Build a CNN with Keras Data

Although loading data as a TFDS is recommended, Keras is very popular in industry. So let’s build a model from keras.datasets.

Load train and test data:
train_k, test_k = tf.keras.datasets.cifar10.load_data()
Verify data shapes:
print ('train data:', br)
print (train_k[0].shape)
print (train_k[1].shape, br)
print ('test data:', br)
print (test_k[0].shape)
print (test_k[1].shape)
Create class label names:
class_labels = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                'dog', 'frog', 'horse', 'ship', 'truck']

Create Variables to Hold Train and Test Samples

Create variables to hold train and test data as shown in Listing 7-15.
# create simple variables from train tuple
train_images = train_k[0]
train_labels = train_k[1]
# display first train label
print ('1st train label:', class_labels[train_labels[0][0]])
# create simple variables from test tuple
test_images = test_k[0]
test_labels = test_k[1]
# display first test label
print ('1st test label: ', class_labels[test_labels[0][0]])
Listing 7-15

Create variables to hold train and test data

Display Sample Images

It’s always a good idea to display some images. In this case, we display 30 images from the training dataset. Visualization allows us to verify that images and labels correspond. That is, a frog image is labeled as a frog and so on.

Listing 7-16 displays sample images from the training set.
n_rows = 5
n_cols = 6
plt.figure(figsize=(n_cols * 1.5, n_rows * 1.5))
for row in range(n_rows):
  for col in range(n_cols):
    index = n_cols * row + col
    plt.subplot(n_rows, n_cols, index + 1)
    plt.imshow(train_images[index], cmap="binary",
               interpolation='nearest')
    plt.axis('off')
    plt.title(class_labels[int(train_labels[index])],
 fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
Listing 7-16

Display sample images

Create the Input Pipeline

Build the input pipeline by scaling images and slicing them into TensorFlow consumable pieces. Continue by shuffling (where appropriate), batching, and prefetching.

Listing 7-17 creates the input pipeline.
# scale images
train_img_sc = train_images / 255.  # divide by 255 to scale
train_lbls = train_labels.astype(np.int32)
test_img_sc = test_images/255.  # divide by 255 to scale
test_lbls = test_labels.astype(np.int32)
# slice data
train_ks = tf.data.Dataset.from_tensor_slices(
    (train_img_sc, train_lbls))
test_ks = tf.data.Dataset.from_tensor_slices(
    (test_img_sc, test_lbls))
# shuffle, batch, and prefetch
BATCH_SIZE = 128
SHUFFLE_BUFFER_SIZE = 5000
train_ds = train_ks.shuffle(
    SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).prefetch(1)
test_ds = test_ks.batch(BATCH_SIZE).prefetch(1)
Listing 7-17

Build the input pipeline

Inspect tensors:
train_ds, test_ds

Create the Model

Listing 7-18 creates the model.
# import libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D,
Dense, Flatten, Dropout
# clear previous models and generate a seed
tf.keras.backend.clear_session()
np.random.seed(0)
tf.random.set_seed(0)
# build the model
model = Sequential([
  Conv2D(32, 3, activation = 'relu', padding="same",
         input_shape=[32, 32, 3]),
  MaxPooling2D(2),
  Conv2D(64, 3, activation="relu", padding="same"),
  MaxPooling2D(2),
  Conv2D(64, 3, activation="relu", padding="same"),
  Flatten(),
  Dense(64, activation="relu"),
  Dropout(0.5),
  Dense(10, activation="softmax")
])
Listing 7-18

Create the model

Compile and Train

Listing 7-19 compiles and trains the model.
# compile
model.compile(optimizer='nadam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
# train
epochs = 10
history = model.fit(train_ds, epochs=epochs,
                    verbose=1, validation_data=test_ds)
Listing 7-19

Compile and train the model

Predict

Predict from the model:
pred_ks = np.argmax(model.predict(test_images), axis=-1)

Visualize Results

Listing 7-20 visualizes training performance results.
# plot the first X (num_rows * num_cols) test images
# (true and predicted labels)
num_rows = 5
num_cols = 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  ax = plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plt.imshow(test_images[i])
  title = class_labels[int(test_labels[i])] +
  ' (' +class_labels[pred_ks[i]] + ') '
  plt.title(title)
  if class_labels[int(test_labels[i])] !=
     class_labels[pred_ks[i]]:
    ax.set_title(title, style="italic", color="red")
  plt.axis('off')
plt.tight_layout()
Listing 7-20

Visualize training performance results

Epilogue

Many improvements to the fundamental CNN architecture have been developed over the past few years that vastly improve prediction performance. Although we don’t cover these advances in this lesson, we believe that we provided the basic foundation with CNNs to help you comfortably work with these recent advances and even the many advances to come in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.65.65