Constructing the convolutional network

We will skip explanations for the two utility functions, reformat and accuracy, as we've already encountered these in Chapter 2, Your First Classifier. Instead, we will jump directly to the neural network configuration. For comparison, the following figure shows our model from Chapter 2, Your First Classifier, and the next figure shows our new model. We'll run the new model on the same notMNIST dataset to see the accuracy boost that we will get (hint: good news!):

The following figure is our new model:

First, we will encounter a helper function, as follows:

    def fc_first_layer_dimen(image_size, layers): 
       output = image_size 
       for x in range(layers): 
        output = math.ceil(output/2.0) 
       return int(output)

Then, we will call it later, as follows:

    fc_first_layer_dimen(image_size, conv_layers)

The fc_first_layer_dimen function calculates the dimensions of the first fully connected layer. Recall how CNN's typically use a series of layers with a smaller window layer after layer. Here, we've decided to reduce the dimensions by half for each convolutional layer we used. This also shows why having input images highly divisible by powers of two makes things nice and clean.

Let's now parse the actual network. This is generated using the nn_model method and called later when training the model, and again when testing against the validation and test sets.

Recall how CNN's are usually composed of the following layers:

Convolutional layers
Rectified linear unit layers
Pooling layers
Fully connected layers

The convolutional layers are usually paired with RELU layers and repeated. That is what we've done—we've got three nearly identical CONV-RELU layers stacked on top of each other.

Each of the paired layers appears as follows:

    with tf.name_scope('Layer_1') as scope: 
        conv = tf.nn.conv2d(data, weights['conv1'], strides=[1, 1, 
         1, 1], padding='SAME', name='conv1')        
        bias_add = tf.nn.bias_add(conv, biases['conv1'], 
         name='bias_add_1') 
        relu = tf.nn.relu(bias_add, name='relu_1') 
        max_pool = tf.nn.max_pool(relu, ksize=[1, 2, 2, 1], 
         strides=[1, 2, 2, 1], padding='SAME', name=scope)

The major difference across the three nearly identical layers (Layer_1, Layer_2, and Layer_3) is how the output of one is fed to the next in a series. So, the first layer begins by taking in data (the image data) but the second layer begins by taking in the pooling layer output from the first layer, as follows:

    conv = tf.nn.conv2d(max_pool, weights['conv2'], strides=[1, 1, 1, 
     1], padding='SAME', name='conv2')

Similarly, the third layer begins by taking in the pooling layer output from the second layer, as follows:

    conv = tf.nn.conv2d(max_pool, weights['conv3'], strides=[1, 1, 1, 
     1], padding='SAME', name='conv3')

There is another major difference across the three CONV-RELU layers, that is, the layers get squeezed. It might help to peek at the conv variable after each layer is declared using a couple of print statements like this:

    print "Layer 1 CONV", conv.get_shape() 
    print "Layer 2 CONV", conv.get_shape() 
    print "Layer 3 CONV", conv.get_shape()

This will reveal the following structures:

Layer 1 CONV (32, 28, 28, 4) 
Layer 2 CONV (32, 14, 14, 4) 
Layer 3 CONV (32, 7, 7, 4) 
Layer 1 CONV (10000, 28, 28, 4) 
Layer 2 CONV (10000, 14, 14, 4) 
Layer 3 CONV (10000, 7, 7, 4) 
Layer 1 CONV (10000, 28, 28, 4) 
Layer 2 CONV (10000, 14, 14, 4) 
Layer 3 CONV (10000, 7, 7, 4)

We ran this with the notMNIST dataset, so we will see an original input size of 28x28 to no surprise. More interesting are the sizes of successive layers—14x14 and 7x7. Notice how the filters for successive convolutional layers are squeezed.

Let's make things more interesting and examine the entire stack. Add the following print statements to peek at the CONV, RELU, and POOL layers:

 print "Layer 1 CONV", conv.get_shape() 
 print "Layer 1 RELU", relu.get_shape() 
 print "Layer 1 POOL", max_pool.get_shape()

Add similar statements after the other two CONV-RELU-POOL stacks and you'll find the following output:

Layer 1 CONV (32, 28, 28, 4) 
Layer 1 RELU (32, 28, 28, 4) 
Layer 1 POOL (32, 14, 14, 4) 
Layer 2 CONV (32, 14, 14, 4) 
Layer 2 RELU (32, 14, 14, 4) 
Layer 2 POOL (32, 7, 7, 4) 
Layer 3 CONV (32, 7, 7, 4) 
Layer 3 RELU (32, 7, 7, 4) 
Layer 3 POOL (32, 4, 4, 4) 
...

We will ignore the outputs from the validation and test instances (those are the same, except with a height of 10000 instead of 32 as we're processing the validation and test sets rather than a minibatch).

We will see from the outputs how the dimension is squeezed at the POOL layer (28 to 14) and how that squeeze then carries to the next CONV layer. At the third and final POOL layer, we will end up with a 4x4 size.

There is another feature on the final CONV stack—a dropout layer that we will use when training, which is as follows:

 max_pool = tf.nn.dropout(max_pool, dropout_prob, seed=SEED, 
  name='dropout')

This layer utilizes the dropout_prob = 0.8 configuration we set earlier. It randomly drops neurons on the layer to prevent overfitting by disallowing nodes from coadapting to neighboring nodes with dropouts; they can never rely on a particular node being present.

Let's proceed through our network. We'll find a fully connected layer followed by a RELU:

    with tf.name_scope('FC_Layer_1') as scope: 
        matmul = tf.matmul(reshape, weights['fc1'], 
         name='fc1_matmul')       
         bias_add = tf.nn.bias_add(matmul, biases['fc1'], 
         name='fc1_bias_add') 
        relu = tf.nn.relu(bias_add, name=scope)

Finally, we will end with a fully connected layer, as follows:

    with tf.name_scope('FC_Layer_2') as scope: 
        matmul = tf.matmul(relu, weights['fc2'], 
         name='fc2_matmul')       
        layer_fc2 = tf.nn.bias_add(matmul, biases['fc2'], 
         name=scope)

This is typical for the convolutional network. Typically, we will end up with a fully connected, RELU layer and finally a fully connected layer that holds scores for each class.

We skipped some details along the way. Most of our layers were initialized with three other values—weights, biases, and strides:

The weights and biases are themselves initialized with other variables. I didn't say this will be easy.

The most important variable here is patch_size, which denotes the size of the filter we slide across the image. Recall that we set this to 5 early on, so we will use 5x5 patches. We will also get reintroduced to the stddev and depth_inc configurations that we set up earlier.

Table of Contents for Constructing the convolutional network

Create new playlist

Sign In

Sign Up

Table of Contents for
Constructing the convolutional network