Loading a pre-trained model to speed up the training

In this section, let's focus on loading the pre-trained model in TensorFlow. We will use the VGG-16 model proposed by K. Simonyan and A. Zisserman from the University of Oxford.

VGG-16 is a very deep neural network with lots of convolution layers followed by max-pooling and fully connected layers. In the ImageNet challenge, the top-5 classification error of the VGG-16 model on the validation set of 1,000 image classes is 8.1% in a single-scale approach:

First, create a file named nets.py in the project directory. The following code defines the graph for the VGG-16 model:

    import tensorflow as tf 
    import numpy as np 
 
 
    def inference(images): 
    with tf.name_scope("preprocess"): 
        mean = tf.constant([123.68, 116.779, 103.939],  
    dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean') 
        input_images = images - mean 
    conv1_1 = _conv2d(input_images, 3, 3, 64, 1, 1,   
    name="conv1_1") 
    conv1_2 = _conv2d(conv1_1, 3, 3, 64, 1, 1, name="conv1_2") 
    pool1 = _max_pool(conv1_2, 2, 2, 2, 2, name="pool1") 
 
    conv2_1 = _conv2d(pool1, 3, 3, 128, 1, 1, name="conv2_1") 
    conv2_2 = _conv2d(conv2_1, 3, 3, 128, 1, 1, name="conv2_2") 
    pool2 = _max_pool(conv2_2, 2, 2, 2, 2, name="pool2") 
 
    conv3_1 = _conv2d(pool2, 3, 3, 256, 1, 1, name="conv3_1") 
    conv3_2 = _conv2d(conv3_1, 3, 3, 256, 1, 1, name="conv3_2") 
    conv3_3 = _conv2d(conv3_2, 3, 3, 256, 1, 1, name="conv3_3") 
    pool3 = _max_pool(conv3_3, 2, 2, 2, 2, name="pool3") 
 
    conv4_1 = _conv2d(pool3, 3, 3, 512, 1, 1, name="conv4_1") 
    conv4_2 = _conv2d(conv4_1, 3, 3, 512, 1, 1, name="conv4_2") 
    conv4_3 = _conv2d(conv4_2, 3, 3, 512, 1, 1, name="conv4_3") 
    pool4 = _max_pool(conv4_3, 2, 2, 2, 2, name="pool4") 
 
    conv5_1 = _conv2d(pool4, 3, 3, 512, 1, 1, name="conv5_1") 
    conv5_2 = _conv2d(conv5_1, 3, 3, 512, 1, 1, name="conv5_2") 
    conv5_3 = _conv2d(conv5_2, 3, 3, 512, 1, 1, name="conv5_3") 
    pool5 = _max_pool(conv5_3, 2, 2, 2, 2, name="pool5") 
 
    fc6 = _fully_connected(pool5, 4096, name="fc6") 
    fc7 = _fully_connected(fc6, 4096, name="fc7") 
    fc8 = _fully_connected(fc7, 1000, name='fc8', relu=False) 
    outputs = _softmax(fc8, name="output") 
    return outputs

In the preceding code, there are a few things that you should note:

_conv2d, _max_pool, _fully_connected and _softmax are methods that define the convolution, max pooling, fully connected, and softmax layers, respectively. We will implement these methods shortly.
In the preprocess name scope, we define a constant tensor, mean, which is subtracted from the input image. This is the mean vector that the VGG-16 model is trained on in order to make the image zero mean.
We then define the convolution, max pooling, and fully connected layers with the parameters.
In the fc8 layers, we don't apply ReLU activation to the outputs and we send the outputs to a softmax layer to compute the probability over 1,000 classes.

Now, we will implement _conv2d, _max_pool, _fully_connected, and _softmax in the nets.py file.

The following code is the code for the _conv2d and _max_pool methods:

 def _conv2d(input_data, k_h, k_w, c_o, s_h, s_w, name, relu=True,  
 padding="SAME"): 
    c_i = input_data.get_shape()[-1].value 
    convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1],  
 padding=padding) 
    with tf.variable_scope(name) as scope: 
        weights = tf.get_variable(name="kernel", shape=[k_h, k_w,  
 c_i, c_o], 
                                   
 initializer=tf.truncated_normal_initializer(stddev=1e-1,  
 dtype=tf.float32)) 
        conv = convolve(input_data, weights) 
        biases = tf.get_variable(name="bias", shape=[c_o],  
 dtype=tf.float32, 
                                  
 initializer=tf.constant_initializer(value=0.0)) 
        output = tf.nn.bias_add(conv, biases) 
        if relu: 
            output = tf.nn.relu(output, name=scope.name) 
        return output 
 def _max_pool(input_data, k_h, k_w, s_h, s_w, name,  
 padding="SAME"): 
    return tf.nn.max_pool(input_data, ksize=[1, k_h, k_w, 1], 
                          strides=[1, s_h, s_w, 1], padding=padding,  
 name=name)

Most of the preceding code is self-explanatory if you have read Chapter 4, Cats and Dogs, but there are some lines that deserve a bit of explanation:

k_h and k_w are the height and weights of the kernel
c_o means channel outputs, which is the number of feature maps of the convolution layers
s_h and s_w are the stride parameters for the tf.nn.conv2d and tf.nn.max_pool layers
tf.get_variable is used instead of tf.Variable because we will need to use get_variable again when we load the pre-trained weights

Implementing the fully_connected layers and softmax layers are quite easy:

 def _fully_connected(input_data, num_output, name, relu=True): 
    with tf.variable_scope(name) as scope: 
        input_shape = input_data.get_shape() 
        if input_shape.ndims == 4: 
            dim = 1 
            for d in input_shape[1:].as_list(): 
                dim *= d 
            feed_in = tf.reshape(input_data, [-1, dim]) 
        else: 
            feed_in, dim = (input_data, input_shape[-1].value) 
        weights = tf.get_variable(name="kernel", shape=[dim,  
 num_output], 
                                   
 initializer=tf.truncated_normal_initializer(stddev=1e-1,  
 dtype=tf.float32)) 
        biases = tf.get_variable(name="bias", shape=[num_output], 
 dtype=tf.float32, 
                                  
 initializer=tf.constant_initializer(value=0.0)) 
        op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b 
        output = op(feed_in, weights, biases, name=scope.name) 
        return output 
 def _softmax(input_data, name): 
    return tf.nn.softmax(input_data, name=name)

Using the _fully_connected method, we first check the number of dimensions of the input data in order to reshape the input data into the correct shape. Then, we create weights and biases variables with the get_variable method. Finally, we check the relu parameter to decide whether we should apply relu to the output with the tf.nn.relu_layer or tf.nn.xw_plus_b. tf.nn.relu_layer will compute relu(matmul(x, weights) + biases). tf.nn.xw_plus_b but will only compute matmul(x, weights) + biases.

The final method in this section is used to load the pre-trained caffe weights into the defined variables:

   def load_caffe_weights(path, sess, ignore_missing=False): 
    print("Load caffe weights from ", path) 
    data_dict = np.load(path).item() 
    for op_name in data_dict: 
        with tf.variable_scope(op_name, reuse=True): 
            for param_name, data in   
    data_dict[op_name].iteritems(): 
                try: 
                    var = tf.get_variable(param_name) 
                    sess.run(var.assign(data)) 
                except ValueError as e: 
                    if not ignore_missing: 
                        print(e) 
                        raise e

In order to understand this method, we must know how the data is stored in the pre-trained model, VGG16.npz. We have created a simple code to print all the variables in the pre-trained model. You can put the following code at the end of nets.py and run it with Python nets.py:

    if __name__ == "__main__": 
    path = "data/VGG16.npz" 
    data_dict = np.load(path).item() 
    for op_name in data_dict: 
        print(op_name) 
        for param_name, data in data_dict[op_name].iteritems(): 
            print("	" + param_name + "	" + str(data.shape))

Here are a few lines of the results:

conv1_1
    weights (3, 3, 3, 64)
    biases  (64,)
conv1_2
    weights (3, 3, 64, 64)
    biases  (64,)

As you can see, op_name is the name of the layers, and we can access the weights and biases of each layer with data_dict[op_name].

Let's take a look at load_caffe_weights:

We use it with tf.variable_scope with reuse=True in the parameters so that we can get the exact variables for weights and biases that were defined in the graph. After that, we run the assign method to set the data for each variable.
The get_variable method will give ValueError if the variable name is not defined. Therefore, we will use the ignore_missing variable to decide whether we should raise an error or not.

Table of Contents for Loading a pre-trained model to speed up the training

Create new playlist

Sign In

Sign Up

Table of Contents for
Loading a pre-trained model to speed up the training