Performing the training process

Now we are ready to train the model. Let's create a Python file named in the scripts folder. First, we need to define some parameters for the training routines:

 import tensorflow as tf 
 import os 
 from datetime import datetime 
 from tqdm import tqdm 
 import nets, models, datasets 
 # Dataset 
 dataset_dir = "data/train_data" 
 batch_size = 64 
 image_size = 224 
 # Learning rate 
 initial_learning_rate = 0.001 
 decay_steps = 250 
 decay_rate = 0.9 
 # Validation 
 output_steps = 10  # Number of steps to print output 
 eval_steps = 20  # Number of steps to perform evaluations 
 # Training 
 max_steps = 3000  # Number of steps to perform training 
 save_steps = 200  # Number of steps to perform saving checkpoints 
 num_tests = 5  # Number of times to test for test accuracy 
 max_checkpoints_to_keep = 3 
 save_dir = "data/checkpoints" 
 train_vars = 'models/fc8-pets/weights:0,models/fc8-pets/biases:0' 
 # Export 
 export_dir = "/tmp/export/" 
 export_name = "pet-model" 
 export_version = 2 

These variables are self-explanatory. Next, we need to define some operations for training, as follows:

 images, labels = datasets.input_pipeline(dataset_dir, batch_size,   
is_training=True) test_images, test_labels = datasets.input_pipeline(dataset_dir,
batch_size, is_training=False) with tf.variable_scope("models") as scope: logits = nets.inference(images, is_training=True) scope.reuse_variables() test_logits = nets.inference(test_images, is_training=False) total_loss = models.compute_loss(logits, labels) train_accuracy = models.compute_accuracy(logits, labels) test_accuracy = models.compute_accuracy(test_logits, test_labels) global_step = tf.Variable(0, trainable=False) learning_rate = models.get_learning_rate(global_step,
initial_learning_rate, decay_steps, decay_rate) train_op = models.train(total_loss, learning_rate, global_step,
train_vars) saver = tf.train.Saver(max_to_keep=max_checkpoints_to_keep) checkpoints_dir = os.path.join(save_dir,"%Y-%m-%d_%H-%M-%S")) if not os.path.exists(save_dir): os.mkdir(save_dir) if not os.path.exists(checkpoints_dir): os.mkdir(checkpoints_dir)

These operations are created by calling our defined methods in,, and In this code, we create an input pipeline for training and another pipeline for testing. After that, we create a new variable_scope named models and create logits and test_logits with the nets.inference method. You must make sure that scope.reuse_variables is added because we want to reuse the weights and biases from training in testing. Finally, we create a saver and some directories to save the checkpoints every save_steps.

The last part of the training routine is the training loop:

 with tf.Session() as sess: 
    coords = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(sess=sess, coord=coords) 
    with tf.variable_scope("models"): 
       nets.load_caffe_weights("data/VGG16.npz", sess,  
ignore_missing=True) last_saved_test_accuracy = 0 for i in tqdm(range(max_steps), desc="training"): _, loss_value, lr_value =[train_op,
total_loss, learning_rate]) if (i + 1) % output_steps == 0: print("Steps {}: Loss = {:.5f} Learning Rate =
{}".format(i + 1, loss_value, lr_value)) if (i + 1) % eval_steps == 0: test_acc, train_acc, loss_value =[test_accuracy, train_accuracy, total_loss]) print("Test accuracy {} Train accuracy {} : Loss =
{:.5f}".format(test_acc, train_acc, loss_value)) if (i + 1) % save_steps == 0 or i == max_steps - 1: test_acc = 0 for i in range(num_tests): test_acc += test_acc /= num_tests if test_acc > last_saved_test_accuracy: print("Save steps: Test Accuracy {} is higher than
{}".format(test_acc, last_saved_test_accuracy)) last_saved_test_accuracy = test_acc saved_file =,
os.path.join(checkpoints_dir, 'model.ckpt'), global_step=global_step) print("Save steps: Save to file %s " % saved_file) else: print("Save steps: Test Accuracy {} is not higher
than {}".format(test_acc, last_saved_test_accuracy)) models.export_model(checkpoints_dir, export_dir, export_name,
export_version) coords.request_stop() coords.join(threads)

The training loop is easy to understand. First, we load the pre-trained VGG16 model with ignore_missing set to True because we replaced the name of the fc8 layer before. Then, we loop for max_steps steps, print the loss every output_steps, and print the test_accuracy every eval_steps. Every save_steps, we check and save the checkpoint if the current test accuracy is higher than the previous. We still need to create models.export_model to export the model for serving after training. However, you may want to check whether the training routine works before moving forward. Let's comment out the following line:

    models.export_model(checkpoints_dir, export_dir, export_name,  

Then, run the training script with this command:

python scripts/

Here is some output in the console. First, our script loads the pre-trained model. Then, it will output the loss:

('Load caffe weights from ', 'data/VGG16.npz')
training:   0%|▏                | 9/3000 [00:05<24:59,  1.99it/s]
Steps 10: Loss = 31.10747 Learning Rate = 0.0010000000475
training:   1%|▎                | 19/3000 [00:09<19:19,  2.57it/s]
Steps 20: Loss = 34.43741 Learning Rate = 0.0010000000475
Test accuracy 0.296875 Train accuracy 0.0 : Loss = 31.28600
training:   1%|▍                | 29/3000 [00:14<20:01,  2.47it/s]
Steps 30: Loss = 15.81103 Learning Rate = 0.0010000000475
training:   1%|▌                | 39/3000 [00:18<19:42,  2.50it/s]
Steps 40: Loss = 14.07709 Learning Rate = 0.0010000000475
Test accuracy 0.53125 Train accuracy 0.03125 : Loss = 20.65380  

Now, let's stop the training and uncomment the export_model method. We need the models.export_model method to export the latest model that has the highest test accuracy to the export_dir folder with the name export_name and the version export_version.

