Performing the training process

Now we are ready to train the model. Let's create a Python file named train.py in the scripts folder. First, we need to define some parameters for the training routines:

 import tensorflow as tf 
 import os 
 from datetime import datetime 
 from tqdm import tqdm 
 
 import nets, models, datasets 
 
 # Dataset 
 dataset_dir = "data/train_data" 
 batch_size = 64 
 image_size = 224 
 
 # Learning rate 
 initial_learning_rate = 0.001 
 decay_steps = 250 
 decay_rate = 0.9 
 
 # Validation 
 output_steps = 10  # Number of steps to print output 
 eval_steps = 20  # Number of steps to perform evaluations 
 
 # Training 
 max_steps = 3000  # Number of steps to perform training 
 save_steps = 200  # Number of steps to perform saving checkpoints 
 num_tests = 5  # Number of times to test for test accuracy 
 max_checkpoints_to_keep = 3 
 save_dir = "data/checkpoints" 
 train_vars = 'models/fc8-pets/weights:0,models/fc8-pets/biases:0' 
 
 # Export 
 export_dir = "/tmp/export/" 
 export_name = "pet-model" 
 export_version = 2

These variables are self-explanatory. Next, we need to define some operations for training, as follows:

 images, labels = datasets.input_pipeline(dataset_dir, batch_size,   
 is_training=True) 
 test_images, test_labels = datasets.input_pipeline(dataset_dir,  
 batch_size, is_training=False) 
 
 with tf.variable_scope("models") as scope: 
    logits = nets.inference(images, is_training=True) 
    scope.reuse_variables() 
    test_logits = nets.inference(test_images, is_training=False) 
 
 total_loss = models.compute_loss(logits, labels) 
 train_accuracy = models.compute_accuracy(logits, labels) 
 test_accuracy = models.compute_accuracy(test_logits, test_labels) 
  
 global_step = tf.Variable(0, trainable=False) 
 learning_rate = models.get_learning_rate(global_step,  
 initial_learning_rate, decay_steps, decay_rate) 
 train_op = models.train(total_loss, learning_rate, global_step,  
 train_vars) 
 
 saver = tf.train.Saver(max_to_keep=max_checkpoints_to_keep) 
 checkpoints_dir = os.path.join(save_dir,  
 datetime.now().strftime("%Y-%m-%d_%H-%M-%S")) 
 if not os.path.exists(save_dir): 
    os.mkdir(save_dir) 
 if not os.path.exists(checkpoints_dir): 
    os.mkdir(checkpoints_dir)

These operations are created by calling our defined methods in datasets.py, nets.py, and models.py. In this code, we create an input pipeline for training and another pipeline for testing. After that, we create a new variable_scope named models and create logits and test_logits with the nets.inference method. You must make sure that scope.reuse_variables is added because we want to reuse the weights and biases from training in testing. Finally, we create a saver and some directories to save the checkpoints every save_steps.

The last part of the training routine is the training loop:

 with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    coords = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(sess=sess, coord=coords) 
 
    with tf.variable_scope("models"): 
       nets.load_caffe_weights("data/VGG16.npz", sess,  
       ignore_missing=True) 
 
    last_saved_test_accuracy = 0 
    for i in tqdm(range(max_steps), desc="training"): 
                  _, loss_value, lr_value = sess.run([train_op,    
                  total_loss,  learning_rate]) 
 
      if (i + 1) % output_steps == 0: 
          print("Steps {}: Loss = {:.5f} Learning Rate =  
          {}".format(i + 1, loss_value, lr_value)) 
 
      if (i + 1) % eval_steps == 0: 
          test_acc, train_acc, loss_value =  
          sess.run([test_accuracy, train_accuracy, total_loss]) 
          print("Test accuracy {} Train accuracy {} : Loss =  
          {:.5f}".format(test_acc, train_acc, loss_value)) 
 
      if (i + 1) % save_steps == 0 or i == max_steps - 1: 
          test_acc = 0 
          for i in range(num_tests): 
              test_acc += sess.run(test_accuracy) 
          test_acc /= num_tests 
      if test_acc > last_saved_test_accuracy: 
            print("Save steps: Test Accuracy {} is higher than  
            {}".format(test_acc, last_saved_test_accuracy)) 
             last_saved_test_accuracy = test_acc 
             saved_file = saver.save(sess, 
                                         
     os.path.join(checkpoints_dir, 'model.ckpt'), 
                  global_step=global_step) 
          print("Save steps: Save to file %s " % saved_file) 
      else: 
          print("Save steps: Test Accuracy {} is not higher  
                than {}".format(test_acc, last_saved_test_accuracy)) 
 
    models.export_model(checkpoints_dir, export_dir, export_name,  
    export_version) 
 
    coords.request_stop() 
    coords.join(threads)

The training loop is easy to understand. First, we load the pre-trained VGG16 model with ignore_missing set to True because we replaced the name of the fc8 layer before. Then, we loop for max_steps steps, print the loss every output_steps, and print the test_accuracy every eval_steps. Every save_steps, we check and save the checkpoint if the current test accuracy is higher than the previous. We still need to create models.export_model to export the model for serving after training. However, you may want to check whether the training routine works before moving forward. Let's comment out the following line:

    models.export_model(checkpoints_dir, export_dir, export_name,  
    export_version)

Then, run the training script with this command:

python scripts/train.py

Here is some output in the console. First, our script loads the pre-trained model. Then, it will output the loss:

('Load caffe weights from ', 'data/VGG16.npz')
training:   0%|▏                | 9/3000 [00:05<24:59,  1.99it/s]
Steps 10: Loss = 31.10747 Learning Rate = 0.0010000000475
training:   1%|▎                | 19/3000 [00:09<19:19,  2.57it/s]
Steps 20: Loss = 34.43741 Learning Rate = 0.0010000000475
Test accuracy 0.296875 Train accuracy 0.0 : Loss = 31.28600
training:   1%|▍                | 29/3000 [00:14<20:01,  2.47it/s]
Steps 30: Loss = 15.81103 Learning Rate = 0.0010000000475
training:   1%|▌                | 39/3000 [00:18<19:42,  2.50it/s]
Steps 40: Loss = 14.07709 Learning Rate = 0.0010000000475
Test accuracy 0.53125 Train accuracy 0.03125 : Loss = 20.65380

Now, let's stop the training and uncomment the export_model method. We need the models.export_model method to export the latest model that has the highest test accuracy to the export_dir folder with the name export_name and the version export_version.

Table of Contents for Performing the training process

Create new playlist

Sign In

Sign Up

Table of Contents for
Performing the training process