Build the training loop

The next step is to utilize the model for training and record the learned model parameters, which we will accomplish in train.py.

Let's start with importing the dependencies:

import tensorflow as tf
import hy_param

# MLP Model which we defined in previous step
import model

Then we define the variables required to be feed into our MLP:

# This will feed the raw images
X = model.X
# This will feed the labels assosiated with the image
Y = model.Y

Let's create the folder to save the checkpoints. Checkpoints are basically the intermediate steps which capture the values of W and b in the process of learning. Then we will use the tf.train.Saver() function (find more details here: https://www.tensorflow.org/api_docs/python/tf/train/Saver) to save and restore the checkpoints:

checkpoint_dir = os.path.abspath(os.path.join(hy_param.checkpoint_dir, "checkpoints"))
checkpoint_prefix = os.path.join(checkpoint_dir, "model")
if not os.path.exists(checkpoint_dir):
    os.makedirs(checkpoint_dir)

# We only keep the last 2 checkpoints to manage storage
saver = tf.train.Saver(tf.global_variables(), max_to_keep=2)

In order to begin training, we need to create a new session in TensorFlow. In this session we initialize the graph variables and feed the model operations the valid data:

# Initialize the variables
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, hy_param.num_steps+1):
        # Extracting 
        batch_x, batch_y = mnist.train.next_batch(hy_param.batch_size)
        # Run optimization op (backprop)
        sess.run(model.train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % hy_param.display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([model.loss_op, model.accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + 
                  "{:.4f}".format(loss) + ", Training Accuracy= " + 
                  "{:.3f}".format(acc))
        if step % hy_param.checkpoint_every == 0:
            path = saver.save(
                        sess, checkpoint_prefix, global_step=step)
            print("Saved model checkpoint to {}
".format(path))

    print("Optimization Finished!")

We will extract batches of 128 training image-label pair from the MNIST dataset and feed them into the model. After subsequent steps/epochs, we will store the checkpoints using the saver operation:

Once we have executed the train.py file, we will see the progress on your console as shown in Figure 2.3. This depicts the loss getting reduced after every step and accuracy increasing over each step:

Figure 2.3 Training epochs output with minibatch loss and training accuracy parameters.

Also, you can visualize in the plot of loss (Figure 2.4) that it approaching towards minima in each step.

Figure 2.4: Plot of the loss values computed at each step.

It is very important to visualize how your model is performing so that you can analyze and prevent it from underfitting or overfitting. Overfitting is a very common scenario when you are dealing with the deeper models. Let's spend some time understanding them in details and few tricks to overcome them.

Table of Contents for Build the training loop

Create new playlist

Sign In

Sign Up

Table of Contents for
Build the training loop