The main training loop

Once we have retrieved the data and built the graph, we can start our main training loop, which will continue over 20,000 iterations. In every iteration, a batch of training data is taken using the CPU device, and the __train_step.run method of the AdamOptimizer object is called to run one forward and one backward pass. Every 100 iterations, we run a forward pass over the current training and testing batch to collect training and validation loss, and other summary data. Then, the add_summary method of the FileWriter object wraps the provided TensorFlow summaries: summary_1 and summary_2 in an event protocol buffer and adds it to the event file:

       # Train Loop 
       for i in range(20000): 


           batch_train = self.__session.run([iter_train_op]) 
           batch_x_train, batch_y_train = batch_train[0] 

           # Print loss from time to time 
           if i % 100 == 0: 

               batch_test = self.__session.run([iter_test_op]) 
               batch_x_test, batch_y_test = batch_test[0] 

 

               loss_train, summary_1 = self.__session.run([self.__loss, self.__merged_summary_op], 

                                                      feed_dict={self.__x_: batch_x_train, 

                                                                 self.__y_: batch_y_train,                                                                                            self.__is_training: True}) 

 

               loss_val, summary_2 = self.__session.run([self.__loss_val, self.__val_summary], 

                                                        feed_dict={self.__x_: batch_x_test, 

                                                                   self.__y_: batch_y_test,                                                                         self.__is_training: False}) 

               print("Loss Train: {0} Loss Val: {1}".format(loss_train, loss_val)) 

               # Write to tensorboard summary 
               self.__writer.add_summary(summary_1, i) 
               self.__writer.add_summary(summary_2, i) 

 

           # Execute train op 
           self.__train_step.run(session=self.__session, feed_dict={ 
               self.__x_: batch_x_train, self.__y_: batch_y_train, self.__is_training: True})

Once the training loop is over, we store the final model into a checkpoint file with op __saver.save:

       # Save model 
       if not os.path.exists(save_dir): 
           os.makedirs(save_dir) 

       checkpoint_path = os.path.join(save_dir, "model") 
       filename = self.__saver.save(self.__session, checkpoint_path) 
       print("Model saved in file: %s" % filename)

Table of Contents for The main training loop

Create new playlist

Sign In

Sign Up

Table of Contents for
The main training loop