Once we have retrieved the data and built the graph, we can start our main training loop, which will continue over 20,000 iterations. In every iteration, a batch of training data is taken using the CPU device, and the __train_step.run method of the AdamOptimizer object is called to run one forward and one backward pass. Every 100 iterations, we run a forward pass over the current training and testing batch to collect training and validation loss, and other summary data. Then, the add_summary method of the FileWriter object wraps the provided TensorFlow summaries: summary_1 and summary_2 in an event protocol buffer and adds it to the event file:
# Train Loop for i in range(20000): batch_train = self.__session.run([iter_train_op]) batch_x_train, batch_y_train = batch_train[0] # Print loss from time to time if i % 100 == 0: batch_test = self.__session.run([iter_test_op]) batch_x_test, batch_y_test = batch_test[0] loss_train, summary_1 = self.__session.run([self.__loss, self.__merged_summary_op], feed_dict={self.__x_: batch_x_train, self.__y_: batch_y_train, self.__is_training: True}) loss_val, summary_2 = self.__session.run([self.__loss_val, self.__val_summary], feed_dict={self.__x_: batch_x_test, self.__y_: batch_y_test, self.__is_training: False}) print("Loss Train: {0} Loss Val: {1}".format(loss_train, loss_val)) # Write to tensorboard summary self.__writer.add_summary(summary_1, i) self.__writer.add_summary(summary_2, i) # Execute train op self.__train_step.run(session=self.__session, feed_dict={ self.__x_: batch_x_train, self.__y_: batch_y_train, self.__is_training: True})
Once the training loop is over, we store the final model into a checkpoint file with op __saver.save:
# Save model if not os.path.exists(save_dir): os.makedirs(save_dir) checkpoint_path = os.path.join(save_dir, "model") filename = self.__saver.save(self.__session, checkpoint_path) print("Model saved in file: %s" % filename)