Getting ready

The game of Pong is a two-player game, where the goal is to bounce the ball past the other player. The agent can move the paddle up or down (and, yes, the standard NoOp). One of the players in the OpenAI environment is a decent AI player who knows how to play the game well. Our goal is to train the second agent using policy gradients. Our agent gets proficient with each game it plays. While the code has been built to run only for 500 episodes, we should add a provision to save the agent state at specified checkpoints and, in the next run, start by loading the previous checkpoint. We achieve this by first declaring a saver, then using the TensorFlow saver.save method to save the present network state (checkpoint), and lastly load the network from the last saved checkpoint. The following methods of the class PolicyNetwork, defined in the How to do it section of this recipe, perform this work:

def load(self):
self.saver = tf.train.Saver(tf.global_variables())
load_was_success = True # yes, I'm being optimistic
try:
save_dir = '/'.join(self.save_path.split('/')[:-1])
ckpt = tf.train.get_checkpoint_state(save_dir)
load_path = ckpt.model_checkpoint_path
self.saver.restore(self.session, load_path)
except:
print("no saved model to load. starting new session")
load_was_success = False
else:
print("loaded model: {}".format(load_path))
saver = tf.train.Saver(tf.global_variables())
episode_number = int(load_path.split('-')[-1])

To save the model after every 50 episodes, we use the following:

def save(self):
self.saver.save(self.session, self.save_path, global_step=n)
print("SAVED MODEL #{}".format(n))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.68.49