Car racing

So far, we have seen how to build a dueling DQN. Now, we will see how to make use of our dueling DQN when playing the car racing game.

First, let's import our necessary libraries:

import gym
import time
import logging
import os
import sys
import tensorflow as tf

Initialize all of the necessary variables:

ENV_NAME = 'Seaquest-v0'
TOTAL_FRAMES = 20000000
MAX_TRAINING_STEPS = 20*60*60/3
TESTING_GAMES = 30
MAX_TESTING_STEPS = 5*60*60/3
TRAIN_AFTER_FRAMES = 50000
epoch_size = 50000
MAX_NOOP_START = 30
LOG_DIR = 'logs'
outdir = 'results'
logger = tf.train.SummaryWriter(LOG_DIR)
# Intialize tensorflow session
session = tf.InteractiveSession()

Build the agent:

agent = DQN(state_size=env.observation_space.shape,
action_size=env.action_space.n,
session=session,
summary_writer = logger,
exploration_period = 1000000,
minibatch_size = 32,
discount_factor = 0.99,
experience_replay_buffer = 1000000,
target_qnet_update_frequency = 20000,
initial_exploration_epsilon = 1.0,
final_exploration_epsilon = 0.1,
reward_clipping = 1.0,
)
session.run(tf.initialize_all_variables())
logger.add_graph(session.graph)
saver = tf.train.Saver(tf.all_variables())

Store the recording:

env.monitor.start(outdir+'/'+ENV_NAME,force = True, video_callable=multiples_video_schedule)
num_frames = 0
num_games = 0
current_game_frames = 0
init_no_ops = np.random.randint(MAX_NOOP_START+1)
last_time = time.time()
last_frame_count = 0.0
state = env.reset()

Now, let's start the training:

while num_frames <= TOTAL_FRAMES+1:
if test_mode:
env.render()
num_frames += 1
current_game_frames += 1

Select the action, given the current state:

    action = agent.action(state, training = True)

Perform the action on the environment, receive the reward, and move to the next_state:

    next_state,reward,done,_ = env.step(action)

Store this transitional information in the experience_replay_buffer:

    if current_game_frames >= init_no_ops:
agent.store(state,action,reward,next_state,done)
state = next_state

Train the agent:

    if num_frames>=TRAIN_AFTER_FRAMES:
agent.train()

if done or current_game_frames > MAX_TRAINING_STEPS:
state = env.reset()
current_game_frames = 0
num_games += 1
init_no_ops = np.random.randint(MAX_NOOP_START+1)

Save the network's parameters after every epoch:

    if num_frames % epoch_size == 0 and num_frames > TRAIN_AFTER_FRAMES:
saver.save(session, outdir+"/"+ENV_NAME+"/model_"+str(num_frames/1000)+"k.ckpt")
print "epoch: frames=",num_frames," games=",num_games

We test the performance for every two epochs:

   if num_frames % (2*epoch_size) == 0 and num_frames > TRAIN_AFTER_FRAMES:
total_reward = 0
avg_steps = 0
for i in xrange(TESTING_GAMES):
state = env.reset()
init_no_ops = np.random.randint(MAX_NOOP_START+1)
frm = 0

while frm < MAX_TESTING_STEPS:
frm += 1
env.render()
action = agent.action(state, training = False)
if current_game_frames < init_no_ops:
action = 0
state,reward,done,_ = env.step(action)
total_reward += reward

if done:
break

avg_steps += frm
avg_reward = float(total_reward)/TESTING_GAMES

str_ = session.run( tf.scalar_summary('test reward ('+str(epoch_size/1000)+'k)', avg_reward) )
logger.add_summary(str_, num_frames)
state = env.reset()


env.monitor.close()

We can see how the agent is learning to win the car racing game, as shown in the following screenshot:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.36.146