Training and testing the deep n-step advantage actor-critic agent

Because our agent's implementation is generic (as discussed using the table in step 1 in the previous section), we can use any learning environment that has Gym-compatible interfaces to train/test the agent. You can experiment and train the agent in a variety of environments that we discussed in the initial chapters of this book, and we will also be discussing some more interesting learning environments in the next chapter. Don't forget about our custom CARLA car driving environment!

We will pick a few environments as examples and walk through how you can launch the training and testing process to get you started experimenting on your own. First, update your fork of the book's code repository and cd to the ch8 folder, where the code for this chapter resides. As always, make sure to activate the conda environment we created for this book. After this, you can launch the training process for the n-step advantage actor critic agent using the a2c_agent.py script, as illustrated here:

(rl_gym_book) praveen@ubuntu:~/HOIAWOG/ch8$ python a2c_agent --env Pendulum-v0

You can replace Pendulum-v0 with any Gym-compatible learning environment name that is set up on your machine.

This should launch the agent's training script, which will use the default parameters specified in the ~/HOIAWOG/ch8/parameters.json file (which you can change to experiment). It will also load the trained agent's brain/model for the specified environment from the ~/HOIAWOG/ch8/trained_models directory, if available, and continue training. For high-dimensional state space environments, such as the Atari games, or other environments where the state/observation is an image of the scene or the screen pixels, the deep convolutional neural network we discussed in one of the previous sections will be used, which will make use of the GPU on your machine, if available, to speed up computations (you can disable this by setting use_cuda = False in the parameters.json file if you want). If you have multiple GPUs on your machine and would like to train different agents on different GPUs, you can specify the GPU device ID as a command line argument to the a2c_agent.py script using the --gpu-id flag to ask the script to use a particular GPU for training/testing.

Once the training process starts, you can monitor the agent's process by launching tensorboard using the following command from the logs directory:

(rl_gym_book) praveen@ubuntu:~/HOIAWOG/ch8/logs$ tensorboard --logdir .

After launching tensorboard using the preceding command, you can visit the web page at http://localhost:6006 to monitor the progress of the agent. Sample screenshots are provided here for your reference; these were from two training runs of the n-step advantage actor-critic agent, with different values for n steps, using the learning_step_threshold parameter in the parameters.json file:

Actor-critic (using separate actor and critic network):

- Pendulum-v0 ; n-step (learning_step_threshold = 100)

2. - Pendulum-v0; n-step (learning_step_threshold = 5)

Comparing 1 (100-step AC in green) and 2 (5-step AC in grey) on Pendulum-v0 for 10 million steps:

The training script will also output a summary of the training process to the console. If you want to visualize the environment to see what the agent is doing or how it is learning, you can add the --render flag to the command while launching the training script, as illustrated in the following line:

(rl_gym_book) praveen@ubuntu:~/HOIAWOG/ch8$ python a2c_agent --env CartPole-v0 --render

As you can see, we have reached a point where you are just one command away from training, logging, and visualizing the agent's performance! We have made very good progress so far.

You can run several experiments with different sets of parameters for the agent, on the same environment or on different environments. The previous example was chosen to demonstrate its performance in a simpler environment so that you can easily run full-length experiments and reproduce and compare the results, irrespective of the hardware resources you may have. As part of the book's code repository, trained agent brains/models are provided for some environments so that you can quickly start and run the script in test mode to see how a trained agent performs at the tasks. They are available in the ch8/trianed_models folder in your fork of the book's repository, or at the upstream origin here: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/tree/master/ch8/trained_models. You will also find other resources, such as illustrations of learning curves in other environments and video clips of agents performing in a variety of environments, in the book's code repository for your reference.

Once you are ready to test the agent, either using your own trained agent's brain model or using one of the pre-trained agent brains, you can use the --test flag to signify that you would like to disable learning and run the agent in testing mode. For example, to test the agent in the LunarLander-v2 environment with rendering of the learning environment turned on, you can use the following command:

(rl_gym_book) praveen@ubuntu:~/HOIAWOG/ch8$ python a2c_agent --env LunarLander-v2 --test --render

We can interchangeably use the asynchronous agent that we discussed as an extension to our base agent. Since both the agent implementations follow the same structure and configuration, we can easily switch to the asynchronous agent training script by just using the async_a2c_agent.py script in place of a2c_agent.py. They even support the same command line arguments to make our work simpler. When using the asyn_a2c_agent.py script, you should make sure to set the num_agents parameter in the parameters.json file, based on the number of processes or parallel instances you would like the agent to use for training. As an example, we can train the asynchronous version of our agent in the BipedalWalker-v2 environment using the following command:

(rl_gym_book) praveen@ubuntu:~/HOIAWOG/ch8$ python async_a2c_agent --env BipedalWalker-v2

As you may have realized, our agent implementation is capable of learning to act in a variety of different environments, each with its own set of tasks to be completed, as well as their own state, observation and action spaces. It is this versatility that has made deep reinforcement learning-based agents popular and suitable for solving a variety of problems. Now that we are familiar with the training process, we can finally move on to training the agent to drive a car and follow the lanes in the CARLA driving simulator.

Table of Contents for Training and testing the deep n-step advantage actor-critic agent

Create new playlist

Sign In

Sign Up

Table of Contents for
Training and testing the deep n-step advantage actor-critic agent