Results

After the first 250 episodes, we will see that the total rewards for the episode approach 200 and the episode steps also approach 200. This means that the agent has learned to balance the pole on the cart until the environment ends at a maximum of 200 steps.

It's of course fun to watch our success, so we can use the DQNAgent .test() method to evaluate for some number of episodes. The following code is used to define this method:

dqn.test(env, nb_episodes=5, visualize=True)

Here we've set visualize=True so we can watch our agent balance the pole, as shown in the following image:

There we go, that's one balanced pole! Alright, I know, I'll admit that balancing a pole on a cart isn't all that cool, so let's do one more lightweight example. In this example, we will land a lunar lander on the moon, which will hopefully impress you more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.45.80