Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Results

After the first 250 episodes, we will see that the total rewards for the episode approach 200 and the episode steps also approach 200. This means that the agent has learned to balance the pole on the cart until the environment ends at a maximum of 200 steps.

It's of course fun to watch our success, so we can use the DQNAgent .test() method to evaluate for some number of episodes. The following code is used to define this method:

dqn.test(env, nb_episodes=5, visualize=True)

Here we've set visualize=True so we can watch our agent balance the pole, as shown in the following image:

There we go, that's one balanced pole! Alright, I know, I'll admit that balancing a pole on a cart isn't all that cool, so let's do one more lightweight example. In this example, we will land a lunar lander on the moon, which will hopefully impress you more.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.218.45.80

Table of Contents for Results

Create new playlist

Sign In

Sign Up

Table of Contents for
Results