The conclusion to the project

This project was to build a Deep reinforcement learning model to successfully play the game of CartPole-v1 from OpenAI Gym. The use case of this chapter is to build reinforcement learning model on a simple game environment and then extend it to other complex games such as Atari.

In the 1st half of the chapter, we built a Deep Q learning model to play the Cart-Pole game. The DQN model during testing scored an average of  277.88 points over 100 games. 

In the 2nd half of the chapter, we built a Deep SARSA learning model(using the same epsilon-greedy policy as Q learning) to play the Cart-Pole game. The SARSA model during testing scored an average of  365.67 points over 100 games.

Now, let's follow the same technique we have been following in the previous chapters for evaluating the performance of the models from the restaurant chain point of view.

What are the implications of this score? 

An average score of 277.88 with Q-learning means that we have successfully solved the game of Cart-Pole as defined on the OpenAI site. It also means that our model survives slightly more than half the length of the game with the total game length being 500 points.  

Whereas for SARSA learning, an average score of 365.67 with Q-learning means that we have successfully solved the game of Cart-Pole as defined on the OpenAI site and also that our model survives more than 70% the length of the game with the total game length being 500 points. 

It is still not a level of performance you should be happy with because the goal should not just be to solve the problem but to train a model that is really good at scoring a consistent 500 points at each game, so you can see why we'd need to continue fine-tuning the models to get the maximum performance possible.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.172.132