The conclusion to the project

This project was to build a Deep reinforcement learning model to successfully play the game of CartPole-v1 from OpenAI Gym. The use case of this chapter is to build reinforcement learning model on a simple game environment and then extend it to other complex games such as Atari.

In the 1st half of the chapter, we built a Deep Q learning model to play the Cart-Pole game. The DQN model during testing scored an average of 277.88 points over 100 games.

In the 2nd half of the chapter, we built a Deep SARSA learning model(using the same epsilon-greedy policy as Q learning) to play the Cart-Pole game. The SARSA model during testing scored an average of 365.67 points over 100 games.

Now, let's follow the same technique we have been following in the previous chapters for evaluating the performance of the models from the restaurant chain point of view.

What are the implications of this score?

An average score of 277.88 with Q-learning means that we have successfully solved the game of Cart-Pole as defined on the OpenAI site. It also means that our model survives slightly more than half the length of the game with the total game length being 500 points.

Whereas for SARSA learning, an average score of 365.67 with Q-learning means that we have successfully solved the game of Cart-Pole as defined on the OpenAI site and also that our model survives more than 70% the length of the game with the total game length being 500 points.

It is still not a level of performance you should be happy with because the goal should not just be to solve the problem but to train a model that is really good at scoring a consistent 500 points at each game, so you can see why we'd need to continue fine-tuning the models to get the maximum performance possible.

Table of Contents for The conclusion to the project

Create new playlist

Sign In

Sign Up

Table of Contents for
The conclusion to the project