How to do it...

Let's start with the recipe:

  1. The core interface provided by OpenAI Gym is the Unified Environment Interface. The agent can interact with the environment using three basic methods, that is, reset, step, and render. The reset method resets the environment and returns the observation. The step method steps the environment by one timestep and returns observation, reward, done, and info. The render method renders one frame of the environment, like popping a window.
  2. To use OpenAI Gym, you will need to import it first:
import gym
  1. Next, we create our first environment:
env_name = 'Breakout-v3'
env = gym.make(env_name)
  1. We start the environment using the reset method:
obs = env.reset()
  1. Let's check the shape of our environment:
print(obs.shape)
  1. The number of actions possible can be checked using the command actions = env.action_space. We can see from the result of this that, for Breakout-v4, we have four possible actions: NoOp, Fire, Left, and Right. The total number of actions can be obtained by calling the env.action_space.n command.
  2. Let's define an agent with a random policy. The agent chooses any of the four possible actions randomly:
def random_policy(n):
action = np.random.randint(0,n)
return action
  1. We next allow our random agent to play for 1,000 steps, using obs, reward, done, info = env.step(action):
for step in range(1000): # 1000 steps max
action = random_policy(env.action_space.n)
obs, reward, done, info = env.step(action)
env.render()
if done:
img = env.render(mode='rgb_array')
plt.imshow(img)
plt.show()
print("The game is over in {} steps".format(step))
break

The obs tells the agent what the environment looks like; for our environment, it corresponds to an RGB image of size 210 x 160 x 3. The agent is rewarded either 0 or 1 at each step, the reward as per the OpenAI Gym wiki is [-inf, inf]. Once the game is over, the environment returns done as True. The info can be useful for debugging but is not used by the agent. The env.render() command pops up a window that shows the present state of the environment; when you include this command, you can see through the popped-up window how the agent is trying to play and learn. It is better to comment it when the agent is under training to save time.

  1. Lastly, close the environment:
env.close()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.39.133