Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

OpenAI Gym

OpenAI Gym is an open source Python framework developed by OpenAI, a non-profit AI research company, as a toolkit for developing and evaluating RL algorithms. It gives us a set of test problems, known as environments, that we can write RL algorithms to solve. This enables us to dedicate more of our time to implementing and improving the learning algorithm instead of spending a lot of time simulating the environment. In addition, it provides a medium for people to compare and review the algorithms of others.

OpenAI environments

OpenAI Gym has a collection of environments. At the time of writing this book, the following environments are available:

Classic control and toy text: Small-scale tasks from the RL literature.
Algorithmic: Performs computations such as adding multi-digit numbers and reversing sequences. Most of these tasks require memory, and their difficulty can be changed by varying the sequence length.
Atari: Classic Atari games, with screen images or RAM as input, using the Arcade Learning Environment.
Board games: Currently, we have included the game of Go on 9x9 and 19x19 boards, and the Pachi engine [13] serves as an opponent.
2D and 3D robots: Allows you to control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. A few of the tasks are adapted from RLLab.

The env class

OpenAI Gym allows the use of the env class, which encapsulates the environment and any internal dynamics. This class has different methods and attributes that enable you to implement to create a new environment. The most important methods are named reset, step, and render:

The reset method has the task of resetting the environment by initializing it to the initial state. Within the reset method, the definitions of the elements that make up the environment (in this case, the definition of the mechanical arm, the object to be grasped, and its support) must be contained.
The step method is used to advance the environment temporally. It requires the action to be entered and returns the new observation to the agent. Within the method, movement dynamics management, status and reward calculation, and episode completion controls must be defined.
The last method is render, which is used to visualize the current state.

Using the env class proposed by the framework as the basis for new environments, it adopts the common interface provided by the toolkit.

This way, built environments can be integrated into the library of the toolkit, and their dynamics can be learned from algorithms that have been made by the users of the OpenAI Gym community.

Installing and running OpenAI Gym

For a more detailed explanation of how to use and run OpenAI Gym, please refer to the official documentation page at (https://gym.openai.com/docs/). A minimal installation of OpenAI Gym can be achieved with the following command:

git clone https://github.com/openai/gym
cd gym
pip install -e

After OpenAI Gym has been installed, you can instantiate and run an environment in your Python code:

import gym
env = gym.make('CartPole-v0')


obs = env.reset()

for step_idx in range(500):
  env.render()
  obs, reward, done, _ = env.step(env.action_space.sample())

This code snippet will first import the gym library. Then it creates an instance of the Cart-Pole (https://gym.openai.com/envs/CartPole-v0/) environment, which is a classical problem in RL. The Cart-Pole environment simulates an inverted pendulum mounted on a cart. The pendulum is initially vertical, and your goal is to maintain its vertical balance. The only way to control the pendulum is to choose a horizontal direction for the cart to move (either to left or right).

The preceding code runs the environment for 500 time steps, and it chooses a random action to perform at each step. As a result, you see in the video below that the pole is not kept stable for long. The reward is measured by the number of time steps elapsed before the pole becomes more than 15 degrees away from the vertical. The longer you remain within this range, the higher your total reward.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for OpenAI Gym

Create new playlist

Sign In

Sign Up

OpenAI Gym

OpenAI environments

The env class

Installing and running OpenAI Gym

Table of Contents for
OpenAI Gym