The Atari Gym environment produces observations which typically have a shape of 210x160x3, which represents a RGB (color) image of a width of 210 pixels and a height of 160 pixels. While the color image at the original resolution of 210x160x3 has more pixels and therefore more information, it turns out that often, better performance is possible with reduced resolution. Lower resolution means less data to be processed by the agent at every step, which translates to faster training time, especially on consumer grade computing hardware that you and I own.
Let's create a preprocessing pipeline that would take the original observation image (of the Atari screen) and perform the following operations:
We can crop out the region on the screen that does not have any useful information regarding the environment for the agent.
Finally, we resize the image to a dimension of 84x84. We can choose a different number, other than 84, as long as it contains a reasonable amount of pixels. However, it is efficient to have a square matrix (like 84x84 or 80x80) as the convolution operations (for example, with CUDA) are optimized for such square input:
def process_frame_84(frame, conf):
frame = frame[conf["crop1"]:conf["crop2"] + 160, :160]
frame = frame.mean(2)
frame = frame.astype(np.float32)
frame *= (1.0 / 255.0)
frame = cv2.resize(frame, (84, conf["dimension2"]))
frame = cv2.resize(frame, (84, 84))
frame = np.reshape(frame, [1, 84, 84])
return frame
class AtariRescale(gym.ObservationWrapper):
def __init__(self, env, env_conf):
gym.ObservationWrapper.__init__(self, env)
self.observation_space = Box(0.0, 1.0, [1, 84, 84])
self.conf = env_conf
def observation(self, observation):
return process_frame_84(observation, self.conf)