16.12 Reinforcement Learning

Reinforcement learning is a form of machine learning in which algorithms learn from their environment, similar to how humans learn—for example, a video game enthusiast learning a new game, or a baby learning to walk or recognize its parents.

The algorithm implements an agent that learns by trying to perform a task, receiving feedback about success or failure, making adjustments then trying again. The goal is to maximize the reward. The agent receives a positive reward for doing a right thing and a negative reward (that is, a punishment) for doing a wrong thing. The agent uses this information to determine the next action to perform and must try to maximize the reward.

Reinforcement learning was used in some key artificial-intelligence milestones that captured people’s attention and imagination. In 2011, IBM’s Watson beat the world’s two best human Jeopardy! players in a $1 million match. Watson simultaneously executed hundreds of language-analysis algorithms to locate correct answers in 200 million pages of content (including all of Wikipedia) requiring four terabytes of storage.⁹⁷^,⁹⁸ Watson was trained with machine learning and used reinforcement learning techniques to learn the game-playing strategies (such as when to answer, which square to pick and how much money to risk on daily doubles).⁹⁹^,¹⁰⁰

Mastering the Chinese Board Game Go

Go—a board game created in China thousands of years ago¹⁰¹—is widely considered to be one of the most complex games ever invented with 10¹⁷⁰ possible board configurations.¹⁰² To give you a sense of how large a number that is, it’s believed that there are (only) between 10⁷⁸ and 10⁸⁷ atoms in the known universe!¹⁰³^,¹⁰⁴ In 2015, AlphaGo—created by Google’s DeepMind group—used deep learning with two neural networks to beat the European Go champion Fan Hui. Go is considered to be a far more complex game than chess.

AlphaZero

More recently, Google generalized its AlphaGo AI to create AlphaZero—a game-playing AI that uses reinforcement learning to teach itself to play other games. In December 2017, AlphaZero learned the rules of and taught itself to play chess in less than four hours. It then beat the world champion chess program, Stockfish 8, in a 100-game match—winning or drawing every game. After training itself in Go for just eight hours, AlphaZero was able to play Go vs. its AlphaGo predecessor, winning 60 of 100 games.¹⁰⁵

16.12.1 Deep Q-Learning

One of the most popular reinforcement learning techniques is Deep Q-Learning, which was originally described in the Google DeepMind team’s paper “Playing Atari with Deep Reinforcement Learning.”¹⁰⁶ Using Deep Q-Learning, they were able to develop an agent that learned to play Atari video games by observing changes in the pixels on the screen.

Deep Q-Learning combines Q-learning with deep learning. In Q-Learning a Q function determines the reward using a combination of the environment’s current state and the action the agent performs. For example, if the agent is trying to learn how to avoid obstacles, every move the agent makes that does not hit an obstacle would get a positive reward and every move that collides with an obstacle would get a negative reward (that is, a punishment).

16.12.2 OpenAI Gym

Game playing is a key application of reinforcement learning. A tool called OpenAI Gym (https://gym.openai.com) has become popular for reinforcement learning research. It comes with several games environments that you can use to experiment with reinforcement learning and to develop your own algorithms. There are many additional environments (from Atari and others) that you can download and install into OpenAI Gym. In one of this chapter’s project exercises, you’ll research OpenAI Gym and experiment with its CartPole environment (shown below). This is a simple game with a cart (the black rectangle) that can move left or right on a track in one dimension and a pole (the vertical line) that’s hinged to the cart. The goal of the game is to keep the pole vertical. As it falls, the algorithm moves the cart left or right to restore the pole to the vertical position.

3 illustrations depict the cart pole environment.

16.12-12 Full Alternative Text

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16.12 Reinforcement Learning

Create new playlist

Sign In

Sign Up

Table of Contents for
16.12 Reinforcement Learning