Reinforcement learning overview

Reinforcement learning is based on the concept of an intelligent agent. An agent interacts with it's environment by observing some state and then taking an action. As the agent takes actions to move between states, it receives feedback about the goodness of its actions in the form of a reward signal. This reward signal is the reinforcement in reinforcement learning. It's a feedback loop that the agent can use to learn the goodness of it's choice. Of course, rewards can be both positive and negative (punishments).

Imagine a self-driving car as the agent we are building. As it's driving down the road, it's receiving a constant stream of reward signals for it's actions. Staying within the lanes would likely lead to a positive reward while running over pedestrians would likely result in a very negative reward for the agent. When faced with the choice of staying in the lines, or hitting a pedestrian, the agent will hopefully learn to avoid the pedestrian at the expense of swerving outside the lines, losing lane line reward in order to avoid a much greater pedestrian collision punishment.

Central to the idea of reinforcement learning are the concepts of state, action, and reward. I've already discussed reward, so lets' talk about action and state. Action is what the agent can do, when it observes some state. If our agent were playing a simple board game, the action would be the thing that the agent does on it's turn. The turn is then the agent's state. For the sake of the problems we will be looking at here, the actions an agent can take are always finite and discrete. This concept is illustrated in the following figure:

One step of this feedback loop can be expressed mathematically as follows:

Actions transition the agent between it's original state s and it's next state , where the it receives some reward r. The way the agent chooses actions is called the agent's policy and it is typically noted as .

The goal of reinforcement learning is to find a sequence of actions that get the agent from state to state with as much reward as possible.

Table of Contents for Reinforcement learning overview

Create new playlist

Sign In

Sign Up

Table of Contents for
Reinforcement learning overview