Reinforcement learning introduction

Reinforcement learning aims to create algorithms that can learn and adapt to environmental changes. This programming technique is based on the concept of receiving external stimuli depending on the algorithm choices. A correct choice will involve a premium while an incorrect choice will lead to a penalty. The goal of the system is to achieve the best possible result, of course.

In supervised learning, there is a teacher that tells the system which is the correct output (learning with a teacher). This is not always possible. Often we have only qualitative information (sometimes binary, right/wrong, or success/failure). The information available is called reinforcement signals. But the system does not give any information on how to update the agent's behavior (that is, weights). You cannot define a cost function or a gradient. The goal of the system is to create the smart agents that are able to learn from their experience.

The following is a flowchart that displays reinforcement learning interaction with the environment:

Scientific literature has taken an uncertain stance on the classification of learning by reinforcement as a paradigm. In fact, in an initial phase it was considered as a special case of supervised learning, and then it was fully promoted as the third paradigm of machine learning algorithms. It is applied in different contexts in which supervised learning is inefficient; the problems of interaction with the environment are clear examples.

The following flow shows the steps to follow to correctly apply a reinforcement learning algorithm:

Preparation of the agent
Observation of the environment
Selection of the optimal strategy
Execution of actions
Calculation of the corresponding reward (or penalty)
Development of updating strategies (if necessary)
Repeating steps 2-5 iteratively until the agent learns the optimal strategies

Reinforcement learning is based on some theory of psychology, elaborated after a series of experiments performed on animals. In particular, American psychologist Edward Thorndike noted that if a cat is given a reward immediately after the execution of a behavior considered correct, it increases the probability that this behavior will repeat itself. While in the face of unwanted behavior, the application of a punishment decreases the probability of a repetition of error.

On the basis of this theory, and with a goal to be achieved defined, reinforcement learning tries to maximize the rewards received for execution of the action or set of actions that allow us to reach the designated goal.

Table of Contents for Reinforcement learning introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Reinforcement learning introduction