Reinforcement learning introduction

Reinforcement learning aims to create algorithms that can learn and adapt to environmental changes. This programming technique is based on the concept of receiving external stimuli depending on the algorithm choices. A correct choice will involve a premium while an incorrect choice will lead to a penalty. The goal of the system is to achieve the best possible result, of course.

In supervised learning, there is a teacher that tells the system which is the correct output (learning with a teacher). This is not always possible. Often we have only qualitative information (sometimes binary, right/wrong, or success/failure). The information available is called reinforcement signals. But the system does not give any information on how to update the agent's behavior (that is, weights). You cannot define a cost function or a gradient. The goal of the system is to create the smart agents that are able to learn from their experience.

The following is a flowchart that displays reinforcement learning interaction with the environment:

Scientific literature has taken an uncertain stance on the classification of learning by reinforcement as a paradigm. In fact, in an initial phase it was considered as a special case of supervised learning, and then it was fully promoted as the third paradigm of machine learning algorithms. It is applied in different contexts in which supervised learning is inefficient; the problems of interaction with the environment are clear examples.

The following flow shows the steps to follow to correctly apply a reinforcement learning algorithm:

  1. Preparation of the agent
  2. Observation of the environment
  3. Selection of the optimal strategy
  4. Execution of actions
  5. Calculation of the corresponding reward (or penalty)
  6. Development of updating strategies (if necessary)
  7. Repeating steps 2-5 iteratively until the agent learns the optimal strategies

Reinforcement learning is based on some theory of psychology, elaborated after a series of experiments performed on animals. In particular, American psychologist Edward Thorndike noted that if a cat is given a reward immediately after the execution of a behavior considered correct, it increases the probability that this behavior will repeat itself. While in the face of unwanted behavior, the application of a punishment decreases the probability of a repetition of error.

On the basis of this theory, and with a goal to be achieved defined, reinforcement learning tries to maximize the rewards received for execution of the action or set of actions that allow us to reach the designated goal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.160.43