Supervised learning

Supervised learning is a machine learning technique that aims to program a computer system so that it can resolve the relevant tasks automatically. To do this, the input data is included in a set I (typically vectors). Then the set of output data is fixed as set O, and finally it defines a function f that associates each input with the correct answer. Such information is called a training set.

These types of algorithms are based on learning by example theory: knowledge is gained by starting from a set of positive examples, which are instances of the concept to be learned, and negative examples, which are non-instances of the concept. In other words, there is a teacher who shows what is right and what is wrong; based on these teachings (training phase), the algorithm will learn to recognize new instances of the problem automatically, as shown in the following diagram:

All supervised learning algorithms are based on the following thesis:

If an algorithm provides an adequate number of examples, it will be able to create a derived function B that will approximate the desired function A.

If the approximation of the desired function is adequate, when the input data is offered to the derived function, this function should be able to provide output responses similar to those provided by the desired function and then be acceptable. These algorithms are based on the "similar inputs correspond to similar outputs" concept.

Generally, in the real world, this assumption is not valid; however, some situations exist in which it is acceptable. Clearly, the proper functioning of such algorithms depends significantly on the input data. If there are only a few training inputs, the algorithm might not have enough experience to provide a correct output. Conversely, many inputs may make it excessively slow since the derivative function generated by a large number of inputs could be very complicated.

Moreover, experience shows that this type of algorithm is very sensitive to noise; even a few pieces of incorrect data can make the entire system unreliable and lead to wrong decisions. In supervised learning, it's possible to split problems based on the nature of the data. If the output value is categorical, such as membership/non-membership of a certain class, it is a classification problem. If the output is a continuous real value in a certain range, then it is a regression problem.

Table of Contents for Supervised learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Supervised learning