Unsupervised learning

Unsupervised learning is a machine learning technique that, starting from a series of inputs (system experience), is able to reclassify and organize on the basis of common characteristics to try to make predictions on subsequent inputs. Unlike supervised learning, only unlabeled examples are provided to the learner during the learning process, as the classes are not known a priori but must be learned automatically.

The following diagram shows three groups labeled from raw data:

From this diagram, it is possible to notice that the system has identified three groups on the basis of a similarity, which in this case is due to proximity. In general, unsupervised learning tries to identify the internal structure of data to reproduce it.

Typical examples of these algorithms are search engines. These programs, given one or more keywords, are able to create a list of links that lead to pages that the search algorithm considers relevant to the research carried out. The validity of these algorithms depends on the usefulness of the information that they can extract from the database.

Unsupervised learning techniques work by comparing data and looking for similarities or differences. As is known, machine learning algorithms try to imitate the functioning of an animal's nervous system. For this purpose, we can hypothesize that neural processes are guided by mechanisms that optimize the unknown objective they pursue. Each process evolves from an initial situation associated with a stimulus to a terminal in which there is an answer, which is the result of the process itself. It is intuitive that, in this evolution, there is a transfer of information. In fact, the stimulus provides the information necessary to obtain the desired response. Therefore, it is important that this information is transmitted as faithfully as possible until the process is completed. A reasonable criterion for interpreting the processes that take place in the nervous system is, therefore, to consider them as transfers of information with maximum preservation of the same.

Unsupervised learning algorithms are based on these concepts. It is a question of using learning theory techniques to measure the loss of information that has occurred in the transfer. The process under consideration is considered as the transmission of a signal through a noisy channel, using well-known techniques developed in the field of communications. It is possible, however, to follow a different approach based on a geometric representation of the process. In fact, both the stimulus and the response are characterized by an appropriate number of components, which in a space correspond to a point. Thus, the process can be interpreted as a geometric transformation of the input space to the output space. The exit space has a smaller size than the input space, as the stimulus contains the information necessary to activate many simultaneous processes. Compared to only one, it is redundant. This means that there is always a redundancy reduction operation in the transformation under consideration.

In the entry and exit space, typical regions are formed, with which the information is associated. The natural mechanism that controls the transfer of information must therefore identify, in some way, these important regions for the process under consideration, and make sure that they correspond in the transformation. Thus, a data grouping operation is present in the process in question; this operation can be identified with the acquisition of experience. The two previous operations of grouping and reduction of redundancy are typical of optimal signal processing, and there is biological evidence of their existence in the functioning of the nervous system. It is interesting to note that these two operations are automatically achieved in the case of non-supervised learning based on experimental principles, such as competitive learning.

Table of Contents for Unsupervised learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Unsupervised learning