Unsupervised machine learning

Unsupervised machine learning involves datasets that do not have labeled outcomes. Taking the example of predicting mpg values for cars, in an unsupervised exercise, our dataset would have looked as follows:

If all the outcomes are missing, it would be impossible to know what the values might have been. Recall that the primary premise of machine learning is to use historical information to make predictions on datasets whose outcome is not known. But, if the historical information itself does not have any identified outcomes, then it would not be possible to build a model. Without knowing any other information, the values of mpg in the table could be all 0 or all 100; it is not possible to tell, as we do not have any data point that will help lead us to the value.

This is where unsupervised machine learning gets applied. In this type of machine learning, we are not trying to predict outcomes. Rather, we are trying to determine which items are most similar to one another.

A common name for such an exercise is clustering, that is, we are attempting to find clusters or groups of records that are most similar to one another. Where can we use this information and what are some examples of unsupervised learning?

There are various news aggregators on the web - sites that do not themselves publish information, but collect information from other news sources. One such aggregator is Google News. If, say, we had to search for information on the last images taken by the satellite Cassini of Saturn, we could do a simple search for the phrase on Google News https://news.google.com/news/?gl=US&ned=us&hl=en. An example is shown here:

Notice that there is a link for View all at the bottom of the news articles. Clicking the link will take you to a page with all the other related news articles. Surely, Google didn't manually classify the articles as belonging to the specific search term. In fact, Google doesn't know in advance what the user will search for. The search term could have well been images of Saturn rings from space.

So, how does Google know which articles belong to a specific search term? The answer lies in the application of clustering or principles of unsupervised learning. Unsupervised learning examines the attributes of a specific dataset in order to determine which articles are most similar to one another. To do this, the algorithm doesn't even need to know the contextual background.

Suppose you were given two sets of books with no covers, a set of books on gardening and a set of books on computer programming. Although you may not know the title of the book, it would be fairly easy to distinguish books on computers from books on gardening. One set of books would have an overwhelming number of terms related to computing, while the other would have an overwhelming number of terms related to plants. To make the distinction that there were two distinct categories of books would not be difficult just by virtue of the images in the books, even for a reader who, let's assume, is not aware of either computers or gardening.

Other examples of unsupervised machine learning include detection of malignant and non-malignant tumors, and gene sequencing.

Table of Contents for Unsupervised machine learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Unsupervised machine learning