Unsupervised learning

Contrary to supervised learning, unsupervised usually has more leeway in how the outcome is determined. The data is treated such that, to the algorithm, there is no single feature more important than any other in the dataset. These algorithms learn from datasets of input data without the expected output data being labeled. k-means clustering (cluster analysis) is an example of an unsupervised model. It is very good at finding patterns in the data that have meaning relative to the input data. The big difference between what we learned in the supervised section and here is that we now have x features X₁, X₂, X₃, ... X_x measured on n observations. But we no longer interested in prediction of Y because we no longer have Y. Our only interest is to discover data patterns over the features that we have:

In the previous diagram, you can see that data such as this lends itself much more to a non-linear approach, where the data appears to be in clusters relative to importance. It is non-linear because there is no way we will get a straight line to accurately separate and categorize the data. Unsupervised learning allows us to approach a problem with little to no idea what the results will, or should, look like. Structure is derived from the data itself versus supervised rules applied to output labels. This structure is usually derived by clustering relationships of data.

For example, let's say we have 10⁸ genes from our genomic data science experiment. We would like to group this data into similar segments, such as hair color, lifespan, weight, and so on.

The second example is what is famously known as the cocktail party effect₃, which basically refers to the brains auditory ability to focus attention to one thing and filter out the noise around it.

Both examples can use clustering to accomplish their goals.

Table of Contents for Unsupervised learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Unsupervised learning