Unsupervised learning

In this section, we will provide a brief introduction to unsupervised machine learning technique with appropriate examples. Let's start the discussion with a practical example. Suppose you have a large collection of not-pirated-totally-legal mp3s in a crowded and massive folder on your hard drive. Now, what if you can build a predictive model that helps automatically group together similar songs and organize them into your favorite categories such as country, rap, rock, and so on. This act of assigning an item to a group such that a mp3 to is added to the respective playlist in an unsupervised way. In the previous chapters, we assumed you're given a training dataset of correctly labeled data. Unfortunately, we don't always have that extravagance when we collect data in the real-world. For example, suppose we would like to divide up a large amount of music into interesting playlists. How could we possibly group together songs if we don't have direct access to their metadata? One possible approach could be a mixture of various machine learning techniques, but clustering is often at the heart of the solution.

In short, iIn unsupervised machine learning problem, correct classes of the training dataset are not available or unknown. Thus, classes have to be deduced from the structured or unstructured datasets as shown in Figure 1. This essentially implies that the goal of this type of algorithm is to preprocess the data in some structured ways. In other words, the main objective of the unsupervised learning algorithms is to explore the unknown/hidden patterns in the input data that are unlabeled. Unsupervised learning, however, also comprehends other techniques to explain the key features of the data in an exploratory way toward finding the hidden patterns. To overcome this challenge, clustering techniques are used widely to group unlabeled data points based on certain similarity measures in an unsupervised way.

For an in-depth theoretical knowledge of how unsupervised algorithms work, please refer to the following three books: Bousquet, O.; von Luxburg, U.; Raetsch, G., eds (2004). Advanced Lectures on Machine Learning. Springer-Verlag. ISBN 978-3540231226. Or Duda, Richard O.; Hart, Peter E.; Stork, David G. (2001). Unsupervised Learning and Clustering. Pattern classification (2nd Ed.). Wiley. ISBN 0-471-05669-3 and Jordan, Michael I.; Bishop, Christopher M. (2004) Neural Networks. In Allen B. Tucker Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton, FL: Chapman and Hall/CRC Press LLC. ISBN 1-58488-360-X.

Figure 1: Unsupervised learning with Spark
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.46.58