Summary

In this chapter, we considered what clustering is and how it differs from classification. We saw different types of clustering methods, such as the partition-based, the spectral, the hierarchical, the density-based, and the model-based methods. Also, we observed that partition-based methods could be divided into more categories, such as the distance-based methods and the ones based on graph theory. We used implementations of these algorithms, including the k-means algorithm (the distance-based method), the GMM algorithm (the model-based method), the Newman modularity-based algorithm, and the Chinese Whispers algorithm for graph clustering. We also saw how to use the hierarchical and spectral clustering algorithm implementations in programs. We saw that the crucial issues for successful clustering are as follows:

  • The choice of the distance measure function
  • The initialization step
  • The splitting or merging strategy
  • Prior knowledge about cluster numbers

The combination of these issues is unique for each specific algorithm. Also, we saw that a clustering algorithm's results depend a lot on dataset characteristics and that we should choose the algorithm according to these.

The list of application areas where clustering is applied is comprehensive: image segmentation, marketing, anti-fraud, forecasting, and text analysis, among many others. At the present stage, clustering is often used as the first step in data analysis. The task of clustering was formulated in such scientific areas as statistics, pattern recognition, optimization, and machine learning. At the moment, the number of methods for partitioning groups of objects into clusters is quite large—several dozen algorithms, and even more when you take into account their various modifications.

At the end of the chapter, we studied how we can visualize clustering results with the plotcpp library.

In the following chapter, we will learn what a data anomaly is and what machine learning algorithms exist for anomaly detection. Also, we will see how anomaly detection algorithms can be used for solving real-life problems, and which properties of such algorithms play a more significant role in different tasks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.107.85