Clustering

Clustering is an unsupervised machine learning method that is used for splitting the original dataset of objects into groups classified by properties. An object in machine learning is usually treated as a point in the multidimensional metric space. Every space dimension corresponds to an object property (feature), and the metric is a function of the values of these properties. Depending on the types of dimensions in this space, which can be both numerical and categorical, we choose the type of clustering algorithm and specific metric function. This choice depends on the nature of different object properties' types. 

The main difference between clustering and classification is an undefined set of target groups, which is determined by the clustering algorithm. The set of target groups (clusters) is the algorithm's result.

We can split cluster analysis into the following phases:

  • Selecting objects for clustering
  • Determining the set of object properties that we will use for the metric
  • Normalizing property values
  • Calculating the metric
  • Identifying distinct groups of objects based on metric values

After the analysis of clustering results, some correction may be required for the selected metric of the chosen algorithm.

We can use clustering for various real-world tasks, including the following:

  • Splitting news into several categories for advertisers
  • Identifying customer groups by their preferences for market analysis
  • Identifying plant and animal groups for biological studies
  • Identifying and categorizing properties for city planning and management
  • Detecting earthquake epicenter clusters to identify danger zones
  • Categorizing groups of insurance policyholders for risk management
  • Categorizing books in libraries
  • Searching for hidden structural similarities in the data

The following topics will be covered in this chapter:

  • Measuring distance in clustering
  • Types of clustering algorithms
  • Examples of using the Shogun library for dealing with the clustering task samples
  • Examples of using the Shark-ML library for dealing with the clustering task samples
  • Examples of using the Dlib library for dealing with the clustering task samples
  • Plotting data with C++
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.242.235