In this chapter, you will learn how to implement the top algorithms for clusters with R. The evaluation/benchmark/measure tools are also provided.
In this chapter, we will cover the following topics:
By defining the density and density measures of data point space, the clusters can be modeled as sections with certain density in the data space.
Density Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most popular density-based clustering algorithms. The major characteristics of DBSCAN are:
DBSCAN is based on the classification of the data points in the dataset as core data points, border data points, and noise data points, with the support of the use of density relations between points, including directly density-reachable, density-reachable, density-connected points. Before we provide a detailed description of DBSCAN, let's illustrate these ideas.
A point is defined as a core point if the number of data points within the distance of the predefined parameter, Eps or , is greater than that of the predefined parameter, MinPts. The space within the Eps is called Eps-neighborhood or . An object, o, is noise only if there is no cluster that contains o. A border data object is any object that belongs to a cluster, but not a core data object.
Given a core object, p, and an object, q, the object, q, is directly density-reachable from p if .
Given a core object, p, and an object, q, the object q is density-reachable from p if and .
For two data objects, and , they are density-connected if .
The density-based cluster denotes a set of density-connected data objects that are maximal according to density reachability.
Here is an example illustrated in the following diagram:
The summarized pseudocode for the DBSCAN algorithm is as follows, with the following input/output parameters defined.
The input parameters for the DBSCAN algorithm are as follows:
18.191.61.243