In density-based methods, clusters are considered as regions where the multiple objects' density is high, which are separated by regions with a low density of objects.
The DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm is one of the first density clustering algorithms. The basis of this algorithm is several statements, detailed as follows:
- The of an object is a neighborhood of the radius of an object.
- The root object is an object whose contains a minimum non-zero number of objects. Assume that this minimum number equals to a predefined value named MinPts.
- The p object is directly densely accessible from the q object if p is in the of q and q is the root object.
- The p object is densely accessible from the q object for the given and MinPts if there is a sequence of objects, where and , such that is directly densely accessible from , .
- The p object is densely connected to the q object for the given and MinPts if there is an o object such that p and q are densely accessible from o.
The DBSCAN algorithm checks the neighborhood of each object to search for clusters. If the of the p object contains more points than MinPts, then a new cluster is created with the p object as a root object. DBSCAN then iteratively collects objects directly densely accessible from root objects, which can lead to the union of several densely accessible clusters. The process ends when no new objects can be added to any cluster.
Unlike the partition-based methods, DBSCAN does not require the number of clusters to be specified in advance; it only requires the values of the and MinPts parameters, which directly affect the result of clustering. The optimal values of these parameters are difficult to determine, especially for multidimensional data spaces. Also, the distribution of data in such spaces is often asymmetrical, which makes it impossible to use global density parameters for their clustering. For clustering multidimensional data spaces, there is the SUBCLU (Subspace Clustering) algorithm, which is based on the DBSCAN algorithm.