For high-dimensional data-space clustering, two major issues occur: efficiency and quality. New algorithms are needed to deal with this type of dataset. Two popular strategies are applied to it. One is the subspace-clustering strategy to find the cluster in the subspace of the original dataset space. Another is the dimensionality-reduction strategy, which is a lower dimensional data space created for further clustering.
MAFIA is an efficient and scalable subspace-clustering algorithm for high-dimensional and large datasets.
The summarized pseudocode for the MAFIA algorithm is as follows:
The summarized pseudocode for the parallel MAFIA algorithm is as follows:
The summarized pseudocodes for the SURFING algorithm are as follows. It selects interesting features from the original attributes of the dataset.
18.188.191.11