Customer purchase data analysis and clustering high-dimensional data

For high-dimensional data-space clustering, two major issues occur: efficiency and quality. New algorithms are needed to deal with this type of dataset. Two popular strategies are applied to it. One is the subspace-clustering strategy to find the cluster in the subspace of the original dataset space. Another is the dimensionality-reduction strategy, which is a lower dimensional data space created for further clustering.

MAFIA is an efficient and scalable subspace-clustering algorithm for high-dimensional and large datasets.

The MAFIA algorithm

The summarized pseudocode for the MAFIA algorithm is as follows:

The MAFIA algorithm

The summarized pseudocode for the parallel MAFIA algorithm is as follows:

The MAFIA algorithm

The SURFING algorithm

The summarized pseudocodes for the SURFING algorithm are as follows. It selects interesting features from the original attributes of the dataset.

The SURFING algorithm

The R implementation

Please take a look at the R codes file ch_06_surfing.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_06_surfing.R")

Customer purchase data analysis

Customer purchase data analysis contains many applications such as the customer-satisfaction analysis.

From the customer purchase data analysis, one application helps find the unwanted consumption or user's purchase behavior.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.191.11