Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Customer purchase data analysis and clustering high-dimensional data

For high-dimensional data-space clustering, two major issues occur: efficiency and quality. New algorithms are needed to deal with this type of dataset. Two popular strategies are applied to it. One is the subspace-clustering strategy to find the cluster in the subspace of the original dataset space. Another is the dimensionality-reduction strategy, which is a lower dimensional data space created for further clustering.

MAFIA is an efficient and scalable subspace-clustering algorithm for high-dimensional and large datasets.

The MAFIA algorithm

The summarized pseudocode for the MAFIA algorithm is as follows:

The summarized pseudocode for the parallel MAFIA algorithm is as follows:

The SURFING algorithm

The summarized pseudocodes for the SURFING algorithm are as follows. It selects interesting features from the original attributes of the dataset.

The R implementation

Please take a look at the R codes file ch_06_surfing.R from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_06_surfing.R")