Chapter 6. Advanced Cluster Analysis

In this chapter, you will learn how to implement the top algorithms for clusters with R. The evaluation/benchmark/measure tools are also provided.

In this chapter, we will cover the following topics:

  • Customer categorization analysis of e-commerce and DBSCAN
  • Clustering web pages and OPTICS
  • Visitor analysis in the browser cache and DENCLUE
  • Recommendation system and STING
  • Web sentiment analysis and CLIQUE
  • Opinion mining and WAVE CLUSTER
  • User search intent and the EM algorithm
  • Customer purchase data analysis and clustering high-dimensional data
  • SNS and clustering graph and network data

Customer categorization analysis of e-commerce and DBSCAN

By defining the density and density measures of data point space, the clusters can be modeled as sections with certain density in the data space.

Density Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most popular density-based clustering algorithms. The major characteristics of DBSCAN are:

  • Good at dealing with large datasets with noises
  • Clusters of various shapes can be dealt with

DBSCAN is based on the classification of the data points in the dataset as core data points, border data points, and noise data points, with the support of the use of density relations between points, including directly density-reachable, density-reachable, density-connected points. Before we provide a detailed description of DBSCAN, let's illustrate these ideas.

A point is defined as a core point if the number of data points within the distance of the predefined parameter, Eps or Customer categorization analysis of e-commerce and DBSCAN, is greater than that of the predefined parameter, MinPts. The space within the Eps is called Eps-neighborhood or Customer categorization analysis of e-commerce and DBSCAN. An object, o, is noise only if there is no cluster that contains o. A border data object is any object that belongs to a cluster, but not a core data object.

Given a core object, p, and an object, q, the object, q, is directly density-reachable from p if Customer categorization analysis of e-commerce and DBSCAN.

Given a core object, p, and an object, q, the object q is density-reachable from p if Customer categorization analysis of e-commerce and DBSCAN and Customer categorization analysis of e-commerce and DBSCAN.

For two data objects, Customer categorization analysis of e-commerce and DBSCAN and Customer categorization analysis of e-commerce and DBSCAN, they are density-connected if Customer categorization analysis of e-commerce and DBSCAN.

The density-based cluster denotes a set of density-connected data objects that are maximal according to density reachability.

Here is an example illustrated in the following diagram:

Customer categorization analysis of e-commerce and DBSCAN

The DBSCAN algorithm

The summarized pseudocode for the DBSCAN algorithm is as follows, with the following input/output parameters defined.

The input parameters for the DBSCAN algorithm are as follows:

  • D, which is the training tuples dataset
  • k, which is the neighbor list size
  • Eps, which is the radius parameter that denotes the neighborhood area of a data point
  • MinPts, which is the minimum number (the neighborhood density threshold) of points that must exist in the Eps-neighborhood
  • The output of the algorithm
  • A set of density-based clusters
The DBSCAN algorithm

Customer categorization analysis of e-commerce

Customers of e-commerce can be categorized by the psychographic, culturally specific purchasing behavior. The result of the customer categorization can make the storeowner efficiently and effectively respond to the customer. The e-commerce general analysis process is illustrated as follows:

Customer categorization analysis of e-commerce
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.80.173