Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. Advanced Cluster Analysis

In this chapter, you will learn how to implement the top algorithms for clusters with R. The evaluation/benchmark/measure tools are also provided.

In this chapter, we will cover the following topics:

Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE CLUSTER
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data

Customer categorization analysis of e-commerce and DBSCAN

By defining the density and density measures of data point space, the clusters can be modeled as sections with certain density in the data space.

Density Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most popular density-based clustering algorithms. The major characteristics of DBSCAN are:

Good at dealing with large datasets with noises
Clusters of various shapes can be dealt with

DBSCAN is based on the classification of the data points in the dataset as core data points, border data points, and noise data points, with the support of the use of density relations between points, including directly density-reachable, density-reachable, density-connected points. Before we provide a detailed description of DBSCAN, let's illustrate these ideas.

A point is defined as a core point if the number of data points within the distance of the predefined parameter, Eps or , is greater than that of the predefined parameter, MinPts. The space within the Eps is called Eps-neighborhood or . An object, o, is noise only if there is no cluster that contains o. A border data object is any object that belongs to a cluster, but not a core data object.

Given a core object, p, and an object, q, the object, q, is directly density-reachable from p if .

Given a core object, p, and an object, q, the object q is density-reachable from p if and .

For two data objects, and , they are density-connected if .

The density-based cluster denotes a set of density-connected data objects that are maximal according to density reachability.

Here is an example illustrated in the following diagram:

Customer categorization analysis of e-commerce and DBSCAN

The DBSCAN algorithm

The summarized pseudocode for the DBSCAN algorithm is as follows, with the following input/output parameters defined.

The input parameters for the DBSCAN algorithm are as follows:

D, which is the training tuples dataset
k, which is the neighbor list size
Eps, which is the radius parameter that denotes the neighborhood area of a data point
MinPts, which is the minimum number (the neighborhood density threshold) of points that must exist in the Eps-neighborhood
The output of the algorithm
A set of density-based clusters

Customer categorization analysis of e-commerce

Customers of e-commerce can be categorized by the psychographic, culturally specific purchasing behavior. The result of the customer categorization can make the storeowner efficiently and effectively respond to the customer. The e-commerce general analysis process is illustrated as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. Advanced Cluster Analysis

Create new playlist

Sign In

Sign Up

Chapter 6. Advanced Cluster Analysis

Customer categorization analysis of e-commerce and DBSCAN

The DBSCAN algorithm

Customer categorization analysis of e-commerce

Table of Contents for
6. Advanced Cluster Analysis