Web click streams and mining symbolic sequences

Web click streams data is large and continuously emerging, with hidden trends buried and to be discovered for various usages, such as recommendations. TECNO-STREAMS (Tracking Evolving Clusters in NOisy Streams) is a one-pass algorithm.

The TECNO-STREAMS algorithm

The whole algorithm is modeled on the following equations: the robust weight or activation function (1), influence zone (2), pure simulation (3), optimal scale update (4), incremental update of pure simulation and optimal update (5) and (6), the simulation and scale values (7), and finally, the D-W-B-cell update equations (8).

The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm

The similarity measures applied in the learning phase are defined here:

The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm

The similarity measures applied in the validation phase are defined here:

The TECNO-STREAMS algorithm
The TECNO-STREAMS algorithm

The summarized pseudocode for the TECNO-STREAMS algorithm is as follows:

The TECNO-STREAMS algorithm

The R implementation

Please take a look at the R codes file ch_08_tecno_stream.R from the bundle of R codes for previous algorithm. The codes can be tested with the following command:

> source("ch_08_tecno_stream.R")

Web click streams

The web click streams denote the user's behavior when visiting the site, especially for e-commerce sites and CRM (Customer Relation Management). The analysis of web click streams will improve the user experience of the customer and optimize the structure of the site to meet the customers' expectation and, finally, increase the income of the site.

In other aspects, web click streams mining can be used to detect DoS attacks, track the attackers, and prevent these on the Web in advance.

The dataset for web click stream is obviously the click records that get generated when the user visits various sites. The major characteristics of this dataset are that it is huge, and the size goes on increasing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.56.216