Web click streams data is large and continuously emerging, with hidden trends buried and to be discovered for various usages, such as recommendations. TECNO-STREAMS (Tracking Evolving Clusters in NOisy Streams) is a one-pass algorithm.
The whole algorithm is modeled on the following equations: the robust weight or activation function (1), influence zone (2), pure simulation (3), optimal scale update (4), incremental update of pure simulation and optimal update (5) and (6), the simulation and scale values (7), and finally, the D-W-B-cell update equations (8).
The similarity measures applied in the learning phase are defined here:
The similarity measures applied in the validation phase are defined here:
The summarized pseudocode for the TECNO-STREAMS algorithm is as follows:
Please take a look at the R codes file ch_08_tecno_stream.R
from the bundle of R codes for previous algorithm. The codes can be tested with the following command:
> source("ch_08_tecno_stream.R")
The web click streams denote the user's behavior when visiting the site, especially for e-commerce sites and CRM (Customer Relation Management). The analysis of web click streams will improve the user experience of the customer and optimize the structure of the site to meet the customers' expectation and, finally, increase the income of the site.
In other aspects, web click streams mining can be used to detect DoS attacks, track the attackers, and prevent these on the Web in advance.
The dataset for web click stream is obviously the click records that get generated when the user visits various sites. The major characteristics of this dataset are that it is huge, and the size goes on increasing.
3.144.2.48