Visitor analysis in the browser cache and DENCLUE

DENsity-based CLUstEring (DENCLUE) is a density-based clustering algorithm that depends on the support of density-distribution functions.

Before a detailed explanation on the DENCLUE algorithm, some concepts need to be introduced; they are influence function, density function, gradient, and density attractor.

The influence function of a specific data object can be any function for which the Gaussian kernel is usually used as the kernel at the data point.

The density function at a point, x, is defined as the sum of the influence functions of all the data objects at this data point.

A point is defined as a density attractor if it is a local maximum of the density function and is computed as

Visitor analysis in the browser cache and DENCLUE
Visitor analysis in the browser cache and DENCLUE

A gradient of the density function is defined in the following equation, given the density function, Visitor analysis in the browser cache and DENCLUE.

Visitor analysis in the browser cache and DENCLUE

DENCLUE defines a density function for the data point space at first. All the local maxima data points are searched and found. Assign each data point to the nearest local maxima point to maximize the density related to it. Each group of data points bound with a local maxima point is defined as a cluster. As a postprocess, the cluster is discarded if its bound local maxima density is lower than the user-predefined value. The clusters are merged if there exists a path such that each point on the path has a higher density value than the user-predefined value.

The DENCLUE algorithm

The summarized pseudocodes for the DENCLUE algorithm is as follows:

The DENCLUE algorithm

The R implementation

Please take a look at the R codes file ch_06_denclue.R from the bundle of R codes for previously mentioned algorithm. The codes can be tested with the following command:

> source("ch_06_denclue.R")

Visitor analysis in the browser cache

The browser-cache analysis provides the website owner with the convenience that shows the best matched part to the visitors, and at the same time, it is related to their privacy protection. The data instances in this context are browser caches, sessions, cookies, various logs, and so on.

The possible factors included in certain data instances can be the Web address, IP address (denotes the position where the visitor comes from), the duration for which the visitor stayed on a specific page, the pages the user visited, the sequence of the visited pages, the date and time of every visit, and so on. The log can be specific to a certain website or to various websites. A more detailed description is given in the following table:

Hit

This refers to each element of a web page downloaded to a viewer's web browser (such as Internet Explorer, Mozilla, or Netscape). Hits do not correspond in any direct fashion to the number of pages viewed or number of visitors to a site. For example, if a viewer downloads a web page with three graphics, the web logfile will show four hits: one for the web page and one for each of the three graphics.

Unique Visitors

The actual number of viewers to the website that came from a unique IP address (see IP address in this table).

New/Return Visitors

The number of first-time visitors to the site compared to returning visitors.

Page views

The number of times a specified web page has been viewed; shows exactly what content people are (or are not) viewing at a website. Every time a visitor hits the page refresh button, another page view is logged.

Page views per visitor

The number of page views divided by the number of visitors; measures how many pages viewers look at each time they visit a website.

IP address

A numeric identifier for a computer. (The format of an IP address is a 32-bit numeric address written as four numbers separated by periods; each number can be zero to 255. For example, 1.160.10.240 could be an IP address.) The IP address can be used to determine a viewer's origin (that is, by country); it also can be used to determine the particular computer network a website's visitors are coming from.

Visitor location

The geographic location of the visitor.

Visitor language

The language setting on the visitor's computer.

Referring pages/sites (URLs)

Indicates how visitors get to a website (that is, whether they type the URL, or web address, directly into a web browser or if they click through from a link at another site).

Keywords

If the referring URL is a search engine, the keywords (search string) that the visitor used can be determined.

Browser type

The type of browser software a visitor is using (that is, Netscape, Mozilla, Internet Explorer, and so on)

Operating system version

The specific operating system the site visitor uses.

Screen resolution

The display settings for the visitor's computer.

Java or Flash-enabled

Whether or not the visitor's computer allows Java (a programming language for applications on the Web) and/or Flash (a software tool that allows web pages to be displayed with animation, or motion).

Connection speed

Whether visitors are accessing the website from a slower dial-up connection, high-speed broadband, or Tl.

Errors

The number of errors recorded by the server, such as a "404-file not found" error; can be used to identify broken links and other problems at the website.

Visit duration

Average time spent on the site (length the visitor stays on the site before leaving). Sites that retain visitors longer are referred to as "sticky" sites.

Visitor paths/navigation

How visitors navigate the website, by specific pages, most common entry pages (the first page accessed by a visitor at a website) and exit points (the page from which a visitor exits a Website), and so on. For example, if a large number of visitors leave the site after looking at a particular page, the analyst might infer that they either found the information they needed, or alternatively, there might be a problem with that page (is it the page where shipping and handling fees are posted, which maybe are large enough to turn visitors away?).

Bounce rate

The percentage of visitors who leave the site after the first page; calculated by the number of visitors who visit only a single page divided by the number of total visits. The bounce rate is sometimes used as another indicator of "stickiness."

The analysis of a visitor is basically history sniffing, which is used for user-behavior analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.72.86