Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Intrusion detection and density-based methods

Here is a formal definition of outliers formalized based on concepts such as, LOF, LRD, and so on. Generally speaking, an outlier is a data point biased from others so much that it seems as if it has not been generated from the same distribution functions as others have been.

Given a dataset, D, a DB (x, y)-outlier, p, is defined like this:

The k-distance of the p data point denotes the distance between p and the data point, o, which is member of D:

The k-distance neighborhood of the p object is defined as follows, q being the k-Nearest Neighbor of p:

The following formula gives the reachability distance of an object, p, with respect to an object, o:

The Local Reachability Density (LRD) of a data object, o, is defined like this:

Intrusion detection and density-based methods

The Local Outlier Factor (LOF) is defined as follows, and it measures the degree of the outlierness:

A property of LOF (p) is defined as shown in the following equation:

These equations are illustrated as follows:

The OPTICS-OF algorithm

The input parameters for the bagging algorithm are:

, the dataset
, the parameter
, another parameter

The output of the algorithm is the value of CBLOF, for all records.

The summarized pseudocodes of the OPTICS-OF algorithm are as follows:

The High Contrast Subspace algorithm

The summarized pseudocodes of the High Contrast Subspace (HiCS) algorithm are as follows, where the input parameters are S, M, and . The output is a contrast, |S|.

The R implementation

Look up the file of R codes, ch_07_ density _based.R, from the bundle of R codes for the previously mentioned algorithms. The codes can be tested using the following command:

> source("ch_07_ density _based.R")

Intrusion detection

Any malicious activity against systems, networks, and servers can be treated as intrusion, and finding such activities is called intrusion detection.

The characteristics of situations where you can detect intrusion are high volume of data, missing labeled data in the dataset (which can be training data for some specific solution), time series data, and false alarm rate in the input dataset.

An intrusion detection system is of two types: host-based and network-based intrusion detection systems. A popular architecture for intrusion detection based on data mining is illustrated in the following diagram:

The core algorithms applied in an outlier detection system are usually semi-supervised or unsupervised according to the characteristics of intrusion detection.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Intrusion detection and density-based methods

Create new playlist

Sign In

Sign Up

Intrusion detection and density-based methods

The OPTICS-OF algorithm

The High Contrast Subspace algorithm

The R implementation

Intrusion detection

Table of Contents for
Intrusion detection and density-based methods