Intrusion detection and clustering-based methods

The strategy of outlier detection technologies based on the clustering algorithm is focused on the relation between data objects and clusters.

Hierarchical clustering to detect outliers

Outlier detection that uses the hierarchical clustering algorithm is based on the k-Nearest Neighbor graph. The input parameters include the input dataset, DATA, of size, n, and each data point with k variables, the distance measure function (d), one hierarchical algorithm (h), threshold (t), and cluster number (nc).

Hierarchical clustering to detect outliers

The k-means-based algorithm

The process of the outlier detection based on the k-means algorithm is illustrated in the following diagram:

The k-means-based algorithm

The summarized pseudocodes of outlier detection using the k-means algorithm are as follows:

  • Phase 1 (Data preparation):
    1. The target observations and attributes should be aligned to improve the accuracy of the result and performance of the k-means clustering algorithm.
    2. If the original dataset has missing data, its handling activity must be carried out. The data of maximum likelihood that is anticipated by the EM algorithm is fed as input into the missing data.
  • Phase 2 (The outlier detection procedure):
    1. The k value should be determined in order to run the k-means clustering algorithm. The proper k value is decided by referring to the value of the Cubic Clustering Criterion.
    2. The k-means clustering algorithm runs with the decided k value. On completion, the expert checks the external and internal outliers in the clustering results. If the other groups' elimination of the outliers is more meaningful, then he/she stops this procedure. If the other groups need to be recalculated, he/she again runs the k-means clustering algorithms but without the detected outliers.
  • Phase 3 (Review and validation):
    1. The result of the previous phase is only a candidate for this phase. By considering the domain knowledge, we can find the true outliers.

The ODIN algorithm

Outlier detection using the indegree number with the ODIN algorithm is based on the k-Nearest Neighbor graph.

The ODIN algorithm

The R implementation

Look up the file of R codes, ch_07_ clustering _based.R, from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_07_ clustering _based.R")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.75