Two major approaches of proximity-based methods are distance-based and density-based outlier detection algorithms.
The summarized pseudocodes of the distance-based outlier detection algorithm are as follows, given a dataset D, size of the input dataset n, threshold r (r > 0
), and :
A outlier is defined as a data point, o
, and subjected to this formula:
Let's now learn the pseudocodes for a variety of distance-based outlier detection algorithms, which are summarized in the following list. The input parameters are k
, n
, and D
, which represent the neighbors' number, outlier number to be identified, and input dataset, respectively. A few supporter functions also are defined. Nearest (o, S, k)
returns k
nearest objects in S
to o
, Maxdist (o, S)
returns the maximum distance between o
and points from S
, and TopOutlier (S, n)
returns the top n outliers in S
according to the distance to their kth nearest neighbor.
The Dolphin algorithm is a distance-based outlier detection algorithm. The summarized pseudocodes of this algorithm are listed as follows:
Look up the file of R codes, ch_07_proximity_based.R
, from the bundle of R codes for the preceding algorithms. The codes can be tested with the following command:
> source("ch_07_proximity_based.R")
The purpose of outlier detection is to find the patterns in source datasets that do not conform to the standard behavior. The dataset here consists of the calling records, and the patterns exist in the calling records.
There are many special algorithms developed for each specific domain. Misuse of a mobile is termed as mobile fraud. The subject under research is the calling activity or call records. The related attributes include, but are not limited to, call duration, calling city, call day, and various services' ratios.
3.145.93.145