Given a dataset, if a collection of related data instances is anomalous with respect to the entire dataset, it is defined as a collective outlier.
The summarized pseudocodes of the ROD algorithm are as follows. The input parameters include the multidimensional attribute space, attribute dataset (D
), distance measure function (F
), the depth of neighbor (ND
), spatial graph (G = (V, E)
), and confidence interval (CI
):
Look up the file of R codes, ch_07_ rod.R
, from the bundle of R codes for the previously mentioned algorithm. The codes can be tested with the following command:
> source("ch_07_ rod.R")
Collective outliers denote a collection of data that is an abnormal contrast to the input dataset. As a major characteristic, only the collection of data appearing together will be collective outliers, but specific data itself in that collection does not appear together with other data in that collection of data, which is definitely not an outlier. Another characteristic of a collective outlier is that it can be a contextual outlier.
Collective outliers may be a sequence of data, spatial data, and so on.
18.118.184.91