Density-based k-nearest neighbors

To demonstrate how LOF calculates scores, we'll first split the dataset into training and testing sets by using the testCV(int, int) function. The first parameter specifies the number of folds, while the second parameter specifies which fold to return:

// split data to train and test 
Instances trainData = dataset.testCV(2, 0); 
Instances testData = dataset.testCV(2, 1);

The LOF algorithm is not a part of the default Weka distribution, but it can be downloaded through Weka's package manager at http://weka.sourceforge.net/packageMetaData/localOutlierFactor/index.html.

The LOF algorithm has two implemented interfaces: as an unsupervised filter that calculates LOF values (known unknowns), and as a supervised k-nearest neighbors classifier (known knowns). In our case, we want to calculate the outlierness factor, and therefore, we'll use the unsupervised filter interface:

import weka.filters.unsupervised.attribute.LOF;

The filter is initialized in the same way as a usual filter. We can specify k number of neighbors (for example, k=3) with the -min and -max parameters. LOF allows us to specify two different k parameters, which are used internally as the upper and lower bound, to find the minimum or maximum number of lof values:

LOF lof = new LOF(); 
lof.setInputFormat(trainData); 
lof.setOptions(new String[]{"-min", "3", "-max", "3"});

Next, we load the training instances into the filter that will serve as a positive example library. After we complete the loading, we will call the batchFinished() method to initialize the internal calculations:

for(Instance inst : trainData){ 
  lof.input(inst); 
} 
lof.batchFinished();

Finally, we can apply the filter to the test data. The Filter() function will process the instances and append an additional attribute at the end, containing the LOF score. We can simply provide the score as output in the console:

Instances testDataLofScore = Filter.useFilter(testData, lof); 
 
for(Instance inst : testDataLofScore){ 
  System.out.println(inst.value(inst.numAttributes()-1)); 
}

The LOF score of the first couple of test instances is as follows:

    1.306740014927325
    1.318239332210458
    1.0294812291949587
    1.1715039094530768

To understand the LOF values, we need some background on the LOF algorithm. It compares the density of an instance to the density of its nearest neighbors. The two scores are divided, producing the LOF score. An LOF score of around 1 indicates that the density is approximately equal, while higher LOF values indicate that the density of the instance is substantially lower than the density of its neighbors. In such cases, the instance can be marked as anomalous.

Table of Contents for Density-based k-nearest neighbors

Create new playlist

Sign In

Sign Up

Table of Contents for
Density-based k-nearest neighbors