Detecting anomalies with the Local Outlier Factor method

The distance measurement-based methods are widely used for solving different machine learning problems, as well as for anomaly detection. These methods assume that there is a specific metric in the object space that helps us find anomalies. The general assumption when we use distance-based methods for anomaly detection is that the anomaly only has a few neighbors, while a normal point has many. Therefore, for example, the distance to the kth neighbor can serve as a good measure of anomalies, as reflected in the Local Outlier Factor (LOF) method. This method is based on estimating the density of objects that have been checked for anomalies. Objects lying in the areas of lowest density are considered anomalies or outliers. The advantage of the LOF method over other methods is that it works in conjunction with the local density of objects. Thus, the LOF successfully recognizes outliers in situations where there are objects of different classes that are not necessarily anomalies in the training dataset.

For example, let's assume that there is a distance, k-distance(A), from the object [A] to the kth nearest neighbor. Note that the set of nearest neighbors includes all objects at this distance. We denote the set of k nearest neighbors as Nk(A). This distance is used to determine the reachability distance:

If point A lies among k neighbors of point B, then reachability-distance will be equal to the k-distance of point B. Otherwise, it will be equal to the exact distance between points A and B, which is given by the dist function. The local reachability density of an object A is defined as follows:

Local reachability density is the inverse of the average reachability distance of the object, A, from its neighbors. Note that this is not the average reachability distance of neighbors from A (which, by definition, should have been k-distance(A)), but is the distance at which A can be reached from its neighbors. The local reachability densities are then compared with the local reachability densities of the neighbors:

The provided formula gives the average local reachability density of the neighbors, divided by the local reachability density of the object itself. A value of approximately 1 means that the object can be compared with its neighbors (and therefore it is not an outlier). A value less than 1 indicates a dense area (objects have many neighbors), while values that are significantly larger than 1 indicate anomalies.

The disadvantage of this method is the fact that the resulting values are difficult to interpret. A value of 1 or less indicates that a point is purely internal, but there is no clear rule by which a point will be an outlier. In one dataset, the value 1.1 may indicate an outlier. However, in another dataset with a different set of parameters (for example, if there is data with sharp local fluctuations), the value 2 may also indicate internal objects. These differences can also occur within a single dataset due to the locality of the method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.109.8