Histogram-based anomaly detection

In histogram-based anomaly detection, we split the signals by a selected time window, as shown in the following diagram.

For each window, we calculate the histogram; that is, for a selected number of buckets, we count how many values fall into each bucket. The histogram captures the basic distribution of values in a selected time window, as shown in the center of the diagram.

Histograms can then be directly presented as instances, where each bin corresponds to an attribute. Furthermore, we can reduce the number of attributes by applying a dimensionality-reduction technique, such as Principal Component Analysis (PCA), which allows us to visualize the reduced-dimension histograms in a plot, as shown at the bottom-right of the diagram, where each dot corresponds to a histogram.

In our example, the idea is to observe website traffic for a couple of days, and then create histograms; for example, four-hour time windows, to build a library of positive behavior. If a new time window histogram cannot be matched against a positive library, we can mark it as an anomaly:

For comparing a new histogram to a set of existing histograms, we will use a density-based k-nearest neighbor algorithm, Local Outlier Factor (LOF) (Breunig, et al., 2000). The algorithm is able to handle clusters with different densities, as shown in the following diagram. For example, the upper-right cluster is large and widespread, compared to the bottom-left cluster, which is smaller and denser:

Let's get started!

Table of Contents for Histogram-based anomaly detection

Create new playlist

Sign In

Sign Up

Table of Contents for
Histogram-based anomaly detection