Anomaly detection in time series data

Detecting anomalies in raw, streaming time series data requires some data transformation. The most obvious way to do this is to select a time window and sample a time series with a fixed length. In the next step, we want to compare a new time series to our previously collected set to detect whether something is out of the ordinary.

The comparison can be done with various techniques, as follows:

  • Forecasting the most probable following value, as well as the confidence intervals (for example, Holt-Winters exponential smoothing). If a new value is out of the forecasted confidence interval, it is considered anomalous.
  • Cross-correlation compares a new sample to a library of positive samples, and it looks for an exact match. If the match is not found, it is marked as anomalous.
  • Dynamic time wrapping is similar to cross-correlation, but allows for signal distortion in comparison.
  • Discretizing signals to bands, where each band corresponds to a letter. For example, A=[min, mean/3], B=[mean/3, mean*2/3], and C=[mean*2/3, max] transforms the signal into a sequence of letters, such as aAABAACAABBA.... This approach reduces the storage and allows us to apply the text mining algorithms that we will discuss in Chapter 10, Text Mining with Mallet  Topic Modeling and Spam Detection.
  • A distribution-based approach estimates the distribution of values in a specific time window. When we observe a new sample, we can compare whether the distribution matches the previously observed one.

This list is by no means exhaustive. Different approaches are focused on detecting different anomalies (for example, in the value, frequency, and distribution). We will focus on a version of distribution-based approaches in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.0.145