Anomaly detection

Anomalies are essentially abnormal patterns in a series that are irregular deviations from the expected behavior. For example, many of us have watched a cricket match. One form of getting out in this game is to be caught out, and before the ball travels straight to the hands of a fielder, it has to touch the bat of a batsman. If the stadium is very noisy, sometimes it is too difficult for anyone to judge whether the ball has touched the bat or not. To solve this problem, umpires use a device called the snickometer to help them make the call. The snickometer uses the sound from the stump mic to generate a plot of the mic's sound. If the plot is a straight line, then the ball did not make contact with the bat; otherwise, the plot will show a spike. Therefore, a spike is a sign of an anomaly. Another example of an anomaly could be the detection of a malignant tumor in a scan.

Anomaly detection is a technique that we can use to figure out aberrant behavior. An anomaly can also be called an outlier. The following list shows several different anomalies:

  • Point anomalies: A point anomaly is a point that breaches the boundary of a threshold that has been assigned to keep the whole system in check. There is often a system in place to send an alert when this boundary has been breached by a point anomaly. For example, fraud detection in the financial industries can use point anomaly detection to check whether a transaction has taken place from a different city to the card holder's usual location.
  • Contextual anomalies: Context-specific observations are called contextual anomalies. For example, it is commonplace to have lots of traffic on weekdays, but a holiday falling on a Monday may make it look like an anomaly.
  • Collective anomalies: A set of collective data instances helps in detecting anomalies. Say that someone is unexpectedly trying to copy data form a remote machine to a local host. In this case, this anomaly would be flagged as a potential cyber attack.

In this section, we will focus on contextual anomalies and try to detect them with the help of a simple moving average.

First, let's load all the required libraries as follows:

import numpy as np # vectors and matrices
import pandas as pd # tables and data manipulations
import matplotlib.pyplot as plt # plots
import seaborn as sns # more plots
from sklearn.metrics import mean_absolute_error
import warnings # `do not disturb` mode
warnings.filterwarnings('ignore')
%matplotlib inline

Next, we read the dataset using the following code. We are keeping the same dataset—namely, AirPassenger.csv:

data = pd.read_csv('AirPassengers.csv', index_col=['Month'], parse_dates=['Month'])
plt.figure(figsize=(20, 10))
plt.plot(ads)
plt.title('Trend')
plt.grid(True)
plt.show()

We get the output as follows:

Now we will write a function and create a threshold for detecting the anomalies using the following code:

def plotMovingAverage(series, window, plot_intervals=False, scale=1.96, plot_anomalies=False):
rolling_mean = series.rolling(window=window).mean()
plt.figure(figsize=(15,5))
plt.title("Moving average window size = {}".format(window))
plt.plot(rolling_mean, "g", label="Rolling mean trend")
# Plot confidence intervals for smoothed values
if plot_intervals:
mae = mean_absolute_error(series[window:], rolling_mean[window:])
deviation = np.std(series[window:] - rolling_mean[window:])
lower_bond = rolling_mean - (mae + scale * deviation)
upper_bond = rolling_mean + (mae + scale * deviation)
plt.plot(upper_bond, "r--", label="Upper Bond / Lower Bond")
plt.plot(lower_bond, "r--")
# Having the intervals, find abnormal values
if plot_anomalies:
anomalies = pd.DataFrame(index=series.index, columns=series.columns)
anomalies[series<lower_bond] = series[series<lower_bond]
anomalies[series>upper_bond] = series[series>upper_bond]
plt.plot(anomalies, "ro", markersize=10)
plt.plot(series[window:], label="Actual values")
plt.legend(loc="upper left")
plt.grid(True)

Now, let's introduce anomalies to the series using the following:

data_anomaly = data.copy()
data_anomaly.iloc[-20] = data_anomaly.iloc[-20] * 0.2

Now, let's plot it to detect the anomalies introduced using the following code:

plotMovingAverage(data_anomaly, 4, plot_intervals=True, plot_anomalies=True)

The following diagram shows the output:

Now, the introduced anomaly can be seen after 1959 as a dip in the number of travelers. It should be noted, however, that this is one of the simpler methods. ARIMA and Holt-Winters can also be used in this scenario.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.111.87