Defining the problem

In data mining, anomaly detection (or outlier detection) is defined as the identification of items, events, or observations that do not conform to an expected pattern (or other items) in a dataset, and that are sometimes referred to as rare events. These events raise suspicion and, typically, the anomalous items will translate to some kind of problem that requires deeper attention and needs to be addressed. Common events include bank fraud, structural defects, medical conditions, or simply mistakes in a text.

Anomalous items that raise suspicions by differing significantly from the majority of the data may also be referred to as outliers, novelties, noise, deviations, and exceptions.

Anomaly detection is a technique or method that is used to identify unusual patterns that don't seem to conform to the accepted behavior. You will routinely see anomaly detection techniques used in many areas such as intrusion detection, system-health monitoring, and fraud detection.

No matter what the application might be, it is critical to first establish and understand certain baselines and boundaries that will define what an anomaly is (for that application).

Anomalies are generally categorized as point (which is when a single data point is too different from then most others), contextual (contextual anomalies are only a problem in specific situations/contexts), or collective (when data as part of a set becomes an issue anomalies).

It's important to grasp that the process of anomaly detection is comparable to noise removal and novelty detection, but novelty detection is the process of identifying an unobserved pattern in new observations that are not included in the training data, while noise removal is the process of removing the occurrence of unwanted observations from the meaningful data.

Table of Contents for Defining the problem

Create new playlist

Sign In

Sign Up

Table of Contents for
Defining the problem