Detecting novelty in text, topic detection, and mining contextual outliers

If each of the data instances in the training dataset is related to a specific context attribute, then the dramatic deviation of the data point from the context will be termed as an outlier. There are many applications of this assumption.

The conditional anomaly detection (CAD) algorithm

The summarized pseudocodes of the CAD algorithm are as follows:

The conditional anomaly detection (CAD) algorithm

The following are the summarized pseudocodes of the GMM-CAD-Full algorithm:

The conditional anomaly detection (CAD) algorithm
The conditional anomaly detection (CAD) algorithm
The conditional anomaly detection (CAD) algorithm
The conditional anomaly detection (CAD) algorithm
The conditional anomaly detection (CAD) algorithm

The summarized pseudocodes of the GMM-CAD-Split algorithm are as follows:

The conditional anomaly detection (CAD) algorithm

The R implementation

Look up the file of R codes, ch_07_ contextual _based.R, from the bundle of R codes for the previously mentioned algorithms. The codes can be tested with the following command:

> source("ch_07_ contextual _based.R")

Detecting novelty in text and topic detection

One application of outlier detection is in finding novel topics in a collection of documents or articles from newspapers. Major detection includes opinion detection. This is basically an outlier among a lot of opinions.

With the increase in social media, there are many events that happen every day. The earlier collection was that of only special events or ideas for related researchers or companies.

The characteristics related to increase in the collection are the various sources of data, documents in different formats, high-dimensional attributes, and sparse source data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.72.86