Chapter 5. Real-Time Stream Machine Learning

In Chapter 2, Practical Approach to Real-World Supervised Learning, Chapter 3, Unsupervised Machine Learning Techniques, and Chapter 4, Semi-Supervised and Active Learning, we discussed various techniques of classification, clustering, outlier detection, semi-supervised, and active learning. The form of learning done from existing or historic data is traditionally known as batch learning.

All of these algorithms or techniques assume three things, namely:

  • Finite training data is available to build different models.
  • The learned model will be static; that is, patterns won't change.
  • The data distribution also will remain the same.

In many real-world data scenarios, there is either no training data available a priori or the data is dynamic in nature; that is, changes continuously with respect to time. Many real-world applications may also have data which has a transient nature to it and comes in high velocity or volume such as IoT sensor information, network monitoring, and Twitter feeds. The requirement here is to learn from the instance immediately and then update the learning.

The nature of dynamic data and potentially changing distribution renders existing batch-based algorithms and techniques generally unsuitable for such tasks. This gave rise to adaptable or updatable or incremental learning algorithms in machine learning. These techniques can be applied to continuously learn from the data streams. In many cases, the disadvantage of learning from Big Data due to size and the need to fit the entire data into memory can also be overcome by converting the Big Data learning problem into an incremental learning problem and inspecting one example at a time.

In this chapter, we will discuss the assumptions and discuss different techniques in supervised and unsupervised learning that facilitate real-time or stream machine learning. We will use the open source library Massive Online Analysis (MOA) for performing a real-world case study.

The major sections of this chapter are:

  • Assumptions and mathematical notation.
  • Basic stream processing and computational techniques. A discussion of stream computations, sliding windows including the ADWIN algorithm, and sampling.
  • Concept drift and drift detection: Introduces learning evolving systems and data management, detection methods, and implicit and explicit adaptation.
  • Incremental supervised learning: A discussion of learning from labeled stream data, modeling techniques including linear, non-linear, and ensemble algorithms. This is followed by validation, evaluation, and model comparison methods.
  • Incremental unsupervised learning: Clustering techniques similar to those discussed in Chapter 3, Unsupervised Machine Learning Techniques, including validation and evaluation techniques.
  • Unsupervised learning using outlier detection: Partition-based and distance-based, and the validation and evaluation techniques used.
  • Case study for stream-based learning: Introduces the MOA framework, presents the business problem, feature analysis, mapping to machine learning blueprint; describes the experiments, and concludes with the presentation and analysis of the results.

Assumptions and mathematical notations

There are some key assumptions made by many stream machine learning techniques and we will state them explicitly here:

  • The number of features in the data is fixed.
  • Data has small to medium dimensions, or number of features, typically in the hundreds.
  • The number of examples or training data can be infinite or very large, typically in the millions or billions.
  • The number of class labels in supervised learning or clusters are small and finite, typically less than 10.
  • Normally, there is an upper bound on memory; that is, we cannot fit all the data in memory, so learning from data must take this into account, especially lazy learners such as K-Nearest-Neighbors.
  • Normally, there is an upper bound on the time taken to process the event or the data, typically a few milliseconds.
  • The patterns or the distributions in the data can be evolving over time.
  • Learning algorithms must converge to a solution in finite time.

Let Dt = {xi, yi : y = f(x)} be the given data available at time t ∈ {1, 2, … i}.

An incremental learning algorithm produces sequences of models/hypotheses {.., Gj-1, Gj, Gj+1..} for the sequence of data {.., Dj-1, Dj, Dj+1..} and model/hypothesis Gi depends only on the previous hypothesis Gi-1 and current data Di.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.177.85