Summary

The assumptions in stream-based learning are different from batch-based learning, chief among them being upper bounds on operating memory and computation times. Running statistics using sliding windows or sampling must be computed in order to scale to a potentially infinite stream of data. We make the distinction between learning from stationary data, where it is assumed the generating data distribution is constant, and dynamic or evolving data, where concept drift must be accounted for. This is accomplished by techniques involving the monitoring of model performance changes or the monitoring of data distribution changes. Explicit and implicit adaptation methods are ways to adjust to the concept change.

Several supervised and unsupervised learning methods have been adapted for incremental online learning. Supervised methods include linear, non-linear, and ensemble techniques, The HoeffdingTree is introduced which is particularly interesting due largely in part to its guarantees on upper bounds on error rates. Model validation techniques such as prequential evaluation are adaptations unique to incremental learning. For stationary supervised learning, evaluation measures are similar to those used in batch-based learning. Other measures are used in the case of evolving data streams.

Clustering algorithms operating under fixed memory and time constraints typically use small memory buffers with standard techniques in a single pass. Issues specific to streaming must be considered during evaluations of clustering, such as aging, noise, and missed or misplaced points. Outlier detection in data streams is a relatively new and growing field. Extending ideas in clustering to anomaly detection has proved very effective.

The experiments in the case study in this chapter use the Java framework MOA, illustrating various stream-based learning techniques for supervised, clustering, and outlier detection.

In the next chapter, we embark on a tour of the probabilistic graph modelling techniques that are useful in representing, eliciting knowledge, and learning in various domains.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.40.189