Part II

Stream Data Analytics

Introduction to Part II

Now that we have provided an overview of the various supporting technologies including data mining and cloud computing in Part I, the chapters in Part II will describe various stream data mining techniques that we have designed and developed. Note that we use the term data mining and data analytics interchangeably.

Part II, consisting of six chapters, provides a detailed overview of the novel class detection techniques for data streams. These techniques are part of stream data mining.

Chapter 8 focuses on the various challenges associated with data stream classification and describes our approach to meet those challenges. Data stream classification mainly consists of two steps. Building (or learning) a classification model using historical labeled data and classifying (or predicting the class of) future instances using the model. The focus in Chapter 8 will mainly be on the challenges involved in data stream classification. Chapter 9 discusses related work in data stream classification, semisupervised clustering, and novelty detection. First, we discuss various data stream classification techniques that solve the infinite length and concept-drift problems. Also, we describe how our proposed multiple partition and multiple chunk (MPC) ensemble technique is different from the existing techniques. Second, we discuss various novelty/anomaly detection techniques and their differences from our ECSMiner approach. Finally, we describe different semisupervised clustering techniques and the advantages of our ReaSC approach over them. Chapter 10 describes the MPC ensemble classification technique. First, we present an overview of the approach. Then we establish theoretical justification for using this approach over other approaches. Finally, we show the experimental results on real and synthetic data. Chapter 11 explains ECSMiner, our novel class detection technique, in detail. First, we provide a basic idea about the concept-evolution problem and give an outline of our solution. Then, we discuss the algorithm in detail and show how to efficiently detect a novel class within given time constraints and limited memory. Next, we analyze the algorithm’s efficiency in correctly detecting the novel classes. Finally, we present experimental results on different benchmark datasets. Chapter 12 describes the limited labeled data problem, and our solution, ReaSC. First, we give an overview of the data stream classification problem, and a top level description of ReaSC. Then, we describe the semisupervised clustering technique to efficiently learn a classification model from scarcely labeled training data. Next, we discuss various issues related to stream evolution. Last, we provide experimental results on a number of datasets. Finally, Chapter 13 discusses our findings and provides directions for further work in stream data analytics, in general, and stream data classification, in particular. In addition, we will discuss stream data analytics for handling massive amounts of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.100.120