Classification using massive online analysis

Massive Online Analysis (MOA), as discussed in Chapter 2Java Libraries and Platforms for Machine Learning, is another library that can be used to achieve classification. It is mainly designed to work with the stream. If it is working with the stream, a lot of data will be there; so, how do we evaluate the model? In the traditional batch learning mode, we usually divide the data into training and test sets and cross-validation is preferred if the data is limited. In stream processing, where the data seems to be unlimited, cross-validation proves to be expensive. Two approaches that we can use are as follows:

  • Holdout: This is useful when the data is already divided into two parts, which are predefined. It gives the estimation of the current classifier, if it is similar to the current data. This similarity is hard to guarantee between the holdout set and the current data.
  • Interleaved test-then-train, or prequential: In this method, the model is tested on the example before it is used for training. So, the model is always tested for the data that it has never seen. In this, no holdout scheme is needed. It uses the available data. Over time, this approach will improve the accuracy of classification.

MOA provides various ways to generate the stream of data. First, download the MOA library from https://moa.cms.waikato.ac.nz/downloads/. Add the downloaded .jar files to Eclipse, like we did for Weka at the beginning of this chapter. We will be using the GUI tool provided by MOA to see how to use MOA for streams. To launch the GUI, make sure moa.jar and sizeofag.jar are in the current path; then, run the following command in Command Prompt:

$ java -cp moa.jar -javaagent:sizeofag.jar moa.gui.GUI

It will display the following output:

We can see that it has options for classification, regression, clustering, outliers, and more. Clicking on the Configure button will display the screen used to make your classifier. It provides various learners and streams to work with, as shown in the following screenshot:

The following is an example of running RandomTreeGenerator with NaiveBayes and HoeffdingTree:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.15.161