Summary

This chapter has attempted to provide you with an overview of some of the functionality available within the Apache Spark MLlib module. It has also shown the functionality that will soon be available in terms of ANN, or artificial neural networks, which is intended for release in Spark 1.3. It has not been possible to cover all the areas of MLlib, due to the time and space allowed for this chapter.

You have been shown how to develop Scala-based examples for Naïve Bayes classification, K-Means clustering, and ANN or artificial neural networks. You have been shown how to prepare test data for these Spark MLlib routines. You have also been shown that they all accept the LabeledPoint structure, which contains features and labels. Also, each approach takes a training and prediction approach to training and testing a model using different data sets. Using the approach shown in this chapter, you can now investigate the remaining functionality in the MLlib library. You should refer to the http://spark.apache.org/ website, and ensure that when checking documentation that you refer to the correct version, that is, http://spark.apache.org/docs/1.0.0/ for version 1.0.0.

Having examined the Apache Spark MLlib machine learning library, in this chapter, it is now time to consider Apache Spark's stream processing capability. The next chapter will examine stream processing using the Spark and Scala-based example code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.141.115