Spark MLlib

MLlib is Spark's scalable machine learning library and is an extension of the Spark Core API which provides a library of easy-to-use machine learning algorithms. Spark algorithms are implemented in Scala and then expose the API for Java, Scala, Python, and R. Spark provides support of local vectors and matrix data types stored on a single machine, as well as distributed matrices backed by one or multiple RDDs. The beauties of Spark MLlib are numerous. For example, algorithms are highly scalable and leverage Spark's ability to work with a massive amounts of data.

  • They are fast foward designed for parallel computing with an in-memory based operation that is 100 times faster compared to MapReduce data processing (they also support disk-based operation, which is 10 times faster compared to what MapReduce has as normal data processing).
  • They are diverse, since they cover common machine learning algorithms for regression analysis, classification, clustering, recommender systems, text analytics, and frequent pattern mining, and obviously cover all the steps required to build a scalable machine learning application.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.166.37