Spark machine learning libraries

As already stated, in the pre-Spark era, big data modelers typically used to build their ML models using statistical languages such as R, STATA, and SAS. However, this kind of workflow (that is, the execution flow of these ML algorithms) lacks efficiency, scalability, and throughput, as well as accuracy, with, of course, extended execution times.

Then, data engineers used to reimplement the same model in Java, for example, to deploy on Hadoop. Using Spark, the same ML model can be rebuilt, adopted, and deployed, making the whole workflow much more efficient, robust, and faster, allowing you to provide hands-on insight to increase the performance. Moreover, implementing these algorithms in Hadoop means that these algorithms can run in parallel that cannot be run on R, STATA and SAS and so on. The Spark machine learning library is divided into two packages: Spark MLlib (spark.mllib) and Spark ML (spark.ml).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.104.160