MLlib

MLlib is one of the flagship components of the Spark ecosystem. It provides a scalable, high-performance interface to perform resource intensive machine learning tasks in Spark. Additionally, MLlib can natively connect to HDFS, HBase, and other underlying storage systems supported in Spark. Due to this versatility, users do not need to rely on a pre-existing Hadoop environment to start using the algorithms built into MLlib. Some of the supported algorithms in MLlib include:

  • Classification: logistic regression
  • Regression: generalized linear regression, survival regression and others
  • Decision trees, random forests, and gradient-boosted trees
  • Recommendation: Alternating least squares
  • Clustering: K-means, Gaussian mixtures and others
  • Topic modeling: Latent Dirichlet allocation
  • Apriori: Frequent Itemsets, Association Rules

ML workflow utilities include:

  • Feature transformations: Standardization, normalization and others
  • ML Pipeline construction
  • Model evaluation and hyper-parameter tuning
  • ML persistence: Saving and loading models and Pipelines
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.53.93