Chapter 4. Machine Learning Algorithms
In this chapter, we continue with the following topics:
- Hands on with Spark/MLlib
- Implement classification and regression algorithms using Spark/MLlib
- Implement clustering algorithms using Spark/MLlib
- Using Python for plotting our results
Hands on with Spark/MLlib
In the first chapter, we set up Apache Spark and also discussed an example using Spark/MLlib. Now is the time to explore Spark/MLlib in detail.
The following are the different features provided by Spark/MLlib:
- Data types:
vector
, LabeledPoint
, matrix
, DistributedMatrix
(BlockMatrix
, RowMatrix
, IndexedRowMatrix
, and CoordinateMatrix
) - Statistics: Summary statistics, correlations, stratified sampling, hypothesis testing, random data generation
- Feature extraction: TF-IDF (HashingTF and IDF), scaling, normalization, and Word2Vec
- Classification/regression: Linear regression (
LinearRegressionWithSGD
, LassoWithSGD
, and RidgeRegressionWithSGD
), logistic regression (SGD
, LBFGS
), SVM, Naive Bayes, decision tree, ensembles (RandomForests
and GradientBostedTrees
) - Clustering: K-Means, Gaussian mixture model/expectation-maximization, power iteration clustering, LDA (
EMLDAOptimizer
/OnlineLDAOptimizer
), streaming K-Means - Association analysis: Frequent pattern mining (FPGrowth)
- Dimensionality reduction: SVD and PCA
- Recommendation: Collaborative filtering
- Pipeline API
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.