Home Page Icon
Home Page
Table of Contents for
Apache Spark Machine Learning Blueprints
Close
Apache Spark Machine Learning Blueprints
by Alex Liu
Apache Spark Machine Learning Blueprints
Apache Spark Machine Learning Blueprints
Table of Contents
Apache Spark Machine Learning Blueprints
Credits
About the Author
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the color images of this book
Errata
Piracy
Questions
1. Spark for Machine Learning
Spark overview and Spark advantages
Spark overview
Spark advantages
Spark computing for machine learning
Machine learning algorithms
MLlib
Other ML libraries
Spark RDD and dataframes
Spark RDD
Spark dataframes
Dataframes API for R
ML frameworks, RM4Es and Spark computing
ML frameworks
RM4Es
The Spark computing framework
ML workflows and Spark pipelines
ML as a step-by-step workflow
ML workflow examples
Spark notebooks
Notebook approach for ML
Step 1: Getting the software ready
Step 2: Installing the Knitr package
Step 3: Creating a simple report
Spark notebooks
Summary
2. Data Preparation for Spark ML
Accessing and loading datasets
Accessing publicly available datasets
Loading datasets into Spark
Exploring and visualizing datasets
Data cleaning
Dealing with data incompleteness
Data cleaning in Spark
Data cleaning made easy
Identity matching
Identity issues
Identity matching on Spark
Entity resolution
Short string comparison
Long string comparison
Record deduplication
Identity matching made better
Crowdsourced deduplication
Configuring the crowd
Using the crowd
Dataset reorganizing
Dataset reorganizing tasks
Dataset reorganizing with Spark SQL
Dataset reorganizing with R on Spark
Dataset joining
Dataset joining and its tool – the Spark SQL
Dataset joining in Spark
Dataset joining with the R data table package
Feature extraction
Feature development challenges
Feature development with Spark MLlib
Feature development with R
Repeatability and automation
Dataset preprocessing workflows
Spark pipelines for dataset preprocessing
Dataset preprocessing automation
Summary
3. A Holistic View on Spark
Spark for a holistic view
The use case
Fast and easy computing
Methods for a holistic view
Regression modeling
The SEM approach
Decision trees
Feature preparation
PCA
Grouping by category to use subject knowledge
Feature selection
Model estimation
MLlib implementation
The R notebooks' implementation
Model evaluation
Quick evaluations
RMSE
ROC curves
Results explanation
Impact assessments
Deployment
Dashboard
Rules
Summary
4. Fraud Detection on Spark
Spark for fraud detection
The use case
Distributed computing
Methods for fraud detection
Random forest
Decision trees
Feature preparation
Feature extraction from LogFile
Data merging
Model estimation
MLlib implementation
R notebooks implementation
Model evaluation
A quick evaluation
Confusion matrix and false positive ratios
Results explanation
Big influencers and their impacts
Deploying fraud detection
Rules
Scoring
Summary
5. Risk Scoring on Spark
Spark for risk scoring
The use case
Apache Spark notebooks
Methods of risk scoring
Logistic regression
Preparing coding in R
Random forest and decision trees
Preparing coding
Data and feature preparation
OpenRefine
Model estimation
The DataScientistWorkbench for R notebooks
R notebooks implementation
Model evaluation
Confusion matrix
ROC
Kolmogorov-Smirnov
Results explanation
Big influencers and their impacts
Deployment
Scoring
Summary
6. Churn Prediction on Spark
Spark for churn prediction
The use case
Spark computing
Methods for churn prediction
Regression models
Decision trees and Random forest
Feature preparation
Feature extraction
Feature selection
Model estimation
Spark implementation with MLlib
Model evaluation
Results explanation
Calculating the impact of interventions
Deployment
Scoring
Intervention recommendations
Summary
7. Recommendations on Spark
Apache Spark for a recommendation engine
The use case
SPSS on Spark
Methods for recommendation
Collaborative filtering
Preparing coding
Data treatment with SPSS
Missing data nodes on SPSS modeler
Model estimation
SPSS on Spark – the SPSS Analytics server
Model evaluation
Recommendation deployment
Summary
8. Learning Analytics on Spark
Spark for attrition prediction
The use case
Spark computing
Methods of attrition prediction
Regression models
About regression
Preparing for coding
Decision trees
Preparing for coding
Feature preparation
Feature development
Feature selection
Principal components analysis
Subject knowledge aid
ML feature selection
Model estimation
Spark implementation with the Zeppelin notebook
Model evaluation
A quick evaluation
The confusion matrix and error ratios
Results explanation
Calculating the impact of interventions
Calculating the impact of main causes
Deployment
Rules
Scoring
Summary
9. City Analytics on Spark
Spark for service forecasting
The use case
Spark computing
Methods of service forecasting
Regression models
About regression
Preparing for coding
Time series modeling
About time series
Preparing for coding
Data and feature preparation
Data merging
Feature selection
Model estimation
Spark implementation with the Zeppelin notebook
Spark implementation with the R notebook
Model evaluation
RMSE calculation with MLlib
RMSE calculation with R
Explanations of the results
Biggest influencers
Visualizing trends
The rules of sending out alerts
Scores to rank city zones
Summary
10. Learning Telco Data on Spark
Spark for using Telco Data
The use case
Spark computing
Methods for learning from Telco Data
Descriptive statistics and visualization
Linear and logistic regression models
Decision tree and random forest
Data and feature development
Data reorganizing
Feature development and selection
Model estimation
SPSS on Spark – SPSS Analytics Server
Model evaluation
RMSE calculations with MLlib
RMSE calculations with R
Confusion matrix and error ratios with MLlib and R
Results explanation
Descriptive statistics and visualizations
Biggest influencers
Special insights
Visualizing trends
Model deployment
Rules to send out alerts
Scores subscribers for churn and for Call Center calls
Scores subscribers for purchase propensity
Summary
11. Modeling Open Data on Spark
Spark for learning from open data
The use case
Spark computing
Methods for scoring and ranking
Cluster analysis
Principal component analysis
Regression models
Score resembling
Data and feature preparation
Data cleaning
Data merging
Feature development
Feature selection
Model estimation
SPSS on Spark – SPSS Analytics Server
Model evaluation
RMSE calculations with MLlib
RMSE calculations with R
Results explanation
Comparing ranks
Biggest influencers
Deployment
Rules for sending out alerts
Scores for ranking school districts
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Table of Contents
Next
Next Chapter
Apache Spark Machine Learning Blueprints
Apache Spark Machine Learning Blueprints
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset