Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by Alex Kozlov
Mastering Scala Machine Learning
Mastering Scala Machine Learning
Table of Contents
Mastering Scala Machine Learning
Credits
About the Author
Acknowlegement
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Exploratory Data Analysis
Getting started with Scala
Distinct values of a categorical field
Summarization of a numeric field
Grepping across multiple fields
Basic, stratified, and consistent sampling
Working with Scala and Spark Notebooks
Basic correlations
Summary
2. Data Pipelines and Modeling
Influence diagrams
Sequential trials and dealing with risk
Exploration and exploitation
Unknown unknowns
Basic components of a data-driven system
Data ingest
Data transformation layer
Data analytics and machine learning
UI component
Actions engine
Correlation engine
Monitoring
Optimization and interactivity
Feedback loops
Summary
3. Working with Spark and MLlib
Setting up Spark
Understanding Spark architecture
Task scheduling
Spark components
MQTT, ZeroMQ, Flume, and Kafka
HDFS, Cassandra, S3, and Tachyon
Mesos, YARN, and Standalone
Applications
Word count
Streaming word count
Spark SQL and DataFrame
ML libraries
SparkR
Graph algorithms – GraphX and GraphFrames
Spark performance tuning
Running Hadoop HDFS
Summary
4. Supervised and Unsupervised Learning
Records and supervised learning
Iris dataset
Labeled point
SVMWithSGD
Logistic regression
Decision tree
Bagging and boosting – ensemble learning methods
Unsupervised learning
Problem dimensionality
Summary
5. Regression and Classification
What regression stands for?
Continuous space and metrics
Linear regression
Logistic regression
Regularization
Multivariate regression
Heteroscedasticity
Regression trees
Classification metrics
Multiclass problems
Perceptron
Generalization error and overfitting
Summary
6. Working with Unstructured Data
Nested data
Other serialization formats
Hive and Impala
Sessionization
Working with traits
Working with pattern matching
Other uses of unstructured data
Probabilistic structures
Projections
Summary
7. Working with Graph Algorithms
A quick introduction to graphs
SBT
Graph for Scala
Adding nodes and edges
Graph constraints
JSON
GraphX
Who is getting e-mails?
Connected components
Triangle counting
Strongly connected components
PageRank
SVD++
Summary
8. Integrating Scala with R and Python
Integrating with R
Setting up R and SparkR
Linux
Mac OS
Windows
Running SparkR via scripts
Running Spark via R's command line
DataFrames
Linear models
Generalized linear model
Reading JSON files in SparkR
Writing Parquet files in SparkR
Invoking Scala from R
Using Rserve
Integrating with Python
Setting up Python
PySpark
Calling Python from Java/Scala
Using sys.process._
Spark pipe
Jython and JSR 223
Summary
9. NLP in Scala
Text analysis pipeline
Simple text analysis
MLlib algorithms in Spark
TF-IDF
LDA
Segmentation, annotation, and chunking
POS tagging
Using word2vec to find word relationships
A Porter Stemmer implementation of the code
Summary
10. Advanced Model Monitoring
System monitoring
Process monitoring
Model monitoring
Performance over time
Criteria for model retiring
A/B testing
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Table of Contents
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset