Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by Mahmoud Parsian
Data Algorithms with Spark
1. Reductions in Spark
Creating Pair RDDs
Example: Using Collections
Example: Using map() Transformation
Reducer transformations
Spark’s Reductions
What is a Reduction?
Spark’s Reduction Transformations
Simple Warmup Example
Solution by reduceByKey()
Using Lambda Expressions
Using Functions
Solution by groupByKey()
Solution by aggregateByKey()
Solution by combineByKey()
What is a Monoid?
Monoid Examples
Non-Monoid Examples
Movie Problem
Input Data Set to Analyze
Ratings Data File Structure (ratings.csv)
Solution Using aggregateByKey()
Results
How does aggregateByKey() work?
PySpark Solution using aggregateByKey()
Step 1: Read Data and Create Pairs
Step 2: Use aggregateByKey() to Sum Up Ratings
Step 3: Find Average Rating
Complete PySpark Solution by groupByKey()
PySpark Solution using groupByKey()
Step 1: Read Data and Create Pairs
Step 2: Use groupByKey() to Group Ratings
Step 3: Find Average Rating
Shuffle Step in Reductions
Shuffle Step for groupByKey()
Shuffle Step for reduceByKey()
Complete PySpark Solution using reduceByKey()
Step 1: Read Data and Create Pairs
Step 2: Use reduceByKey() to Sum up Ratings
Step 3: Find Average Rating
Complete PySpark Solution using combineByKey()
PySpark Solution using combineByKey()
Step 1: Read Data and Create Pairs
Step 2: Use combineByKey() to Sum up Ratings
Step 3: Find Average Rating
Comparison of Reductions
Summary
2. Data Design Patterns
InMapper Combining
Basic MapReduce Design Pattern
InMapper Combining Per Record
InMapper Combiner Per Partition
Top-10
Top-N Formalized
PySpark Solution
Implementation in PySpark
How to Find Bottom-10
MinMax
Solution-1: Classic MapReduce
Solution-2: Sorting
Solution-3: Spark’s mapPartitions()
Solution-3 Input
PySpark Solution
The Composite Pattern and Monoids
Composite Pattern
Monoids
Definition of Monoid
How to form a Monoid?
Monoidic and Non-Monoidic Examples
Non-Commutative Example
Median over Set of Integers
Concatenation over Lists
Union and Intersection over Integers
Matrix Example
Not a Monoid Example
Monoid Example
PySpark Implementation of Monodized Mean
Input
PySpark Solution
Conclusion on Using Monoids
Functors and Monoids
Map-Side Join
Efficient Joins using Bloom filters
Bloom filter
A Simple Bloom Filter Example
Bloom Filter in Python
Using Bloom Filter in PySpark
Summary
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Data Algorithms with Spark
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset