Home Page Icon
Home Page
Table of Contents for
Links and references
Close
Links and references
by Holden Karau
Fast Data Processing with Spark
Fast Data Processing with Spark
Table of Contents
Fast Data Processing with Spark
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Disclaimer
Errata
Piracy
Questions
1. Installing Spark and Setting Up Your Cluster
Running Spark on a single machine
Running Spark on EC2
Running Spark on EC2 with the scripts
Deploying Spark on Elastic MapReduce
Deploying Spark with Chef (opscode)
Deploying Spark on Mesos
Deploying Spark on YARN
Deploying set of machines over SSH
Links and references
Summary
2. Using the Spark Shell
Loading a simple text file
Using the Spark shell to run logistic regression
Interactively loading data from S3
Summary
3. Building and Running a Spark Application
Building your Spark project with sbt
Building your Spark job with Maven
Building your Spark job with something else
Summary
4. Creating a SparkContext
Scala
Java
Shared Java and Scala APIs
Python
Links and references
Summary
5. Loading and Saving Data in Spark
RDDs
Loading data into an RDD
Saving your data
Links and references
Summary
6. Manipulating Your RDD
Manipulating your RDD in Scala and Java
Scala RDD functions
Functions for joining PairRDD functions
Other PairRDD functions
DoubleRDD functions
General RDD functions
Java RDD functions
Spark Java function classes
Common Java RDD functions
Methods for combining JavaPairRDD functions
JavaPairRDD functions
Manipulating your RDD in Python
Standard RDD functions
PairRDD functions
Links and references
Summary
7. Shark – Using Spark with Hive
Why Hive/Shark?
Installing Shark
Running Shark
Loading data
Using Hive queries in a Spark program
Links and references
Summary
8. Testing
Testing in Java and Scala
Refactoring your code for testability
Testing interactions with SparkContext
Testing in Python
Links and references
Summary
9. Tips and Tricks
Where to find logs?
Concurrency limitations
Memory usage and garbage collection
Serialization
IDE integration
Using Spark with other languages
A quick note on security
Mailing lists
Links and references
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Deploying set of machines over SSH
Next
Next Chapter
Summary
Links and references
Some of the
useful links are as follows:
http://archive09.linux.com/feature/151340
http://spark-project.org/docs/latest/spark-standalone.html
https://github.com/mesos/spark/blob/master/core/src/main/scala/spark/deploy/worker/WorkerArguments.scala
http://bickson.blogspot.com/2012/10/deploying-graphlabsparkmesos-cluster-on.html
http://www.ibm.com/developerworks/library/os-spark/
http://mesos.apache.org/
http://aws.amazon.com/articles/Elastic-MapReduce/4926593393724923
http://spark-project.org/docs/latest/ec2-scripts.html
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset