Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

HadoopRDD

HadoopRDD provides core functionality for reading data stored in HDFS using the MapReduce API from the Hadoop 1.x libraries. HadoopRDD is the default used and can be seen when loading data from any file system into an RDD:

class HadoopRDD[K, V] extends RDD[(K, V)]

When loading the state population records from the CSV, the underlying base RDD is actually HadoopRDD as in the following code snippet:

scala> val statesPopulationRDD = sc.textFile("statesPopulation.csv")
statesPopulationRDD: org.apache.spark.rdd.RDD[String] = statesPopulation.csv MapPartitionsRDD[93] at textFile at <console>:25

scala> statesPopulationRDD.toDebugString
res110: String =
(2) statesPopulation.csv MapPartitionsRDD[93] at textFile at <console>:25 []
 | statesPopulation.csv HadoopRDD[92] at textFile at <console>:25 []

The following diagram is an illustration of a HadoopRDD created by loading a textfile from the file system into an RDD:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.145.112.210

Table of Contents for HadoopRDD

Create new playlist

Sign In

Sign Up

Table of Contents for
HadoopRDD