SequenceFileRDD

SequenceFileRDD is created from a SequenceFile which is a format of files in the Hadoop File System. The SequenceFile can be compressed or uncompressed.

Map Reduce processes can use SequenceFiles, which are pairs of Keys and Values. Key and Value are of Hadoop writable datatypes, such as Text, IntWritable, and so on.

The following is an example of a SequenceFileRDD, which shows how we can write and read SequenceFile:

scala> val pairRDD = statesPopulationRDD.map(record => (record.split(",")(0), record.split(",")(2)))
pairRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[60] at map at <console>:27

scala> pairRDD.saveAsSequenceFile("seqfile")

scala> val seqRDD = sc.sequenceFile[String, String]("seqfile")
seqRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[62] at sequenceFile at <console>:25

scala> seqRDD.take(10)
res76: Array[(String, String)] = Array((State,Population), (Alabama,4785492), (Alaska,714031), (Arizona,6408312), (Arkansas,2921995), (California,37332685), (Colorado,5048644), (Delaware,899816), (District of Columbia,605183), (Florida,18849098))

The following is a diagram of SequenceFileRDD as seen in the preceding example:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.187.207