SequenceFileRDD

SequenceFileRDD is created from a SequenceFile which is a format of files in the Hadoop File System. The SequenceFile can be compressed or uncompressed.

Map Reduce processes can use SequenceFiles, which are pairs of Keys and Values. Key and Value are of Hadoop writable datatypes, such as Text, IntWritable, and so on.

The following is an example of a SequenceFileRDD, which shows how we can write and read SequenceFile:

scala> val pairRDD = statesPopulationRDD.map(record => (record.split(",")(0), record.split(",")(2)))
pairRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[60] at map at <console>:27

scala> pairRDD.saveAsSequenceFile("seqfile")

scala> val seqRDD = sc.sequenceFile[String, String]("seqfile")
seqRDD: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[62] at sequenceFile at <console>:25

scala> seqRDD.take(10)
res76: Array[(String, String)] = Array((State,Population), (Alabama,4785492), (Alaska,714031), (Arizona,6408312), (Arkansas,2921995), (California,37332685), (Colorado,5048644), (Delaware,899816), (District of Columbia,605183), (Florida,18849098))

The following is a diagram of SequenceFileRDD as seen in the preceding example:

Table of Contents for SequenceFileRDD

Create new playlist

Sign In

Sign Up

Table of Contents for
SequenceFileRDD