saveAsObjectFile

RDD elements are converted string objects while reading and writing files using the saveAsTextFile() method and hence we lose important metadata information about the datatype of the object being written or read from a file. In order to store and retrieve the data and metadata information about its data type saveAsObjectFile() can be used. It uses Java Serialization to store the data on filesystems and similarly the data can be read using the objectFile() method of SaprkContext:

Java 7:

//saveAsObjectFile()
JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3,4,5),3);
intRDD.saveAsObjectFile("ObjectFileDir");
JavaRDD<Integer> objectRDD= sparkContext.objectFile("ObjectFileDir");
objectRDD.foreach(newVoidFunction<Integer>() {
@Override
public void call(Integer x) throws Exception
{
System.out.println("The elements read from ObjectFileDir are :"+x);
}
});

Java 8:

JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3,4,5),3);
intRDD.saveAsObjectFile("ObjectFileDir");
JavaRDD<Integer> objectRDD= sparkContext.objectFile("ObjectFileDir");
objectRDD.foreach(x->System.out.println("The elements read from ObjectFileDir are :"+x));

This section covered the RDD operation action and its significance in Spark job evaluation. We also discussed commonly used actions and their usage and implication. The next section covers RDD caching and how RDD caching can improve the performance of Spark Job.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.3.175