Fold

Just like a reduce() function, the fold() function too aggregates the value of RDD elements by passing each element to the aggregate function and computing a resultant value of the same data type. However, the fold() function also requires a zero value that is utilized while initializing the call to the partitions for the RDD. Depending on the type of aggregate function the zero value should ideally change , such as while adding the values of RDD elements zero value can be 0, similarly while multiplying it can be 1 and while concatenating a string it could be a blank space. Mathematically, the zero value should be the identity element of the aggregate function being computed:

Java 7:

//fold()
JavaRDD<Integer> intRDD = sparkContext.parallelize(Arrays.asList(1,4,3));
Integer foldInt=intRDD.fold(0, newFunction2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
return v1+v2;
}
});
System.out.println("The sum of all the elements of RDD using fold is "+foldInt);

Java 8:

JavaRDD<Integer> intRDD = sparkContext.parallelize(Arrays.asList(1,4,3));
Integer foldInt=intRDD.fold(0, (a,b)-> a+b);
System.out.println("The sum of all the elements of RDD using fold is "+foldInt);
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.134.130