aggregate

Apart from reduce and fold functions Spark provides another method for aggregation of the RDD element called the aggregate function. The distinctive features of the aggregate function is that the return type of the function need not be of the same type as are the elements of RDD being operated upon.

An aggregate function has three input parameters, the first parameter being the zero value, which acts as an identity element of the data type in which the resultant aggregate of the RDD will be returned as an output. The second parameter is an accumulator function that defines the aggregation logic within each partition of the RDD and the third parameter is the function combining the output of each accumulator that worked on each partition of the RDD:

Java 7:

JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3,4,5),3);
System.out.println("The no of partitions are ::"+rdd.getNumPartitions());
Function2<String,Integer,String> agg=newFunction2<String, Integer, String>()
{
@Override
public String call(String v1, Integer v2) throws Exception {
return v1+v2;
}
};
Function2<String,String,String> combineAgg=newFunction2<String, String, String>() {
@Override
public String call(String v1, String v2) throws Exception {
return v1+v2;
}
};
String result= rdd.aggregate("X", agg, combineAgg);
System.out.println("The aggerate value is ::"+result);

Java 8:

JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3,4,5),3);
System.out.println("The no of partitions are ::"+rdd.getNumPartitions());
String result= rdd.aggregate("X", (x,y)->x+y, (x,z)->x+z);
System.out.println("The aggerate value is ::"+result);

It is to be noted that the zero value gets utilized once for each partition while accumulating the data from them and then once while combining all these accumulated output results. The output of the example has four x where three x represents the three partitions of the RDD while the fourth one represents the aggregation of partial outputs of various partitions. One can also notice that while the RDD elements were of Integer type the output of the aggregate function is a String.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.233.205