Map

In a map transformation there is one-to-one mapping between elements of the source RDD and the target RDD. A function is executed on every element of source RDD and target RDD is created, which represents results of the executed function.

The following is an example of a map transformation, which increments every element of the RDD of integers by 1:

We will start with creating JavaSparkContext as follows:

SparkConf conf = new SparkConf().setMaster( "local" ).setAppName(
"ApacheSparkForJavaDevelopers" );
JavaSparkContext javaSparkContext = new JavaSparkContext( conf );

Note that setMaster should point to a master URL for distributed clusters. However, setting to local specifies that the Spark job will run in single JVM. It is useful to run jobs inside the IDE, (for example, Eclipse) for debugging purposes.

Now, we will create an RDD of integers:

List<Integer> intList = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
JavaRDD<Integer> intRDD = javaSparkContext .parallelize( intList , 2);

Once created, a map transformation on the RDD to increment every element by one can be executed as follows:

Java 7:
intRDD .map( new Function<Integer, Integer>() {
@Override
public Integer call(Integer x ) throws Exception {
return x + 1;
}
});
Java 8:
intRDD.map(x -> x + 1);
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.6