Repartitioning

After an RDD is loaded, the number of partitions of the RDD can be adjusted using a transformation called repartition:

JavaRDD<String> textFile = jsc.textFile("test.txt",3);
System.out.println("Before repartition:"+textFile.getNumPartitions());
JavaRDD<String> textFileRepartitioned = textFile.repartition(4);
System.out.println("After repartition:"+textFileRepartitioned.getNumPartitions());

As per the preceding code example, the number of partitions before repartioning will be three and will increase to four after repartitioning. Repartition works for compressed files as well. So the number of partitions of a gzip file can be increased after reading using the repartition function:

JavaRDD<String> textFile = jsc.textFile("test.gz",3);
System.out.println("Before repartition:"+textFile.getNumPartitions());
JavaRDD<String> textFileRepartitioned = textFile.repartition(4);
System.out.println("After repartition:"+textFileRepartitioned.getNumPartitions());

In this case, the number of partitions before repartition will be one, and after repartition will be four. Shuffling can occur while repartitioning as new partitions may get distributed over different nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.199