coalesce

coalesce applies a transformation function to input partitions to combine the input partitions into fewer partitions in the output RDD.

As shown in the following code snippet, this is how we can combine all partitions to a single partition:

scala> val rdd_two = sc.textFile("wiki1.txt")
rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24

scala> rdd_two.partitions.length
res21: Int = 2

scala> val rdd_three = rdd_two.coalesce(1)
rdd_three: org.apache.spark.rdd.RDD[String] = CoalescedRDD[21] at coalesce at <console>:26

scala> rdd_three.partitions.length
res22: Int = 1

The following diagram explains how coalesce works. You can see that a new RDD is created from the original RDD essentially reducing the number of partitions by combining them as needed:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.3.72