Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

repartition

repartition applies a transformation function to input partitions to repartition the input into fewer or more output partitions in the output RDD.

As shown in the following code snippet, this is how we can map an RDD of a text file to an RDD with more partitions:

scala> val rdd_two = sc.textFile("wiki1.txt")
rdd_two: org.apache.spark.rdd.RDD[String] = wiki1.txt MapPartitionsRDD[8] at textFile at <console>:24

scala> rdd_two.partitions.length
res21: Int = 2

scala> val rdd_three = rdd_two.repartition(5)
rdd_three: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[25] at repartition at <console>:26

scala> rdd_three.partitions.length
res23: Int = 5

The following diagram explains how repartition works. You can see that a new RDD is created from the original RDD, essentially redistributing the partitions by combining/splitting them as needed:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.144.237.122

Table of Contents for repartition

Create new playlist

Sign In

Sign Up

Table of Contents for
repartition