Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

coalesce

A coalesce transformation in Spark is used to reduce the number of partitions. If the partition count is too high, a user may need to reduce the number the partitions before running join operations. Joining two RDDs of n and m sizes respectively requires n*m tasks. So, decreasing the partition count before joining RDDs can be a useful decision if the size of the partition (number of elements per partition) is small.

Repartition transformation can also be used to reduce the number of partitions of an RDD. However, repartition requires a full shuffle of data. coalesce, on the other hand, avoids full shuffle.

For example, an RDD has x number of partitions and the user performs coalesce to decrease the number of partitions to (x-n), then only data of n partitions (x-(x-n)) will be shuffled. This minimizes the data movement amongst nodes.

The coalesce transformation can be performed as follows:

pairRDD.coalesce(2)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.136.233.153

Table of Contents for coalesce

Create new playlist

Sign In

Sign Up

Table of Contents for
coalesce