Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Partitioner

As already stated, for any shuffle operation on a Pair RDD, the destination partition for a key is decided by the partitioner used during the shuffle. The partitioner used during the shuffling plays a major role in deciding the amount of data shuffling.

The basic requirement of a partitioner is that similar keys should land in the same partition. Apache Spark provides a transformation called partitionBy which helps to partition an RDD at any point as per the user's choice. Also, as stated previously, the user can also provide a partitioner of its own choice during the pair RDD transformation that required shuffling, like reduceByKey, aggregateByKey, foldByKey, and so on.

There are two built-in types of partitioner available in Spark. Spark also provides an option to specify a custom partitioner. The following are the partitioner implementations available in Spark:

Hash Partitioner
Range Partitioner

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.116.14.245

Table of Contents for Partitioner

Create new playlist

Sign In

Sign Up

Table of Contents for
Partitioner