Wide Dependencies

When an RDD can be derived from one or more RDDs by transferring data over the wire or exchanging data to repartition or redistribute the data using functions, such as aggregateByKey, reduceByKey and so on, then the child RDD is said to depend on the parent RDDs participating in a shuffle operation. This dependency is known as a Wide dependency as the data cannot be transformed on the same node as the one containing the original RDD/parent RDD partition thus requiring data transfer over the wire between other executors.

Wide dependencies introduce new stages in the job execution.

The following diagram is an illustration of how wide dependency transforms one RDD to another RDD shuffling data between executors:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.184.142