Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Avoiding Shuffle and Reducing Operational Expenses

In this chapter, we will learn how to avoid shuffle and reduce the operational expense of our jobs, along with detecting a shuffle in a process. We will then test operations that cause a shuffle in Apache Spark to find out when we should be very careful and which operations we should avoid. Next, we will learn how to change the design of jobs with wide dependencies. After that, we will be using the keyBy() operations to reduce shuffle and, in the last section of this chapter, we'll see how we can use custom partitioning to reduce the shuffle of our data.

In this chapter, we will cover the following topics:

Detecting a shuffle in a process
Testing operations that cause a shuffle in Apache Spark
Changing the design of jobs with wide dependencies
Using keyBy() operations to reduce shuffle
Using the custom partitioner to reduce shuffle

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.22.216.254

Table of Contents for Avoiding Shuffle and Reducing Operational Expenses

Create new playlist

Sign In

Sign Up

Table of Contents for
Avoiding Shuffle and Reducing Operational Expenses