Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Creating an RDD using a Python list

The following is a very simple example:

nums = parallelize([1, 2, 3, 4])

If I just want to make an RDD out of a plain old Python list, I can call the parallelize() function in Spark. That will convert a list of stuff, in this case, just the numbers, 1, 2, 3, 4, into an RDD object called nums.

That is the simplest case of creating an RDD, just from a hard-coded list of stuff. That list could come from anywhere; it doesn't have to be hard-coded either, but that kind of defeats the purpose of big data. I mean, if I have to load the entire Dataset into memory before I can create an RDD from it, what's the point?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

18.117.74.41

Table of Contents for Creating an RDD using a Python list

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating an RDD using a Python list