Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Serialized RDD storage

As discussed already, despite other types of memory tuning, when your objects are too large to fit in the main memory or disk efficiently, a simpler and better way of reducing memory usage is storing them in a serialized form.

This can be done using the serialized storage levels in the RDD persistence API, such as MEMORY_ONLY_SER. For more information, refer to the previous section on memory management and start exploring available options.

If you specify using MEMORY_ONLY_SER, Spark will then store each RDD partition as one large byte array. However, the only downside of this approach is that it can slow down data access times. This is reasonable and obvious too; fairly speaking, there's no way to avoid it since each object needs to deserialize on the fly back while reusing.

As discussed previously, we highly recommend using Kryo serialization instead of Java serialization to make data access a bit faster.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.149.235.63

Table of Contents for Serialized RDD storage

Create new playlist

Sign In

Sign Up

Table of Contents for
Serialized RDD storage