Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Understanding columnar storage

With Apache Spark V2.0, columnar storage was introduced. Many on-disk technologies, such as parquet, or relational databases, such as IBM DB2 BLU or dashDB, support it. So it was an obvious choice to add this to Apache Spark as well. So what is it all about? Consider the following figure:

If we now transpose, we get the following column-based layout:

In contrast to row-based layouts, where fields of individual records are memory-aligned close together, in columnar storage values from similar columns of different records are residing close together in memory. This changes performance significantly. Not only can columnar data such as parquet be read faster by an order of magnitude, columnar storage also benefits when it comes to indexing individual columns or projection operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.12.76.164

Table of Contents for Understanding columnar storage

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding columnar storage