Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Choosing an appropriate file format and compression type for better performance

Impala is used to process large amounts of data stored in your Hadoop cluster. There is no limitation in Hadoop about what type of data can be stored; however, to improve data access performance in Hadoop, some file types and compression provide better results than others. Impala can query most of the popular structured and unstructured file formats available in Hadoop along with compression used in a file. Here is a list of the supported file formats and compression types in Impala:

File type	File format	Compression type
Text	Unstructured	LZO
Avro	Structured	GZIP, BZIP2, deflate, Snappy
RCFile	Structured	GZIP, BZIP2, deflate, Snappy
SequenceFile	Structured	GZIP, BZIP2, deflate, Snappy
Parquet	Structured	GZIP, Snappy (Default)

Now let's take a look at how choosing a proper file format can improve performance in Impala:

Sometimes the original file format in which data is stored does not provide the required performance. The possible solution here is to create a new table with a different file format or compression, and then use the INSERT statement to perform a one-time conversion. This new table will provide comparatively better performance if you have chosen a new format or compression carefully.
Processing data, which is compressed, requires disk I/O and CPU cycles to read and uncompress. However, if data were uncompressed, only the disk I/O would comprise the primary cost during processing. So if the application architecture supports processing, uncompressed data does expedite the performance. With uncompressed data storage, you will end up taking lots of space on the disk compared to compressed data. So, you will need to take storage cost into consideration with performance gain.
Sometimes, changing the file format or compression does not yield any performance gain; rather it slows down the processing comparatively. In this scenario, just using the original file and compression format is fine. So, the lesson here is to understand the file and compression formats properly and then choose them to derive better performance.
Tip
Chapter 7, Advanced Impala Concepts, has more information about various file formats and compression types and how to use them in Impala.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Choosing an appropriate file format and compression type for better performance

Create new playlist

Sign In

Sign Up

Choosing an appropriate file format and compression type for better performance

Tip

Table of Contents for
Choosing an appropriate file format and compression type for better performance