Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix A. Technology Behind Impala and Integration with Third-party Applications

In the last seven chapters, I described the various traits of Impala, and I believe that you have learned those details as well. Now it is time to finish the book by adding a few more details, which will help you understand the true potential of Impala.

Technology behind Impala

The technology behind Impala is revolutionary and inspired by a Google research project named Dremel. Dremel is a scalable ad hoc query-based analysis system for read-only nested data. Dremel-based implementations can run aggregation queries over trillions of rows in seconds by combining multilevel executing trees and columnar data layout. It does not use MapReduce as the core; instead it complements MapReduce. Impala is considered to be a native Massive Parallel Processing query engine running on Apache Hadoop. Depending on the type of query and configuration, Impala excels in data processing performance over traditional database applications on Hadoop, such as Hive, and processing frameworks, such as MapReduce, due to the following key reasons:

Distributed, scalable aggregation algorithms.
Specialized hardware configuration, such as reducing CPU load, which increases aggregate I/O bandwidth.
Using the columnar binary storage format on Hadoop, which adds speed to query processing. This is done by taking advantage of Parquet file types as an input source.
Impala extends its reach beyond Dremel and provides support for various other popular file formats, making its availability and reach beyond Parquet to multifold users.
Impala uses the available memory on a machine as a table cache, which mean queries always process the data that is available in the cache, making processing super fast by speeding their execution up to 90 times faster than conventional processing when data is read from a disk.

You can learn more on Google Dremel by referring to a research paper at the following URL:

http://research.google.com/pubs/pub36632.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for A. Technology Behind Impala and Integration with Third-party Applications

Create new playlist

Sign In

Sign Up

Appendix A. Technology Behind Impala and Integration with Third-party Applications

Technology behind Impala

Table of Contents for
A. Technology Behind Impala and Integration with Third-party Applications