Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Impala and Hive

In this book, we have always emphasized that Impala uses the Hive metastore as a catalog only. While Hive uses MapReduce to process its queries, MapReduce takes charge of distributing the queries and then returning results back to Hive. Impala uses its own daemons running on one or many or all DataNodes and performs query process tasks. There are a few key topics where Impala and Hive are very different, and I have noted some of them in the following section.

Key differences between Impala and Hive

Impala performs in-memory query processing while Hive does not
Hive use MapReduce to process queries, while Impala uses its own processing engine
Hive can be extended using User Defined Functions (UDF) or writing a custom Serializer/Deserializer (SerDes); however, Impala does not support extensibility as Hive does for now
Impala depends on Hive to function, while Hive does not depend on any other application and just needs the core Hadoop platform (HDFS and MapReduce)
Impala queries are subsets of HiveQL, which means that almost every Impala query (with a few limitation) can run in Hive. But vice-versa is not true because some of the HiveQL features supported in Hive are not supported in Impala

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Impala and Hive

Create new playlist

Sign In

Sign Up

Impala and Hive

Key differences between Impala and Hive

Table of Contents for
Impala and Hive