Impala and Hive

In this book, we have always emphasized that Impala uses the Hive metastore as a catalog only. While Hive uses MapReduce to process its queries, MapReduce takes charge of distributing the queries and then returning results back to Hive. Impala uses its own daemons running on one or many or all DataNodes and performs query process tasks. There are a few key topics where Impala and Hive are very different, and I have noted some of them in the following section.

Key differences between Impala and Hive

  • Impala performs in-memory query processing while Hive does not
  • Hive use MapReduce to process queries, while Impala uses its own processing engine
  • Hive can be extended using User Defined Functions (UDF) or writing a custom Serializer/Deserializer (SerDes); however, Impala does not support extensibility as Hive does for now
  • Impala depends on Hive to function, while Hive does not depend on any other application and just needs the core Hadoop platform (HDFS and MapReduce)
  • Impala queries are subsets of HiveQL, which means that almost every Impala query (with a few limitation) can run in Hive. But vice-versa is not true because some of the HiveQL features supported in Hive are not supported in Impala
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.71.28