In this book, we have always emphasized that Impala uses the Hive metastore as a catalog only. While Hive uses MapReduce to process its queries, MapReduce takes charge of distributing the queries and then returning results back to Hive. Impala uses its own daemons running on one or many or all DataNodes and performs query process tasks. There are a few key topics where Impala and Hive are very different, and I have noted some of them in the following section.
Key differences between Impala and Hive
Impala performs in-memoryquery processing while Hive does not
Hive use MapReduce to process queries, while Impala uses its own processing engine
Hive can be extended using User Defined Functions (UDF) or writing a custom Serializer/Deserializer (SerDes); however, Impala does not support extensibility as Hive does for now
Impala depends on Hive to function, while Hive does not depend on any other application and just needs the core Hadoop platform (HDFS and MapReduce)
Impala queries are subsets of HiveQL, which means that almost every Impala query (with a few limitation) can run in Hive. But vice-versa is not true because some of the HiveQL features supported in Hive are not supported in Impala