EXPLAIN statement

Hive provides an EXPLAIN statement to return a query execution plan without running the query. We can use it to analyze queries if we have concerns about their performance. The EXPLAIN statement helps us to see the difference between two or more queries for the same purpose. The syntax for it is as follows:

EXPLAIN [FORMATTED|EXTENDED|DEPENDENCY|AUTHORIZATION] hql_query

The following keywords can be used:

  • FORMATTED: This provides a formatted JSON version of the query plan.
  • EXTENDED: This provides additional information for the operators in the plan, such as file pathname.
  • DEPENDENCY: This provides a JSON format output that contains a list of tables and partitions that the query depends on. It has been available since Hive v0.10.0
  • AUTHORIZATION: This lists all entities needed to be authorized, including input and output to run the query, and authorization failure, if any. It has been available since Hive v0.14.0.

A typical query plan contains the following three sections. We will also have a look at an example later:

  • Abstract Syntax Tree (AST): Hive uses a parser generator called ANTLR (see http://www.antlr.org/) to automatically generate a tree syntax for HQL
  • Stage Dependencies: This lists all dependencies and the number of stages used to run the query
  • Stage Plans: It contains important information, such as operators and sort orders, for running the job

The following is what a typical query plan looks like. From the following example, we can see that the AST section is shown as a Map/Reduce operator tree. In the STAGE DEPENDENCIES section, both Stage-0 and Stage-1 are independent root stages. In the STAGE PLANS section, Stage-1 has one map and reduce referred to by the Map Operator Tree and Reduce Operator Tree. Inside each Map/Reduce Operator Tree section, all operators corresponding to the query keywords, as well as expressions and aggregations, are listed. The Stage-0 stage does not have map and reduce. It is just a Fetch operation:

> EXPLAIN SELECT gender_age.gender, count(*) 
> FROM employee_partitioned WHERE year=2018
> GROUP BY gender_age.gender LIMIT 2;

+----------------------------------------------------------------------+
| Explain |
+----------------------------------------------------------------------+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: employee_partitioned |
| Pruned Column Paths: gender_age.gender |
| Statistics: |
| Num rows: 4 Data size: 223 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: gender_age.gender (type: string) |
| outputColumnNames: _col0 |
| Statistics: |
| Num rows: 4 Data size: 223 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: count() |
| keys: _col0 (type: string) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: |
| Num rows: 4 Data size: 223 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: |
| Num rows: 4 Data size: 223 Basic stats: COMPLETE Column stats: NONE |
| TopN Hash Memory Usage: 0.1 |
| value expressions: _col1 (type: bigint) |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0) |
| keys: KEY._col0 (type: string) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: |
| Num rows: 2 Data size: 111 Basic stats: COMPLETE Column stats: NONE |
| Limit |
| Number of rows: 2 |
| Statistics: |
| Num rows: 2 Data size: 110 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: |
| Num rows: 2 Data size: 110 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: |
| org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: |
| org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: 2 |
| Processor Tree: |
| ListSink |
+----------------------------------------------------------------------+
53 rows selected (0.232 seconds)

Both the Ambari Hive view and the Hue Hive editor have built-in visualized query explain when running a query. The Ambari Hive view visual the preceding query as follows:

Ambari Hive view visual explaination
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.105.15