Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Query-specific SQL statements in Impala

Now, we will spend some time in understanding the query-specific SQL statements used in Impala. Most of these statements are exactly the same as they are defined in SQL, so to learn more, I would suggest you to look at any SQL reference documentation. Here, I am covering some key information for reference purposes:

EXPLAIN: Using the EXPLAIN clause, we can learn the execution plan of a SQL statement by understanding low-level mechanisms that Impala will use to read and process the data in the whole cluster, and then finally show the results. You can use the EXPLAIN clause ahead of a SELECT statement as shown in the following example:

[Hadoop.testdomain:21000] > EXPLAIN SELECT * FROM  list;
Explain query: select * from list
PLAN FRAGMENT 0
  PARTITION: UNPARTITIONED

  1:EXCHANGE
     tuple ids: 0 

PLAN FRAGMENT 1
  PARTITION: RANDOM

  STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED

  0:SCAN HDFS
     table=default.list #partitions=0 size=0B
     tuple ids: 0
REFRESH table_name;

REFRESH: In a multimode environment, the data files reside on multiple DataNodes while the Impala shell is interacting with the Impala daemon (which acts as the data coordinator for all other nodes). Data files can be updated on other nodes without any update event or information to the coordinator. In this situation, using the REFRESH clause with the table name loads the latest metadata and block location of the data files for a particular table.
Please refer to Chapter 2, The Impala Shell Commands and Interface, to understand more on how REFRESH works and why it is so important to use.
JOIN: The JOIN clause is used in SQL statements to select data from two or more tables and then return the result set containing items from some of all of those tables, depending on the conditions applied. The JOIN query result set is filtered by including the corresponding join column names in the ON clause or by comparison operators referencing columns from both tables in the WHERE clause. To improve JOIN performance, here are some suggestions:
- It is advisable to perform the JOIN operation on the biggest table first and then smaller tables
- Join subsequent tables depending on which table has the most selective filter
Note
The JOIN clause itself is very detailed, so I have introduced it here only for reference; however, I would suggest you study some SQL documentation on JOIN to learn more about it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Query-specific SQL statements in Impala

Create new playlist

Sign In

Sign Up

Query-specific SQL statements in Impala

Note

Table of Contents for
Query-specific SQL statements in Impala