Testing query performance

Most user time is spent writing and executing queries in Impala. To understand if your Impala cluster is performing optimally, you usually measure query execution time before and after fine-tuning the Impala cluster or your query. The difference between both measurements explains if you have achieved any positive improvements. Let's learn how to measure query execution time precisely to make proper judgments.

Benchmarking queries

When processing terabytes of data from multiple nodes, a query runs for a long time. If you are printing a query output for a console, the time to render the query output on the console is still part of the query execution. It is suggested that you disable the query output on the console by using the –B option with the query. This is because you can get the closest execution time. The other option is to save query results in a file using the –o option.

Verifying data locality

We have repeatedly seen that to achieve maximum performance with Impala, the query must be distributed on every node in the cluster. You can design a query to be executed on all the nodes in the cluster; however, how can you check if the query actually ran on all nodes? We are going to find the answer to this question in this section.

To find out if a query is executed on all nodes, you will have to dig inside the Impala logs. Make sure you have Impala logging enabled and, after executing the query, open the logs either on an editor or using Cloudera Manager or Navigator. In the logs, if you find the following line, it means the query is not distributed and it is not running on other nodes:

Total remote scan volume = 0

You can search for the presence of remote scan in the log files and, based on its occurrence, you can troubleshoot this problem on your Impala cluster. More information related to troubleshooting this problem is explained in Chapter 6, Troubleshooting Impala.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.105.114