Chapter 5. Impala Administration and Performance Improvements

After going through all the examples in the previous chapter, I am sure you are able to process data through Impala queries. Now you will have questions about how to improve query performance, and this is one of the two key objectives of this chapter. The other objective is to show effective management of our Impala cluster that will keep it up and running.

In this chapter, we will cover two important topics: Impala administration and performance improvements. Within the Impala administration section, I will show you how you can administer Impala using Cloudera Manager. After that, using debug web server, I will teach you to verify Impala-specific information for its correctness. We will see Impala logs and daemons using the statestore UI. The next part of the Impala admin is about Impala High Availability. We will learn key traits of how to keep Impala going in the event of a problem.

In the Improving performance section, we will cover various ways to improve and tune query performance. We will learn to test Impala queries to understand if they are performing well or not and, if not, what you can do to improve their performance—either fine-tune the cluster or modify the query statement or its execution. Finally, let's start with Impala administration.

Impala administration

We have already discussed in previous chapters that you can install and run Impala with or without Cloudera Manager; however, for simplicity, it is good to have Cloudera Manager manage your Impala cluster. This will help you spend your crucial time working with data transformation rather than cluster administration. In this chapter, I will assume that you are managing your Impala cluster using Cloudera Manager and provide more information based on that assumption.

Administration with Cloudera Manager

While describing Cloudera Manager in detail is beyond the scope of this book, I will try to provide some guidance to you so you can use Cloudera Manager to administer Impala. Once the Cloudera Manager web-based user interface is in front of you, just select Service impala1 from the Services list, and then you have multiple ways to start, stop, and restart both the Impala daemon(s) and statestore service directly from there. You can also change the Impala configuration, view log files, manage Impala nodes, and troubleshoot some of the problems just by opening the Impala debugging interface.

In the next few screenshots, let's see how you can use the Cloudera Manager web-based user interface to manage Impala:

Administration with Cloudera Manager

In the preceding screenshot, you can see the list of Impala daemons and statestore services running that can be managed. In the Queries tab, you can search the SQL statement directly from the Impala web interface and look at various graphs and charts to understand query performance.

In the following screenshot, you can learn configuring Impala auditing features with Impala 1.1.x and above. This configuration helps you to input an auditing scheme based on Username, Role, and Host Ip Address and, based on that, you can analyze the logs directly on the web or download them for further processing.

Administration with Cloudera Manager

I have detailed Impala logging in Chapter 6, Troubleshooting Impala. However, because in this chapter we are talking about Impala administration, it is appropriate to inform you that you can use Cloudera Manager to configure Impala logging based on Impala services on various nodes and then read those logs at your convenience, as shown in the following screenshot:

Administration with Cloudera Manager

The Impala statestore UI

When the Impala server is running with Cloudera Manager, you can open the Impala debugging web interface at port 25000, as shown in the following screenshot. You can also verify the Impala configuration on the /varz page and logs on the /logs page, query metrics on /metrics, and do many other things.

The Impala statestore UI
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.221.144