Summary

In this chapter, we have covered Impala administration and performance improvement using various methods including Cloudera Manager. We discussed Impala High Availability, which mainly depends on Hadoop NameNode High Availability. We studied methods such as enabling block location tracking, native checksumming, and short-circuit read, that help us read data quickly in the Hadoop cluster to improve Impala performance. We also discussed how various types of file and compression formats help us to improve performance and, if not chosen wisely, the file format or compression could drag down the data processing performance. We also discussed gaining higher query execution performance by modifying the query in such as way that its processing is expedited. As most of these topics require a great deal of background information, having them here in this book as a reference will definitely help you to understand them and use them to improve your Impala cluster performance.

The next chapter is all about troubleshooting Impala when experiencing problems. We will extend our knowledge by learning how to find the root cause of various problems in the Impala cluster and resolve them quickly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.123.106