Preface

The changing landscape of Big Data and tools created for a relevant understanding of it have become very crucial in today's tech industry. The ability to understand and familarize with such tools allow individuals to creatively and intelligently take decisions with precision. If you've always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, Cloudera Impala is, hands down, the top choice for you. Cloudera Impala provides a way to ingest various formats of data stored on Hadoop and provides a query engine to process it for gaining extremely important insight.

In this book, Learning Cloudera Impala, you are going to learn everything you need to know about Cloudera Impala so that you can start your project. The book covers Cloudera Impala from installation, administration, and query processing, all the way up to connectivity with other third-party applications. With this book in your hand, you will find yourself empowered to play with your data in Hadoop, and getting insight from your data will look like an interesting game to you.

What this book covers

Chapter 1, Getting Started with Impala, covers information on Impala, its core components, and its inner workings in details. We will cover the Impala execution architecture, including daemon and statestore, and how they interact together with the other components. Impala metadata and metastore are also discussed here to explain how Impala maintains its information. Finally, we will study various ways to interface Impala.

Chapter 2, The Impala Shell Commands and Interface, explains the various command options to interact with Impala, mainly using command-line references. In this chapter, we have covered the Impala command-line interface, explaining various ways Impala shell can connect to Impala daemon. Once the connection between Impala shell and impalad is established, we can use the various commands we discussed to connect to Impala.

Chapter 3, The Impala Query Language and Built-in Functions, teaches us how to make great use of Impala shell to interact with data by using the Impala Query Language, which is based on SQL, while providing a great degree of compatibility with HiveQL. Hive statements are based on SQL statements, and because Impala statements are based on SQL, we will learn several similarities and differences between them. Along with the Impala Query Language, we will also learn various Impala built-in functions using great examples.

Chapter 4, Impala Walkthrough with an Example, covers most of the learning from the previous chapter in detail. This way you can see a real-world scenario used with Impala and understand how and where to use Impala statements in real-world applications. I have created this detailed example by first creating automobile-specific datasets, and then using most of the SQL statements with the built-in functions we discussed in the previous chapter.

Chapter 5, Impala Administration and Performance Improvements, covers two important topics, Impala administration and performance improvements. Within the Impala administration section, I will first show you how you can administer Impala using Cloudera Manager. After that, I will teach you how to verify Impala-specific information for its correctness using a debugging web server. We will see Impala logs and Impala daemons through the statestore UI. The next part of Impala admin is about Impala High Availability, where we will learn the key traits for keeping Impala running in the event of a problem.

Chapter 6, Troubleshooting Impala, teaches you how to troubleshoot various Impala issues in different categories. Besides troubleshooting, in the latter part, I will show you how to utilize Impala logging to learn more about Impala execution, query processing, and possible issues. My objective is to provide you with some critical information on troubleshooting and log analysis, so you can manage the Impala cluster effectively and make it useful for yourself and your team.

Chapter 7, Advanced Impala Concepts, teaches you more about Impala; however, this information is more advance in nature to help you excel in data processing your project through Impala. I have described how Impala works side by side with MapReduce, without using it in the same cluster. I have also explained why Impala has an edge over Hive, even when using Hive as a key component, on which Impala is dependent. Finally, we cover details on using HBase with Impala and processing various Big Data input files on Hadoop with Impala.

Appendix, Technology Behind Impala and Integration with Third-party Applications, covers the detailed technology behind Impala and real-time query concepts with Impala. I have also described a few third-party data visualization applications, from Tableau, Zoomdata, and Microsoft Excel to Microstrategy, which connect with Impala to provide effective data visualization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.97.170