Single point of failure in Impala

The best way to start this section is that there is no single point of failure in Impala, meaning every and all Impala daemons are capable of executing incoming queries. A specific node failure will impact only those query segments that were distributed on the affected machine because one single query is distributed across multiple nodes. In this situation, re-execution of the same query will allow the system to recover from the problem. For Hadoop cluster stability, it is suggested to run various Impala components on DataNode. Running Impala on NameNode is not suggested because in an unfortunate event, Impala on NameNode could cause overall NameNode failure, which ultimately could impact Hadoop cluster stability. Running Impala on DataNode means as long as the Hadoop cluster is up and running smoothly, the Impala cluster will function well, even if there is an issue with failure of a single or a few DataNodes. Also, if NameNode is highly available, the Impala cluster will be highly available as well.

One thing to remember on the same account is that Impala has dependency on statestore, which runs only on a single machine. If statestore is not available, it will not bring Impala to a complete shutdown; however, it does impact its operation and query distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.0.85