Troubleshooting the most frequent HBase errors and their explanations

The following are the places that index information about Hadoop/HBase and other project exceptions, and where we can search for information about Hadoop/HBase errors:

Now, let's see the frequent errors and solutions for these:

For troubleshooting, a log is an excellent place to look into. Now, let's see the default log locations of various daemon processes:

  • NameNode: <hadoop home path>/logs/hadoop-<user>-namenode-<hostname>.log
  • DataNode: <hadoop home path>/logs/hadoop-<user>-datanode-<hostname>.log
  • JobTracker: <hadoop home path>/logs/hadoop-<user>-jobtracker-<hostname>.log
  • TaskTracker: <hadoop home path>/logs/hadoop-<user>-tasktracker-<hostname>.log
  • HMaster: <hadoop home path>/logs/hbase-<user>-master-<hostname>.log
  • RegionServer: <hadoop home path>/logs/hbase-<user>-regionserver-<hostname>.log

Note

Also, look in /var/log/<rest are same> for logs of different HBase components.

The following are the different logging levels we can set in order to change the size. According to the details in logs, we require:

  • ALL
  • TRACE
  • DEBUG
  • INFO
  • WARN
  • ERROR
  • OFF

What might fail in cluster

Different Java versions on cluster machines can cause problems. Different versions of Hadoop and HBase cause problems too.

The following are the components that might fail in operation, which we can look into while debugging:

  • Disk: Corrupt disk
  • Operating System: Bugs, wrong optimization parameters, and over utilization on hardware
  • Network: Connectivity and bandwidth chocking
  • Memory: Bad memory and overloaded memory

Monitoring HBase health

In this section, we will see the various methods for administrators to monitor and manage HBase.

HBase web UI

There are two tools under this category:

  • Master web interface
  • RegionServer web interface

Master

http://<hbase-master>:<port> is the hostname where HMaster is running and port is 60010 for older version and 16010 for newer versions (0.98 and above).

RegionServer

http://<hbase-regionserver>:<port> is the hostname where RegionServers are running and port is 60030 for older version and 16030 for newer versions (0.98 and above).

ZooKeeper command line

The ZooKeeper shell can be started as follows:

hbase zkcli -server host:port <cmd><args>

The arguments we can have in the preceding command are:

  • connect host:port
  • get path [watch]
  • ls path [watch]
  • set path data [version]
  • delquota [-n|-b] path
  • quit
  • printwatches on|off
  • create [-s] [-e] path data acl
  • stat path [watch]
  • close
  • ls2 path [watch]
  • history
  • listquota path
  • setAcl path acl
  • getAcl path
  • sync path
  • redo cmdno
  • addauth scheme auth
  • delete path [version]
  • setquota -n|-b val path

For help, type in just a command name without any parameter.

Linux tools

The following are the Linux tools that we can use:

  • top: This is the Linux command to see live processes and resource uses
  • free -m: This is used to see memory uses
  • jps: This command is used to see the Java running process. This binary is in the Java bin directory
  • tail/head: This is used to see the content of log files
  • ps –ef|grep Java: This is used to see HBase running daemons
  • jstack: This prints Java stack traces of Java threads for a given Java process, core file, or remote debug server

Set up OpenTSDB to monitor HBase more closely using the information given at http://opentsdb.net/setup-hbase.html.

For Cloudera distribution, the Cloudera manager can be used for monitoring and administration.

Now, let's see some exceptions and solutions:

Exceptions

Solution

java.io.IOException: Call to /<host name> failed on local exception: java.io.EOFException org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at org.apache.hadoop.ipc.Client.call(Client.java:1107)

This is used to add/replace the hadoop-core.jar file from Hadoop, being used in the HBase lib directory.

FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.java.lang.IllegalArgumentException: 13955@<hostname>
      at ***
INFO org.apache.hadoop.hbase.master.HMaster: Aborting
INFO org.apache.zookeeper.ClientCnxn: EventThread shut down

This is used to add/replace the commons-lang-*.jar file from Hadoop, being used in the HBase lib directory.

ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration

This is used to add/replace the commons-configuration-*.jar file from Hadoop, being used in the HBase lib directory.

ERROR org.apache.hadoop.hbase.master.HMaster: Cannot start master
Caused by: java.net.ConnectException: Call to <hostname>/<ipaddress> failed on connection exception: java.net.ConnectException: Connection refused

This is used to remove the localhost and 127.0.1.1 entry from the /etc/hosts file.

ScannerTimeoutException or UnknownScannerException

This is used to reduce the setCaching value, which might be an option.

Master starts, but RegionServers do not

Master believes RegionServers have an IP of 127.0.0.1, which is a local host and resolves to master's local host. RegionServers erroneously inform the master that their IP addresses are 127.0.0.1.

It changes the 127.0.0.1 entry to <hostname>.

java.io.IOException.(Too many open files)

This increases ulimit and nproc.

xceiverCount exceeds

This increases the value in the dfs.datanode.max.transfer.threads property.

java.lang.OutOfMemoryError: unable to create new native thread in exceptions

This increases ulimit and nproc

RegionServer lease timeouts

This tunes up GC/check whether NTP is installed and configured or not.

No live nodes contain current block and/or YouAreDeadException

These errors can occur either when running out of OS file handles or in periods of severe network problems where the nodes are unreachable. Check for nproc and ulimit.

ZooKeeper SessionExpired events

Increase the zookeeper.session.timeout and hbase.zookeeper.property.tickTime parameters.

Visit http://hbase.apache.org/book/trouble.html for more exceptions and the latest error documentations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.172.132