The following are the places that index information about Hadoop/HBase and other project exceptions, and where we can search for information about Hadoop/HBase errors:
Now, let's see the frequent errors and solutions for these:
For troubleshooting, a log is an excellent place to look into. Now, let's see the default log locations of various daemon processes:
<hadoop home path>/logs/hadoop-<user>-namenode-<hostname>.log
<hadoop home path>/logs/hadoop-<user>-datanode-<hostname>.log
<hadoop home path>/logs/hadoop-<user>-jobtracker-<hostname>.log
<hadoop home path>/logs/hadoop-<user>-tasktracker-<hostname>.log
<hadoop home path>/logs/hbase-<user>-master-<hostname>.log
<hadoop home path>/logs/hbase-<user>-regionserver-<hostname>.log
The following are the different logging levels we can set in order to change the size. According to the details in logs, we require:
ALL
TRACE
DEBUG
INFO
WARN
ERROR
OFF
Different Java versions on cluster machines can cause problems. Different versions of Hadoop and HBase cause problems too.
The following are the components that might fail in operation, which we can look into while debugging:
In this section, we will see the various methods for administrators to monitor and manage HBase.
There are two tools under this category:
The ZooKeeper shell can be started as follows:
hbase zkcli -server host:port <cmd><args>
The arguments we can have in the preceding command are:
connect host:port
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create [-s] [-e] path data acl
stat path [watch]
close
ls2 path [watch]
history
listquota path
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
setquota -n|-b val path
For help, type in just a command name without any parameter.
The following are the Linux tools that we can use:
top
: This is the Linux command to see live processes and resource usesfree -m
: This is used to see memory usesjps
: This command is used to see the Java running process. This binary is in the Java bin
directorytail
/head
: This is used to see the content of log filesps –ef|grep Java
: This is used to see HBase running daemonsjstack
: This prints Java stack traces of Java threads for a given Java process, core file, or remote debug serverSet up OpenTSDB to monitor HBase more closely using the information given at http://opentsdb.net/setup-hbase.html.
For Cloudera distribution, the Cloudera manager can be used for monitoring and administration.
Now, let's see some exceptions and solutions:
Exceptions |
Solution |
---|---|
java.io.IOException: Call to /<host name> failed on local exception: java.io.EOFException org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at org.apache.hadoop.ipc.Client.call(Client.java:1107)
|
This is used to add/replace the |
FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.java.lang.IllegalArgumentException: 13955@<hostname> at *** INFO org.apache.hadoop.hbase.master.HMaster: Aborting INFO org.apache.zookeeper.ClientCnxn: EventThread shut down |
This is used to add/replace the |
ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
|
This is used to add/replace the |
ERROR org.apache.hadoop.hbase.master.HMaster: Cannot start master Caused by: java.net.ConnectException: Call to <hostname>/<ipaddress> failed on connection exception: java.net.ConnectException: Connection refused |
This is used to remove the localhost and |
ScannerTimeoutException or UnknownScannerException
|
This is used to reduce the |
Master starts, but RegionServers do not |
Master believes RegionServers have an IP of It changes the |
java.io.IOException.(Too many open files)
|
This increases |
xceiverCount exceeds
|
This increases the value in the |
java.lang.OutOfMemoryError: unable to create new native thread in exceptions
|
This increases |
RegionServer lease timeouts
|
This tunes up GC/check whether NTP is installed and configured or not. |
|
These errors can occur either when running out of OS file handles or in periods of severe network problems where the nodes are unreachable. Check for |
ZooKeeper SessionExpired events
|
Increase the |
Visit http://hbase.apache.org/book/trouble.html for more exceptions and the latest error documentations. |
18.223.172.132