Optimizing ZooKeeper

As mentioned in Chapter 1, Understanding the HBase Ecosystem, ZooKeeper provides distributed synchronization and group service to HBase. It is one of the necessities of HBase, and hence, we have to optimize it. Use the following setting for optimization:

<property>
  <name>zookeeper.session.timeout</name>
  <value>3000</value>
</property>

The default value for this setting is 3 minutes. This decides how often master should check for server crashes. We can decrease it so that the server crashes can be noticed quickly, but if this value decreases, we need to take care of GC. In the case of full GC, the server might not respond while running fine, and it might be reported as crashed. This configuration can be overridden in the hbase-site.xml file.

This value should be increased if there is a timeout while writing to the HBase cluster. If it is too small, and while writing huge amounts of data to HBase, GC happens, resulting in the pause of server responses, and hence timeout, this is due to improper JVM tuning. If JVM is tuned correctly, we can keep this value lower for more responsiveness.

The number of ZooKeeper instances must always be an odd number (we already discussed the reason behind this). Try to configure a higher number of ZooKeepers. So, for nodes around 20, we should have five to seven ZooKeepers. You can increase the number later, according to the sizing of the cluster.

Enable the ZooKeeper data directory at a safe location, and not the default HBase temp, so that the logs and data can be checked in the case of some failure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.62