Chapter 2. Installation

In this chapter, we will look at how HBase is installed and initially configured. We will see how HBase can be used from the command line for basic operations, such as adding, retrieving, and deleting data.

Note

All of the following assumes you have the Java Runtime Environment (JRE) installed. Hadoop and also HBase require at least version 1.6 (also called Java 6), and the recommended choice is the one provided by Oracle (formerly by Sun), which can be found at http://www.java.com/download/. If you do not have Java already or are running into issues using it, please see Java.

Quick-Start Guide

Let us get started with the “tl;dr” section of this book: you want to know how to run HBase and you want to know it now! Nothing is easier than that because all you have to do is download the most recent release of HBase from the Apache HBase release page and unpack the contents into a suitable directory, such as /usr/local or /opt, like so:

$ cd /usr/local
$ tar -zxvf hbase-x.y.z.tar.gz

With that in place, we can start HBase and try our first interaction with it. We will use the interactive shell to enter the status command at the prompt (complete the command by pressing the Return key):

$ cd /usr/local/hbase-0.91.0-SNAPSHOT
$ bin/start-hbase.sh
starting master, logging to 
/usr/local/hbase-0.91.0-SNAPSHOT/bin/../logs/hbase-<username>-master-localhost.out
$ bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.91.0-SNAPSHOT, r1130916, Sat Jul 23 12:44:34 CEST 2011 
    
hbase(main):001:0> status
1 servers, 0 dead, 2.0000 average load

This confirms that HBase is up and running, so we will now issue a few commands to show that we can put data into it and retrieve the same data subsequently.

Note

It may not be clear, but what we are doing right now is similar to sitting in a car with its brakes engaged and in neutral while turning the ignition key. There is much more that you need to configure and understand before you can use HBase in a production-like environment. But it lets you get started with some basic HBase commands and become familiar with top-level concepts.

We are currently running in the so-called Standalone Mode. We will look into the available modes later on (see Run Modes), but for now it’s important to know that in this mode everything is run in a single Java process and all files are stored in /tmp by default—unless you did heed the important advice given earlier to change it to something different. Many people have lost their test data during a reboot, only to learn that they kept the default path. Once it is deleted by the OS, there is no going back!

Let us now create a simple table and add a few rows with some data:

hbase(main):002:0> create 'testtable', 'colfam1'
0 row(s) in 0.2930 seconds

      hbase(main):003:0> list 'testtable'
TABLE
testtable
1 row(s) in 0.0520 seconds

hbase(main):004:0> put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
0 row(s) in 0.1020 seconds

hbase(main):005:0> put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
0 row(s) in 0.0410 seconds

hbase(main):006:0> put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'
0 row(s) in 0.0380 seconds

After we create the table with one column family, we verify that it actually exists by issuing a list command. You can see how it outputs the testtable name as the only table currently known. Subsequently, we are putting data into a number of rows. If you read the example carefully, you can see that we are adding data to two different rows with the keys myrow-1 and myrow-2. As we discussed in Chapter 1, we have one column family named colfam1, and can add an arbitrary qualifier to form actual columns, here colfam1:q1, colfam1:q2, and colfam1:q3.

Next we want to check if the data we added can be retrieved. We are using a scan operation to do so:

hbase(main):007:0> scan 'testtable'
ROW              COLUMN+CELL
 myrow-1          column=colfam1:q1, timestamp=1297345476469, value=value-1
 myrow-2          column=colfam1:q2, timestamp=1297345495663, value=value-2
 myrow-2          column=colfam1:q3, timestamp=1297345508999, value=value-3
 
2 row(s) in 0.1100 seconds

You can observe how HBase is printing the data in a cell-oriented way by outputting each column separately. It prints out myrow-2 twice, as expected, and shows the actual value for each column next to it.

If we want to get exactly one row back, we can also use the get command. It has many more options, which we will look at later, but for now simply try the following:

hbase(main):008:0> get 'testtable', 'myrow-1'
COLUMN           CELL
colfam1:q1        timestamp=1297345476469, value=value-1

1 row(s) in 0.0480 seconds

What is missing in our basic set of operations is to delete a value. Again, delete offers many options, but for now we just delete one specific cell and check that it is gone:

hbase(main):009:0> delete 'testtable', 'myrow-2', 'colfam1:q2'
0 row(s) in 0.0390 seconds

hbase(main):010:0> scan 'testtable'                       
ROW              COLUMN+CELL
 myrow-1          column=colfam1:q1, timestamp=1297345476469, value=value-1
 myrow-2          column=colfam1:q3, timestamp=1297345508999, value=value-3

2 row(s) in 0.0620 seconds

Before we conclude this simple exercise, we have to clean up by first disabling and then dropping the test table:

hbase(main):011:0> disable 'testtable'
0 row(s) in 2.1250 seconds

 hbase(main):012:0> drop 'testtable'   
0 row(s) in 1.2780 seconds

Finally, we close the shell by means of the exit command and return to our command-line prompt:

hbase(main):013:0> exit
$ _

The last thing to do is stop HBase on our local system. We do this by running the stop-hbase.sh script:

$ bin/stop-hbase.sh 
stopping hbase.....

That is all there is to it. We have successfully created a table, added, retrieved, and deleted data, and eventually dropped the table using the HBase Shell.

Requirements

Not all of the following requirements are needed for specific run modes HBase supports. For purely local testing, you only need Java, as mentioned in Quick-Start Guide.

Hardware

It is difficult to specify a particular server type that is recommended for HBase. In fact, the opposite is more appropriate, as HBase runs on many, very different hardware configurations. The usual description is commodity hardware. But what does that mean?

For starters, we are not talking about desktop PCs, but server-grade machines. Given that HBase is written in Java, you at least need support for a current Java Runtime, and since the majority of the memory needed per region server is for internal structures—for example, the memstores and the block cache—you will have to install a 64-bit operating system to be able to address enough memory, that is, more than 4 GB.

In practice, a lot of HBase setups are collocated with Hadoop, to make use of locality using HDFS as well as MapReduce. This can significantly reduce the required network I/O and boost processing speeds. Running Hadoop and HBase on the same server results in at least three Java processes running (data node, task tracker, and region server) and may spike to much higher numbers when executing MapReduce jobs. All of these processes need a minimum amount of memory, disk, and CPU resources to run sufficiently.

Note

It is assumed that you have a reasonably good understanding of Hadoop, since it is used as the backing store for HBase in all known production systems (as of this writing). If you are completely new to HBase and Hadoop, it is recommended that you get familiar with Hadoop first, even on a very basic level. For example, read the recommended Hadoop: The Definitive Guide (Second Edition) by Tom White (O’Reilly), and set up a working HDFS and MapReduce cluster.

Giving all the available memory to the Java processes is also not a good idea, as most operating systems need some spare resources to work more effectively—for example, disk I/O buffers maintained by Linux kernels. HBase indirectly takes advantage of this because the already local disk I/O, given that you collocate the systems on the same server, will perform even better when the OS can keep its own block cache.

We can separate the requirements into two categories: servers and networking. We will look at the server hardware first and then into the requirements for the networking setup subsequently.

Servers

In HBase and Hadoop there are two types of machines: masters (the HDFS NameNode, the MapReduce JobTracker, and the HBase Master) and slaves (the HDFS DataNodes, the MapReduce TaskTrackers, and the HBase RegionServers). They do benefit from slightly different hardware specifications when possible. It is also quite common to use exactly the same hardware for both (out of convenience), but the master does not need that much storage, so it makes sense to not add too many disks. And since the masters are also more important than the slaves, you could beef them up with redundant hardware components. We will address the differences between the two where necessary.

Since Java runs in user land, you can run it on top of every operating system that supports a Java Runtime—though there are recommended ones, and those where it does not run without user intervention (more on this in Operating system). It allows you to select from a wide variety of vendors, or even build your own hardware. It comes down to more generic requirements like the following:

CPU

It makes no sense to run three or more Java processes, plus the services provided by the operating system itself, on single-core CPU machines. For production use, it is typical that you use multicore processors.[27] Quad-core are state of the art and affordable, while hexa-core processors are also becoming more popular. Most server hardware supports more than one CPU so that you can use two quad-core CPUs for a total of eight cores. This allows for each basic Java process to run on its own core while the background tasks like Java garbage collection can be executed in parallel. In addition, there is hyperthreading, which adds to their overall performance.

As far as CPU is concerned, you should spec the master and slave machines the same.

Node typeRecommendation
MasterDual quad-core CPUs, 2.0-2.5 GHz
SlaveDual quad-core CPUs, 2.0-2.5 GHz
Memory

The question really is: is there too much memory? In theory, no, but in practice, it has been empirically determined that when using Java you should not set the amount of memory given to a single process too high. Memory (called heap in Java terms) can start to get fragmented, and in a worst-case scenario, the entire heap would need rewriting—this is similar to the well-known disk fragmentation, but it cannot run in the background. The Java Runtime pauses all processing to clean up the mess, which can lead to quite a few problems (more on this later). The larger you have set the heap, the longer this process will take. Processes that do not need a lot of memory should only be given their required amount to avoid this scenario, but with the region servers and their block cache there is, in theory, no upper limit. You need to find a sweet spot depending on your access pattern.

Warning

At the time of this writing, setting the heap of the region servers to larger than 16 GB is considered dangerous. Once a stop-the-world garbage collection is required, it simply takes too long to rewrite the fragmented heap. Your server could be considered dead by the master and be removed from the working set.

This may change sometime as this is ultimately bound to the Java Runtime Environment used, and there is development going on to implement JREs that do not stop the running Java processes when performing garbage collections.

Table 2-1 shows a very basic distribution of memory to specific processes. Please note that this is an example only and highly depends on the size of your cluster and how much data you put in, but also on your access pattern, such as interactive access only or a combination of interactive and batch use (using MapReduce).

Table 2-1. Exemplary memory allocation per Java process for a cluster with 800 TB of raw disk storage space
ProcessHeapDescription
NameNode8 GBAbout 1 GB of heap for every 100 TB of raw data stored, or per every million files/inodes
SecondaryNameNode8 GBApplies the edits in memory, and therefore needs about the same amount as the NameNode
JobTracker2 GBModerate requirements
HBase Master4 GBUsually lightly loaded, moderate requirements only
DataNode1 GBModerate requirements
TaskTracker1 GBModerate requirements
HBase RegionServer12 GBMajority of available memory, while leaving enough room for the operating system (for the buffer cache), and for the Task Attempt processes
Task Attempts1 GB (ea.)Multiply by the maximum number you allow for each
ZooKeeper1 GBModerate requirements

An exemplary setup could be as such: for the master machine, running the NameNode, SecondaryNameNode, JobTracker, and HBase Master, 24 GB of memory; and for the slaves, running the DataNodes, TaskTrackers, and HBase RegionServers, 24 GB or more.

Node typeRecommendation
Master24 GB
Slave24 GB (and up)

Note

It is recommended that you optimize your RAM for the memory channel width of your server. For example, when using dual-channel memory, each machine should be configured with pairs of DIMMs. With triple-channel memory, each server should have triplets of DIMMs. This could mean that a server has 18 GB (9 × 2GB) of RAM instead of 16 GB (4 × 4GB).

Also make sure that not just the server’s motherboard supports this feature, but also your CPU: some CPUs only support dual-channel memory, and therefore, even if you put in triple-channel DIIMMs, they will only be used in dual-channel mode.

Disks

The data is stored on the slave machines, and therefore it is those servers that need plenty of capacity. Depending on whether you are more read/write- or processing-oriented, you need to balance the number of disks with the number of CPU cores available. Typically, you should have at least one core per disk, so in an eight-core server, adding six disks is good, but adding more might not be giving you optimal performance.

Some consideration should be given regarding the type of drives—for example, 2.5” versus 3.5” drives or SATA versus SAS. In general, SATA drives are recommended over SAS since they are more cost-effective, and since the nodes are all redundantly storing replicas of the data across multiple servers, you can safely use the more affordable disks. On the other hand, 3.5” disks are more reliable compared to 2.5” disks, but depending on the server chassis you may need to go with the latter.

The disk capacity is usually 1 TB per disk, but you can also use 2 TB drives if necessary. Using from six to 12 high-density servers with 1 TB to 2 TB drives is good, as you get a lot of storage capacity and the JBOD setup with enough cores can saturate the disk bandwidth nicely.

Node typeRecommendation
Master4 × 1 TB SATA, RAID 0+1 (2 TB usable)
Slave6 × 1 TB SATA, JBOD
Chassis

The actual server chassis is not that crucial, as most servers in a specific price bracket provide very similar features. It is often better to shy away from special hardware that offers proprietary functionality and opt for generic servers so that they can be easily combined over time as you extend the capacity of the cluster.

As far as networking is concerned, it is recommended that you use a two-port Gigabit Ethernet card—or two channel-bonded cards. If you already have support for 10 Gigabit Ethernet or InfiniBand, you should use it.

For the slave servers, a single power supply unit (PSU) is sufficient, but for the master node you should use redundant PSUs, such as the optional dual PSUs available for many servers.

In terms of density, it is advisable to select server hardware that fits into a low number of rack units (abbreviated as “U”). Typically, 1U or 2U servers are used in 19” racks or cabinets. A consideration while choosing the size is how many disks they can hold and their power consumption. Usually a 1U server is limited to a lower number of disks or forces you to use 2.5” disks to get the capacity you want.

Node typeRecommendation
MasterGigabit Ethernet, dual PSU, 1U or 2U
SlaveGigabit Ethernet, single PSU, 1U or 2U

Networking

In a data center, servers are typically mounted into 19” racks or cabinets with 40U or more in height. You could fit up to 40 machines (although with half-depth servers, some companies have up to 80 machines in a single rack, 40 machines on either side) and link them together with a top-of-rack (ToR) switch. Given the Gigabit speed per server, you need to ensure that the ToR switch is fast enough to handle the throughput these servers can create. Often the backplane of a switch cannot handle all ports at line rate or is oversubscribed—in other words, promising you something in theory it cannot do in reality.

Switches often have 24 or 48 ports, and with the aforementioned channel-bonding or two-port cards, you need to size the networking large enough to provide enough bandwidth. Installing 40 1U servers would need 80 network ports; so, in practice, you may need a staggered setup where you use multiple rack switches and then aggregate to a much larger core aggregation switch (CaS). This results in a two-tier architecture, where the distribution is handled by the ToR switch and the aggregation by the CaS.

While we cannot address all the considerations for large-scale setups, we can still notice that this is a common design pattern. Given that the operations team is part of the planning, and it is known how much data is going to be stored and how many clients are expected to read and write concurrently, this involves basic math to compute the number of servers needed—which also drives the networking considerations.

When users have reported issues with HBase on the public mailing list or on other channels, especially regarding slower-than-expected I/O performance bulk inserting huge amounts of data, it became clear that networking was either the main or a contributing issue. This ranges from misconfigured or faulty network interface cards (NICs) to completely oversubscribed switches in the I/O path. Please make sure that you verify every component in the cluster to avoid sudden operational problems—the kind that could have been avoided by sizing the hardware appropriately.

Finally, given the current status of built-in security in Hadoop and HBase, it is common for the entire cluster to be located in its own network, possibly protected by a firewall to control access to the few required, client-facing ports.

Software

After considering the hardware and purchasing the server machines, it’s time to consider software. This can range from the operating system itself to filesystem choices and configuration of various auxiliary services.

Note

Most of the requirements listed are independent of HBase and have to be applied on a very low, operational level. You may have to advise with your administrator to get everything applied and verified.

Operating system

Recommending an operating system (OS) is a tough call, especially in the open source realm. In terms of the past two to three years, it seems there is a preference for using Linux with HBase. In fact, Hadoop and HBase are inherently designed to work with Linux, or any other Unix-like system, or with Unix. While you are free to run either one on a different OS as long as it supports Java—for example, Windows—they have only been tested with Unix-like systems. The supplied start and stop scripts, for example, expect a command-line shell as provided by Linux or Unix.

Within the Unix and Unix-like group you can also differentiate between those that are free (as in they cost no money) and those you have to pay for. Again, both will work and your choice is often limited by company-wide regulations. Here is a short list of operating systems that are commonly found as a basis for HBase clusters:

CentOS

CentOS is a community-supported, free software operating system, based on Red Hat Enterprise Linux (as RHEL). It mirrors RHEL in terms of functionality, features, and package release levels as it is using the source code packages Red Hat provides for its own enterprise product to create CentOS-branded counterparts. Like RHEL, it provides the packages in RPM format.

It is also focused on enterprise usage, and therefore does not adopt new features or newer versions of existing packages too quickly. The goal is to provide an OS that can be rolled out across a large-scale infrastructure while not having to deal with short-term gains of small, incremental package updates.

Fedora

Fedora is also a community-supported, free and open source operating system, and is sponsored by Red Hat. But compared to RHEL and CentOS, it is more a playground for new technologies and strives to advance new ideas and features. Because of that, it has a much shorter life cycle compared to enterprise-oriented products. An average maintenance period for a Fedora release is around 13 months.

The fact that it is aimed at workstations and has been enhanced with many new features has made Fedora a quite popular choice, only beaten by more desktop-oriented operating systems.[31] For production use, you may want to take into account the reduced life cycle that counteracts the freshness of this distribution. You may also want to consider not using the latest Fedora release, but trailing by one version to be able to rely on some feedback from the community as far as stability and other issues are concerned.

Debian

Debian is another Linux-kernel-based OS that has software packages released as free and open source software. It can be used for desktop and server systems and has a conservative approach when it comes to package updates. Releases are only published after all included packages have been sufficiently tested and deemed stable.

As opposed to other distributions, Debian is not backed by a commercial entity, but rather is solely governed by its own project rules. It also uses its own packaging system that supports DEB packages only. Debian is known to run on many hardware platforms as well as having a very large repository of packages.

Ubuntu

Ubuntu is a Linux distribution based on Debian. It is distributed as free and open source software, and backed by Canonical Ltd., which is not charging for the OS but is selling technical support for Ubuntu.

The life cycle is split into a longer- and a shorter-term release. The long-term support (LTS) releases are supported for three years on the desktop and five years on the server. The packages are also DEB format and are based on the unstable branch of Debian: Ubuntu, in a sense, is for Debian what Fedora is for Red Hat Linux. Using Ubuntu as a server operating system is made more difficult as the update cycle for critical components is very frequent.

Solaris

Solaris is offered by Oracle, and is available for a limited number of architecture platforms. It is a descendant of Unix System V Release 4, and therefore, the most different OS in this list. Some of the source code is available as open source while the rest is closed source. Solaris is a commercial product and needs to be purchased. The commercial support for each release is maintained for 10 to 12 years.

Red Hat Enterprise Linux

Abbreviated as RHEL, Red Hat’s Linux distribution is aimed at commercial and enterprise-level customers. The OS is available as a server and a desktop version. The license comes with offerings for official support, training, and a certification program.

The package format for RHEL is called RPM (the Red Hat Package Manager), and it consists of the software packaged in the .rpm file format, and the package manager itself.

Being commercially supported and maintained, RHEL has a very long life cycle of 7 to 10 years.

Note

You have a choice when it comes to the operating system you are going to use on your servers. A sensible approach is to choose one you feel comfortable with and that fits into your existing infrastructure.

As for a recommendation, many production systems running HBase are on top of CentOS, or RHEL.

Filesystem

With the operating system selected, you will have a few choices of filesystems to use with your disks. There is not a lot of publicly available empirical data in regard to comparing different filesystems and their effect on HBase, though. The common systems in use are ext3, ext4, and XFS, but you may be able to use others as well. For some there are HBase users reporting on their findings, while for more exotic ones you would need to run enough tests before using it on your production cluster.

Note

Note that the selection of filesystems is for the HDFS data nodes. HBase is directly impacted when using HDFS as its backing store.

Here are some notes on the more commonly used filesystems:

ext3

One of the most ubiquitous filesystems on the Linux operating system is ext3 (see http://en.wikipedia.org/wiki/Ext3 for details). It has been proven stable and reliable, meaning it is a safe bet in terms of setting up your cluster with it. Being part of Linux since 2001, it has been steadily improved over time and has been the default filesystem for years.

There are a few optimizations you should keep in mind when using ext3. First, you should set the noatime option when mounting the filesystem to reduce the administrative overhead required for the kernel to keep the access time for each file. It is not needed or even used by HBase, and disabling it speeds up the disk’s read performance.

Note

Disabling the last access time gives you a performance boost and is a recommended optimization. Mount options are typically specified in a configuration file called /etc/fstab. Here is a Linux example line where the noatime option is specified:

/dev/sdd1  /data  ext3  defaults,noatime  0  0

Note that this also implies the nodiratime option.

Another optimization is to make better use of the disk space provided by ext3. By default, it reserves a specific number of bytes in blocks for situations where a disk fills up but crucial system processes need this space to continue to function. This is really useful for critical disks—for example, the one hosting the operating system—but it is less useful for the storage drives, and in a large enough cluster it can have a significant impact on available storage capacities.

Note

You can reduce the number of reserved blocks and gain more usable disk space by using the tune2fs command-line tool that comes with ext3 and Linux. By default, it is set to 5% but can safely be reduced to 1% (or even 0%) for the data drives. This is done with the following command:

tune2fs -m 1 <device-name>

Replace <device-name> with the disk you want to adjust—for example, /dev/sdd1. Do this for all disks on which you want to store data. The -m 1 defines the percentage, so use -m 0, for example, to set the reserved block count to zero.

A final word of caution: only do this for your data disk, NOT for the disk hosting the OS nor for any drive on the master node!

Yahoo! has publicly stated that it is using ext3 as its filesystem of choice on its large Hadoop cluster farm. This shows that, although it is by far not the most current or modern filesystem, it does very well in large clusters. In fact, you are more likely to saturate your I/O on other levels of the stack before reaching the limits of ext3.

The biggest drawback of ext3 is that during the bootstrap process of the servers it requires the largest amount of time. Formatting a disk with ext3 can take minutes to complete and may become a nuisance when spinning up machines dynamically on a regular basis—although that is not a very common practice.

ext4

The successor to ext3 is called ext4 (see http://en.wikipedia.org/wiki/Ext4 for details) and initially was based on the same code but was subsequently moved into its own project. It has been officially part of the Linux kernel since the end of 2008. To that extent, it has had only a few years to prove its stability and reliability. Nevertheless, Google has announced plans[32] to upgrade its storage infrastructure from ext2 to ext4. This can be considered a strong endorsement, but also shows the advantage of the extended filesystem (the ext in ext3, ext4, etc.) lineage to be upgradable in place. Choosing an entirely different filesystem like XFS would have made this impossible.

Performance-wise, ext4 does beat ext3 and allegedly comes close to the high-performance XFS. It also has many advanced features that allow it to store files up to 16 TB in size and support volumes up to 1 exabyte (i.e., 1018 bytes).

A more critical feature is the so-called delayed allocation, and it is recommended that you turn it off for Hadoop and HBase use. Delayed allocation keeps the data in memory and reserves the required number of blocks until the data is finally flushed to disk. It helps in keeping blocks for files together and can at times write the entire file into a contiguous set of blocks. This reduces fragmentation and improves performance when reading the file subsequently. On the other hand, it increases the possibility of data loss in case of a server crash.

XFS

XFS (see http://en.wikipedia.org/wiki/Xfs for details) became available on Linux at about the same time as ext3. It was originally developed by Silicon Graphics in 1993. Most Linux distributions today have XFS support included.

Its features are similar to those of ext4; for example, both have extents (grouping contiguous blocks together, reducing the number of blocks required to maintain per file) and the aforementioned delayed allocation.

A great advantage of XFS during bootstrapping a server is the fact that it formats the entire drive in virtually no time. This can significantly reduce the time required to provision new servers with many storage disks.

On the other hand, there are some drawbacks to using XFS. There is a known shortcoming in the design that impacts metadata operations, such as deleting a large number of files. The developers have picked up on the issue and applied various fixes to improve the situation. You will have to check how you use HBase to determine if this might affect you. For normal use, you should not have a problem with this limitation of XFS, as HBase operates on fewer but larger files.

ZFS

Introduced in 2005, ZFS (see http://en.wikipedia.org/wiki/ZFS for details) was developed by Sun Microsystems. The name is an abbreviation for zettabyte filesystem, as it has the ability to store 258 zettabytes (which, in turn, is 1021 bytes).

ZFS is primarily supported on Solaris and has advanced features that may be useful in combination with HBase. It has built-in compression support that could be used as a replacement for the pluggable compression codecs in HBase.

It seems that choosing a filesystem is analogous to choosing an operating system: pick one that you feel comfortable with and that fits into your existing infrastructure. Simply picking one over the other based on plain numbers is difficult without proper testing and comparison. If you have a choice, it seems to make sense to opt for a more modern system like ext4 or XFS, as sooner or later they will replace ext3 and are already much more scalable and perform better than their older sibling.

Warning

Installing different filesystems on a single server is not recommended. This can have adverse effects on performance as the kernel may have to split buffer caches to support the different filesystems. It has been reported that, for certain operating systems, this can have a devastating performance impact. Make sure you test this issue carefully if you have to mix filesystems.

Java

It was mentioned in the note that you do need Java for HBase. Not just any version of Java, but version 6, a.k.a. 1.6, or later. The recommended choice is the one provided by Oracle (formerly by Sun), which can be found at http://www.java.com/download/.

You also should make sure the java binary is executable and can be found on your path. Try entering java -version on the command line and verify that it works and that it prints out the version number indicating it is version 1.6 or later—for example, java version "1.6.0_22". You usually want the latest update level, but sometimes you may find unexpected problems (version 1.6.0_18, for example, is known to cause random JVM crashes) and it may be worth trying an older release to verify.

If you do not have Java on the command-line path or if HBase fails to start with a warning that it was not able to find it (see Example 2-1), edit the conf/hbase-env.sh file by commenting out the JAVA_HOME line and changing its value to where your Java is installed.

Example 2-1. Error message printed by HBase when no Java executable was found
+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+
          

Note

The supplied scripts try many default locations for Java, so there is a good chance HBase will find it automatically. If it does not, you most likely have no Java Runtime installed at all. Start with the download link provided at the beginning of this subsection and read the manuals of your operating system to find out how to install it.

Hadoop

Currently, HBase is bound to work only with the specific version of Hadoop it was built against. One of the reasons for this behavior concerns the remote procedure call (RPC) API between HBase and Hadoop. The wire protocol is versioned and needs to match up; even small differences can cause a broken communication between them.

The current version of HBase will only run on Hadoop 0.20.x. It will not run on Hadoop 0.21.x (nor 0.22.x) as of this writing. HBase may lose data in a catastrophic event unless it is running on an HDFS that has durable sync support. Hadoop 0.20.2 and Hadoop 0.20.203.0 do not have this support. Currently, only the branch-0.20-append branch has this attribute.[33] No official releases have been made from this branch up to now, so you will have to build your own Hadoop from the tip of this branch. Scroll down in the Hadoop How To Release to the “Build Requirements” section for instructions on how to build Hadoop.[34]

Another option, if you do not want to build your own version of Hadoop, is to use a distribution that has the patches already applied. You could use Cloudera’s CDH3. CDH has the 0.20-append patches needed to add a durable sync. We will discuss this in more detail in Cloudera’s Distribution Including Apache Hadoop.

Because HBase depends on Hadoop, it bundles an instance of the Hadoop JAR under its lib directory. The bundled Hadoop was made from the Apache branch-0.20-append branch at the time of HBase’s release. It is critical that the version of Hadoop that is in use on your cluster matches what is used by HBase. Replace the Hadoop JAR found in the HBase lib directory with the hadoop-xyz.jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the JAR on all servers in your cluster that run HBase. Version mismatch issues have various manifestations, but often the result is the same: HBase does not throw an error, but simply blocks indefinitely.

Note

The bundled JAR that ships with HBase is considered only for use in standalone mode.

A different approach is to install a vanilla Hadoop 0.20.2 and then replace the vanilla Hadoop JAR with the one supplied by HBase. This is not tested extensively but seems to work. Your mileage may vary.

Note

HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features—for example, CDH3—as long as you do as suggested in the preceding text and replace the Hadoop JAR that ships with HBase with the secure version.

SSH

Note that ssh must be installed and sshd must be running if you want to use the supplied scripts to manage remote Hadoop and HBase daemons. A commonly used software package providing these commands is OpenSSH, available from http://www.openssh.com/. Check with your operating system manuals first, as many OSes have mechanisms to install an already compiled binary release package as opposed to having to build it yourself. On a Ubuntu workstation, for example, you can use:

$ sudo apt-get install openssh-client

On the servers, you would install the matching server package:

$ sudo apt-get install openssh-server

You must be able to ssh to all nodes, including your local node, using passwordless login. You will need to have a public key pair—you can either use the one you already use (see the .ssh directory located in your home directory) or you will have to generate one—and add your public key on each server so that the scripts can access the remote servers without further intervention.

Note

The supplied shell scripts make use of SSH to send commands to each server in the cluster. It is strongly advised that you not use simple password authentication. Instead, you should use public key authentication—only!

When you create your key pair, also add a passphrase to protect your private key. To avoid the hassle of being asked for the passphrase for every single command sent to a remote server, it is recommended that you use ssh-agent, a helper that comes with SSH. It lets you enter the passphrase only once and then takes care of all subsequent requests to provide it.

Ideally, you would also use the agent forwarding that is built in to log in to other remote servers from your cluster nodes.

Domain Name Service

HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving should work. You can verify if the setup is correct for forward DNS lookups by running the following command:

$ ping -c 1 $(hostname)

You need to make sure that it reports the public IP address of the server and not the loopback address 127.0.0.1. A typical reason for this not to work concerns an incorrect /etc/hosts file, containing a mapping of the machine name to the loopback address.

If your machine has multiple interfaces, HBase will use the interface that the primary hostname resolves to. If this is insufficient, you can set hbase.regionserver.dns.interface (see Configuration for information on how to do this) to indicate the primary interface. This only works if your cluster configuration is consistent and every host has the same network interface configuration.

Another alternative is to set hbase.regionserver.dns.nameserver to choose a different name server than the system-wide default.

Synchronized time

The clocks on cluster nodes should be in basic alignment. Some skew is tolerable, but wild skew can generate odd behaviors. Even differences of only one minute can cause unexplainable behavior. Run NTP on your cluster, or an equivalent application, to synchronize the time on all servers.

If you are having problems querying data, or you are seeing weird behavior running cluster operations, check the system time!

File handles and process limits

HBase is a database, so it uses a lot of files at the same time. The default ulimit -n of 1024 on most Unix or other Unix-like systems is insufficient. Any significant amount of loading will lead to I/O errors stating the obvious: java.io.IOException: Too many open files. You may also notice errors such as the following:

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception
    in createBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
    block blk_-6935524980745310745_1391901

Note

These errors are usually found in the logfiles. See Analyzing the Logs for details on their location, and how to analyze their content.

You need to change the upper bound on the number of file descriptors. Set it to a number larger than 10,000. To be clear, upping the file descriptors for the user who is running the HBase process is an operating system configuration, not an HBase configuration. Also, a common mistake is that administrators will increase the file descriptors for a particular user but HBase is running with a different user account.

Note

You can estimate the number of required file handles roughly as follows. Per column family, there is at least one storage file, and possibly up to five or six if a region is under load; on average, though, there are three storage files per column family. To determine the number of required file handles, you multiply the number of column families by the number of regions per region server. For example, say you have a schema of 3 column families per region and you have 100 regions per region server. The JVM will open 3 × 3 × 100 storage files = 900 file descriptors, not counting open JAR files, configuration files, CRC32 files, and so on. Run lsof -p REGIONSERVER_PID to see the accurate number.

As the first line in its logs, HBase prints the ulimit it is seeing. Ensure that it’s correctly reporting the increased limit.[35] See Analyzing the Logs for details on how to find this information in the logs, as well as other details that can help you find—and solve—problems with an HBase setup.

You may also need to edit /etc/sysctl.conf and adjust the fs.file-max value. See this post on Server Fault for details.

You should also consider increasing the number of processes allowed by adjusting the nproc value in the same /etc/security/limits.conf file referenced earlier. With a low limit and a server under duress, you could see OutOfMemoryError exceptions, which will eventually cause the entire Java process to end. As with the file handles, you need to make sure this value is set for the appropriate user account running the process.

Datanode handlers

A Hadoop HDFS data node has an upper bound on the number of files that it will serve at any one time. The upper bound parameter is called xcievers (yes, this is misspelled). Again, before doing any loading, make sure you have configured Hadoop’s conf/hdfs-site.xml file, setting the xcievers value to at least the following:

<property>
  <name>dfs.datanode.max.xcievers</name>
  <value>4096</value>
</property>

Note

Be sure to restart your HDFS after making the preceding configuration changes.

Not having this configuration in place makes for strange-looking failures. Eventually, you will see a complaint in the datanode logs about the xcievers limit being exceeded, but on the run up to this one manifestation is a complaint about missing blocks. For example:

10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block 
    blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: 
    No live nodes contain current block. Will get new block locations from 
    namenode and retry...

Swappiness

You need to prevent your servers from running out of memory over time. We already discussed one way to do this: setting the heap sizes small enough that they give the operating system enough room for its own processes. Once you get close to the physically available memory, the OS starts to use the configured swap space. This is typically located on disk in its own partition and is used to page out processes and their allocated memory until it is needed again.

Swapping—while being a good thing on workstations—is something to be avoided at all costs on servers. Once the server starts swapping, performance is reduced significantly, up to a point where you may not even be able to log in to such a system because the remote access process (e.g., SSHD) is coming to a grinding halt.

HBase needs guaranteed CPU cycles and must obey certain freshness guarantees—for example, to renew the ZooKeeper sessions. It has been observed over and over again that swapping servers start to miss renewing their leases and are considered lost subsequently by the ZooKeeper ensemble. The regions on these servers are redeployed on other servers, which now take extra pressure and may fall into the same trap.

Even worse are scenarios where the swapping server wakes up and now needs to realize it is considered dead by the master node. It will report for duty as if nothing has happened and receive a YouAreDeadException in the process, telling it that it has missed its chance to continue, and therefore terminates itself. There are quite a few implicit issues with this scenario—for example, pending updates, which we will address later. Suffice it to say that this is not good.

You can tune down the swappiness of the server by adding this line to the /etc/sysctl.conf configuration file on Linux and Unix-like systems:

vm.swappiness=5

You can try values like 0 or 5 to reduce the system’s likelihood to use swap space.

Some more radical operators have turned off swapping completely (see swappoff on Linux), and would rather have their systems run “against the wall” than deal with swapping issues. Choose something you feel comfortable with, but make sure you keep an eye on this problem.

Finally, you may have to reboot the server for the changes to take effect, as a simple

sysctl -p

might not suffice. This obviously is for Unix-like systems and you will have to adjust this for your operating system.

Windows

HBase running on Windows has not been tested to a great extent. Running a production install of HBase on top of Windows is not recommended.

If you are running HBase on Windows, you must install Cygwin to have a Unix-like environment for the shell scripts. The full details are explained in the Windows Installation guide on the HBase website.

Filesystems for HBase

The most common filesystem used with HBase is HDFS. But you are not locked into HDFS because the FileSystem used by HBase has a pluggable architecture and can be used to replace HDFS with any other supported system. In fact, you could go as far as implementing your own filesystem—maybe even on top of another database. The possibilities are endless and waiting for the brave at heart.

Note

In this section, we are not talking about the low-level filesystems used by the operating system (see Filesystem for that), but the storage layer filesystems. These are abstractions that define higher-level features and APIs, which are then used by Hadoop to store the data. The data is eventually stored on a disk, at which point the OS filesystem is used.

HDFS is the most used and tested filesystem in production. Almost all production clusters use it as the underlying storage layer. It is proven stable and reliable, so deviating from it may impose its own risks and subsequent problems.

The primary reason HDFS is so popular is its built-in replication, fault tolerance, and scalability. Choosing a different filesystem should provide the same guarantees, as HBase implicitly assumes that data is stored in a reliable manner by the filesystem. It has no added means to replicate data or even maintain copies of its own storage files. This functionality must be provided by the lower-level system.

You can select a different filesystem implementation by using a URI[36] pattern, where the scheme (the part before the first “:”, i.e., the colon) part of the URI identifies the driver to be used. Figure 2-1 shows how the Hadoop filesystem is different from the low-level OS filesystems for the actual disks.

The filesystem negotiating transparently where data is stored
Figure 2-1. The filesystem negotiating transparently where data is stored

You can use a filesystem that is already supplied by Hadoop: it ships with a list of filesystems,[37] which you may want to try out first. As a last resort—or if you’re an experienced developer—you can also write your own filesystem implementation.

Local

The local filesystem actually bypasses Hadoop entirely, that is, you do not need to have an HDFS or any other cluster at all. It is handled all in the FileSystem class used by HBase to connect to the filesystem implementation. The supplied ChecksumFileSystem class is loaded by the client and uses local disk paths to store all the data.

The beauty of this approach is that HBase is unaware that it is not talking to a distributed filesystem on a remote or collocated cluster, but actually is using the local filesystem directly. The standalone mode of HBase uses this feature to run HBase only. You can select it by using the following scheme:

file:///<path>

Similar to the URIs used in a web browser, the file: scheme addresses local files.

HDFS

The Hadoop Distributed File System (HDFS) is the default filesystem when deploying a fully distributed cluster. For HBase, HDFS is the filesystem of choice, as it has all the required features. As we discussed earlier, HDFS is built to work with MapReduce, taking full advantage of its parallel, streaming access support. The scalability, fail safety, and automatic replication functionality is ideal for storing files reliably. HBase adds the random access layer missing from HDFS and ideally complements Hadoop. Using MapReduce, you can do bulk imports, creating the storage files at disk-transfer speeds.

The URI to access HDFS uses the following scheme:

hdfs://<namenode>:<port>/<path>

S3

Amazon’s Simple Storage Service (S3)[38] is a storage system that is primarily used in combination with dynamic servers running on Amazon’s complementary service named Elastic Compute Cloud (EC2).[39]

S3 can be used directly and without EC2, but the bandwidth used to transfer data in and out of S3 is going to be cost-prohibitive in practice. Transferring between EC2 and S3 is free, and therefore a viable option. One way to start an EC2-based cluster is shown in Apache Whirr.

The S3 FileSystem implementation provided by Hadoop supports two different modes: the raw (or native) mode, and the block-based mode. The raw mode uses the s3n: URI scheme and writes the data directly into S3, similar to the local filesystem. You can see all the files in your bucket the same way as you would on your local disk.

The s3: scheme is the block-based mode and was used to overcome S3’s former maximum file size limit of 5 GB. This has since been changed, and therefore the selection is now more difficult—or easy: opt for s3n: if you are not going to exceed 5 GB per file.

The block mode emulates the HDFS filesystem on top of S3. It makes browsing the bucket content more difficult as only the internal block files are visible, and the HBase storage files are stored arbitrarily inside these blocks and strewn across them. You can select the filesystem using these URIs:

s3://<bucket-name>
s3n://<bucket-name>

Other Filesystems

There are other filesystems, and one that deserves mention is CloudStore (formerly known as the Kosmos filesystem, abbreviated as KFS and the namesake of the URI scheme shown at the end of the next paragraph). It is an open source, distributed, high-performance filesystem written in C++, with similar features to HDFS. Find more information about it at the CloudStore website.

It is available for Solaris and Linux, originally developed by Kosmix and released as open source in 2007. To select CloudStore as the filesystem for HBase use the following URI format:

kfs:///<path>

Installation Choices

Once you have decided on the basic OS-related options, you must somehow get HBase onto your servers. You have a couple of choices, which we will look into next. Also see Appendix D for even more options.

Apache Binary Release

The canonical installation process of most Apache projects is to download a release, usually provided as an archive containing all the required files. Some projects have separate archives for a and release—the former intended to have everything needed to run the release and the latter containing all files needed to build the project yourself. HBase comes as a single package, containing binary and source files together. For more information on HBase releases, you may also want to check out the Release Notes[40] page. Another interesting page is titled Change Log,[41] and it lists everything that was added, fixed, or changed in any form for each release version.

You can download the most recent release of HBase from the Apache HBase release page and unpack the contents into a suitable directory, such as /usr/local or /opt, like so:

$ cd /usr/local
$ tar -zxvf hbase-x.y.z.tar.gz

Once you have extracted all the files, you can make yourself familiar with what is in the project’s directory. The content may look like this:

$ ls -lr
-rw-r--r--   1 larsgeorge  staff   192809 Feb 15 01:54 CHANGES.txt
-rw-r--r--   1 larsgeorge  staff    11358 Feb  9 01:23 LICENSE.txt
-rw-r--r--   1 larsgeorge  staff      293 Feb  9 01:23 NOTICE.txt
-rw-r--r--   1 larsgeorge  staff     1358 Feb  9 01:23 README.txt
drwxr-xr-x  23 larsgeorge  staff      782 Feb  9 01:23 bin
drwxr-xr-x   7 larsgeorge  staff      238 Feb  9 01:23 conf
drwxr-xr-x  64 larsgeorge  staff     2176 Feb 15 01:56 docs
-rwxr-xr-x   1 larsgeorge  staff   905762 Feb 15 01:56 hbase-0.90.1-tests.jar
-rwxr-xr-x   1 larsgeorge  staff  2242043 Feb 15 01:56 hbase-0.90.1.jar
drwxr-xr-x   5 larsgeorge  staff      170 Feb 15 01:55 hbase-webapps
drwxr-xr-x  32 larsgeorge  staff     1088 Mar  3 12:07 lib
-rw-r--r--   1 larsgeorge  staff    29669 Feb 15 01:28 pom.xml
drwxr-xr-x   9 larsgeorge  staff      306 Feb  9 01:23 src

The root of it only contains a few text files, stating the license terms (LICENSE.txt and NOTICE.txt) and some general information on how to find your way around (README.txt). The CHANGES.txt file is a static snapshot of the change log page mentioned earlier. It contains all the changes that went into the current release you downloaded.

You will also find the Java archive, or JAR files, that contain the compiled Java code plus all other necessary resources. There are two variations of the JAR file, one with just the name and version number and one with a postfix of tests. This file contains the code required to run the tests provided by HBase. These are functional unit tests that the developers use to verify a release is fully operational and that there are no regressions.

The last file found is named pom.xml and is the Maven project file needed to build HBase from the sources. See Building from Source.

The remainder of the content in the root directory consists of other directories, which are explained in the following list:

bin

The bin—or binaries—directory contains the scripts supplied by HBase to start and stop HBase, run separate daemons,[42] or start additional master nodes. See Running and Confirming Your Installation for information on how to use them.

conf

The configuration directory contains the files that define how HBase is set up. Configuration explains the contained files in great detail.

docs

This directory contains a copy of the HBase project website, including the documentation for all the tools, the API, and the project itself. Open your web browser of choice and open the docs/index.html file by either dragging it into the browser, double-clicking that file, or using the File→Open (or similarly named) menu.

hbase-webapps

HBase has web-based user interfaces which are implemented as Java web applications, using the files located in this directory. Most likely you will never have to touch this directory when working with or deploying HBase into production.

lib

Java-based applications are usually an assembly of many auxiliary libraries plus the JAR file containing the actual program. All of these libraries are located in the lib directory.

logs

Since the HBase processes are started as daemons (i.e., they are running in the background of the operating system performing their duty), they use logfiles to report their state, progress, and optionally, errors that occur during their life cycle. Analyzing the Logs explains how to make sense of their rather cryptic content.

Note

Initially, there may be no logs directory, as it is created when you start HBase for the first time. The logging framework used by HBase is creating the directory and logfiles dynamically.

src

In case you plan to build your own binary package (see Building from Source for information on how to do that), or you decide you would like to join the international team of developers working on HBase, you will need this source directory, containing everything required to roll your own release.

Since you have unpacked a release archive, you can now move on to Run Modes to decide how you want to run HBase.

Building from Source

HBase uses Maven to build the binary packages. You therefore need a working Maven installation, plus a full Java Development Kit (JDK)—not just a Java Runtime as used in Quick-Start Guide.

Note

This section is important only if you want to build HBase from its sources. This might be necessary if you want to apply patches, which can add new functionality you may be requiring.

Once you have confirmed that both are set up properly, you can build the binary packages using the following command:

$ mvn assembly:assembly

Note that the tests for HBase need more than one hour to complete. If you trust the code to be operational, or you are not willing to wait, you can also skip the test phase, adding a command-line switch like so:

$ mvn -DskipTests assembly:assembly

This process will take a few minutes to complete—and if you have not turned off the test phase, this goes into the tens of minutes—while creating a target directory in the HBase project home directory. Once the build completes with a Build Successful message, you can find the compiled and packaged tarball archive in the target directory. With that archive you can go back to Apache Binary Release and follow the steps outlined there to install your own, private release on your servers.

Run Modes

HBase has two run modes: standalone and distributed. Out of the box, HBase runs in standalone mode, as seen in Quick-Start Guide. To set up HBase in distributed mode, you will need to edit files in the HBase conf directory.

Whatever your mode, you may need to edit conf/hbase-env.sh to tell HBase which java to use. In this file, you set HBase environment variables such as the heap size and other options for the JVM, the preferred location for logfiles, and so on. Set JAVA_HOME to point at the root of your java installation.

Standalone Mode

This is the default mode, as described and used in Quick-Start Guide. In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process. ZooKeeper binds to a well-known port so that clients may talk to HBase.

Distributed Mode

The distributed mode can be further subdivided into pseudodistributed—all daemons run on a single node—and fully distributed—where the daemons are spread across multiple, physical servers in the cluster.[43]

Distributed modes require an instance of the Hadoop Distributed File System (HDFS). See the Hadoop requirements and instructions for how to set up an HDFS. Before proceeding, ensure that you have an appropriate, working HDFS.

The following subsections describe the different distributed setups. Starting, verifying, and exploring of your install, whether a pseudodistributed or fully distributed configuration, is described in Running and Confirming Your Installation. The same verification script applies to both deploy types.

Pseudodistributed mode

A pseudodistributed mode is simply a distributed mode that is run on a single host. Use this configuration for testing and prototyping on HBase. Do not use this configuration for production or for evaluating HBase performance.

Once you have confirmed your HDFS setup, edit conf/hbase-site.xml. This is the file into which you add local customizations and overrides for the default HBase configuration values (see Appendix A for the full list, and HDFS-Related Configuration). Point HBase at the running Hadoop HDFS instance by setting the hbase.rootdir property. For example, adding the following properties to your hbase-site.xml file says that HBase should use the /hbase directory in the HDFS whose name node is at port 9000 on your local machine, and that it should run with one replica only (recommended for pseudodistributed mode):

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  ... 
</configuration>

Note

In the example configuration, the server binds to localhost. This means that a remote client cannot connect. Amend accordingly, if you want to connect from a remote location.

If all you want to try for now is the pseudodistributed mode, you can skip to Running and Confirming Your Installation for details on how to start and verify your setup. See Chapter 12 for information on how to start extra master and region servers when running in pseudodistributed mode.

Fully distributed mode

For running a fully distributed operation on more than one host, you need to use the following configurations. In hbase-site.xml, add the hbase.cluster.distributed property and set it to true, and point the HBase hbase.rootdir at the appropriate HDFS name node and location in HDFS where you would like HBase to write data. For example, if your name node is running at a server with the hostname namenode.foo.com on port 9000 and you want to home your HBase in HDFS at /hbase, use the following configuration:

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode.foo.com:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  ...
</configuration>
Specifying region servers

In addition, a fully distributed mode requires that you modify the conf/regionservers file. It lists all the hosts on which you want to run HRegionServer daemons. Specify one host per line (this file in HBase is like the Hadoop slaves file). All servers listed in this file will be started and stopped when the HBase cluster start or stop scripts are run.

ZooKeeper setup

A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble. HBase, by default, manages a ZooKeeper cluster (which can be as low as a single node) for you. It will start and stop the ZooKeeper ensemble as part of the HBase start and stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start and stop the ZooKeeper ensemble servers as part of the start and stop commands supplied by HBase.

When HBase manages the ZooKeeper ensemble, you can specify the ZooKeeper configuration using its native zoo.cfg file, or just specify the ZooKeeper options directly in conf/hbase-site.xml. You can set a ZooKeeper configuration option as a property in the HBase hbase-site.xml XML configuration file by prefixing the ZooKeeper option name with hbase.zookeeper.property. For example, you can change the clientPort setting in ZooKeeper by setting the hbase.zookeeper.property.clientPort property. For all default values used by HBase, including ZooKeeper configuration, see Appendix A. Look for the hbase.zookeeper.property prefix.[44]

If you are using the hbase-site.xml approach to specify all ZooKeeper settings, you must at least set the ensemble servers with the hbase.zookeeper.quorum property. It otherwise defaults to a single ensemble member at localhost, which is not suitable for a fully distributed HBase (it binds to the local machine only and remote clients will not be able to connect).

For example, in order to have HBase manage a ZooKeeper quorum on nodes rs{1,2,3,4,5}.foo.com, bound to port 2222 (the default is 2181), you must ensure that HBASE_MANAGE_ZK is commented out or set to true in conf/hbase-env.sh and then edit conf/hbase-site.xml and set hbase.zookeeper.property.clientPort and hbase.zookeeper.quorum. You should also set hbase.zookeeper.property.dataDir to something other than the default, as the default has ZooKeeper persist data under /tmp, which is often cleared on system restart. In the following example, we have ZooKeeper persist to /var/zookeeper:

<configuration>
  ...
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2222</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>rs1.foo.com,rs2.foo.com,rs3.foo.com,rs4.foo.com,rs5.foo.com</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/var/zookeeper</value>
  </property>
  ...
</configuration>
Using the existing ZooKeeper ensemble

To point HBase at an existing ZooKeeper cluster, one that is not managed by HBase, set HBASE_MANAGES_ZK in conf/hbase-env.sh to false:

...
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

Next, set the ensemble locations and client port, if nonstandard, in hbase-site.xml, or add a suitably configured zoo.cfg to HBase’s CLASSPATH. HBase will prefer the configuration found in zoo.cfg over any settings in hbase-site.xml.

When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part of the regular start/stop scripts. If you would like to run ZooKeeper yourself, independent of HBase start/stop, do the following:

${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper

Note that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase. Just make sure to set HBASE_MANAGES_ZK to false if you want it to stay up across HBase restarts so that when HBase shuts down, it doesn’t take ZooKeeper down with it.

For more information about running a distinct ZooKeeper cluster, see the ZooKeeper Getting Started Guide. Additionally, see the ZooKeeper wiki, or the ZooKeeper documentation for more information on ZooKeeper sizing.

Configuration

Now that the basics are out of the way (we’ve looked at all the choices when it comes to selecting the filesystem, discussed the run modes, and fine-tuned the operating system parameters), we can look at how to configure HBase itself. Similar to Hadoop, all configuration parameters are stored in files located in the conf directory. These are simple text files either in XML format arranged as a set of properties, or in simple flat files listing one option per line.

Note

For more details on how to modify your configuration files for specific workloads refer to Configuration.

Configuring an HBase setup entails editing a file with environment variables, named conf/hbase-env.sh, which is used mostly by the shell scripts (see Operating a Cluster) to start or stop a cluster. You also need to add configuration properties to an XML file[45] named conf/hbase-site.xml to, for example, override HBase defaults, tell HBase what filesystem to use, and tell HBase the location of the ZooKeeper ensemble.

When running in distributed mode, after you make an edit to an HBase configuration file, make sure you copy the content of the conf directory to all nodes of the cluster. HBase will not do this for you.

Note

There are many ways to synchronize your configuration files across your cluster. The easiest is to use a tool like rsync. There are many more elaborate ways, and you will see a selection in Deployment.

hbase-site.xml and hbase-default.xml

Just as in Hadoop where you add site-specific HDFS configurations to the hdfs-site.xml file, for HBase, site-specific customizations go into the file conf/hbase-site.xml. For the list of configurable properties, see Appendix A, or view the raw hbase-default.xml source file in the HBase source code at src/main/resources. The doc directory also has a static HTML page that lists the configuration options.

Note

Not all configuration options make it out to hbase-default.xml. Configurations that users would rarely change can exist only in code; the only way to turn up such configurations is to read the source code itself.

The servers always read the hbase-default.xml file first and subsequently merge it with the hbase-site.xml file content—if present. The properties set in hbase-site.xml always take precedence over the default values loaded from hbase-default.xml.

Any modifications in your site file require a cluster restart for HBase to notice the changes.

hbase-env.sh

You set HBase environment variables in this file. Examples include options to pass to the JVM when an HBase daemon starts, such as Java heap size and garbage collector configurations. You also set options for HBase configuration, log directories, niceness, SSH options, where to locate process pid files, and so on. Open the file at conf/hbase-env.sh and peruse its content. Each option is fairly well documented. Add your own environment variables here if you want them read when an HBase daemon is started.

Changes here will require a cluster restart for HBase to notice the change.[46]

regionserver

This file lists all the known region server names. It is a flat text file that has one hostname per line. The list is used by the HBase maintenance script to be able to iterate over all the servers to start the region server process.

Note

If you used previous versions of HBase, you may miss the masters file, available in the 0.20.x line. It has been removed as it is no longer needed. The list of masters is now dynamically maintained in ZooKeeper and each master registers itself when started.

log4j.properties

Edit this file to change the rate at which HBase files are rolled and to change the level at which HBase logs messages. Changes here will require a cluster restart for HBase to notice the change, though log levels can be changed for particular daemons via the HBase UI. See Changing Logging Levels for information on this topic, and Analyzing the Logs for details on how to use the logfiles to find and solve problems.

Example Configuration

Here is an example configuration for a distributed 10-node cluster. The nodes are named master.foo.com, host1.foo.com, and so on, through node host9.foo.com. The HBase Master and the HDFS name node are running on the node master.foo.com. Region servers run on nodes host1.foo.com to host9.foo.com. A three-node ZooKeeper ensemble runs on zk1.foo.com, zk2.foo.com, and zk3.foo.com on the default ports. ZooKeeper data is persisted to the directory /var/zookeeper. The following subsections show what the main configuration files—hbase-site.xml, regionservers, and hbase-env.sh—found in the HBase conf directory might look like.

hbase-site.xml

The hbase-site.xml file contains the essential configuration properties, defining the HBase cluster setup.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>zk1.foo.com,zk2.foo.com,zk3.foo.com</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/var/zookeeper</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master.foo.com:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
</configuration>

regionservers

In this file, you list the nodes that will run region servers. In our example, we run region servers on all but the head node master.foo.com, which is carrying the HBase Master and the HDFS name node.

host1.foo.com
host2.foo.com
host3.foo.com
host4.foo.com
host5.foo.com
host6.foo.com
host7.foo.com
host8.foo.com
host9.foo.com

hbase-env.sh

Here are the lines that were changed from the default in the supplied hbase-env.sh file. Here we are setting the HBase heap to be 4 GB instead of the default 1 GB:

...
# export HBASE_HEAPSIZE=1000
export HBASE_HEAPSIZE=4096
...

Once you have edited the configuration files, you need to distribute them across all servers in the cluster. One option to copy the content of the conf directory to all servers in the cluster is to use the rsync command on Unix and Unix-like platforms. This approach and others are explained in Deployment.

Note

Configuration discusses the settings you are most likely to change first when you start scaling your cluster.

Client Configuration

Since the HBase Master may move around between physical machines (see Adding a backup master for details), clients start by requesting the vital information from ZooKeeper—something visualized in Region Lookups. For that reason, clients require the ZooKeeper quorum information in an hbase-site.xml file that is on their Java CLASSPATH.

Note

You can also set the hbase.zookeeper.quorum configuration key in your code. Doing so would lead to clients that need no external configuration files. This is explained in Put Method.

If you are configuring an IDE to run an HBase client, you could include the conf/ directory on your classpath. That would make the configuration files discoverable by the client code.

Minimally, a Java client needs the following JAR files specified in its CLASSPATH, when connecting to HBase: hbase, hadoop-core, zookeeper, log4j, commons-logging, and commons-lang. All of these JAR files come with HBase and are usually postfixed with the a version number of the required release. Ideally, you use the supplied JARs and do not acquire them somewhere else because even minor release changes could cause problems when running the client against a remote HBase cluster.

A basic example hbase-site.xml file for client applications might contain the following properties:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>zk1.foo.com,zk2.foo.com,zk3.foo.com</value>
  </property>
</configuration>
      

Deployment

After you have configured HBase, the next thing you need to do is to think about deploying it on your cluster. There are many ways to do that, and since Hadoop and HBase are written in Java, there are only a few necessary requirements to look out for. You can simply copy all the files from server to server, since they usually share the same configuration. Here are some ideas on how to do that. Please note that you would need to make sure that all the suggested selections and adjustments discussed in Requirements have been applied—or are applied at the same time when provisioning new servers.

Script-Based

Using a script-based approach seems archaic compared to the more advanced approaches listed shortly. But they serve their purpose and do a good job for small to even medium-size clusters. It is not so much the size of the cluster but the number of people maintaining it. In a larger operations group, you want to have repeatable deployment procedures, and not deal with someone having to run scripts to update the cluster.

The scripts make use of the fact that the regionservers configuration file has a list of all servers in the cluster. Example 2-2 shows a very simple script that could be used to copy a new release of HBase from the master node to all slave nodes.

Example 2-2. Example Script to copy the HBase files across a cluster
#!/bin/bash
# Rsyncs HBase files across all slaves. Must run on master. Assumes
# all files are located in /usr/local

if [ "$#" != "2" ]; then
  echo "usage: $(basename $0) <dir-name> <ln-name>"
  echo "	example: $(basename $0) hbase-0.1 hbase"
  exit 1
fi

SRC_PATH="/usr/local/$1/conf/regionservers"
for srv in $(cat $SRC_PATH); do
  echo "Sending command to $srv..."; 
  rsync -vaz --exclude='logs/*' /usr/local/$1 $srv:/usr/local/
  ssh $srv "rm -fR /usr/local/$2 ; ln -s /usr/local/$1 /usr/local/$2"
done

echo "done."

Another simple script is shown in Example 2-3; it can be used to copy the configuration files of HBase from the master node to all slave nodes. It assumes you are editing the configuration files on the master in such a way that the master can be copied across to all region servers.

Example 2-3. Example Script to copy configurations across a cluster
#!/bin/bash
# Rsync's HBase config files across all region servers. Must run on master.

for srv in $(cat /usr/local/hbase/conf/regionservers); do
  echo "Sending command to $srv..."; 
  rsync -vaz --delete --exclude='logs/*' /usr/local/hadoop/ $srv:/usr/local/hadoop/
  rsync -vaz --delete --exclude='logs/*' /usr/local/hbase/ $srv:/usr/local/hbase/
done

echo "done."

The second script uses rsync just like the first script, but adds the --delete option to make sure the region servers do not have any older files remaining but have an exact copy of what is on the originating server.

There are obviously many ways to do this, and the preceding examples are simply for your perusal and to get you started. Ask your administrator to help you set up mechanisms to synchronize the configuration files appropriately. Many beginners in HBase have run into a problem that was ultimately caused by inconsistent configurations among the cluster nodes. Also, do not forget to restart the servers when making changes. If you want to update settings while the cluster is in production, please refer to Rolling Restarts.

Apache Whirr

Recently, we have seen an increase in the number of users who want to run their cluster in dynamic environments, such as the public cloud offerings by Amazon’s EC2, or Rackspace Cloud Servers, as well as in private server farms, using open source tools like Eucalyptus.

The advantage is to be able to quickly provision servers and run analytical workloads and, once the result has been retrieved, to simply shut down the entire cluster, or reuse the servers for other dynamic loads. Since it is not trivial to program against each of the APIs providing dynamic cluster infrastructures, it would be useful to abstract the provisioning part and, once the cluster is operational, simply launch the MapReduce jobs the same way you would on a local, static cluster. This is where Apache Whirr comes in.

Whirr—available at http://incubator.apache.org/whirr/[47]—has support for a variety of public and private cloud APIs and allows you to provision clusters running a range of services. One of those is HBase, giving you the ability to quickly deploy a fully operational HBase cluster on dynamic setups.

You can download the latest Whirr release from the aforementioned site and find preconfigured configuration files in the recipes directory. Use it as a starting point to deploy your own dynamic clusters.

The basic concept of Whirr is to use very simple machine images that already provide the operating system (see Operating system) and SSH access. The rest is handled by Whirr using services that represent, for example, Hadoop or HBase. Each service executes every required step on each remote server to set up the user accounts, download and install the required software packages, write out configuration files for them, and so on. This is all highly customizable and you can add extra steps as needed.

Puppet and Chef

Similar to Whirr, there are other deployment frameworks for dedicated machines. Puppet by Puppet Labs and Chef by Opscode are two such offerings.

Both work similar to Whirr in that they have a central provisioning server that stores all the configurations, combined with client software, executed on each server, which communicates with the central server to receive updates and apply them locally.

Also similar to Whirr, both have the notion of recipes, which essentially translate to scripts or commands executed on each node.[48] In fact, it is quite possible to replace the scripting employed by Whirr with a Puppet- or Chef-based process.

While Whirr solely handles the bootstrapping, Puppet and Chef have further support for changing running clusters. Their master process monitors the configuration repository and, upon updates, triggers the appropriate remote action. This can be used to reconfigure clusters on-the-fly or push out new releases, do rolling restarts, and so on. It can be summarized as configuration management, rather than just provisioning.

Note

You heard it before: select an approach you like and maybe even are familiar with already. In the end, they achieve the same goal: installing everything you need on your cluster nodes. If you need a full configuration management solution with live updates, a Puppet- or Chef-based approach—maybe in combination with Whirr for the server provisioning—is the right choice.

Operating a Cluster

Now that you have set up the servers, configured the operating system and filesystem, and edited the configuration files, you are ready to start your HBase cluster for the first time.

Running and Confirming Your Installation

Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-dfs.sh over in the HADOOP_HOME directory. You can ensure that it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the MapReduce daemons. You only need to start them for actual MapReduce jobs, something we will look into in detail in Chapter 7.

If you are managing your own ZooKeeper, start it and confirm that it is running: otherwise, HBase will start up ZooKeeper for you as part of its start process.

Just as you started the standalone mode in Quick-Start Guide, you start a fully distributed HBase with the following command:

bin/start-hbase.sh

Run the preceding command from the HBASE_HOME directory. You should now have a running HBase instance. The HBase logfiles can be found in the logs subdirectory. If you find that HBase is not working as expected, please refer to Analyzing the Logs for help finding the problem.

Once HBase has started, see for information on how to create tables, add data, scan your insertions, and finally, disable and drop your tables.

Web-based UI Introduction

HBase also starts a web-based user interface (UI) listing vital attributes. By default, it is deployed on the master host at port 60010 (HBase region servers use 60030 by default). If the master is running on a host named master.foo.com on the default port, to see the master’s home page you can point your browser at http://master.foo.com:60010. Figure 2-2 is an example of how the resultant page should look. You can find a more detailed explanation in Web-based UI.

The HBase Master user interface
Figure 2-2. The HBase Master user interface

From this page you can access a variety of status information about your HBase cluster. The page is separated into multiple sections. The top part has the attributes pertaining to the cluster setup. You can see the currently running tasks—if there are any. The catalog and user tables list details about the available tables. For the user table you also see the table schema.

The lower part of the page has the region servers table, giving you access to all the currently registered servers. Finally, the region in transition list informs you about regions that are currently being maintained by the system.

After you have started the cluster, you should verify that all the region servers have registered themselves with the master and appear in the appropriate table with the expected hostnames (that a client can connect to). Also verify that you are indeed running the correct version of HBase and Hadoop.

Shell Introduction

You already used the command-line shell that comes with HBase when you went through Quick-Start Guide. You saw how to create a table, add and retrieve data, and eventually drop the table.

The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything you can do in IRB, you should be able to do in the HBase Shell. You can start the shell with the following command:

$ $HBASE_HOME/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.91.0-SNAPSHOT, r1130916, Sat Jul 23 12:44:34 CEST 2011

hbase(main):001:0>

Type help and then press Return to see a listing of shell commands and options. Browse at least the paragraphs at the end of the help text for the gist of how variables and command arguments are entered into the HBase Shell; in particular, note how table names, rows, and columns, must be quoted. Find the full description of the shell in Shell.

Since the shell is JRuby-based, you can mix Ruby with HBase commands, which enables you to do things like this:

hbase(main):001:0> create 'testtable', 'colfam1'
hbase(main):002:0> for i in 'a'..'z' do for j in 'a'..'z' do 
put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end

The first command is creating a new table named testtable, with one column family called colfam1, using default values (see Column Families for what that means). The second command uses a Ruby loop to create rows with columns in the newly created tables. It creates row keys starting with row-aa, row-ab, all the way to row-zz.

Stopping the Cluster

To stop HBase, enter the following command. Once you have started the script, you will see a message stating that the cluster is being stopped, followed by “.” (period) characters printed in regular intervals (just to indicate that the process is still running, not to give you any percentage feedback, or some other hidden meaning):

$ ./bin/stop-hbase.sh
stopping hbase...............

Shutdown can take several minutes to complete. It can take longer if your cluster is composed of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.

Chapter 12 has more on advanced administration tasks—for example, how to do a rolling restart, add extra master nodes, and more. It also has information on how to analyze and fix problems when the cluster does not start, or shut down.



[27] See “Multi-core processor” on Wikipedia.

[28] See “RAID” on Wikipedia.

[29] See “JBOD” on Wikipedia.

[30] This assumes 100 IOPS per drive, and 100 MB/second per drive.

[31] DistroWatch has a list of popular Linux and Unix-like operating systems and maintains a ranking by popularity.

[32] See this post on the Ars Technica website. Google hired the main developer of ext4, Theodore Ts’o, who announced plans to keep working on ext4 as well as other Linux kernel features.

[33] See CHANGES.txt in branch-0.20-append to see a list of patches involved in adding append on the Hadoop 0.20 branch.

[34] This is very likely to change after this book is printed. Consult with the online configuration guide for the latest details; especially the section on Hadoop.

[35] A useful document on setting configuration values on your Hadoop cluster is Aaron Kimball’s “Configuration Parameters: What can you just ignore?”.

[37] A full list was compiled by Tom White in his post “Get to Know Hadoop Filesystems”.

[38] See “Amazon S3” for more background information.

[39] See “EC2” on Wikipedia.

[42] Processes that are started and then run in the background to perform their task are often referred to as daemons.

[43] The pseudodistributed versus fully distributed nomenclature comes from Hadoop.

[44] For the full list of ZooKeeper configurations, see ZooKeeper’s zoo.cfg. HBase does not ship with that file, so you will need to browse the conf directory in an appropriate ZooKeeper download.

[45] Be careful when editing XML. Make sure you close all elements. Check your file using a tool like xmlint, or something similar, to ensure well-formedness of your document after an edit session.

[46] As of this writing, you have to restart the server. However, work is being done to enable online schema and configuration changes, so this will change over time.

[47] Please note that Whirr is still part of the incubator program of the Apache Software Foundation. Once it is accepted and promoted to a full member, its URL is going to change to a permanent place.

[48] Some of the available recipe packages are an adaption of early EC2 scripts, used to deploy HBase to dynamic, cloud-based server. For Chef, you can find HBase-related examples at http://cookbooks.opscode.com/cookbooks/hbase. For Puppet, please refer to http://hstack.org/hstack-automated-deployment-using-puppet/ and the repository with the recipes at http://github.com/hstack/puppet.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.55.193