In this chapter, we will look at how HBase is installed and initially configured. We will see how HBase can be used from the command line for basic operations, such as adding, retrieving, and deleting data.
All of the following assumes you have the Java Runtime Environment (JRE) installed. Hadoop and also HBase require at least version 1.6 (also called Java 6), and the recommended choice is the one provided by Oracle (formerly by Sun), which can be found at http://www.java.com/download/. If you do not have Java already or are running into issues using it, please see Java.
Let us get started with the “tl;dr” section of this book: you want to know how to run HBase and you want to know it now! Nothing is easier than that because all you have to do is download the most recent release of HBase from the Apache HBase release page and unpack the contents into a suitable directory, such as /usr/local or /opt, like so:
$
cd /usr/local
$
tar -zxvf hbase-
x.y.z
.tar.gz
With that in place, we can start HBase and try our first
interaction with it. We will use the interactive shell to enter the
status
command at the prompt (complete
the command by pressing the Return key):
$
cd /usr/local/hbase-0.91.0-SNAPSHOT
$
bin/start-hbase.sh
starting master, logging to /usr/local/hbase-0.91.0-SNAPSHOT
/bin/../logs/hbase-<username>
-master-localhost.out$
bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.91.0-SNAPSHOT, r1130916, Sat Jul 23 12:44:34 CEST 2011hbase(main):001:0>
status
1 servers, 0 dead, 2.0000 average load
This confirms that HBase is up and running, so we will now issue a few commands to show that we can put data into it and retrieve the same data subsequently.
It may not be clear, but what we are doing right now is similar to sitting in a car with its brakes engaged and in neutral while turning the ignition key. There is much more that you need to configure and understand before you can use HBase in a production-like environment. But it lets you get started with some basic HBase commands and become familiar with top-level concepts.
We are currently running in the so-called Standalone Mode. We will look into the available modes later on (see Run Modes), but for now it’s important to know that in this mode everything is run in a single Java process and all files are stored in /tmp by default—unless you did heed the important advice given earlier to change it to something different. Many people have lost their test data during a reboot, only to learn that they kept the default path. Once it is deleted by the OS, there is no going back!
Let us now create a simple table and add a few rows with some data:
hbase(main):002:0>
create 'testtable', 'colfam1'
0 row(s) in 0.2930 secondshbase(main):003:0>
list 'testtable'
TABLE testtable 1 row(s) in 0.0520 secondshbase(main):004:0>
put 'testtable', 'myrow-1', 'colfam1:q1', 'value-1'
0 row(s) in 0.1020 secondshbase(main):005:0>
put 'testtable', 'myrow-2', 'colfam1:q2', 'value-2'
0 row(s) in 0.0410 secondshbase(main):006:0>
put 'testtable', 'myrow-2', 'colfam1:q3', 'value-3'
0 row(s) in 0.0380 seconds
After we create the table with one column family, we verify that
it actually exists by issuing a list
command. You can see how it outputs the testtable
name as the only table currently
known. Subsequently, we are putting data into a number of rows. If you
read the example carefully, you can see that we are adding data to two
different rows with the keys myrow-1
and myrow-2
. As we discussed in Chapter 1, we have one column family named colfam1
, and can add an arbitrary qualifier to
form actual columns, here colfam1:q1
,
colfam1:q2
, and colfam1:q3
.
Next we want to check if the data we added can be retrieved. We
are using a scan
operation to do
so:
hbase(main):007:0>
scan 'testtable'
ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 2 row(s) in 0.1100 seconds
You can observe how HBase is printing the data
in a cell-oriented way by outputting each column separately. It prints out
myrow-2
twice, as expected, and shows
the actual value for each column next to it.
If we want to get exactly one row back, we can also use the
get
command. It has many more options,
which we will look at later, but for now simply try the following:
hbase(main):008:0>
get 'testtable', 'myrow-1'
COLUMN CELL colfam1:q1 timestamp=1297345476469, value=value-1 1 row(s) in 0.0480 seconds
What is missing in our basic set of operations is to delete a
value. Again, delete
offers many
options, but for now we just delete one specific cell and check that it is
gone:
hbase(main):009:0>
delete 'testtable', 'myrow-2', 'colfam1:q2'
0 row(s) in 0.0390 secondshbase(main):010:0>
scan 'testtable'
ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 2 row(s) in 0.0620 seconds
Before we conclude this simple exercise, we have to clean up by first disabling and then dropping the test table:
hbase(main):011:0>
disable 'testtable' 0 row(s) in 2.1250 secondshbase(main):012:0>
drop 'testtable'
0 row(s) in 1.2780 seconds
Finally, we close the shell by means of the exit
command and return to our command-line
prompt:
hbase(main):013:0>
exit
$ _
The last thing to do is stop HBase on our local system. We do this by running the stop-hbase.sh script:
$
bin/stop-hbase.sh
stopping hbase.....
That is all there is to it. We have successfully created a table, added, retrieved, and deleted data, and eventually dropped the table using the HBase Shell.
Not all of the following requirements are needed for specific run modes HBase supports. For purely local testing, you only need Java, as mentioned in Quick-Start Guide.
It is difficult to specify a particular server type that is recommended for HBase. In fact, the opposite is more appropriate, as HBase runs on many, very different hardware configurations. The usual description is commodity hardware. But what does that mean?
For starters, we are not talking about desktop PCs, but server-grade machines. Given that HBase is written in Java, you at least need support for a current Java Runtime, and since the majority of the memory needed per region server is for internal structures—for example, the memstores and the block cache—you will have to install a 64-bit operating system to be able to address enough memory, that is, more than 4 GB.
In practice, a lot of HBase setups are collocated with Hadoop, to make use of locality using HDFS as well as MapReduce. This can significantly reduce the required network I/O and boost processing speeds. Running Hadoop and HBase on the same server results in at least three Java processes running (data node, task tracker, and region server) and may spike to much higher numbers when executing MapReduce jobs. All of these processes need a minimum amount of memory, disk, and CPU resources to run sufficiently.
It is assumed that you have a reasonably good understanding of Hadoop, since it is used as the backing store for HBase in all known production systems (as of this writing). If you are completely new to HBase and Hadoop, it is recommended that you get familiar with Hadoop first, even on a very basic level. For example, read the recommended Hadoop: The Definitive Guide (Second Edition) by Tom White (O’Reilly), and set up a working HDFS and MapReduce cluster.
Giving all the available memory to the Java processes is also not a good idea, as most operating systems need some spare resources to work more effectively—for example, disk I/O buffers maintained by Linux kernels. HBase indirectly takes advantage of this because the already local disk I/O, given that you collocate the systems on the same server, will perform even better when the OS can keep its own block cache.
We can separate the requirements into two categories: servers and networking. We will look at the server hardware first and then into the requirements for the networking setup subsequently.
In HBase and Hadoop there are two types of machines: masters (the HDFS NameNode, the MapReduce JobTracker, and the HBase Master) and slaves (the HDFS DataNodes, the MapReduce TaskTrackers, and the HBase RegionServers). They do benefit from slightly different hardware specifications when possible. It is also quite common to use exactly the same hardware for both (out of convenience), but the master does not need that much storage, so it makes sense to not add too many disks. And since the masters are also more important than the slaves, you could beef them up with redundant hardware components. We will address the differences between the two where necessary.
Since Java runs in user land, you can run it on top of every operating system that supports a Java Runtime—though there are recommended ones, and those where it does not run without user intervention (more on this in Operating system). It allows you to select from a wide variety of vendors, or even build your own hardware. It comes down to more generic requirements like the following:
It makes no sense to run three or more Java processes, plus the services provided by the operating system itself, on single-core CPU machines. For production use, it is typical that you use multicore processors.[27] Quad-core are state of the art and affordable, while hexa-core processors are also becoming more popular. Most server hardware supports more than one CPU so that you can use two quad-core CPUs for a total of eight cores. This allows for each basic Java process to run on its own core while the background tasks like Java garbage collection can be executed in parallel. In addition, there is hyperthreading, which adds to their overall performance.
As far as CPU is concerned, you should spec the master and slave machines the same.
The question really is: is there too much memory? In theory, no, but in practice, it has been empirically determined that when using Java you should not set the amount of memory given to a single process too high. Memory (called heap in Java terms) can start to get fragmented, and in a worst-case scenario, the entire heap would need rewriting—this is similar to the well-known disk fragmentation, but it cannot run in the background. The Java Runtime pauses all processing to clean up the mess, which can lead to quite a few problems (more on this later). The larger you have set the heap, the longer this process will take. Processes that do not need a lot of memory should only be given their required amount to avoid this scenario, but with the region servers and their block cache there is, in theory, no upper limit. You need to find a sweet spot depending on your access pattern.
At the time of this writing, setting the heap of the region servers to larger than 16 GB is considered dangerous. Once a stop-the-world garbage collection is required, it simply takes too long to rewrite the fragmented heap. Your server could be considered dead by the master and be removed from the working set.
This may change sometime as this is ultimately bound to the Java Runtime Environment used, and there is development going on to implement JREs that do not stop the running Java processes when performing garbage collections.
Table 2-1 shows a very basic distribution of memory to specific processes. Please note that this is an example only and highly depends on the size of your cluster and how much data you put in, but also on your access pattern, such as interactive access only or a combination of interactive and batch use (using MapReduce).
Process | Heap | Description |
NameNode | 8 GB | About 1 GB of heap for every 100 TB of raw data stored, or per every million files/inodes |
SecondaryNameNode | 8 GB | Applies the edits in memory, and therefore needs about the same amount as the NameNode |
JobTracker | 2 GB | Moderate requirements |
HBase Master | 4 GB | Usually lightly loaded, moderate requirements only |
DataNode | 1 GB | Moderate requirements |
TaskTracker | 1 GB | Moderate requirements |
HBase RegionServer | 12 GB | Majority of available memory, while leaving enough room for the operating system (for the buffer cache), and for the Task Attempt processes |
Task Attempts | 1 GB (ea.) | Multiply by the maximum number you allow for each |
ZooKeeper | 1 GB | Moderate requirements |
An exemplary setup could be as such: for the master machine, running the NameNode, SecondaryNameNode, JobTracker, and HBase Master, 24 GB of memory; and for the slaves, running the DataNodes, TaskTrackers, and HBase RegionServers, 24 GB or more.
It is recommended that you optimize your RAM for the memory channel width of your server. For example, when using dual-channel memory, each machine should be configured with pairs of DIMMs. With triple-channel memory, each server should have triplets of DIMMs. This could mean that a server has 18 GB (9 × 2GB) of RAM instead of 16 GB (4 × 4GB).
Also make sure that not just the server’s motherboard supports this feature, but also your CPU: some CPUs only support dual-channel memory, and therefore, even if you put in triple-channel DIIMMs, they will only be used in dual-channel mode.
The data is stored on the slave machines, and therefore it is those servers that need plenty of capacity. Depending on whether you are more read/write- or processing-oriented, you need to balance the number of disks with the number of CPU cores available. Typically, you should have at least one core per disk, so in an eight-core server, adding six disks is good, but adding more might not be giving you optimal performance.
Some consideration should be given regarding the type of drives—for example, 2.5” versus 3.5” drives or SATA versus SAS. In general, SATA drives are recommended over SAS since they are more cost-effective, and since the nodes are all redundantly storing replicas of the data across multiple servers, you can safely use the more affordable disks. On the other hand, 3.5” disks are more reliable compared to 2.5” disks, but depending on the server chassis you may need to go with the latter.
The disk capacity is usually 1 TB per disk, but you can also use 2 TB drives if necessary. Using from six to 12 high-density servers with 1 TB to 2 TB drives is good, as you get a lot of storage capacity and the JBOD setup with enough cores can saturate the disk bandwidth nicely.
The actual server chassis is not that crucial, as most servers in a specific price bracket provide very similar features. It is often better to shy away from special hardware that offers proprietary functionality and opt for generic servers so that they can be easily combined over time as you extend the capacity of the cluster.
As far as networking is concerned, it is recommended that you use a two-port Gigabit Ethernet card—or two channel-bonded cards. If you already have support for 10 Gigabit Ethernet or InfiniBand, you should use it.
For the slave servers, a single power supply unit (PSU) is sufficient, but for the master node you should use redundant PSUs, such as the optional dual PSUs available for many servers.
In terms of density, it is advisable to select server hardware that fits into a low number of rack units (abbreviated as “U”). Typically, 1U or 2U servers are used in 19” racks or cabinets. A consideration while choosing the size is how many disks they can hold and their power consumption. Usually a 1U server is limited to a lower number of disks or forces you to use 2.5” disks to get the capacity you want.
In a data center, servers are typically mounted into 19” racks or cabinets with 40U or more in height. You could fit up to 40 machines (although with half-depth servers, some companies have up to 80 machines in a single rack, 40 machines on either side) and link them together with a top-of-rack (ToR) switch. Given the Gigabit speed per server, you need to ensure that the ToR switch is fast enough to handle the throughput these servers can create. Often the backplane of a switch cannot handle all ports at line rate or is oversubscribed—in other words, promising you something in theory it cannot do in reality.
Switches often have 24 or 48 ports, and with the aforementioned channel-bonding or two-port cards, you need to size the networking large enough to provide enough bandwidth. Installing 40 1U servers would need 80 network ports; so, in practice, you may need a staggered setup where you use multiple rack switches and then aggregate to a much larger core aggregation switch (CaS). This results in a two-tier architecture, where the distribution is handled by the ToR switch and the aggregation by the CaS.
While we cannot address all the considerations for large-scale setups, we can still notice that this is a common design pattern. Given that the operations team is part of the planning, and it is known how much data is going to be stored and how many clients are expected to read and write concurrently, this involves basic math to compute the number of servers needed—which also drives the networking considerations.
When users have reported issues with HBase on the public mailing list or on other channels, especially regarding slower-than-expected I/O performance bulk inserting huge amounts of data, it became clear that networking was either the main or a contributing issue. This ranges from misconfigured or faulty network interface cards (NICs) to completely oversubscribed switches in the I/O path. Please make sure that you verify every component in the cluster to avoid sudden operational problems—the kind that could have been avoided by sizing the hardware appropriately.
Finally, given the current status of built-in security in Hadoop and HBase, it is common for the entire cluster to be located in its own network, possibly protected by a firewall to control access to the few required, client-facing ports.
After considering the hardware and purchasing the server machines, it’s time to consider software. This can range from the operating system itself to filesystem choices and configuration of various auxiliary services.
Most of the requirements listed are independent of HBase and have to be applied on a very low, operational level. You may have to advise with your administrator to get everything applied and verified.
Recommending an operating system (OS) is a tough call, especially in the open source realm. In terms of the past two to three years, it seems there is a preference for using Linux with HBase. In fact, Hadoop and HBase are inherently designed to work with Linux, or any other Unix-like system, or with Unix. While you are free to run either one on a different OS as long as it supports Java—for example, Windows—they have only been tested with Unix-like systems. The supplied start and stop scripts, for example, expect a command-line shell as provided by Linux or Unix.
Within the Unix and Unix-like group you can also differentiate between those that are free (as in they cost no money) and those you have to pay for. Again, both will work and your choice is often limited by company-wide regulations. Here is a short list of operating systems that are commonly found as a basis for HBase clusters:
CentOS is a community-supported, free software operating system, based on Red Hat Enterprise Linux (as RHEL). It mirrors RHEL in terms of functionality, features, and package release levels as it is using the source code packages Red Hat provides for its own enterprise product to create CentOS-branded counterparts. Like RHEL, it provides the packages in RPM format.
It is also focused on enterprise usage, and therefore does not adopt new features or newer versions of existing packages too quickly. The goal is to provide an OS that can be rolled out across a large-scale infrastructure while not having to deal with short-term gains of small, incremental package updates.
Fedora is also a community-supported, free and open source operating system, and is sponsored by Red Hat. But compared to RHEL and CentOS, it is more a playground for new technologies and strives to advance new ideas and features. Because of that, it has a much shorter life cycle compared to enterprise-oriented products. An average maintenance period for a Fedora release is around 13 months.
The fact that it is aimed at workstations and has been enhanced with many new features has made Fedora a quite popular choice, only beaten by more desktop-oriented operating systems.[31] For production use, you may want to take into account the reduced life cycle that counteracts the freshness of this distribution. You may also want to consider not using the latest Fedora release, but trailing by one version to be able to rely on some feedback from the community as far as stability and other issues are concerned.
Debian is another Linux-kernel-based OS that has software packages released as free and open source software. It can be used for desktop and server systems and has a conservative approach when it comes to package updates. Releases are only published after all included packages have been sufficiently tested and deemed stable.
As opposed to other distributions, Debian is not backed by a commercial entity, but rather is solely governed by its own project rules. It also uses its own packaging system that supports DEB packages only. Debian is known to run on many hardware platforms as well as having a very large repository of packages.
Ubuntu is a Linux distribution based on Debian. It is distributed as free and open source software, and backed by Canonical Ltd., which is not charging for the OS but is selling technical support for Ubuntu.
The life cycle is split into a longer- and a shorter-term release. The long-term support (LTS) releases are supported for three years on the desktop and five years on the server. The packages are also DEB format and are based on the unstable branch of Debian: Ubuntu, in a sense, is for Debian what Fedora is for Red Hat Linux. Using Ubuntu as a server operating system is made more difficult as the update cycle for critical components is very frequent.
Solaris is offered by Oracle, and is available for a limited number of architecture platforms. It is a descendant of Unix System V Release 4, and therefore, the most different OS in this list. Some of the source code is available as open source while the rest is closed source. Solaris is a commercial product and needs to be purchased. The commercial support for each release is maintained for 10 to 12 years.
Abbreviated as RHEL, Red Hat’s Linux distribution is aimed at commercial and enterprise-level customers. The OS is available as a server and a desktop version. The license comes with offerings for official support, training, and a certification program.
The package format for RHEL is called RPM (the Red Hat Package Manager), and it consists of the software packaged in the .rpm file format, and the package manager itself.
Being commercially supported and maintained, RHEL has a very long life cycle of 7 to 10 years.
You have a choice when it comes to the operating system you are going to use on your servers. A sensible approach is to choose one you feel comfortable with and that fits into your existing infrastructure.
As for a recommendation, many production systems running HBase are on top of CentOS, or RHEL.
With the operating system selected, you will have a few choices of filesystems to use with your disks. There is not a lot of publicly available empirical data in regard to comparing different filesystems and their effect on HBase, though. The common systems in use are ext3, ext4, and XFS, but you may be able to use others as well. For some there are HBase users reporting on their findings, while for more exotic ones you would need to run enough tests before using it on your production cluster.
Note that the selection of filesystems is for the HDFS data nodes. HBase is directly impacted when using HDFS as its backing store.
Here are some notes on the more commonly used filesystems:
One of the most ubiquitous filesystems on the Linux operating system is ext3 (see http://en.wikipedia.org/wiki/Ext3 for details). It has been proven stable and reliable, meaning it is a safe bet in terms of setting up your cluster with it. Being part of Linux since 2001, it has been steadily improved over time and has been the default filesystem for years.
There are a few optimizations you
should keep in mind when using ext3. First, you should set the
noatime
option when mounting
the filesystem to reduce the administrative overhead required
for the kernel to keep the access time for
each file. It is not needed or even used by HBase, and disabling
it speeds up the disk’s read performance.
Disabling the last access time gives
you a performance boost and is a recommended optimization.
Mount options are typically specified in a configuration file
called /etc/fstab. Here
is a Linux example line where the noatime
option is specified:
/dev/sdd1 /data ext3 defaults,noatime 0 0
Another optimization is to make better use of the disk space provided by ext3. By default, it reserves a specific number of bytes in blocks for situations where a disk fills up but crucial system processes need this space to continue to function. This is really useful for critical disks—for example, the one hosting the operating system—but it is less useful for the storage drives, and in a large enough cluster it can have a significant impact on available storage capacities.
You can reduce the number of
reserved blocks and gain more usable disk space by using the
tune2fs
command-line tool
that comes with ext3 and Linux. By default, it is set to 5%
but can safely be reduced to 1% (or even 0%) for the data
drives. This is done with the following command:
tune2fs -m 1 <device-name>
Replace <device-name>
with the disk
you want to adjust—for example, /dev/sdd1
. Do this for all disks on
which you want to store data. The -m
1
defines the percentage, so use -m 0
, for example, to set the
reserved block count to zero.
A final word of caution: only do this for your data disk, NOT for the disk hosting the OS nor for any drive on the master node!
Yahoo! has publicly stated that it is using ext3 as its filesystem of choice on its large Hadoop cluster farm. This shows that, although it is by far not the most current or modern filesystem, it does very well in large clusters. In fact, you are more likely to saturate your I/O on other levels of the stack before reaching the limits of ext3.
The biggest drawback of ext3 is that during the bootstrap process of the servers it requires the largest amount of time. Formatting a disk with ext3 can take minutes to complete and may become a nuisance when spinning up machines dynamically on a regular basis—although that is not a very common practice.
The successor to ext3 is called ext4 (see http://en.wikipedia.org/wiki/Ext4 for details) and initially was based on the same code but was subsequently moved into its own project. It has been officially part of the Linux kernel since the end of 2008. To that extent, it has had only a few years to prove its stability and reliability. Nevertheless, Google has announced plans[32] to upgrade its storage infrastructure from ext2 to ext4. This can be considered a strong endorsement, but also shows the advantage of the extended filesystem (the ext in ext3, ext4, etc.) lineage to be upgradable in place. Choosing an entirely different filesystem like XFS would have made this impossible.
Performance-wise, ext4 does beat ext3 and allegedly comes close to the high-performance XFS. It also has many advanced features that allow it to store files up to 16 TB in size and support volumes up to 1 exabyte (i.e., 1018 bytes).
A more critical feature is the so-called delayed allocation, and it is recommended that you turn it off for Hadoop and HBase use. Delayed allocation keeps the data in memory and reserves the required number of blocks until the data is finally flushed to disk. It helps in keeping blocks for files together and can at times write the entire file into a contiguous set of blocks. This reduces fragmentation and improves performance when reading the file subsequently. On the other hand, it increases the possibility of data loss in case of a server crash.
XFS (see http://en.wikipedia.org/wiki/Xfs for details) became available on Linux at about the same time as ext3. It was originally developed by Silicon Graphics in 1993. Most Linux distributions today have XFS support included.
Its features are similar to those of ext4; for example, both have extents (grouping contiguous blocks together, reducing the number of blocks required to maintain per file) and the aforementioned delayed allocation.
A great advantage of XFS during bootstrapping a server is the fact that it formats the entire drive in virtually no time. This can significantly reduce the time required to provision new servers with many storage disks.
On the other hand, there are some drawbacks to using XFS. There is a known shortcoming in the design that impacts metadata operations, such as deleting a large number of files. The developers have picked up on the issue and applied various fixes to improve the situation. You will have to check how you use HBase to determine if this might affect you. For normal use, you should not have a problem with this limitation of XFS, as HBase operates on fewer but larger files.
Introduced in 2005, ZFS (see http://en.wikipedia.org/wiki/ZFS for details) was developed by Sun Microsystems. The name is an abbreviation for zettabyte filesystem, as it has the ability to store 258 zettabytes (which, in turn, is 1021 bytes).
ZFS is primarily supported on Solaris and has advanced features that may be useful in combination with HBase. It has built-in compression support that could be used as a replacement for the pluggable compression codecs in HBase.
It seems that choosing a filesystem is analogous to choosing an operating system: pick one that you feel comfortable with and that fits into your existing infrastructure. Simply picking one over the other based on plain numbers is difficult without proper testing and comparison. If you have a choice, it seems to make sense to opt for a more modern system like ext4 or XFS, as sooner or later they will replace ext3 and are already much more scalable and perform better than their older sibling.
Installing different filesystems on a single server is not recommended. This can have adverse effects on performance as the kernel may have to split buffer caches to support the different filesystems. It has been reported that, for certain operating systems, this can have a devastating performance impact. Make sure you test this issue carefully if you have to mix filesystems.
It was mentioned in the note that you do need Java for HBase. Not just any version of Java, but version 6, a.k.a. 1.6, or later. The recommended choice is the one provided by Oracle (formerly by Sun), which can be found at http://www.java.com/download/.
You also should make sure the java
binary is executable and can be found
on your path. Try entering java
-version
on the command line and verify that it works and
that it prints out the version number indicating it is version 1.6 or
later—for example, java version
"1.6.0_22"
. You usually want the latest update level, but
sometimes you may find unexpected problems (version 1.6.0_18, for
example, is known to cause random JVM crashes) and it may be worth
trying an older release to verify.
If you do not have Java on the command-line path or if HBase
fails to start with a warning that it was not able to find it (see
Example 2-1), edit the conf/hbase-env.sh file by commenting out
the JAVA_HOME
line and changing its value to where
your Java is installed.
+======================================================================+ | Error: JAVA_HOME is not set and Java could not be found | +----------------------------------------------------------------------+ | Please download the latest Sun JDK from the Sun Java web site | | > http://java.sun.com/javase/downloads/ < | | | | HBase requires Java 1.6 or later. | | NOTE: This script will find Sun Java whether you install using the | | binary or the RPM based installer. | +======================================================================+
The supplied scripts try many default locations for Java, so there is a good chance HBase will find it automatically. If it does not, you most likely have no Java Runtime installed at all. Start with the download link provided at the beginning of this subsection and read the manuals of your operating system to find out how to install it.
Currently, HBase is bound to work only with the specific version of Hadoop it was built against. One of the reasons for this behavior concerns the remote procedure call (RPC) API between HBase and Hadoop. The wire protocol is versioned and needs to match up; even small differences can cause a broken communication between them.
The current version of HBase will only run on Hadoop 0.20.x. It will not run on Hadoop 0.21.x (nor 0.22.x) as of this writing. HBase may lose data in a catastrophic event unless it is running on an HDFS that has durable sync support. Hadoop 0.20.2 and Hadoop 0.20.203.0 do not have this support. Currently, only the branch-0.20-append branch has this attribute.[33] No official releases have been made from this branch up to now, so you will have to build your own Hadoop from the tip of this branch. Scroll down in the Hadoop How To Release to the “Build Requirements” section for instructions on how to build Hadoop.[34]
Another option, if you do not want to build your own version of Hadoop, is to use a distribution that has the patches already applied. You could use Cloudera’s CDH3. CDH has the 0.20-append patches needed to add a durable sync. We will discuss this in more detail in Cloudera’s Distribution Including Apache Hadoop.
Because HBase depends on Hadoop, it bundles an instance of the Hadoop JAR under its lib directory. The bundled Hadoop was made from the Apache branch-0.20-append branch at the time of HBase’s release. It is critical that the version of Hadoop that is in use on your cluster matches what is used by HBase. Replace the Hadoop JAR found in the HBase lib directory with the hadoop-xyz.jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the JAR on all servers in your cluster that run HBase. Version mismatch issues have various manifestations, but often the result is the same: HBase does not throw an error, but simply blocks indefinitely.
A different approach is to install a vanilla Hadoop 0.20.2 and then replace the vanilla Hadoop JAR with the one supplied by HBase. This is not tested extensively but seems to work. Your mileage may vary.
Note that ssh must be
installed and sshd must be running
if you want to use the supplied scripts to manage remote Hadoop and
HBase daemons. A commonly used software package providing these
commands is OpenSSH
, available from http://www.openssh.com/. Check
with your operating system manuals first, as many OSes have mechanisms
to install an already compiled binary release package as opposed to
having to build it yourself. On a Ubuntu workstation, for example, you
can use:
$
sudo apt-get install openssh-client
On the servers, you would install the matching server package:
$
sudo apt-get install openssh-server
You must be able to ssh to all nodes, including your local node, using passwordless login. You will need to have a public key pair—you can either use the one you already use (see the .ssh directory located in your home directory) or you will have to generate one—and add your public key on each server so that the scripts can access the remote servers without further intervention.
The supplied shell scripts make use of SSH to send commands to each server in the cluster. It is strongly advised that you not use simple password authentication. Instead, you should use public key authentication—only!
When you create your key pair, also add a passphrase to protect your private key. To avoid the hassle of being asked for the passphrase for every single command sent to a remote server, it is recommended that you use ssh-agent, a helper that comes with SSH. It lets you enter the passphrase only once and then takes care of all subsequent requests to provide it.
Ideally, you would also use the agent forwarding that is built in to log in to other remote servers from your cluster nodes.
HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving should work. You can verify if the setup is correct for forward DNS lookups by running the following command:
$
ping -c 1 $(hostname)
You need to make sure that it reports the
public IP address of the server and not the
loopback address 127.0.0.1
. A typical reason for this not to
work concerns an incorrect /etc/hosts file, containing a mapping of
the machine name to the loopback address.
If your machine has multiple interfaces,
HBase will use the interface that the primary hostname resolves to. If
this is insufficient, you can set hbase.regionserver.dns.interface
(see Configuration for information on how to do this) to
indicate the primary interface. This only works if your cluster
configuration is consistent and every host has the same network
interface configuration.
Another alternative is to set hbase.regionserver.dns.nameserver
to choose
a different name server than the system-wide default.
The clocks on cluster nodes should be in basic alignment. Some skew is tolerable, but wild skew can generate odd behaviors. Even differences of only one minute can cause unexplainable behavior. Run NTP on your cluster, or an equivalent application, to synchronize the time on all servers.
If you are having problems querying data, or you are seeing weird behavior running cluster operations, check the system time!
HBase is a database, so it uses a lot of files at the same
time. The default ulimit -n
of
1024
on most Unix or other
Unix-like systems is insufficient. Any significant amount of loading
will lead to I/O errors stating the obvious: java.io.IOException: Too many open files
.
You may also notice errors such as the following:
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
These errors are usually found in the logfiles. See Analyzing the Logs for details on their location, and how to analyze their content.
You need to change the upper bound on the number of file descriptors. Set it to a number larger than 10,000. To be clear, upping the file descriptors for the user who is running the HBase process is an operating system configuration, not an HBase configuration. Also, a common mistake is that administrators will increase the file descriptors for a particular user but HBase is running with a different user account.
You can estimate the number of required file handles roughly as follows. Per column family, there is at least one storage file, and possibly up to five or six if a region is under load; on average, though, there are three storage files per column family. To determine the number of required file handles, you multiply the number of column families by the number of regions per region server. For example, say you have a schema of 3 column families per region and you have 100 regions per region server. The JVM will open 3 × 3 × 100 storage files = 900 file descriptors, not counting open JAR files, configuration files, CRC32 files, and so on. Run lsof -p REGIONSERVER_PID to see the accurate number.
As the first line in its logs, HBase prints the ulimit it is seeing. Ensure that it’s correctly reporting the increased limit.[35] See Analyzing the Logs for details on how to find this information in the logs, as well as other details that can help you find—and solve—problems with an HBase setup.
You may also need to edit /etc/sysctl.conf and adjust the fs.file-max
value. See this post
on Server Fault for details.
You should also consider increasing the
number of processes allowed by adjusting the nproc
value in the same /etc/security/limits.conf file referenced
earlier. With a low limit and a server under duress, you could see
OutOfMemoryError
exceptions, which
will eventually cause the entire Java process to end. As with the file
handles, you need to make sure this value is set for the appropriate
user account running the process.
A Hadoop HDFS data node has an upper bound on the number
of files that it will serve at any one time. The upper bound parameter
is called xcievers
(yes, this is
misspelled). Again, before doing any loading, make sure you have
configured Hadoop’s conf/hdfs-site.xml
file, setting the xcievers
value to
at least the following:
<property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property>
Not having this configuration in place makes
for strange-looking failures. Eventually, you will see a complaint in
the datanode logs about the xcievers
limit being exceeded, but on the
run up to this one manifestation is a complaint about missing blocks.
For example:
10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
You need to prevent your servers from running out of memory over time. We already discussed one way to do this: setting the heap sizes small enough that they give the operating system enough room for its own processes. Once you get close to the physically available memory, the OS starts to use the configured swap space. This is typically located on disk in its own partition and is used to page out processes and their allocated memory until it is needed again.
Swapping—while being a good thing on workstations—is something to be avoided at all costs on servers. Once the server starts swapping, performance is reduced significantly, up to a point where you may not even be able to log in to such a system because the remote access process (e.g., SSHD) is coming to a grinding halt.
HBase needs guaranteed CPU cycles and must obey certain freshness guarantees—for example, to renew the ZooKeeper sessions. It has been observed over and over again that swapping servers start to miss renewing their leases and are considered lost subsequently by the ZooKeeper ensemble. The regions on these servers are redeployed on other servers, which now take extra pressure and may fall into the same trap.
Even worse are scenarios where the swapping
server wakes up and now needs to realize it is considered dead by the
master node. It will report for duty as if nothing has happened and receive a YouAreDeadException
in the process, telling
it that it has missed its chance to continue, and therefore terminates
itself. There are quite a few implicit issues with this scenario—for
example, pending updates, which we will address later. Suffice it to
say that this is not good.
You can tune down the swappiness of the server by adding this line to the /etc/sysctl.conf configuration file on Linux and Unix-like systems:
vm.swappiness=5
You can try values like 0
or 5
to
reduce the system’s likelihood to use swap space.
Some more radical operators have turned off swapping completely (see swappoff on Linux), and would rather have their systems run “against the wall” than deal with swapping issues. Choose something you feel comfortable with, but make sure you keep an eye on this problem.
Finally, you may have to reboot the server for the changes to take effect, as a simple
sysctl -p
might not suffice. This obviously is for Unix-like systems and you will have to adjust this for your operating system.
HBase running on Windows has not been tested to a great extent. Running a production install of HBase on top of Windows is not recommended.
If you are running HBase on Windows, you must install Cygwin to have a Unix-like environment for the shell scripts. The full details are explained in the Windows Installation guide on the HBase website.
The most common filesystem used with HBase is HDFS. But you are
not locked into HDFS because the FileSystem
used by
HBase has a pluggable architecture and can be used to replace HDFS with
any other supported system. In fact, you could go as far as implementing
your own filesystem—maybe even on top of another database. The
possibilities are endless and waiting for the brave at heart.
In this section, we are not talking about the low-level filesystems used by the operating system (see Filesystem for that), but the storage layer filesystems. These are abstractions that define higher-level features and APIs, which are then used by Hadoop to store the data. The data is eventually stored on a disk, at which point the OS filesystem is used.
HDFS is the most used and tested filesystem in production. Almost all production clusters use it as the underlying storage layer. It is proven stable and reliable, so deviating from it may impose its own risks and subsequent problems.
The primary reason HDFS is so popular is its built-in replication, fault tolerance, and scalability. Choosing a different filesystem should provide the same guarantees, as HBase implicitly assumes that data is stored in a reliable manner by the filesystem. It has no added means to replicate data or even maintain copies of its own storage files. This functionality must be provided by the lower-level system.
You can select a different filesystem implementation by using a URI[36] pattern, where the scheme (the part before the first “:”, i.e., the colon) part of the URI identifies the driver to be used. Figure 2-1 shows how the Hadoop filesystem is different from the low-level OS filesystems for the actual disks.
You can use a filesystem that is already supplied by Hadoop: it ships with a list of filesystems,[37] which you may want to try out first. As a last resort—or if you’re an experienced developer—you can also write your own filesystem implementation.
The local filesystem actually bypasses
Hadoop entirely, that is, you do not need to have an HDFS or any other
cluster at all. It is handled all in the FileSystem
class used by HBase to connect to
the filesystem implementation. The supplied ChecksumFileSystem
class is loaded by the
client and uses local disk paths to store all the data.
The beauty of this approach is that HBase is unaware that it is not talking to a distributed filesystem on a remote or collocated cluster, but actually is using the local filesystem directly. The standalone mode of HBase uses this feature to run HBase only. You can select it by using the following scheme:
file:///<path>
Similar to the URIs used in a web browser, the
file:
scheme addresses local files.
The Hadoop Distributed File System (HDFS) is the default filesystem when deploying a fully distributed cluster. For HBase, HDFS is the filesystem of choice, as it has all the required features. As we discussed earlier, HDFS is built to work with MapReduce, taking full advantage of its parallel, streaming access support. The scalability, fail safety, and automatic replication functionality is ideal for storing files reliably. HBase adds the random access layer missing from HDFS and ideally complements Hadoop. Using MapReduce, you can do bulk imports, creating the storage files at disk-transfer speeds.
The URI to access HDFS uses the following scheme:
hdfs://<namenode>:<port>/<path>
Amazon’s Simple Storage Service (S3)[38] is a storage system that is primarily used in combination with dynamic servers running on Amazon’s complementary service named Elastic Compute Cloud (EC2).[39]
S3 can be used directly and without EC2, but the bandwidth used to transfer data in and out of S3 is going to be cost-prohibitive in practice. Transferring between EC2 and S3 is free, and therefore a viable option. One way to start an EC2-based cluster is shown in Apache Whirr.
The S3 FileSystem implementation provided by
Hadoop supports two different modes: the raw (or
native) mode, and the
block-based mode. The raw mode uses the s3n:
URI scheme and writes the data directly
into S3, similar to the local filesystem. You can see all the files in
your bucket the same way as you would on your local disk.
The s3:
scheme is the block-based mode and was used to overcome S3’s former
maximum file size limit of 5 GB. This has since been changed, and
therefore the selection is now more difficult—or easy: opt for s3n:
if you are not going to exceed 5 GB per
file.
The block mode emulates the HDFS filesystem on top of S3. It makes browsing the bucket content more difficult as only the internal block files are visible, and the HBase storage files are stored arbitrarily inside these blocks and strewn across them. You can select the filesystem using these URIs:
s3://<bucket-name> s3n://<bucket-name>
There are other filesystems, and one that deserves mention is CloudStore (formerly known as the Kosmos filesystem, abbreviated as KFS and the namesake of the URI scheme shown at the end of the next paragraph). It is an open source, distributed, high-performance filesystem written in C++, with similar features to HDFS. Find more information about it at the CloudStore website.
It is available for Solaris and Linux, originally developed by Kosmix and released as open source in 2007. To select CloudStore as the filesystem for HBase use the following URI format:
kfs:///<path>
Once you have decided on the basic OS-related options, you must somehow get HBase onto your servers. You have a couple of choices, which we will look into next. Also see Appendix D for even more options.
The canonical installation process of most Apache projects is to download a release, usually provided as an archive containing all the required files. Some projects have separate archives for a binary and source release—the former intended to have everything needed to run the release and the latter containing all files needed to build the project yourself. HBase comes as a single package, containing binary and source files together. For more information on HBase releases, you may also want to check out the Release Notes[40] page. Another interesting page is titled Change Log,[41] and it lists everything that was added, fixed, or changed in any form for each release version.
You can download the most recent release of HBase from the Apache HBase release page and unpack the contents into a suitable directory, such as /usr/local or /opt, like so:
$
cd /usr/local
$
tar -zxvf hbase-
x.y.z
.tar.gz
Once you have extracted all the files, you can make yourself familiar with what is in the project’s directory. The content may look like this:
$
ls -lr
-rw-r--r-- 1 larsgeorge staff 192809 Feb 15 01:54 CHANGES.txt -rw-r--r-- 1 larsgeorge staff 11358 Feb 9 01:23 LICENSE.txt -rw-r--r-- 1 larsgeorge staff 293 Feb 9 01:23 NOTICE.txt -rw-r--r-- 1 larsgeorge staff 1358 Feb 9 01:23 README.txt drwxr-xr-x 23 larsgeorge staff 782 Feb 9 01:23 bin drwxr-xr-x 7 larsgeorge staff 238 Feb 9 01:23 conf drwxr-xr-x 64 larsgeorge staff 2176 Feb 15 01:56 docs -rwxr-xr-x 1 larsgeorge staff 905762 Feb 15 01:56 hbase-0.90.1-tests.jar -rwxr-xr-x 1 larsgeorge staff 2242043 Feb 15 01:56 hbase-0.90.1.jar drwxr-xr-x 5 larsgeorge staff 170 Feb 15 01:55 hbase-webapps drwxr-xr-x 32 larsgeorge staff 1088 Mar 3 12:07 lib -rw-r--r-- 1 larsgeorge staff 29669 Feb 15 01:28 pom.xml drwxr-xr-x 9 larsgeorge staff 306 Feb 9 01:23 src
The root of it only contains a few text files, stating the license terms (LICENSE.txt and NOTICE.txt) and some general information on how to find your way around (README.txt). The CHANGES.txt file is a static snapshot of the change log page mentioned earlier. It contains all the changes that went into the current release you downloaded.
You will also find the Java archive, or JAR files, that contain the compiled Java code plus all other necessary resources. There are two variations of the JAR file, one with just the name and version number and one with a postfix of tests. This file contains the code required to run the tests provided by HBase. These are functional unit tests that the developers use to verify a release is fully operational and that there are no regressions.
The last file found is named pom.xml and is the Maven project file needed to build HBase from the sources. See Building from Source.
The remainder of the content in the root directory consists of other directories, which are explained in the following list:
The bin—or binaries—directory contains the scripts supplied by HBase to start and stop HBase, run separate daemons,[42] or start additional master nodes. See Running and Confirming Your Installation for information on how to use them.
The configuration directory contains the files that define how HBase is set up. Configuration explains the contained files in great detail.
This directory contains a copy of the HBase project website, including the documentation for all the tools, the API, and the project itself. Open your web browser of choice and open the docs/index.html file by either dragging it into the browser, double-clicking that file, or using the File→Open (or similarly named) menu.
HBase has web-based user interfaces which are implemented as Java web applications, using the files located in this directory. Most likely you will never have to touch this directory when working with or deploying HBase into production.
Java-based applications are usually an assembly of many auxiliary libraries plus the JAR file containing the actual program. All of these libraries are located in the lib directory.
Since the HBase processes are started as daemons (i.e., they are running in the background of the operating system performing their duty), they use logfiles to report their state, progress, and optionally, errors that occur during their life cycle. Analyzing the Logs explains how to make sense of their rather cryptic content.
In case you plan to build your own binary package (see Building from Source for information on how to do that), or you decide you would like to join the international team of developers working on HBase, you will need this source directory, containing everything required to roll your own release.
Since you have unpacked a release archive, you can now move on to Run Modes to decide how you want to run HBase.
HBase uses Maven to build the binary packages. You therefore need a working Maven installation, plus a full Java Development Kit (JDK)—not just a Java Runtime as used in Quick-Start Guide.
This section is important only if you want to build HBase from its sources. This might be necessary if you want to apply patches, which can add new functionality you may be requiring.
Once you have confirmed that both are set up properly, you can build the binary packages using the following command:
$
mvn assembly:assembly
Note that the tests for HBase need more than one hour to complete. If you trust the code to be operational, or you are not willing to wait, you can also skip the test phase, adding a command-line switch like so:
$
mvn -DskipTests assembly:assembly
This process will take a few minutes to
complete—and if you have not turned off the test phase, this goes into
the tens of minutes—while creating a target directory in the HBase project home
directory. Once the build completes with a Build Successful
message, you can find the
compiled and packaged tarball archive in the
target directory. With that archive
you can go back to Apache Binary Release and follow the
steps outlined there to install your own, private release on
your servers.
HBase has two run modes: standalone and distributed. Out of the box, HBase runs in standalone mode, as seen in Quick-Start Guide. To set up HBase in distributed mode, you will need to edit files in the HBase conf directory.
Whatever your mode, you may need to edit
conf/hbase-env.sh to tell HBase which java to use. In this file, you set HBase
environment variables such as the heap size and other options for the
JVM, the preferred location for
logfiles, and so on. Set JAVA_HOME
to point at the
root of your java installation.
This is the default mode, as described and used in Quick-Start Guide. In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process. ZooKeeper binds to a well-known port so that clients may talk to HBase.
The distributed mode can be further subdivided into pseudodistributed—all daemons run on a single node—and fully distributed—where the daemons are spread across multiple, physical servers in the cluster.[43]
Distributed modes require an instance of the Hadoop Distributed File System (HDFS). See the Hadoop requirements and instructions for how to set up an HDFS. Before proceeding, ensure that you have an appropriate, working HDFS.
The following subsections describe the different distributed setups. Starting, verifying, and exploring of your install, whether a pseudodistributed or fully distributed configuration, is described in Running and Confirming Your Installation. The same verification script applies to both deploy types.
A pseudodistributed mode is simply a distributed mode that is run on a single host. Use this configuration for testing and prototyping on HBase. Do not use this configuration for production or for evaluating HBase performance.
Once you have confirmed your HDFS setup, edit conf/hbase-site.xml. This is the file into
which you add local customizations and overrides for the default HBase
configuration values (see Appendix A for the full list, and HDFS-Related Configuration). Point HBase at the running Hadoop
HDFS instance by setting the hbase.rootdir
property. For example, adding the following properties to your
hbase-site.xml file says that
HBase should use the /hbase
directory in the HDFS whose name node is at port 9000 on your local
machine, and that it should run with one replica only (recommended for
pseudodistributed mode):
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> ... </configuration>
In the example configuration, the server
binds to localhost
. This means
that a remote client cannot connect. Amend accordingly, if you want
to connect from a remote location.
If all you want to try for now is the pseudodistributed mode, you can skip to Running and Confirming Your Installation for details on how to start and verify your setup. See Chapter 12 for information on how to start extra master and region servers when running in pseudodistributed mode.
For running a fully distributed operation on more than one
host, you need to use the following configurations. In hbase-site.xml, add the hbase.cluster.distributed
property and set
it to true
, and point the HBase
hbase.rootdir
at the appropriate
HDFS name node and location in HDFS where you would like HBase to
write data. For example, if your name node is running at a server with
the hostname namenode.foo.com
on
port 9000 and you want to home your HBase in HDFS at /hbase, use the following
configuration:
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://namenode.foo.com:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> ... </configuration>
In addition, a fully distributed mode requires that you modify the conf/regionservers file. It lists all the hosts on which you want to run HRegionServer daemons. Specify one host per line (this file in HBase is like the Hadoop slaves file). All servers listed in this file will be started and stopped when the HBase cluster start or stop scripts are run.
A distributed HBase depends on a running ZooKeeper
cluster. All participating nodes and clients need to be able to
access the running ZooKeeper ensemble. HBase, by default, manages a
ZooKeeper cluster (which can be as low as a single node) for you. It
will start and stop the ZooKeeper ensemble as part of the HBase
start and stop process. You can also manage the ZooKeeper ensemble
independent of HBase and just point HBase at the cluster it should
use. To toggle HBase management of ZooKeeper, use
the HBASE_MANAGES_ZK
variable in conf/hbase-env.sh.
This variable, which defaults to true
, tells HBase whether to start and
stop the ZooKeeper ensemble servers as part of the start and stop
commands supplied by HBase.
When HBase manages the ZooKeeper ensemble, you can specify
the ZooKeeper configuration using its native zoo.cfg file, or just specify the
ZooKeeper options directly in conf/hbase-site.xml. You can set a
ZooKeeper configuration option as a property in the HBase hbase-site.xml XML configuration file by
prefixing the ZooKeeper option name with hbase.zookeeper.property
. For example, you
can change the clientPort
setting
in ZooKeeper by setting the hbase.zookeeper.property.clientPort
property. For all default values used by HBase,
including ZooKeeper configuration, see Appendix A. Look for the hbase.zookeeper.property
prefix.[44]
If you are using the hbase-site.xml approach to specify all
ZooKeeper settings, you must at least set the ensemble servers with
the hbase.zookeeper.quorum
property. It otherwise defaults to a single ensemble member at
localhost
, which is not suitable
for a fully distributed HBase (it binds to the local machine only
and remote clients will not be able to connect).
For example, in order to have HBase manage a ZooKeeper
quorum on nodes rs{1,2,3,4,5}.foo.com, bound to
port 2222 (the default is 2181), you must ensure that HBASE_MANAGE_ZK
is commented out
or set to true
in conf/hbase-env.sh and then edit conf/hbase-site.xml and set hbase.zookeeper.property.clientPort
and hbase.zookeeper.quorum
. You should also
set hbase.zookeeper.property.dataDir
to
something other than the default, as the default has ZooKeeper
persist data under /tmp, which
is often cleared on system restart. In the following example, we
have ZooKeeper persist to /var/zookeeper:
<configuration> ... <property> <name>hbase.zookeeper.property.clientPort</name> <value>2222</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>rs1.foo.com,rs2.foo.com,rs3.foo.com,rs4.foo.com,rs5.foo.com</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/var/zookeeper</value> </property> ... </configuration>
To point HBase at an existing ZooKeeper cluster, one that
is not managed by HBase, set HBASE_MANAGES_ZK
in conf/hbase-env.sh to
false
:
... # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false
Next, set the ensemble locations and client port, if nonstandard, in hbase-site.xml, or add a suitably configured zoo.cfg to HBase’s CLASSPATH. HBase will prefer the configuration found in zoo.cfg over any settings in hbase-site.xml.
When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part of the regular start/stop scripts. If you would like to run ZooKeeper yourself, independent of HBase start/stop, do the following:
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
Note that you can use HBase in this manner
to spin up a ZooKeeper cluster, unrelated to HBase. Just make sure
to set HBASE_MANAGES_ZK
to
false
if you want it to stay up
across HBase restarts so that when HBase shuts down, it doesn’t take
ZooKeeper down with it.
For more information about running a distinct ZooKeeper cluster, see the ZooKeeper Getting Started Guide. Additionally, see the ZooKeeper wiki, or the ZooKeeper documentation for more information on ZooKeeper sizing.
Now that the basics are out of the way (we’ve looked at all the choices when it comes to selecting the filesystem, discussed the run modes, and fine-tuned the operating system parameters), we can look at how to configure HBase itself. Similar to Hadoop, all configuration parameters are stored in files located in the conf directory. These are simple text files either in XML format arranged as a set of properties, or in simple flat files listing one option per line.
For more details on how to modify your configuration files for specific workloads refer to Configuration.
Configuring an HBase setup entails editing a file with environment variables, named conf/hbase-env.sh, which is used mostly by the shell scripts (see Operating a Cluster) to start or stop a cluster. You also need to add configuration properties to an XML file[45] named conf/hbase-site.xml to, for example, override HBase defaults, tell HBase what filesystem to use, and tell HBase the location of the ZooKeeper ensemble.
When running in distributed mode, after you make an edit to an HBase configuration file, make sure you copy the content of the conf directory to all nodes of the cluster. HBase will not do this for you.
There are many ways to synchronize your configuration files across your cluster. The easiest is to use a tool like rsync. There are many more elaborate ways, and you will see a selection in Deployment.
Just as in Hadoop where you add site-specific HDFS configurations to the hdfs-site.xml file, for HBase, site-specific customizations go into the file conf/hbase-site.xml. For the list of configurable properties, see Appendix A, or view the raw hbase-default.xml source file in the HBase source code at src/main/resources. The doc directory also has a static HTML page that lists the configuration options.
Not all configuration options make it out to hbase-default.xml. Configurations that users would rarely change can exist only in code; the only way to turn up such configurations is to read the source code itself.
The servers always read the hbase-default.xml file first and subsequently merge it with the hbase-site.xml file content—if present. The properties set in hbase-site.xml always take precedence over the default values loaded from hbase-default.xml.
Any modifications in your site file require a cluster restart for HBase to notice the changes.
You set HBase environment variables in this file. Examples include options to pass to the JVM when an HBase daemon starts, such as Java heap size and garbage collector configurations. You also set options for HBase configuration, log directories, niceness, SSH options, where to locate process pid files, and so on. Open the file at conf/hbase-env.sh and peruse its content. Each option is fairly well documented. Add your own environment variables here if you want them read when an HBase daemon is started.
Changes here will require a cluster restart for HBase to notice the change.[46]
This file lists all the known region server names. It is a flat text file that has one hostname per line. The list is used by the HBase maintenance script to be able to iterate over all the servers to start the region server process.
Edit this file to change the rate at which HBase files are rolled and to change the level at which HBase logs messages. Changes here will require a cluster restart for HBase to notice the change, though log levels can be changed for particular daemons via the HBase UI. See Changing Logging Levels for information on this topic, and Analyzing the Logs for details on how to use the logfiles to find and solve problems.
Here is an example configuration for a
distributed 10-node cluster. The nodes are named master.foo.com
, host1.foo.com
, and so on, through node
host9.foo.com
. The HBase Master and
the HDFS name node are running on the node master.foo.com
. Region servers run on nodes
host1.foo.com
to host9.foo.com
. A three-node ZooKeeper ensemble
runs on zk1.foo.com
, zk2.foo.com
, and zk3.foo.com
on the default ports. ZooKeeper
data is persisted to the directory /var/zookeeper. The following subsections
show what the main configuration files—hbase-site.xml, regionservers, and hbase-env.sh—found in the HBase conf directory might look like.
The hbase-site.xml file contains the essential configuration properties, defining the HBase cluster setup.
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>zk1.foo.com,zk2.foo.com,zk3.foo.com</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/var/zookeeper</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://master.foo.com:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
In this file, you list the nodes that will run region
servers. In our example, we run region servers on all but the head
node master.foo.com
, which is
carrying the HBase Master and the HDFS name node.
host1.foo.com host2.foo.com host3.foo.com host4.foo.com host5.foo.com host6.foo.com host7.foo.com host8.foo.com host9.foo.com
Here are the lines that were changed from the default in the supplied hbase-env.sh file. Here we are setting the HBase heap to be 4 GB instead of the default 1 GB:
... # export HBASE_HEAPSIZE=1000 export HBASE_HEAPSIZE=4096 ...
Once you have edited the configuration files, you need to distribute them across all servers in the cluster. One option to copy the content of the conf directory to all servers in the cluster is to use the rsync command on Unix and Unix-like platforms. This approach and others are explained in Deployment.
Configuration discusses the settings you are most likely to change first when you start scaling your cluster.
Since the HBase Master may move around between physical
machines (see Adding a backup master for
details), clients start by requesting the vital information from
ZooKeeper—something visualized in Region Lookups.
For that reason, clients require the ZooKeeper quorum information in an
hbase-site.xml file that is on
their Java CLASSPATH
.
You can also set the hbase.zookeeper.quorum
configuration
key in your code. Doing so would lead to clients that need no external
configuration files. This is explained in Put Method.
If you are configuring an IDE to run an HBase client, you could include the conf/ directory on your classpath. That would make the configuration files discoverable by the client code.
Minimally, a Java client needs the following
JAR files specified in its CLASSPATH
, when connecting
to HBase: hbase, hadoop-core, zookeeper, log4j, commons-logging, and commons-lang. All of these JAR files come
with HBase and are usually postfixed with the a version number of the
required release. Ideally, you use the supplied JARs and do not acquire
them somewhere else because even minor release changes could cause
problems when running the client against a remote HBase cluster.
A basic example hbase-site.xml file for client applications might contain the following properties:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.zookeeper.quorum</name> <value>zk1.foo.com,zk2.foo.com,zk3.foo.com</value> </property> </configuration>
After you have configured HBase, the next thing you need to do is to think about deploying it on your cluster. There are many ways to do that, and since Hadoop and HBase are written in Java, there are only a few necessary requirements to look out for. You can simply copy all the files from server to server, since they usually share the same configuration. Here are some ideas on how to do that. Please note that you would need to make sure that all the suggested selections and adjustments discussed in Requirements have been applied—or are applied at the same time when provisioning new servers.
Using a script-based approach seems archaic compared to the more advanced approaches listed shortly. But they serve their purpose and do a good job for small to even medium-size clusters. It is not so much the size of the cluster but the number of people maintaining it. In a larger operations group, you want to have repeatable deployment procedures, and not deal with someone having to run scripts to update the cluster.
The scripts make use of the fact that the regionservers configuration file has a list of all servers in the cluster. Example 2-2 shows a very simple script that could be used to copy a new release of HBase from the master node to all slave nodes.
#!/bin/bash # Rsyncs HBase files across all slaves. Must run on master. Assumes # all files are located in /usr/local if [ "$#" != "2" ]; then echo "usage: $(basename $0) <dir-name> <ln-name>" echo " example: $(basename $0) hbase-0.1 hbase" exit 1 fi SRC_PATH="/usr/local/$1/conf/regionservers" for srv in $(cat $SRC_PATH); do echo "Sending command to $srv..."; rsync -vaz --exclude='logs/*' /usr/local/$1 $srv:/usr/local/ ssh $srv "rm -fR /usr/local/$2 ; ln -s /usr/local/$1 /usr/local/$2" done echo "done."
Another simple script is shown in Example 2-3; it can be used to copy the configuration files of HBase from the master node to all slave nodes. It assumes you are editing the configuration files on the master in such a way that the master can be copied across to all region servers.
#!/bin/bash # Rsync's HBase config files across all region servers. Must run on master. for srv in $(cat /usr/local/hbase/conf/regionservers); do echo "Sending command to $srv..."; rsync -vaz --delete --exclude='logs/*' /usr/local/hadoop/ $srv:/usr/local/hadoop/ rsync -vaz --delete --exclude='logs/*' /usr/local/hbase/ $srv:/usr/local/hbase/ done echo "done."
The second script uses rsync just like the first script, but adds the --delete option to make sure the region servers do not have any older files remaining but have an exact copy of what is on the originating server.
There are obviously many ways to do this, and the preceding examples are simply for your perusal and to get you started. Ask your administrator to help you set up mechanisms to synchronize the configuration files appropriately. Many beginners in HBase have run into a problem that was ultimately caused by inconsistent configurations among the cluster nodes. Also, do not forget to restart the servers when making changes. If you want to update settings while the cluster is in production, please refer to Rolling Restarts.
Recently, we have seen an increase in the number of users who want to run their cluster in dynamic environments, such as the public cloud offerings by Amazon’s EC2, or Rackspace Cloud Servers, as well as in private server farms, using open source tools like Eucalyptus.
The advantage is to be able to quickly provision servers and run analytical workloads and, once the result has been retrieved, to simply shut down the entire cluster, or reuse the servers for other dynamic loads. Since it is not trivial to program against each of the APIs providing dynamic cluster infrastructures, it would be useful to abstract the provisioning part and, once the cluster is operational, simply launch the MapReduce jobs the same way you would on a local, static cluster. This is where Apache Whirr comes in.
Whirr—available at http://incubator.apache.org/whirr/[47]—has support for a variety of public and private cloud APIs and allows you to provision clusters running a range of services. One of those is HBase, giving you the ability to quickly deploy a fully operational HBase cluster on dynamic setups.
You can download the latest Whirr release from the aforementioned site and find preconfigured configuration files in the recipes directory. Use it as a starting point to deploy your own dynamic clusters.
The basic concept of Whirr is to use very simple machine images that already provide the operating system (see Operating system) and SSH access. The rest is handled by Whirr using services that represent, for example, Hadoop or HBase. Each service executes every required step on each remote server to set up the user accounts, download and install the required software packages, write out configuration files for them, and so on. This is all highly customizable and you can add extra steps as needed.
Similar to Whirr, there are other deployment frameworks for dedicated machines. Puppet by Puppet Labs and Chef by Opscode are two such offerings.
Both work similar to Whirr in that they have a central provisioning server that stores all the configurations, combined with client software, executed on each server, which communicates with the central server to receive updates and apply them locally.
Also similar to Whirr, both have the notion of recipes, which essentially translate to scripts or commands executed on each node.[48] In fact, it is quite possible to replace the scripting employed by Whirr with a Puppet- or Chef-based process.
While Whirr solely handles the bootstrapping, Puppet and Chef have further support for changing running clusters. Their master process monitors the configuration repository and, upon updates, triggers the appropriate remote action. This can be used to reconfigure clusters on-the-fly or push out new releases, do rolling restarts, and so on. It can be summarized as configuration management, rather than just provisioning.
You heard it before: select an approach you like and maybe even are familiar with already. In the end, they achieve the same goal: installing everything you need on your cluster nodes. If you need a full configuration management solution with live updates, a Puppet- or Chef-based approach—maybe in combination with Whirr for the server provisioning—is the right choice.
Now that you have set up the servers, configured the operating system and filesystem, and edited the configuration files, you are ready to start your HBase cluster for the first time.
Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-dfs.sh over in the HADOOP_HOME directory. You can ensure that it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the MapReduce daemons. You only need to start them for actual MapReduce jobs, something we will look into in detail in Chapter 7.
If you are managing your own ZooKeeper, start it and confirm that it is running: otherwise, HBase will start up ZooKeeper for you as part of its start process.
Just as you started the standalone mode in Quick-Start Guide, you start a fully distributed HBase with the following command:
bin/start-hbase.sh
Run the preceding command from the HBASE_HOME directory. You should now have a running HBase instance. The HBase logfiles can be found in the logs subdirectory. If you find that HBase is not working as expected, please refer to Analyzing the Logs for help finding the problem.
Once HBase has started, see for information on how to create tables, add data, scan your insertions, and finally, disable and drop your tables.
HBase also starts a web-based user interface (UI) listing vital
attributes. By default, it is deployed on the master host at port 60010
(HBase region servers use 60030 by default). If the master is running on
a host named master.foo.com
on the
default port, to see the master’s home page you can point your browser
at http://master.foo.com:60010
. Figure 2-2 is an example of how the resultant page should
look. You can find a more detailed explanation in Web-based UI.
From this page you can access a variety of status information about your HBase cluster. The page is separated into multiple sections. The top part has the attributes pertaining to the cluster setup. You can see the currently running tasks—if there are any. The catalog and user tables list details about the available tables. For the user table you also see the table schema.
The lower part of the page has the region servers table, giving you access to all the currently registered servers. Finally, the region in transition list informs you about regions that are currently being maintained by the system.
After you have started the cluster, you should verify that all the region servers have registered themselves with the master and appear in the appropriate table with the expected hostnames (that a client can connect to). Also verify that you are indeed running the correct version of HBase and Hadoop.
You already used the command-line shell that comes with HBase when you went through Quick-Start Guide. You saw how to create a table, add and retrieve data, and eventually drop the table.
The HBase Shell is (J)Ruby’s IRB with some HBase-related commands added. Anything you can do in IRB, you should be able to do in the HBase Shell. You can start the shell with the following command:
$
$HBASE_HOME/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.91.0-SNAPSHOT, r1130916, Sat Jul 23 12:44:34 CEST 2011 hbase(main):001:0>
Type help
and then press Return to see a
listing of shell commands and options. Browse at least the paragraphs at
the end of the help text for the gist of how variables and command
arguments are entered into the HBase Shell; in particular, note how
table names, rows, and columns, must be quoted. Find the full
description of the shell in Shell.
Since the shell is JRuby-based, you can mix Ruby with HBase commands, which enables you to do things like this:
hbase(main):001:0>
create 'testtable', 'colfam1'
hbase(main):002:0>
for i in 'a'..'z' do for j in 'a'..'z' do
put 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
The first command is creating a new table
named testtable
, with one column
family called colfam1
, using default
values (see Column Families for what that means). The
second command uses a Ruby loop to create rows with columns in the newly
created tables. It creates row keys starting with row-aa
, row-ab
, all the way to row-zz
.
To stop HBase, enter the following command. Once you have started the script, you will see a message stating that the cluster is being stopped, followed by “.” (period) characters printed in regular intervals (just to indicate that the process is still running, not to give you any percentage feedback, or some other hidden meaning):
$
./bin/stop-hbase.sh
stopping hbase...............
Shutdown can take several minutes to complete. It can take longer if your cluster is composed of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.
Chapter 12 has more on advanced administration tasks—for example, how to do a rolling restart, add extra master nodes, and more. It also has information on how to analyze and fix problems when the cluster does not start, or shut down.
[27] See “Multi-core processor” on Wikipedia.
[30] This assumes 100 IOPS per drive, and 100 MB/second per drive.
[31] DistroWatch has a list of popular Linux and Unix-like operating systems and maintains a ranking by popularity.
[32] See this post on the Ars Technica website. Google hired the main developer of ext4, Theodore Ts’o, who announced plans to keep working on ext4 as well as other Linux kernel features.
[33] See CHANGES.txt in branch-0.20-append to see a list of patches involved in adding append on the Hadoop 0.20 branch.
[34] This is very likely to change after this book is printed. Consult with the online configuration guide for the latest details; especially the section on Hadoop.
[35] A useful document on setting configuration values on your Hadoop cluster is Aaron Kimball’s “Configuration Parameters: What can you just ignore?”.
[36] See “Uniform Resource Identifier” on Wikipedia.
[37] A full list was compiled by Tom White in his post “Get to Know Hadoop Filesystems”.
[38] See “Amazon S3” for more background information.
[40] https://issues.apache.org/jira/browse/HBASE?report=com.atlassian.jira.plugin.system.project:changelog-panel.
[41] https://issues.apache.org/jira/browse/HBASE?report=com.atlassian.jira.plugin.system.project:changelog-panel#selectedTab=com.atlassian.jira.plugin.system.project%3Achangelog-panel.
[42] Processes that are started and then run in the background to perform their task are often referred to as daemons.
[43] The pseudodistributed versus fully distributed nomenclature comes from Hadoop.
[44] For the full list of ZooKeeper configurations, see ZooKeeper’s zoo.cfg. HBase does not ship with that file, so you will need to browse the conf directory in an appropriate ZooKeeper download.
[45] Be careful when editing XML. Make sure you close all elements. Check your file using a tool like xmlint, or something similar, to ensure well-formedness of your document after an edit session.
[46] As of this writing, you have to restart the server. However, work is being done to enable online schema and configuration changes, so this will change over time.
[47] Please note that Whirr is still part of the incubator program of the Apache Software Foundation. Once it is accepted and promoted to a full member, its URL is going to change to a permanent place.
[48] Some of the available recipe packages are an adaption of early EC2 scripts, used to deploy HBase to dynamic, cloud-based server. For Chef, you can find HBase-related examples at http://cookbooks.opscode.com/cookbooks/hbase. For Puppet, please refer to http://hstack.org/hstack-automated-deployment-using-puppet/ and the repository with the recipes at http://github.com/hstack/puppet.
18.225.55.193