There are more choices to install HBase than using the Apache releases. Here we list what is available alternatively.
Cloudera’s Distribution including Apache Hadoop (hereafter CDH) is based on the most recent stable version of Apache Hadoop with numerous patches, backports, and updates. Cloudera makes the distribution available in a number of different formats: source and binary tar files, RPMs, Debian packages, VMware images, and scripts for running CDH in the cloud. CDH is free, released under the Apache 2.0 license and available at http://www.cloudera.com/hadoop/.
To simplify deployment, Cloudera hosts packages on public yum and apt repositories. CDH enables you to install and configure Hadoop, and HBase, on each machine using a single command. Kickstart users can commission entire Hadoop clusters without manual intervention.
CDH manages cross-component versions and provides a stable platform with a compatible set of packages that work together. As of CDH3, the following packages are included, many of which are covered elsewhere in this book:
Self-healing distributed filesystem
Powerful, parallel data processing framework
A set of utilities that support the Hadoop subprojects
Hadoop database for random read/write access
SQL-like queries and tables on large data sets
Dataflow language and compiler
Workflow for interdependent Hadoop jobs
Integrates databases and data warehouses with Hadoop
Highly reliable, configurable streaming data collection
Coordination service for distributed applications
User interface framework and SDK for visual Hadoop applications
Library for running Hadoop, and HBase, in the cloud
In regard to HBase, CDH solves the issue of running a truly reliable cluster setup, as it has all the required HDFS patches to enable durability. The Hadoop project itself has no officially supported release in the 0.20.x family that has the required additions to guarantee that no data is lost in case of a server crash.
To download CDH, visit http://www.cloudera.com/downloads/.
3.141.2.157