Appendix D. Distributions

There are more choices to install HBase than using the Apache releases. Here we list what is available alternatively.

Cloudera’s Distribution Including Apache Hadoop

Cloudera’s Distribution including Apache Hadoop (hereafter CDH) is based on the most recent stable version of Apache Hadoop with numerous patches, backports, and updates. Cloudera makes the distribution available in a number of different formats: source and binary tar files, RPMs, Debian packages, VMware images, and scripts for running CDH in the cloud. CDH is free, released under the Apache 2.0 license and available at http://www.cloudera.com/hadoop/.

To simplify deployment, Cloudera hosts packages on public yum and apt repositories. CDH enables you to install and configure Hadoop, and HBase, on each machine using a single command. Kickstart users can commission entire Hadoop clusters without manual intervention.

CDH manages cross-component versions and provides a stable platform with a compatible set of packages that work together. As of CDH3, the following packages are included, many of which are covered elsewhere in this book:

HDFS

Self-healing distributed filesystem

MapReduce

Powerful, parallel data processing framework

Hadoop Common

A set of utilities that support the Hadoop subprojects

HBase

Hadoop database for random read/write access

Hive

SQL-like queries and tables on large data sets

Pig

Dataflow language and compiler

Oozie

Workflow for interdependent Hadoop jobs

Sqoop

Integrates databases and data warehouses with Hadoop

Flume

Highly reliable, configurable streaming data collection

ZooKeeper

Coordination service for distributed applications

Hue

User interface framework and SDK for visual Hadoop applications

Whirr

Library for running Hadoop, and HBase, in the cloud

In regard to HBase, CDH solves the issue of running a truly reliable cluster setup, as it has all the required HDFS patches to enable durability. The Hadoop project itself has no officially supported release in the 0.20.x family that has the required additions to guarantee that no data is lost in case of a server crash.

To download CDH, visit http://www.cloudera.com/downloads/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.2.157