Foreword

At a high level, HBase is like the atomic bomb. Its basic operation can be explained on the back of a napkin over a drink (or two). Its deployment is another matter.

HBase is composed of multiple moving parts. The distributed HBase application is made up of client and server processes. Then there is the Hadoop Distributed File System (HDFS) to which HBase persists. HBase uses yet another distributed system, Apache ZooKeeper, to manage its cluster state. Most deployments throw in Map-Reduce to assist with bulk loading or running distributed full-table scans. It can be tough to get all the pieces pulling together in any approximation of harmony.

Setting up the proper environment and configuration for HBase is critical. HBase is a general data store that can be used in a wide variety of applications. It ships with defaults that are conservatively targeted at a common use case and a generic hardware profile. Its ergonomic ability—its facility for self-tuning—is still under development, so you have to match HBase to the hardware and loading, and this configuration can take a couple of attempts to get right.

But proper configuration isn’t enough. If your HBase data-schema model is out of alignment with how the data store is being queried, no amount of configuration can compensate. You can achieve huge improvements when the schema agrees with how the data is queried. If you come from the realm of relational databases, you aren’t used to modeling schema. Although there is some overlap, making a columnar data store like HBase hum involves a different bag of tricks from those you use to tweak, say, MySQL.

If you need help with any of these dimensions, or with others such as how to add custom functionality to the HBase core or what a well-designed HBase application should look like, this is the book for you. In this timely, very practical text, Amandeep and Nick explain in plain language how to use HBase. It’s the book for those looking to get a leg up in deploying HBase-based applications.

Nick and Amandeep are the lads to learn from. They’re both long-time HBase practitioners. I recall the time Amandeep came to one of our early over-the-weekend Hackathons in San Francisco—a good many years ago now—where a few of us huddled around his well-worn ThinkPad trying to tame his RDF on an early version of an HBase student project.

He has been paying the HBase community back ever since by helping others on the project mailing lists. Nick showed up not long after and has been around the HBase project in one form or another since that time, mostly building stuff on top of it. These boys have done the HBase community a service by taking the time out to research and codify their experience in a book.

You could probably get by with this text and an HBase download, but then you’d miss out on what’s best about HBase. A functional, welcoming community of developers has grown up around the HBase project and is all about driving the project forward. This community is what we—members such as myself and the likes of Amandeep and Nick—are most proud of. Although some big players contribute to HBase’s forward progress—Facebook, Huawei, Cloudera, and Salesforce, to name a few—it’s not the corporations that make a community. It’s the participating individuals who make HBase what it is. You should consider joining us. We’d love to have you.

MICHAEL STACK

CHAIR OF THE APACHE HBASE

PROJECT MANAGEMENT COMMITTEE

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.189.220