Letter to the HBase Community

Before we examine the current situation, please allow me to flash back a few years and look at the beginnings of HBase.

In 2007, when I was faced with using a large, scalable data store at literally no cost—because the project’s budget would not allow it—only a few choices were available. You could either use one of the free databases, such as MySQL or PostgreSQL, or a pure key/value store like Berkeley DB. Or you could develop something on your own and open up the playing field—which of course only a few of us were bold enough to attempt, at least in those days.

These solutions might have worked, but one of the major concerns was scalability. This feature wasn’t well developed and was often an afterthought to the existing systems. I had to store billions of documents, maintain a search index on them, and allow random updates to the data, while keeping index updates short. This led me to the third choice available that year: Hadoop and HBase.

Both had a strong pedigree, and they came out of Google, a Valhalla of the best talent that could be gathered when it comes to scalable systems. My belief was that if these systems could serve an audience as big as the world, their underlying foundations must be solid. Thus, I proposed to built my project with HBase (and Lucene, as a side note).

Choices were easy back in 2007. But as we flash forward through the years, the playing field grew, and we saw the advent of many competing, or complementing, solutions. The term NoSQL was used to group the increasing number of distributed databases under a common umbrella. A long and sometimes less-than-useful discussion arose around that name alone; to me, what mattered was that the available choices increased rapidly.

The next attempt to frame the various nascent systems was based on how their features compared: strongly consistent versus eventual consistent models, which were built to fulfill specific needs. People again tried to put HBase and its peers into this perspective: for example, using Eric Brewer’s CAP theorem. And yet again a heated discussion ensued about what was most important: being strongly consistent or being able to still serve data despite catastrophic, partial system failures.

And as before, to me, it was all about choices—but I learned that you need to fully understand a system before you can use it. It’s not about slighting other solutions as inferior; today we have a plentiful selection, with overlapping qualities. You have to become a specialist to distinguish them and make the best choice for the problem at hand.

This leads us to HBase and the current day. Without a doubt, its adoption by well-known, large web companies has raised its profile, proving that it can handle the given use cases. These companies have an important advantage: they employ very skilled engineers. On the other hand, a lot of smaller or less fortunate companies struggle to come to terms with HBase and its applications. We need someone to explain in plain, no-nonsense terms how to build easily understood and reoccurring use cases on top of HBase.

How do you design the schema to store complex data patterns, to trade between read and write performance? How do you lay out the data’s access patterns to saturate your HBase cluster to its full potential? Questions like these are a dime a dozen when you follow the public mailing lists. And that is where Amandeep and Nick come in. Their wealth of real-world experience at making HBase work in a variety of use cases will help you understand the intricacies of using the right data schema and access pattern to successfully build your next project.

What does the future of HBase hold? I believe it holds great things! The same technology is still powering large numbers of products and systems at Google, naysayers of the architecture have been proven wrong, and the community at large has grown into one of the healthiest I’ve ever been involved in. Thank you to all who have treated me as a fellow member; to those who daily help with patches and commits to make HBase even better; to companies that willingly sponsor engineers to work on HBase full time; and to the PMC of HBase, which is the absolutely most sincere group of people I have ever had the opportunity know—you rock.

And finally a big thank-you to Nick and Amandeep for writing this book. It contributes to the value of HBase, and it opens doors and minds. We met before you started writing the book, and you had some concerns. I stand by what I said then: this is the best thing you could have done for HBase and the community. I, for one, am humbled and proud to be part of it.

LARS GEORGE

HBASE COMMITTER

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.69.53