Chapter 10. Persistence Using Apache Accumulo

In this chapter, we will cover:

  • Designing a row key to store geographic events in Accumulo
  • Using MapReduce to bulk import geographic event data into Accumulo
  • Setting a custom field constraint for inputting geographic event data in Accumulo
  • Limiting query results using the regex filtering iterator
  • Counting fatalities for different versions of the same key using SumCombiner
  • Enforcing cell-level security on scans using Accumulo
  • Aggregating sources in Accumulo using MapReduce

Introduction

Storage of big data is a topic of ever-increasing popularity. Software projects facing concerns over data scalability frequently find themselves having to shell out top dollar for expensive RDBMS commercial licenses, or worse, having to rely on solutions in which scalability was an afterthought. In the last couple of years, we have seen the introduction of many viable open source database solutions to help manage massive amounts of structured and unstructured data. Apache Accumulo was inspired by the Google BigTable design approach, and offers scalable, distributed columnar persistence of data backed over Apache Hadoop. The Google BigTable design is explained in detail at http://research.google.com/archive/bigtable.html. This chapter will show several recipes that tackle common database query/load tasks, and also shows how many of Accumulo's unique features help to streamline the implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.27.131