Some advance topics in brief

In this section, we will discuss some advanced topics useful for developers that will enable them to interact with HBase more closely.

Coprocessors

Coprocessors are similar to Linux kernel modules. They provide a way to run server-level code against locally stored data. This provides a very powerful functionality. It runs in the process on each RegionServer. All the regions contain references to the coprocessor implementation classes associated. It can be loaded either from local JAR files on the RegionServer class path or through the HDFS class loader. These are not designed to be used by the users of HBase but by developers who add additional functionalities to HBase. These can be used for server-side operations such as region splits, major compactions, and client-side operations such as create, read, update, and delete operations, and also can be used to implement a custom use case such as user-defined functionalities.

Types of coprocessors

The following are the types of coprocessors:

  • Coprocessor: This provides region life cycle management such as region open, close, split, flush, compact operations, and so on.
  • RegionObserver: This provides a hook for monitoring table operations from the client side such as table get, put, scan, delete, and so on.
  • Endpoint: This provides on-demand triggers for arbitrary functions to be executed at a region. For example, column aggregation at RegionServer.

Bloom filters

The bloom filters are a special kind of filter that are used when there is a lot of data to be avoided while scanning, and are also to skip internal data lookup to speed up the scanning process. This enables us to discard the data that we do not need. These are stored in the metadata of HFiles when it is written and then never needed to be updated as HFiles are immutable. These filters implement folding to keep the size down and combinatorial generation to speed up their creation. When an HFile is opened during deployment of regions to a RegionServer, the bloom filter is loaded into the memory.

Note

The full internal architecture and implementation can be found at https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf.

The Lily project

You can find the following definition at http://www.lilyproject.org/lily/index.html:

"Lily is a data management platform combining planet-sized data storage, indexing and search with on-line, real-time usage tracking, audience analytics and content recommendations. It's a one-stop-platform for any organization confronted with Big Data challenges that seeks rapid implementation, rock-solid performance at scale, and efficiency at management."

"Lily unifies Apache HBase, Hadoop, and Solr into a comprehensively integrated, interactive data platform with easy-to-use access APIs; a high-level data model and schema language; flexible, real-time indexing; and the expressive search power of Apache Solr. Best of all, Lily is open source, allowing anyone to explore and learn what Lily can do."

Features

Lily provides the following features:

  • Easy to use through a high-level schema supporting rich and mixed, structured and unstructured data sets
  • It is developer-friendly, powerful, and expressive REST and Java API
  • A flexible, configurable indexing system, supporting real-time indexing into Solr

Note

The documentation to configure, install, and get started with it can be found at http://docs.ngdata.com/lily-docs-current/414-lily.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.182.62