Chapter 5. The Storage, Structure Layout, and Data Model of HBase

As the chapter name implies, this chapter is an in-depth discussion on the storage and structure layout of HBase. It will also cover data models and their operations in HBase. We will look at some important topics such as tables, columns, column families, cells, and metadata in HBase. The chapter will end with a section that is based on schema designing, and it will cover types of table design and its benefits.

In this chapter, we will discuss the following topics:

  • A data model of HBase
  • Namespaces
  • Data model commands
  • Versioning of records
  • Row key design tips
  • Schema designing basics

Let's get started with a conceptual and physical view of data stored in HBase tables. Then, we will discuss the various components of HBase storage.

The Storage, Structure Layout, and Data Model of HBase

HBase is not very relational design centric, but it is open to a more flexible design, based on a user's requirements, which enables the user to have a more flexible and scalable table layout. It provides a single index facility on row keys, which is called the primary key in the relational world. We can avoid very large read-and-write operations in HBase by dividing rows into column families and columns, and this supports both horizontal and vertical scaling of tables.

An HBase table consists of the following components:

  • Row
    • Column family
      • Column
        • Cell

So, we can think of rows consisting of a column family, a column family is made up of columns, and the columns are made up of cells. The data in a table is accessed using row keys.

We can give any name to the row key (but we have some suggested parameters for row key design, which we will discuss later). When we name a column family, it should be logical to group columns. Column qualifiers are specified as follows:

<columnFamily>:<columnName>

We are now aware of the data model in HBase. Let's move forward to explore the data types in HBase.

Data types in HBase

There are no fancy data types such as String, INT, or Long in HBase; it's all byte array. It's a kind of byte-in and byte-out database, wherein, when a value is inserted, it is converted into a byte array using the Put and Result interfaces. HBase implicitly converts the data to byte arrays through a serialization framework, stores it into the cell, and also gives out byte arrays. It implicitly converts data to equivalent representation while putting and getting the value.

So, in short, we can say that HBase cells only hold byte arrays. Put and Result methods handle encoding and decoding of objects.

Anything that can be converted into bytes, from a simple string to an image file, can be stored in HBase, but it too is converted into bytes and can then be stored (or as long as it's a serializable object). We can have values up to 10 to 15 MB stored in an HBase cell. If any value is bigger, we need not store it into HBase, what we can do is store the file on HDFS and then store the filepath in HBase. It is not advisable to convert a huge file or value into byte arrays and store it in HBase; however, HDFS can be used to host files with an underlying distribution and file metadata into an HBase table.

HBase provides APIs that serialize and deserialize different data to be put into an HBase table and fetched from an HBase table. We will see this in Java coding for HBase in Chapter 8, Coding HBase in Java, and Chapter 9, Advance Coding in Java for HBase.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.189.251