As the chapter name implies, this chapter is an in-depth discussion on the storage and structure layout of HBase. It will also cover data models and their operations in HBase. We will look at some important topics such as tables, columns, column families, cells, and metadata in HBase. The chapter will end with a section that is based on schema designing, and it will cover types of table design and its benefits.
In this chapter, we will discuss the following topics:
Let's get started with a conceptual and physical view of data stored in HBase tables. Then, we will discuss the various components of HBase storage.
HBase is not very relational design centric, but it is open to a more flexible design, based on a user's requirements, which enables the user to have a more flexible and scalable table layout. It provides a single index facility on row keys, which is called the primary key in the relational world. We can avoid very large read-and-write operations in HBase by dividing rows into column families and columns, and this supports both horizontal and vertical scaling of tables.
An HBase table consists of the following components:
So, we can think of rows consisting of a column family, a column family is made up of columns, and the columns are made up of cells. The data in a table is accessed using row keys.
We can give any name to the row key (but we have some suggested parameters for row key design, which we will discuss later). When we name a column family, it should be logical to group columns. Column qualifiers are specified as follows:
<columnFamily>:<columnName>
We are now aware of the data model in HBase. Let's move forward to explore the data types in HBase.
There are no fancy data types such as String
, INT
, or Long
in HBase; it's all byte array. It's a kind of byte-in and byte-out database, wherein, when a value is inserted, it is converted into a byte array using the Put
and Result
interfaces. HBase implicitly converts the data to byte arrays through a serialization framework, stores it into the cell, and also gives out byte arrays. It implicitly converts data to equivalent representation while putting and getting the value.
So, in short, we can say that HBase cells only hold byte arrays. Put
and Result
methods handle encoding and decoding of objects.
Anything that can be converted into bytes, from a simple string to an image file, can be stored in HBase, but it too is converted into bytes and can then be stored (or as long as it's a serializable object). We can have values up to 10 to 15 MB stored in an HBase cell. If any value is bigger, we need not store it into HBase, what we can do is store the file on HDFS and then store the filepath in HBase. It is not advisable to convert a huge file or value into byte arrays and store it in HBase; however, HDFS can be used to host files with an underlying distribution and file metadata into an HBase table.
HBase provides APIs that serialize and deserialize different data to be put into an HBase table and fetched from an HBase table. We will see this in Java coding for HBase in Chapter 8, Coding HBase in Java, and Chapter 9, Advance Coding in Java for HBase.
18.118.189.251