Services of HBase

The HBase data model terminology is listed as follows:

  • Table
  • Row
  • Column family
  • Column
  • Cell

Let's have a look at each of them in detail.

Row key

This is a unique key for each record in an HBase table. It is represented as a byte array internally. No matter what data (string, long, date, or serialized) we choose as the row key, internally, on the disk, or in memory, it will be converted to byte arrays, and then stored. For example, Emp_ID can be the row key for an employee table.

Column family

This entity of an HBase table groups different columns of the table. Suppose we have columns such as name, dob, salary, city, phone, pin, and landmark in an employee table. We can group these columns as Basic_Detail(name, dob, salary) and Address(city, phone, pin, landmark) as two column families. The benefit is that you can retrieve the columns faster as column families are stored separately in HBase on the disk.

Column

Each field in a row is called a column family. We can have columns such as name, dob, salary, city, phone, and pin in an employee table.

Cell

These are the smallest or basic units of storage inside a column where the actual value of a field is stored. Cells can be accessed using the <row, column family:column,version> tuple. The default version is 1.

Version

HBase is able to maintain more than one value for a cell of tuple (row, column family, and column), which is called the version of a record. The version is specified in long integers and based on a timestamp. By default, HBase keeps three versions of records. However, we can change it to the number of versions we need. For example, if we have frequent data change and need to retain previous values too, we can have versioning. Fetching the value form HBase gives the latest value, and we can get the specific version by specifying it.

Timestamp

With every insertion of data, the current timestamp becomes associated with the value. This denotes when the specific value was inserted into a table.

We can visualize the version and timestamp in the following diagram:

Timestamp

So, for each version of record, we have a timestamp attached to it, and we can have more than one version or copy of a record in an HBase table. If you want to save space, you can set the version to 1; if you want the number of previous records, you can set the value to 3. Once the maximum version is reached, the earliest record is overwritten.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.98.250