The HBase data model terminology is listed as follows:
Let's have a look at each of them in detail.
This is a unique key for each record in an HBase table. It is represented as a byte array internally. No matter what data (string, long, date, or serialized) we choose as the row key, internally, on the disk, or in memory, it will be converted to byte arrays, and then stored. For example, Emp_ID
can be the row key for an employee
table.
This entity of an HBase table groups different columns of the table. Suppose we have columns such as name
, dob
, salary
, city
, phone
, pin
, and landmark
in an employee
table. We can group these columns as Basic_Detail(name, dob, salary)
and Address(city, phone, pin, landmark)
as two column families. The benefit is that you can retrieve the columns faster as column families are stored separately in HBase on the disk.
Each field in a row is called a column family. We can have columns such as name
, dob
, salary
, city
, phone
, and pin
in an employee
table.
These are the smallest or basic units of storage inside a column where the actual value of a field is stored. Cells can be accessed using the <row, column family:column,version>
tuple. The default version is 1
.
HBase is able to maintain more than one value for a cell of tuple (row, column family, and column), which is called the version of a record. The version is specified in long integers and based on a timestamp. By default, HBase keeps three versions of records. However, we can change it to the number of versions we need. For example, if we have frequent data change and need to retain previous values too, we can have versioning. Fetching the value form HBase gives the latest value, and we can get the specific version by specifying it.
With every insertion of data, the current timestamp becomes associated with the value. This denotes when the specific value was inserted into a table.
We can visualize the version and timestamp in the following diagram:
So, for each version of record, we have a timestamp attached to it, and we can have more than one version or copy of a record in an HBase table. If you want to save space, you can set the version to 1
; if you want the number of previous records, you can set the value to 3
. Once the maximum version is reached, the earliest record is overwritten.
3.22.74.160