Deciding the number of the version

Now, let's discuss the maximum and minimum numbers of a version we can consider for a table. This information is maintained using HColumnDescriptor, which contains information about a column family, such as the number of versions and compression settings. This acts as input for table creation and addition of columns.

Lower bound of versions

The default lower bound of version is 0, which means it is disabled. The minimum number of rows a version uses is in conjunction with Time To Live (TTL), and we can have 0 or more for a version, according to the requirements of the use case. Using 0 for version will prevent the writing of more than one value to the cell.

Upper bound of versions

The default upper bound for a version is 3, which keeps three copies (inserted on the basis of a timestamp) of a row. It is advised that the maximum number should not be very large as it is storage-centric too. So, more or less 100 can be thought of as the upper bound, which is not a hard limit, as we can go with bigger numbers too. The maximum number of version is solely based on the use case data to be stored in an HBase table.

Once the maximum limit of the version is reached, and if we try to insert any new data, the latest value will be overwritten and we will get the latest inserted value plus the previously maintained version.

Keeping the value very high will drastically increase the size of a store file (if all the cells contain value), leading to the requirement of more storage and overhead on reading the store file too.

On HBase shell, we can define it as follows:

hbase>create 'Tablewithversion', {NAME => 'colFamily1', VERSIONS => 50}

The preceding command will create a table that will maintain 50th version of previous data for a row in a table. Suppose we need to change the version already defined, we can use the following command:

hbase>alter <Tablewithversion> {NAME =>'colFamily1', VERSIONS => 100}

The preceding command will change the number of version from 50 to 100.

Note

This version feature of HBase can be used as a data-retention technique of HBase where we can use more versions to keep the history data. The option-defining TTL is also a method to keep the data up to a certain point in time; the TTL will keep the specific data until the timestamp is defined. Using these two features, we can have historical data for a table. It is just table based, so we can have different versions for different tables according to the requirements of data stored in the table.

When TTL expires, the whole data will be deleted and no version will be available, so we need to choose TTL so that the table data is not marked to be deleted after a specific timestamp. In the newer version of HBase, we have an option to define versions to be kept even after the time is expired or TTL is overpassed.

For instance, have a look at the following:

keep <specific number of version of data>

The maximum number of data copies can be the number of versions we define.

Keep the data till <TTL>. This will keep the data till <TTL> (time to live) expires. In newer version of HBase, data remains there even after TTL is expired.

If we need to keep the deleted value and not remove it from the table, we can define it as follows:

hbase>KEEP_DELETED_CELLS=>true

This is done using HBase shell, and if we need to do it through Java code API, we can do so as follows:

HColumnDescriptor.setKeepDeletedCells(true)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.124.177