Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Calculating the data size stored in HBase

In the case of any database, whether it is RDBMS or NoSQL, we always need to find out the record size in order to plan the storage size needed, or to in order do a capacity planning. Even a few bytes per record might bring drastic changes to the data storage size that we estimate. For example, suppose we have one extra byte attached to each record, and we have around one billion records, and this extra byte requires around 1 GB of storage space on the disk.

Now, let's consider this data size calculation in case of HBase. Let's consider a table named employee, where we have fields such as the row key, the column family, the column, and the value. In HBase, each value is stored as fully qualified, so for each column of a record, it is accompanied with the row key we assign. So, let's now consider the space requirement.

As HBase stores data in the key-value format, let's now do the approximation. We will consider the row key as student1.

Key size	Value size	Row size	Row data	Col fam size	Col fam data	Column size	Timestamp	Key type	Actual value
Int (4)	Int(4)	Short(2)	Byte array	Byte (1)	Byte array	Byte array	Long (8)	Byte (1)	Byte array

Let's calculate the requirement of fixed size, which is 4 + 4 + 2 + 1 + 8 + 1 and equals 20 bytes. For other parts, we need to calculate the byte array sizes of the different values, so the total size is Total = fixed size + variable size.

Suppose we have one billion records, then the total size will be around 40 bytes * one billion = 40 billion bytes, which will be around 40 GB, and therefore, we can calculate according to the number of columns and rows in HBase. There is the option of compression in an HBase table, using which we can minimize the requirements of storage drastically.

We can implement compression while creating the table, as follows:

hbase>create 'tableWithCompression',{ NAME =>'colFam',COMPRESSION =>'SNAPPY'}

This will implement the Snappy compression algorithm on the records inserted in an HBase table. There are also other compression algorithms we can use as Snappy, such as LZF, LZO, and ZLIB.

Some benchmarking on the use of algorithm follows, so use of algorithms should be decided accordingly. Have a look at the following table:

Algorithm	IO performance	Compression ration achieved
ZLIB	Performance degraded	Best compression provided around (45 percent to 50 percent)
LZO	Around 4 percent to 6 percent	Around 41 percent to 45 percent
LZF	Around 20 percent to 22 percent)	Around 38 percent to 40 percent
Snappy	Around 24 percent to 28 percent)	Around 38 percent to 41 percent

Also, the compression depends on the type of data present in the table, so compression ration should be accordingly selected. Suppose we need more compression but less performance, we can always go with ZLIB, and if we need performance with an average compression, we can choose Snappy or whichever suits our data in the table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Calculating the data size stored in HBase

Create new playlist

Sign In

Sign Up

Calculating the data size stored in HBase

Table of Contents for
Calculating the data size stored in HBase