Storing data in HBase – logical view versus actual physical view

At a conceptual level, an HBase table can be seen as a sparse set of rows, but in actual storage, it is stored as per a column family. While defining a table, columns can be added or specified on the run in a column family. We must decide the number and name of the column family at the time of table creation, but columns can be added as required at any point in time while storing the data, and this is the beauty of schema-free when we use HBase.

The following is the logical view of how data is stored in HBase, but in actual these are stored separately with column families:

Row keys

Time_Stamp

Column family 1 (CF1)

Column family 2 (CF2)

 

CF1:Col 1

CF1:Col 2

CF2:Col 3

CF2:Col4

CF2:Col 5

Row1

Time stamp 1

  

Value 3

Value 4

Value 5

Row2

Time stamp 2

Value 6

Value 7

Value 8

Value 9

Value 10

Row2

Time stamp 3

Value 11

Value 12

Value 13

  

So, in physical storage, this table will be stored in two parts, column family 1 and column family 2, and data can be accessed from different column families.

A column is always represented and accessed using the column family name as prefix (columnfamilyname: columnname) so that we know which column family is accessed. The columns that do not contain values are not stored. We can see this column-family-wise representation in the following two tables that represent the logical view of data storage, as shown in the preceding table.

The following tables represent the tables that will be stored as column-family-based tables:

Row keys

Time_Stamp

Column family 1 (CF1)

 

CF1:Col 1

CF1:Col 2

Row2

Time stamp 2

Value 6

Value 7

Row2

Time stamp 3

Value 11

Value 12

Row keys

Time_Stamp

Column family 2 (CF2)

 

CF2:Col 3

CF2:Col4

CF2:Col 5

Row1

Time stamp 1

Value 3

Value 4

Value 5

Row2

Time stamp 2

Value 8

Value 9

Value 10

In the earlier releases of HBase, we did not have a database concept; however, there was the table concept. The newer version of HBase introduces a concept called namespace (supported in HBase 0.96 and later versions) that groups tables logically, giving a more structured, organized representation, and storage of tables. Let's discuss it now.

Namespace

A namespace is a logical grouping of tables, similar to relation databases in group-related tables. The following is the typical representation of namespaces:

Namespace

Now, let's now discuss the components of a namespace:

  • Table: All tables are member of some namespace. If a namespace is not defined, the table belongs to a default namespace. One table can only be the member of a single namespace.
  • RegionServer group: A namespace might have a default RegionServer group. Therefore, the table created will be a member of the RegionServer group of the defined namespace.
  • Permission: A namespace enables us to define Access Control Lists (ACLs). For example, the write permission will give permission for table creation and other operations such as read, delete, and update.
  • Quota: This enforces the limit of the number of tables and regions a namespace can contain.
  • Predefined namespaces: The following are the predefined namespaces:
    • default: This namespace is for all the tables for which a namespace is not defined.
    • system: The .ROOT. and .META. tables and tables in ACLs are loaded before any other table.

Commands available for namespaces

The following are the commands available for namespaces:

  • alter_namespace
  • create_namespace
  • describe_namespace
  • drop_namespace
  • list_namespace
  • list_namespace_tables

We will see the uses of these commands when we discuss shell commands in HBase. Keep in mind that these commands are available with HBase Version 0.96.0 and above. So, namespaces can be created, removed, and altered. A table belongs to the namespace that's decided at time of table creation, which adds the table to the specified namespace. We can create namespaces as follows:

Create_<namespace name>

Have a look at the following example:

create_namespace student_namespace

Then, we can create tables in specific namespaces, as follows:

create'<namespace_name : table_name>', 'column_family_name'

Have a look at the following example:

create 'student_namespace:student_table','student_detail'

Once a namespace is created and a table is added to it, the path on HDFS will look like the following:

<ROOT PATH>/data/<NAMESPACE NAME>/<TABLE NAME>
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.178