At a conceptual level, an HBase table can be seen as a sparse set of rows, but in actual storage, it is stored as per a column family. While defining a table, columns can be added or specified on the run in a column family. We must decide the number and name of the column family at the time of table creation, but columns can be added as required at any point in time while storing the data, and this is the beauty of schema-free when we use HBase.
The following is the logical view of how data is stored in HBase, but in actual these are stored separately with column families:
Row keys |
Time_Stamp |
Column family 1 (CF1) |
Column family 2 (CF2) | |||
---|---|---|---|---|---|---|
CF1:Col 1 |
CF1:Col 2 |
CF2:Col 3 |
CF2:Col4 |
CF2:Col 5 | ||
Row1 |
Time stamp 1 |
Value 3 |
Value 4 |
Value 5 | ||
Row2 |
Time stamp 2 |
Value 6 |
Value 7 |
Value 8 |
Value 9 |
Value 10 |
Row2 |
Time stamp 3 |
Value 11 |
Value 12 |
Value 13 |
So, in physical storage, this table will be stored in two parts, column family 1 and column family 2, and data can be accessed from different column families.
A column is always represented and accessed using the column family name as prefix (columnfamilyname: columnname
) so that we know which column family is accessed. The columns that do not contain values are not stored. We can see this column-family-wise representation in the following two tables that represent the logical view of data storage, as shown in the preceding table.
The following tables represent the tables that will be stored as column-family-based tables:
Row keys |
Time_Stamp |
Column family 1 (CF1) | |
---|---|---|---|
CF1:Col 1 |
CF1:Col 2 | ||
Row2 |
Time stamp 2 |
Value 6 |
Value 7 |
Row2 |
Time stamp 3 |
Value 11 |
Value 12 |
Row keys |
Time_Stamp |
Column family 2 (CF2) | ||
---|---|---|---|---|
CF2:Col 3 |
CF2:Col4 |
CF2:Col 5 | ||
Row1 |
Time stamp 1 |
Value 3 |
Value 4 |
Value 5 |
Row2 |
Time stamp 2 |
Value 8 |
Value 9 |
Value 10 |
In the earlier releases of HBase, we did not have a database concept; however, there was the table concept. The newer version of HBase introduces a concept called namespace (supported in HBase 0.96 and later versions) that groups tables logically, giving a more structured, organized representation, and storage of tables. Let's discuss it now.
A namespace is a logical grouping of tables, similar to relation databases in group-related tables. The following is the typical representation of namespaces:
Now, let's now discuss the components of a namespace:
The following are the commands available for namespaces:
alter_namespace
create_namespace
describe_namespace
drop_namespace
list_namespace
list_namespace_tables
We will see the uses of these commands when we discuss shell commands in HBase. Keep in mind that these commands are available with HBase Version 0.96.0 and above. So, namespaces can be created, removed, and altered. A table belongs to the namespace that's decided at time of table creation, which adds the table to the specified namespace. We can create namespaces as follows:
Create_<namespace name>
Have a look at the following example:
create_namespace student_namespace
Then, we can create tables in specific namespaces, as follows:
create'<namespace_name : table_name>', 'column_family_name'
Have a look at the following example:
create 'student_namespace:student_table','student_detail'
Once a namespace is created and a table is added to it, the path on HDFS will look like the following:
<ROOT PATH>/data/<NAMESPACE NAME>/<TABLE NAME>
3.144.172.38