Chapter 18. HBCK and Inconsistencies

HBase Filesystem Layout

Like any database or filesystem, HBase can run into inconsistencies between what it believes its metadata looks like and what its filesystem actually looks like. Of course, the inverse of the previous statement can be true as well. Before getting into debugging HBase inconsistencies, it is important to understand the layout of HBase’s metadata master table known as hbase:meta and the how HBase is laid out on HDFS. Looking at the meta table name hbase:meta, the hbase before the : indicates the namespace the table lives in, and after the : is the name of the table, which is meta. Namespaces are used for logical grouping of similar tables, typically utilized in multitenant environments. Out of the box, two namespaces are used: default and hbase. default is where all tables without a namespace specified are created, and hbase is used for HBase internal tables. For right now, we are going to focus on hbase:meta. HBase’s meta table is used to store important pieces of information about the regions in the HBase tables. Here is a sample output of an HBase instance with one user table named odell:

hbase(main):002:0> describe 'hbase:meta'
DESCRIPTION
 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 =>
 '|org.apache.hadoop.hbase.coprocessor.MultiRowMutation Endpoint|536870911|'},
 {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
 REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL =>
 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE =>
 '8192', IN_MEMORY => 'true', BLOCKCACHE => 'true'}

The important pieces of information to glean from the this output are as follows:

IS_META => true

This means that the table you are describing is the meta table. Hopefully this comes as no surprise!

NAME => info

There is only one column family in the meta table called info. We look deeper into what is stored in this column family next.

IN_MEMORY => true, BLOCKCACHE => true

The metatable and HBase indexes are both stored in block cache, and it’s important to never set the block cache below the necessary amount (usually 0.1 or 10% of the heap is sufficient for this).

Reading META

Looking at the following block of data, it is clear that meta is not very fun to read natively, but it is a necessary evil when troubleshooting HBase inconsistencies:

hbase(main):010:0> scan 'hbase:meta', {STARTROW => 'odell,,'}
ROW                                                                COLUMN+CELL

odell,,1412793323534.aa18c6b576bd8fe3eaf71382475bade8.
column=info:regioninfo, timestamp=1412793324037, value={ENCODED => aa18c6b576b...
column=info:seqnumDuringOpen, timestamp=1412793324138, value=x00x00x00x00...
column=info:server, timestamp=1412793324138, value=odell-test-5.ent.cloudera.c...
column=info:serverstartcode, timestamp=1412793324138, value=1410381620515


odell,ccc,1412793397646.3eadbb7dcbfeee47e8751b356853b17e.
column=info:regioninfo, timestamp=1412793398180, value={ENCODED => 3eadbb7dcbf...
column=info:seqnumDuringOpen, timestamp=1412793398398, value=x00x00x00x00...
column=info:server, timestamp=1412793398398, value=odell-test-3.ent.cloudera.c...
column=info:serverstartcode, timestamp=1412793398398, value=1410381620376

...truncated

The first column of the output is the row key:

odell,,1412793323534.aa18c6b576bd8fe3eaf71382475bade8.

and:

odell,ccc,1412793397646.3eadbb7dcbfeee47e8751b356853b17e.

The meta row key is broken down into table name, start key, timestamp, encoded region name, and . (yes, the . is necessary). When troubleshooting, the most important aspects are table name, encoded region name, and start key because it is important these match up expectedly. The next column are all of the key value pairs for this row—in this case, there is one column family named info and four column qualifiers named regioninfo, seqnumDuringOpen, server, and serverstartcode. There are a few main values to make particular note of when looking at the meta table for a particular region:

info:regioninfo

Contains the encoded region name, the row key, the start key, and the stop key.

info:seqnumDuringOpen

Used for later HBase features such as shadow regions, but is currently not important for troubleshooting.

info:server

Contains the information about the RegionServer the region is assigned to (this will become quite useful when troubleshooting unassigned regions).

info:serverstartcode

Contains the start times for the particular region in the RegionServer.

Reading HBase on HDFS

Looking at the layout of meta can be a good indicator of what the HBase region structure should look like, but there are times when meta can misleading or incorrect. In these cases, HDFS is the source of truth for HBase. It is just as important to be able to read the HDFS file layout of HBase as it is the meta table. We are now going to explore the HDFS layout on disk:

-bash-4.1$ hadoop fs -ls /hbase
Found 9 items
drwxr-xr-x   - hbase hbase          0 2014-10-08 12:14 /hbase/.hbase-snapshot
drwxr-xr-x   - hbase hbase          0 2014-08-26 10:36 /hbase/.migration
drwxr-xr-x   - hbase hbase          0 2014-09-30 06:48 /hbase/.tmp
drwxr-xr-x   - hbase hbase          0 2014-09-10 13:40 /hbase/WALs
drwxr-xr-x   - hbase hbase          0 2014-10-10 21:33 /hbase/archive
drwxr-xr-x   - hbase hbase          0 2014-08-28 08:49 /hbase/data
-rw-r--r--   3 hbase hbase         42 2014-08-28 08:53 /hbase/hbase.id
-rw-r--r--   3 hbase hbase          7 2014-08-28 08:49 /hbase/hbase.version
drwxr-xr-x   - hbase hbase          0 2014-10-14 06:45 /hbase/oldWALs

The first set of directories that begin with a . are all internal HBase directories that do not contain any data. The .hbase-snapshot directory is fairly self-explanatory—it contains all of the current HBase snapshots. Next is the .migration directory, which is utilized in upgrades from one HBase version to the next. The .tmp directory is where files are temporarily created. Also, FSUtils and HBCK will take advantage of this space. For example, the hbase.version file is created here and then moved to /hbase once fully written. HBCK will use the space when merging regions or any other operations that involve changing the FS layout. The WALs directory will contain all of the currently active WALs that have not yet been rolled or need to be split during a restart. The archive directory is related directly to the .hbase-snapshots and is reserved for snapshot use only. The archive directly holds the HFiles that are being protected by the HBase snapshot, rather than being deleted. The hbase.id file contains the unique ID of the cluster. The hbase.version file holds a string representation of the current version of HBase. The oldWALs directory is used in direct correlation with HBase replication. Any WALs that still need to be replayed to the destination cluster are written here rather than deleted when rolled. This backup typically happens whenever there is communication issues between the source and destination cluster. For the exercise of troubleshooting inconsistencies, we will be focused on the data directory. The data directory is aptly named—it contains the data for HBase. Let’s take a deeper look at the the directory structure for data:

-bash-4.1$ hadoop fs -ls /hbase/data
Found 2 items
drwxr-xr-x   - hbase hbase          0 2014-10-08 11:31 /hbase/data/default
drwxr-xr-x   - hbase hbase          0 2014-08-28 08:53 /hbase/data/hbase

The next layer is where the namespaces for the HBase are contained. In the preceding example, there are only two namespaces—default and hbase:

-bash-4.1$ hadoop fs -ls /hbase/data/default
Found 1 items
drwxr-xr-x   - hbase hbase          0 2014-10-08 11:40 /hbase/data/default/odell
-bash-4.1$ hadoop fs -ls /hbase/data/hbase
Found 2 items
drwxr-xr-x   - hbase hbase          0 2014-08-28 08:53 /hbase/data/hbase/meta
drwxr-xr-x   - hbase hbase          0 2014-08-28 08:53 /hbase/.../namespace

The next level down will show the tables’ names. As in the preceding code snippet, we have a table called odell in our default namespace, and we have two tables, meta, and namespace, in the hbase namespace. We are going to focus on what odell looks like going forward:

-bash-4.1$ hadoop fs -ls /hbase/data/default/odell
Found 5 items
drwxr-xr-x 2014-10-08 11:31 /hbase/data/default/odell/.tabledesc
drwxr-xr-x 2014-10-08 11:31 /hbase/data/default/odell/.tmp
drwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/3eadbb7dcbfeee47e875...
drwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/7450bb77ac287b9e77ad...
drwxr-xr-x 2014-10-08 11:35 /hbase/data/default/odell/aa18c6b576bd8fe3eaf71...

The .tabledesc directory contains a file typically named .tableinfo.000000xxxx where x is a count for the number of tables. The tableinfo file contains a listing of the same information presented when running a describe from the HBase shell. This includes information such as meta, data block encoding type, bloomfilter used, version count, compression, and so on. It is important to maintain the tableinfo file when attempting to duplicate tables using distcp (we highly recommend using snapshots instead). The .tmp directory is used to write the tableinfo file which is then moved to the .tabledesc when it is complete. Next, we have the encoded region name; glancing back at the meta output you will notice that the encoded region names should match up to the output of meta especially in the info:regioninfo under ENCODED. Under each encoded region directory is:

-bash-4.1$ hadoop fs -ls -R /hbase/data/default/odell/3ead...
-rwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/3ead.../.regioninfo
drwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/3ead.../.tmp
drwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/3ead.../cf1
-rwxr-xr-x 2014-10-08 11:36 /hbase/data/default/odell/3ead.../cf1/5cadc83fc35d...

The .regioninfo file contains information about the region’s <INFORMATION>. The .tmp directory at the individual region level is used for rewriting storefiles during major compactions. Finally there will be a directory for each column family in the table, which will contain the storefiles if any data has been written to disk in that column family.

General HBCK Overview

Now that we have an understanding of the HBase internals on meta and the filesystem, let’s look at what HBase looks like logically when everything is intact. In the preceding example we have one table named odell with three regions covering “–aaa, aaa–ccc, ccc–eee, eee–”. It always helps to be able to visualize that data:

Table 18-1. HBCK data visualization
Region 1 Region 2 Region 3 Region 24 Region 25 Region 26

aaa

bbb

xxx

yyy

zzz

aaa

bbb

ccc

yyy

zzz

Table 18-1 is a logical diagram of a fictitious HBase table covering the alphabet. Every set of keys is assigned to individual regions, starting and ending with absolute quotation marks which will catch anything before or after the current set of row keys.

Earlier versions of HBase were prone to inconsistencies through bad splits, failed merges, and incomplete region cleanup operations. The later versions of HBase are quite solid, and rarely do we run into inconsistencies. But as with life, nothing in this world is guaranteed, and software can have faults. It is always best to be prepared for anything. The go-to tool for repairing inconsistencies in HBase is known as the HBCK tool. This tool is capable of repairing most any issues you will encounter with HBase. The HBCK tool can be executed by by running hbase hbck from the CLI:

-bash-4.1$ sudo -u hbase hbase hbck
14/10/15 05:23:24 INFO Client environment:zookeeper.version=3.4.5-cdh5.1.2--1,...
14/10/15 05:23:24 INFO Client environment:host.name=odell-test-1.ent.cloudera.com
14/10/15 05:23:24 INFO Client environment:java.version=1.7.0_55
14/10/15 05:23:24 INFO Client environment:java.vendor=Oracle Corporation
14/10/15 05:23:24 INFO Client environment:java.home=/usr/java/jdk1.7.0_55-clou...
...truncated...
Summary:
hbase:meta is okay.
Number of regions: 1
Deployed on: odell-test-5.ent.cloudera.com,60020,1410381620515
odell is okay.
Number of regions: 3
Deployed on: odell-test-3.ent.cloudera.com,60020,1410381620376
hbase:namespace is okay.
Number of regions: 1
Deployed on: odell-test-4.ent.cloudera.com,60020,1410381620086
0 inconsistencies detected.
Status: OK

The preceding code outlines a healthy HBase instance with all of the regions assigned, META is correct, all of the region info is correct in HDFS, and all of the regions are currently consistent. If everything is running as expected, there should be 0 inconsistencies detected and status of OK. There are a few ways that HBase can become corrupt. We will take a deeper look at some of the more common scenarios:

  • Bad region assignments

  • Corrupt META

  • HDFS holes

  • HDFS orphaned regions

  • Region overlaps

Using HBCK

When dealing with inconsistencies, it is very common for false positives to be present and cause the situation to look more dire than it really is. For example, a corrupt META can cause numerous HDFS overlaps or holes to show up, when the underlying FS is actually perfect. The primary flag to run in HBCK with only the -repair flag. This flag will execute every repair in a row command:

-fixAssignments
-fixMeta
-fixHdfsHoles
-fixHdfsOrphans
-fixHdfsOverlaps
-fixVersionFile
-sidelineBigOverlaps
-fixReferenceFiles
-fixTableLocks

This is great when you are working with an experimental or development instance, but might not be ideal when dealing with production or pre-production instances. One of the primary reasons to be careful with executing just the -repair flag is the -sidelineBigOverlaps flag. If there are overly large overlaps, HBase will sideline regions outside of HBase, and they will have to be bulk loaded back into the correct region assignments. Without a full understanding of every flag’s implication, it is possible to make the issue worse than it is. It is recommended to take a pragmatic approach and start with the less impactful flags.

Log Everything

Before you start running any HBCK, make sure you are either logging to an external file or your terminal is logging all commands and terminal outputs

The first two flags we typically prefer to run are -fixAssignments and -FixMeta. The -fixAssignments flag repairs unassigned regions, incorrectly assigned regions, or regions with multiple assignments. HBase uses HDFS as the underlying source of truth for the correct layout of META. The -fixMeta flag removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if the regions are present in HDFS while not in META. In HBase, the region assignments are controlled through the Assignment Manager. The Assignment Manager keeps the current state of HBase in memory, if the region assignments were out of sync with HBase and META, it is safe to assume they are out of sync in the Assignment Manager. The fastest way to update the Assignment Manager to the correct values provided by HBCK is to rolling restart your HBase Master nodes. After restarting the HBase Master nodes, it is time to run HBCK again.

If after rerunning HBCK the end result is not “0 inconsistencies detected,” then it is time to use some heavier-handed commands to correct the outstanding issues. The three other major issues that could still be occurring are HDFS holes, HDFS overlaps, and HDFS orphans.

If running the -FixMeta and the -FixAssignments flag, we would recommend contacting your friendly neighborhood Hadoop vendor for more detailed instructions. If, on the other hand, you are handling this yourself, we would recommend using the -repair flag at this point. It is important to note that numerous passes may need to be run. We recommend running the -repair flag in a cycle similar to this:

-bash-4.1$ sudo -u hbase hbase hbck

-bash-4.1$ sudo -u hbase hbase hbck -repair

-bash-4.1$ sudo -u hbase hbase hbck

-bash-4.1$ sudo -u hbase hbase hbck -repair

-bash-4.1$ sudo -u hbase hbase hbck

-bash-4.1$ sudo -u hbase hbase hbck -repair

-bash-4.1$ sudo -u hbase hbase hbck

If you have run through this set of commands and are still seeing inconsistencies, you may need to start running through individual commands depending on the output of the last HBCK command. Again, at this point, we cannot stress enough the importance of contacting your Hadoop vendor or the Apache mailing lists—there are experts available who can help with situations like this. In lieu of that, here is a list of other commands that be found in HBCK:

-fixHdfsHoles

Try to fix region holes in HDFS.

-fixHdfsOrphans

Try to fix region directories with no .regioninfo file in HDFS.

-fixTableOrphans

Try to fix table directories with no .tableinfo file in HDFS (online mode only).

-fixHdfsOverlaps

Try to fix region overlaps in HDFS.

-fixVersionFile

Try to fix missing hbase.version file in HDFS.

-sidelineBigOverlaps

When fixing region overlaps, allow to sideline big overlaps.

-fixReferenceFiles

Try to offline lingering reference store files.

-fixEmptyMetaCells

Try to fix hbase:meta entries not referencing any region (empty REGIONINFO_QUALIFIER rows).

-maxMerge <n>

When fixing region overlaps, allow at most <n> regions to merge (n=5 by default).

-maxOverlapsToSideline <n>

When fixing region overlaps, allow at most <n> regions to sideline per group (n=2 by default).

Warning

The preceding list is not inclusive, nor is it meant to be. There are lots of dragons ahead when messing with Meta and the underlying HDFS structures. Proceed at your own risk!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.186.178