Using the Hadoop tool or JARs for HBase

In a driver class provided by Hadoop, we can run HBase JAR files utilizing the features of Hadoop and using the following command:

hadoop jar <HBase Jar file path>/hbase-*.jar<program name>

The program names we can use here are:

  • completebulkload: This is for a bulk data load
  • copytable: This is to export a table data from the local to peer cluster
  • export: This is to export data from an HBase table to HDFS as a sequence file
  • import: This is to import data written by export
  • importtsv: This is to import data in TSV format to HBase
  • rowcounter: This is to count rows in an HBase table using MapReduce
  • verifyrep: This is to compare the data from tables of different clusters

We will discuss the preceding methods in the next chapter, where we will also discuss the backup/restore process. Likewise, we can call the HBase JAR file with Hadoop. The following are the Hadoop tools:

  • HFile tool: This tool helps us to read an HFile content in text format. We can use it as:
    hbase org.apache.hadoop.hbase.io.hfile.hfile
    

    This is a very useful tool, as hfile is not in human-readable format, and if we need to see the content, this tool fits well.

  • FSHLog tool: This tool can be used to read WAL files in human-readable format. We can use it as:
    hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --dump <hbaselocationlogfile>
    

    We can also use it to split log files, as follows:

    hbase org.apache.hadoop.hbase.regionserver.wal.FSHLog --split <hbaselocationlogfile>
    

    We have HLogPrettyPrinter, which prints the contents of the HBase log file and WALPlayer to replay WAL log files.

  • Counting rows or cell efficiently: An inbuilt HBase counter is much slower as it scans through the HBase tables and huge tables take a lot of time. So, if we need to count the number of records or number of cells for a table, we have an option, using which we can do it in less time. This runs the MapReduce task for the same.

    Use the following command to count rows as a MapReduce task:

    hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>
    

    The preceding command will show the number of rows in a specified HBase table. For more detailed statistics of records, we can use CellCounter or RowCounter, which we will see next.

    A cell counter results in detailed counts; it provides the following once completed:

    • The number of rows in the table
    • The number of column families across all rows
    • The number of qualifiers across all rows
    • The number of occurrences of each column family
    • The number of occurrences of each qualifier
    • The number of versions of each qualifier

    We can use CellCounter as follows:

    hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename><outputDir> [regex or prefix]
    
  • Offline compaction tool: This can be used to run compactions in the offline mode. It can be run as follows:
    hbase org.apache.hadoop.hbase.regionserver.CompactionTool
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.228.88