We have seen how to do scripting in HBase. In this chapter, we will see some more scripting tips and tricks, which will enable an administrator to perform various tasks in HBase by automating. We can write scripts in Ruby, shell script, and a script that's a combination of HBase commands.
Now, let's consider a case where we need to create a table with two column families and two columns, and then insert some data. The script for the same is as follows:
vi hbasescript.script create 'table','data', for i in '0'..'2' do for j in '0'..'2' do for k in '0'..'2' do put 'table', "row-#{i}#{j}#{k}","data:column#{j}#{k}", "name#{j}#{k}" end end end
After saving this script, we can run the following script:
hbase shell hbasescript.script
We can also do the same thing by going to HBase shell:
hbase > for i in '0'..'5' do hbase >* put "utable", "rowKey_#{i}", "address:address", "address#{i}" hbase>* end
The preceding commands will insert five rows in the utable
.
The preceding script will create a table and put 10 rows of data in the table. Likewise, we can write scripts to load data into the table and perform various operations such as inserting data from a text or CSV file.
We can run an HBase command to create an HBase table without going to the HBase shell, as follows:
echo "create 'tableToCreate', 'colFamily'" | hbase shell
Now, we will see a script to scan the table between two rows:
vi scanTable.sh #!/bin/bash TableToScan=$1 RowStart=$2 RowEnd=$3 exec hbase shell <<EOF scan "${TableToScan}", {RowStart => "${RowStart}", RowEnd => "${RowEnd}"} EOF
This code must be called ./scanTable.sh emptable row100 row1000
. This will display rows between row100
and row1000
(which are passed as parameters to the script) from the emptable
table.
As we know, HBase uses Ruby shell, and this can be customized using the .irbrc
file to perform commands such as clearing, maintaining history in HBase shell, and so on. If this file does not already exist in a user's home
directory, we can create it and put the following content, which will enable us to use the clear
command on HBase shell to clear the screen and maintain a command history for HBase shell:
home
directory, issue the following command and add the following lines to the file:vi .irbrc #Clear HBase shell command def clear system('clear') end hadoop_home="<your hadoop home path here>" #Enable history(commands executed previously will be preserved) in hbase shell require "irb/ext/save-history" #No. of commands to be saved. 50 here IRB.conf[:SAVE_HISTORY] = 50 # The location to save the history file IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history" #List given HDFS path from hbase shell def ls(path) directory="/"+path system("#{hadoop_home}/hadoop fs -ls #{directory}") end #<hadoop home path> is the full path of the hadoop directory Kernel.at_exit do IRB.conf[:AT_EXIT].each do |i| i.call end end
clear
and directory
commands from HBase shell as:hbase > clear hbase > ls <directory ls>
hbase > var = create 'table','colFam'
var
to perform operations on the table
, as follows:hbase > var.scan
We will scan table
, and likewise, we can use the put
, get
, and other commands of HBase with this variable.
hbase > var = get_table 'table'
var
variable on HBase shell to perform various operations on the given table, as follows:hbase > var.scan hbase . var.put 'row','colfam:name','shashwat' hbase > var.disable
Likewise, we can use all the commands related to a table.
We can use HBase shell to get the date and time converted to the HBase timestamp, which is useful while specifying the timestamp in some commands in HBase, as follows:
hbase > import java.text.SimpleDateFormat hbase > import java.text.ParsePosition hbase > SimpleDateFormat.new("").parse("", ParsePosition.new(0)).getTime()
The following is an example:
hbase > SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("14/07/01 09:00:00", ParsePosition.new(0)).getTime()
These three commands will give the specified date-time data in HBase timestamp, which we can use to scan or for some other commands.
For example, here we need a timestamp in the get
command, as follows:
get 'tableToGetDataFrom', 'row1', {COLUMN => 'colFam:Name', TIMESTAMP => 1317945301466}
We can get the date-time data from an HBase timestamp, as follows:
hbase > import java.util.Date hbase > Date.new(1317945301466).toString()
This will show the equivalent date-time format of the specified timestamp.
We can execute the following command to enable more output on HBase shell about the commands we are executing:
hbase > debug
This will display more of the stack trace while being on HBase shell and executing commands.
Let's see a separate project that enables us to fetch data from HBase using SQL commands, which we already know; consider the following taken from http://phoenix.apache.org:
"Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets."
We can configure to enable SQL facility in HBase using the following link, and play with SQL queries on HBase:
A good place to get a list of scripts is https://github.com/search?q=hbase+script&ref=cmdform.
3.145.70.38