There are some operations that can be performed on HBase data; these are known as HBase data model operations. These operations might have tasks such as reading from, writing to, and modifying data into HBase. Now, we will see the following data model operations:
Get()
and Scan()
Put()
Delete()
In this section, we will see the data models that are useful to read data from an HBase table.
Get()
reads a row from a table. It can read a single or a set of rows based on the specified condition. It returns a result that contains data in key-value pairs or a map format. This method is provided by HTable classes and executed as HTable.get
(condition). It returns a row specified as a row key or one that's based on a matching filter.
Using the following constructors, we can create an object to access the HBase APIs to read the data:
Result get (byte [] RowKey) throws IOException Result get (Get get) throws IOException
The following are the methods using which we can read different records from HBase. These methods consist of reading rows of data, locking rows for reading, setting filters for specific records, and others; these are listed as follows:
getRow()
: This method returns the row key when specified with Get
object instances.getRowLock()
: This method returns the row lock on a specified row.getLockId()
: This method returns the lock ID for a locked row.getTimeRange()
: This method sets a time range for a Get
instance object.setTimeStamp()
: This is used for setting the minimum and maximum values to the given timestamp.setFilter()
and getFilter()
: These are special filters that can be used to select specific columns or cells based on a variety of conditions about filters. We will discuss this in detail in the next section.setCacheBlocks()
and getCacheBlocks()
: These enable caching.numFamilies()
: This retrieves the size of the family map containing the families using the addFamily()
or addColumn()
calls.hasFamilies()
: This checks whether a family column is added to the current instance of the Get
class.The most updated list of Get
methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html.
The following are the options that one has while using the Get
class:
addFamily
filter for each column family to retrieveaddColumn
filter to Get
setTimeRange
to the Get
classsetTimeStamp
setMaxVersion
setFilter
Get
method can be called on a row key, a get
object, or in batch modes using a specified list of get
in the get list, List<Get>
Let's have a look at a few examples. Here is the basic code to use Get
:
Configuration config = HBaseConfiguration.create(); HtableObject tableObject = new HtableObject(config, "tableObjectnametoreadfrom"); Get get = new Get(Bytes.toBytes("RowID")); Result result = tableObject.get(get); byte[] nameVal= result.getValue(Bytes.toBytes("details"), Bytes.toBytes("name")); System.out.println("Name : " + Bytes.toString(nameVal));
The preceding code will read a name column from the Detail
column family of a table. To read data, we need to pass rows, column families, and columns as byte arrays using the Bytes.toBytes()
method, which converts string parameters to byte arrays.
Now, let's see how we can use multiple gets:
Configuration config = HBaseConfiguration.create(); HtableObject tableObject = new HtableObject(config, "tableObjectnametoreadfrom"); List<Get> listOfGets = new ArrayList<Get>(); listOfGets.add(new Get(Bytes.toBytes("rowKey1"))); listOfGets.add(new Get(Bytes.toBytes("rowKey2"))); listOfGets.add(new Get(Bytes.toBytes("rowKey3"))); Result[] records = tableObject.get(listOfGets); for (Result r : records) { System.out.println("Row Key:" +r.getRow()); }
This will iterate through the result
object and display all the rows matched. This might return map
or result
that we will discuss in the result
class after scan()
. Likewise, we can use the preceding methods to utilize the power of Get
.
A full code on Get
is as follows:
import org.apache.hadoop.hbase.util.Bytes.*; import java.io.IOException; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; public class GetExample { public static void main(String[] args) throws IOException { Configuration config = HBaseConfiguration.create(); try { HTable tableObj = new HTable(config, "logtable"); Get getObject = new Get(toBytes("rowKey1")); Result getResult = tableObj.getObject(getObject); print(getResult); getObject.addColumn(toBytes("colFam"), toBytes("col2")); getResult = tableObj.getObject(getObject); print(getResult); }catch (Exception e) { System.out.println("error in reading data"); } finally { tableObj.close();} } private static void print(Result getResult) { System.out.println("Row Key: " + Bytes.toString(getResult.getRow())); byte [] value1 = getResult.getValue(toBytes("colFam"), toBytes("column1")); System.out.println("colFam1:colum1="+Bytes.toString(value1)); byte [] value2 = getResult.getValue(toBytes("colFam"), toBytes("column2")); System.out.println("colFam1:column2="+Bytes.toString(value2)); } }
The following are miscellaneous data methods:
boolean exists(Get getobj) throws IOException
: Using this method, we can check whether the get
operation we specify will return a result or the result will be null.Result getRowOrBefore(byte[] rowkey, byte[] colFamily) throws IOException
: Using this method, we can get a row just before the specified row.Scan through the table for all data or sets of data based on filters. This is used like get
, but to get more than one record based on the filter specified. In Scan()
, we can specify the start row from where scanning will start and the stop row where scanning will stop. We can also specify the time range as a filter to get data between a given time range. To know more about scan optimization methods, visit http://hbase.apache.org/book/perf.reading.html#perf.hbase.client.caching.
The following are the constructors that we can use:
Scan()
: This constructor is used to create scan operations that scan through all the rowsScan(byte [] startRow)
: This constructor forces a lower bound on the row from where the scan will startScan(byte [] startRow, byte [] stopRow)
: This forces scanners to scan between the specified start and end rows onlyScan(byte [] startRow, Filter filter)
: This implements a start row and Filter
, which we will discuss later in this chapterScan(Get get)
: Scan is done on the basis of conditions in the get
object instanceScan(Scan scan)
: Scan is done based on conditions in another scan objectThese constructors can be used for various types of limiting, or filters can be applied in order to limit the scan to the required scope of data, avoiding the useless scanning of data in a table.
Here, we will learn about the methods used to read data using scan:
getStartRow()
: This is used to retrieve the start row of the scanning operation.getStopRow()
:This is used to retrieve the stop row of the scanning operation.setStartRow()
:This is used to set the start row of the scanning operation.setStopRow()
:This is used to set the stop row of the scanning operation.getTimeRange()
: This gets the associated timestamp or time range of the Get
instance.setTimeRange()
: This sets the associated timestamp or time range of the Get
instance.getMaxVersions()
: This gets the highest version of the configured record.setMaxVersions()
: This sets the maximum version to return.getFilter()
: We can get the currently assigned filter using this method. It might return null if no filter is set. We will discuss filters later in this chapter.setCacheBlocks()
: This sets block caching for scans.getCacheBlocks()
: This gets block caching for scans.numFamilies()
: This is used to get the size of the family map.hasFamilies()
: This checks whether column families have been added to the scan.getFamilies()
: This retrieves the column families.setFamilyMap()
: This sets the family map.getFamilyMap()
: This retrieves the family map.setFilter()
: This applies a filter to the scan query.So, let's understand what we can do to customize scans to get the desired result:
Scan scanobj=new Scan()
setCaching()
or HTable.setScannerCaching(int)
, or we can limit the result size by using setMaxResultSize(int)
addFamily
to scan objectsaddColumn
to scan objectssetTimeRange
in scan objectssetTimeStamp
setMaxVersions
setBath
for next()
callsetFilter
setCacheBlock()
to true
/false
Now, let's see some examples of Scan
:
Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "logtable"); Scan scan = new Scan(); scan.setMaxVersions(2); ResultScanner result = table.getScanner(scan); for (Result result: scanner) { System.out.println("Rows which were scanned : " + Bytes.toString(result.getRow())); }
The preceding code will scan through logtable
and print all the rows in the table. Now, we will see how to scan rows in between two rows, as follows. It will scan between row100
and row1000
:
Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "logtable"); Scan scan = new Scan (Bytes.ToBytes ("row100"), Bytes.toBytes ("row1000"); scan.setMaxVersions(2); ResultScanner result = table.getScanner(scan); for (Result result: scanner) { System.out.println("Rows which were scanned : " + Bytes.toString(result.getRow())); }
The following is the full-fledged code to display on scan:
import org.apache.hadoop.hbase.util.Bytes.toBytes; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.tableToScan; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; public class scanExampleFull { public static void main(String[] args) throws IOException { Configuration config = HBaseConfiguration.create(); tableToScan tableToScan = new tableToScan(config, "HBaseSamples"); scan(tableToScan, "row1000", "row10000"); scan(tableToScan, "row0", "row200"); tableToScan.close();} private static void scan(tableToScan tableToScan, String startingRowKey, String stoppingRowKey) throws IOException { Scan scan = new Scan(toBytes(startingRowKey), toBytes(stoppingRowKey)); scan.addColumn(toBytes("detailColFam"), toBytes("Namecolumn")); ResultScanner scanner = tableToScan.getScanner(scan); for (Result result : scanner){ byte [] value = result.getValue( toBytes("detailColFam"), toBytes("Namecolumn")); System.out.println(" " + Bytes.toString(result.getRow()) + " => " + Bytes.toString(value)); } scanner.close(); } }
We can perform the scan in batches. The following code will display how to do this:
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.util.Bytes.toBytes; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import java.io.IOException; public class scanInBatch { public static void main(String[] args) throws IOException { Configuration config = HBaseConfiguration.create(); HTable tableToScanObj = new HTable(config, "logTable"); Scan scanObj = new Scan(); scan.addFamily(toBytes("columns")); scanDisplayData(tableToScanObj, scanObj); scan.setBatch(2); scanDisplayData(tableToScanObj, scanObj); tableToScanObj.close(); } private static void scanDisplayData(HTable tableToScanObj, Scan scanObj) throws IOException { System.out.println("Batch Number : " + scanObj.getBatch()); ResultScanner resultScannerObj = tableToScanObj.getScanner(scanObj); for ( Result result : resultScannerObj){ System.out.println("Data : "); for ( KeyValue keyValuePairObj : result.list()){ System.out.println(Bytes.toString(keyValuePairObj.getValue())); } } resultScannerObj.close(); } }
The most updated list of constructors/methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html.
Optimization of scanners can be found at http://hbase.apache.org/book/perf.reading.html.
HBase provides the facility of writing data into HBase using the Put
class. To write the data into HBase, we use the Put()
method.
The Put()
method is available to write records and data into an HBase table. This method takes parameter as row key and a put object. Using this we can write a row or set of rows of data in an HBase table.
The following are the constructors:
Put(byte[] rowKey)
Put(byte[] rowKey, long timeStamp)
Put(byte[] rowKey, RowLock rowLock)
Put(byte[] rowKey, long timeStamp, RowLock rowLock)
To perform writing, we instantiate the put
object with the row ID that needs to be inserted. We can use the following methods to perform the put(insert)
task:
add (byte[] columnFamName, byte[] columnName, byte[] cellValue)
: This adds the specified column and value for the column to the put operationadd (byte[] columnFamName, byte[] columnName, long timeStamp, byte[] cellValue)
: This adds the specified column and value for the column to the put operation, with the timestamp and cell valueadd (byte[] columnFamName, ByteBuffer columnName, long timeStamp, ByteBuffer cellValue)
: This adds the specified column and value for the column to the put operation, with the given timestamp and its version to the put operationadd (Cell keyValue)
: This adds a key-value pair to the put operation.The following is an example of a put operation:
import java.io.IOException; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; public class ExampleofPutOperation { public static void main(String[] arguments) throws IOException { Configuration config = HBaseConfiguration.create(); HTable toWriteDataInTable = new HTable(config, "logTable"); Put putObj = new Put(toBytes("logdataKey1")); putObj.add(toBytes("colFamily"), toBytes("columnName1"), toBytes("internetexplorer")); putObj.add(toBytes("colFamily"), toBytes("columnName2"), toBytes("123456")); toWriteDataInTable.put(putObj); toWriteDataInTable.close(); } }
This code will put a row named logdataKey1
in the logTable
table with the colFamily
column family, which will have two columns, columnName1
and columnName2
, which contain the internetexplorer
and 123456
values, respectively.
More updated information on the put APIs and list of methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html.
HBase provides the Delete
class and methods to delete and modify the columns in HBase tables.
Here, we will discuss methods such as deleting values from a table, using which we can modify table data.
Using the Delete
class and methods provided, we can delete a row or set of rows and a record or a set of records from an HBase table using specified parameters.
The following are the constructors:
Delete(byte[] rowKey)
Delete(byte[] rowKeyArray, int rowKeyOffset, int rowKeyLength)
Delete(byte[] rowKeyArray, int rowKeyOffset, int rowKeyLength, long timestamp)
Delete(byte[] rowKey, long timestamp)
Delete(byte[] rowKey, long timestamp, rowKeyLock rowKeyLock)
Delete(Delete delObj)
The following methods are available with the Delete
class to perform the deletion of columns, column families, or a record in the HBase table:
deleteColumn(byte[] family, byte[] qualifier)
: This is used to delete the latest version of a given columndeleteColumn(byte[] family, byte[] qualifier, long timestamp)
: This is used to delete the specified version of a given columndeleteColumns(byte[] family, byte[] qualifier)
: This is used to delete all versions of a given columndeleteColumns(byte[] family, byte[] qualifier, long timestamp)
: This is used to delete all versions of a given column with a timestamp that's less than or equal to the given timestampdeleteFamily(byte[] family)
: This is used to delete all versions of all columns of a given column familydeleteFamily(byte[] family, long timestamp)
: This is used to delete all columns of a given family with a timestamp less than or equal to the given timestampdeleteFamilyVersion(byte[] family, long timestamp)
: This is used to delete all column versions of a given family with a timestamp equal to the given timestampsetTimestamp(long timestamp)
: This is used to set the timestamp of the delete operationThe following code is an example of Delete()
:
import org.apache.hadoop.hbase.util.Bytes.toBytes; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.Delete; import org.apache.hadoop.hbase.client.HTable; import java.io.IOException; public class DeleteOperationExample { public static void main(String[] arguments) throws IOException { Configuration config = HBaseConfiguration.create(); HTable tableToDeleteDataFrom = new HTable(config, "logTable"); Delete deleteobj1 = new Delete(toBytes("rowIDToDelete")); tableToDeleteDataFrom.delete(deleteobj1); Delete deleteobj2 = new Delete(toBytes("2ndRowIDToDelete")); delete1.deleteColumns(toBytes("columnFamily"), toBytes("columnName")); tableToDeleteDataFrom.delete(deleteobj2); tableToDeleteDataFrom.close(); } }
This code will delete two rows in first delete
. It will delete the entire row, and in the second delete
operation, it will delete the given column for a column family.
More updated delete APIs and methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html.
All the latest HBase APIs can be found at the following links:
3.133.156.251