Data model Java operations

There are some operations that can be performed on HBase data; these are known as HBase data model operations. These operations might have tasks such as reading from, writing to, and modifying data into HBase. Now, we will see the following data model operations:

  • Read operations using Get() and Scan()
  • Write operations using Put()
  • Modify operations using Delete()

Read

In this section, we will see the data models that are useful to read data from an HBase table.

Get()

Get() reads a row from a table. It can read a single or a set of rows based on the specified condition. It returns a result that contains data in key-value pairs or a map format. This method is provided by HTable classes and executed as HTable.get (condition). It returns a row specified as a row key or one that's based on a matching filter.

Constructors

Using the following constructors, we can create an object to access the HBase APIs to read the data:

Result get (byte [] RowKey) throws IOException
Result get (Get get) throws IOException

Supported methods

The following are the methods using which we can read different records from HBase. These methods consist of reading rows of data, locking rows for reading, setting filters for specific records, and others; these are listed as follows:

  • getRow(): This method returns the row key when specified with Get object instances.
  • getRowLock(): This method returns the row lock on a specified row.
  • getLockId(): This method returns the lock ID for a locked row.
  • getTimeRange(): This method sets a time range for a Get instance object.
  • setTimeStamp(): This is used for setting the minimum and maximum values to the given timestamp.
  • setFilter() and getFilter(): These are special filters that can be used to select specific columns or cells based on a variety of conditions about filters. We will discuss this in detail in the next section.
  • setCacheBlocks() and getCacheBlocks(): These enable caching.
  • numFamilies(): This retrieves the size of the family map containing the families using the addFamily() or addColumn() calls.
  • hasFamilies(): This checks whether a family column is added to the current instance of the Get class.

    Note

    The most updated list of Get methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html.

The following are the options that one has while using the Get class:

  • Get everything of a row data for which we specify the row key
  • Get column families from a row for which we add the addFamily filter for each column family to retrieve
  • To get a specific column, we add an addColumn filter to Get
  • To get columns written within a time range, we can add setTimeRange to the Get class
  • For data written at a specific time, we can add setTimeStamp
  • As we know, an HBase record maintains more than one version of data, so we can set the limit on the number of versions to return by setting setMaxVersion
  • To add a string-related or any other filter, we can use setFilter
  • The Get method can be called on a row key, a get object, or in batch modes using a specified list of get in the get list, List<Get>

Let's have a look at a few examples. Here is the basic code to use Get:

Configuration config = HBaseConfiguration.create();
HtableObject tableObject = new HtableObject(config, "tableObjectnametoreadfrom");
Get get = new Get(Bytes.toBytes("RowID"));
Result result = tableObject.get(get);
byte[] nameVal= result.getValue(Bytes.toBytes("details"), Bytes.toBytes("name"));
System.out.println("Name : " + Bytes.toString(nameVal));

The preceding code will read a name column from the Detail column family of a table. To read data, we need to pass rows, column families, and columns as byte arrays using the Bytes.toBytes() method, which converts string parameters to byte arrays.

Now, let's see how we can use multiple gets:

Configuration config = HBaseConfiguration.create();
HtableObject tableObject = new HtableObject(config, "tableObjectnametoreadfrom");
List<Get> listOfGets = new ArrayList<Get>();
listOfGets.add(new Get(Bytes.toBytes("rowKey1")));
listOfGets.add(new Get(Bytes.toBytes("rowKey2")));
listOfGets.add(new Get(Bytes.toBytes("rowKey3")));
Result[] records = tableObject.get(listOfGets);
            for (Result r : records) {
                System.out.println("Row Key:" +r.getRow());
            }

This will iterate through the result object and display all the rows matched. This might return map or result that we will discuss in the result class after scan(). Likewise, we can use the preceding methods to utilize the power of Get.

A full code on Get is as follows:

import org.apache.hadoop.hbase.util.Bytes.*;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
public class GetExample {
        public static void main(String[] args) throws IOException {
                Configuration config = HBaseConfiguration.create();
               try { HTable tableObj = new HTable(config, "logtable");
                Get getObject = new Get(toBytes("rowKey1"));
                Result getResult = tableObj.getObject(getObject);
                print(getResult);
                getObject.addColumn(toBytes("colFam"), toBytes("col2"));
                getResult = tableObj.getObject(getObject);
                print(getResult);
}catch (Exception e)
{
System.out.println("error in reading data");
}
finally
{
tableObj.close();}

        }
        private static void print(Result getResult) {
                System.out.println("Row Key: " + Bytes.toString(getResult.getRow()));
                byte [] value1 = getResult.getValue(toBytes("colFam"), toBytes("column1"));
                System.out.println("colFam1:colum1="+Bytes.toString(value1));
                byte [] value2 = getResult.getValue(toBytes("colFam"), toBytes("column2"));
                System.out.println("colFam1:column2="+Bytes.toString(value2));
        }
}

The following are miscellaneous data methods:

  • boolean exists(Get getobj) throws IOException: Using this method, we can check whether the get operation we specify will return a result or the result will be null.
  • Result getRowOrBefore(byte[] rowkey, byte[] colFamily) throws IOException: Using this method, we can get a row just before the specified row.

Scan()

Scan through the table for all data or sets of data based on filters. This is used like get, but to get more than one record based on the filter specified. In Scan(), we can specify the start row from where scanning will start and the stop row where scanning will stop. We can also specify the time range as a filter to get data between a given time range. To know more about scan optimization methods, visit http://hbase.apache.org/book/perf.reading.html#perf.hbase.client.caching.

Constructors

The following are the constructors that we can use:

  • Scan(): This constructor is used to create scan operations that scan through all the rows
  • Scan(byte [] startRow): This constructor forces a lower bound on the row from where the scan will start
  • Scan(byte [] startRow, byte [] stopRow): This forces scanners to scan between the specified start and end rows only
  • Scan(byte [] startRow, Filter filter): This implements a start row and Filter, which we will discuss later in this chapter
  • Scan(Get get): Scan is done on the basis of conditions in the get object instance
  • Scan(Scan scan): Scan is done based on conditions in another scan object

These constructors can be used for various types of limiting, or filters can be applied in order to limit the scan to the required scope of data, avoiding the useless scanning of data in a table.

Methods

Here, we will learn about the methods used to read data using scan:

  • getStartRow(): This is used to retrieve the start row of the scanning operation.
  • getStopRow():This is used to retrieve the stop row of the scanning operation.
  • setStartRow():This is used to set the start row of the scanning operation.
  • setStopRow():This is used to set the stop row of the scanning operation.
  • getTimeRange(): This gets the associated timestamp or time range of the Get instance.
  • setTimeRange(): This sets the associated timestamp or time range of the Get instance.
  • getMaxVersions(): This gets the highest version of the configured record.
  • setMaxVersions(): This sets the maximum version to return.
  • getFilter(): We can get the currently assigned filter using this method. It might return null if no filter is set. We will discuss filters later in this chapter.
  • setCacheBlocks(): This sets block caching for scans.
  • getCacheBlocks(): This gets block caching for scans.
  • numFamilies(): This is used to get the size of the family map.
  • hasFamilies(): This checks whether column families have been added to the scan.
  • getFamilies(): This retrieves the column families.
  • setFamilyMap(): This sets the family map.
  • getFamilyMap(): This retrieves the family map.
  • setFilter(): This applies a filter to the scan query.

So, let's understand what we can do to customize scans to get the desired result:

  • To scan the entire data, we can use the empty scan constructor as Scan scanobj=new Scan()
  • To modify caching, we can use setCaching() or HTable.setScannerCaching(int), or we can limit the result size by using setMaxResultSize(int)
  • To get all the columns from a column family, we can add addFamily to scan objects
  • To get single columns, we can add addColumn to scan objects
  • To get columns only in a time range, we can specify setTimeRange in scan objects
  • To get columns written in a specific timestamp, we can set setTimeStamp
  • To get a specific number of versions for a column, we can set setMaxVersions
  • To get the maximum number of values returned for each call in the result, we can set setBath for next() call
  • To add a filter, we can set setFilter
  • To enable/disable server-side block caching, we can set setCacheBlock() to true/false

Now, let's see some examples of Scan:

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "logtable");
Scan scan = new Scan();
scan.setMaxVersions(2);
ResultScanner result = table.getScanner(scan);
for (Result result: scanner) {
    System.out.println("Rows which were scanned : " + Bytes.toString(result.getRow()));
}

The preceding code will scan through logtable and print all the rows in the table. Now, we will see how to scan rows in between two rows, as follows. It will scan between row100 and row1000:

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "logtable");
Scan scan = new Scan (Bytes.ToBytes ("row100"), Bytes.toBytes ("row1000");
scan.setMaxVersions(2);
ResultScanner result = table.getScanner(scan);
for (Result result: scanner) {
    System.out.println("Rows which were scanned : " + Bytes.toString(result.getRow()));
}

The following is the full-fledged code to display on scan:

import org.apache.hadoop.hbase.util.Bytes.toBytes;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.tableToScan;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
public class scanExampleFull {
  public static void main(String[] args) throws IOException {
    Configuration config = HBaseConfiguration.create();
    tableToScan tableToScan = new tableToScan(config, "HBaseSamples");
    scan(tableToScan, "row1000", "row10000");
    scan(tableToScan, "row0", "row200");
    tableToScan.close();}
  private static void scan(tableToScan tableToScan, String startingRowKey, String stoppingRowKey) throws IOException {

     Scan scan = new Scan(toBytes(startingRowKey), toBytes(stoppingRowKey));
     scan.addColumn(toBytes("detailColFam"), toBytes("Namecolumn"));
       ResultScanner scanner = tableToScan.getScanner(scan);
       for (Result result : scanner){
         byte [] value = result.getValue(
           toBytes("detailColFam"), toBytes("Namecolumn"));
           System.out.println("  " + Bytes.toString(result.getRow()) + " => " + Bytes.toString(value));
       }
       scanner.close();
  }
}

We can perform the scan in batches. The following code will display how to do this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.util.Bytes.toBytes;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
public class scanInBatch {
  public static void main(String[] args) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable tableToScanObj = new HTable(config, "logTable");
    Scan scanObj = new Scan();
    scan.addFamily(toBytes("columns"));
    scanDisplayData(tableToScanObj, scanObj);
    scan.setBatch(2);
    scanDisplayData(tableToScanObj, scanObj);
    tableToScanObj.close();
  }
  private static void scanDisplayData(HTable tableToScanObj, Scan scanObj) throws IOException {
    System.out.println("Batch Number : " + scanObj.getBatch());
    ResultScanner resultScannerObj = tableToScanObj.getScanner(scanObj);
    for ( Result result : resultScannerObj){
      System.out.println("Data : ");
      for ( KeyValue keyValuePairObj : result.list()){
        System.out.println(Bytes.toString(keyValuePairObj.getValue()));
      }
    }
  resultScannerObj.close();
  }
}

Note

The most updated list of constructors/methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html.

Optimization of scanners can be found at http://hbase.apache.org/book/perf.reading.html.

Write

HBase provides the facility of writing data into HBase using the Put class. To write the data into HBase, we use the Put() method.

Put()

The Put() method is available to write records and data into an HBase table. This method takes parameter as row key and a put object. Using this we can write a row or set of rows of data in an HBase table.

Constructors

The following are the constructors:

  • Put(byte[] rowKey)
  • Put(byte[] rowKey, long timeStamp)
  • Put(byte[] rowKey, RowLock rowLock)
  • Put(byte[] rowKey, long timeStamp, RowLock rowLock)

Methods

To perform writing, we instantiate the put object with the row ID that needs to be inserted. We can use the following methods to perform the put(insert) task:

  • add (byte[] columnFamName, byte[] columnName, byte[] cellValue): This adds the specified column and value for the column to the put operation
  • add (byte[] columnFamName, byte[] columnName, long timeStamp, byte[] cellValue): This adds the specified column and value for the column to the put operation, with the timestamp and cell value
  • add (byte[] columnFamName, ByteBuffer columnName, long timeStamp, ByteBuffer cellValue): This adds the specified column and value for the column to the put operation, with the given timestamp and its version to the put operation
  • add (Cell keyValue): This adds a key-value pair to the put operation.

The following is an example of a put operation:

import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
public class ExampleofPutOperation {
  public static void main(String[] arguments) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable toWriteDataInTable = new HTable(config, "logTable");
    Put putObj = new Put(toBytes("logdataKey1"));
    putObj.add(toBytes("colFamily"), toBytes("columnName1"), toBytes("internetexplorer"));
    putObj.add(toBytes("colFamily"), toBytes("columnName2"), toBytes("123456"));
    toWriteDataInTable.put(putObj);
    toWriteDataInTable.close();
  }
}

This code will put a row named logdataKey1 in the logTable table with the colFamily column family, which will have two columns, columnName1 and columnName2, which contain the internetexplorer and 123456 values, respectively.

Note

More updated information on the put APIs and list of methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html.

Modify

HBase provides the Delete class and methods to delete and modify the columns in HBase tables.

Here, we will discuss methods such as deleting values from a table, using which we can modify table data.

Delete()

Using the Delete class and methods provided, we can delete a row or set of rows and a record or a set of records from an HBase table using specified parameters.

Constructors

The following are the constructors:

  • Delete(byte[] rowKey)
  • Delete(byte[] rowKeyArray, int rowKeyOffset, int rowKeyLength)
  • Delete(byte[] rowKeyArray, int rowKeyOffset, int rowKeyLength, long timestamp)
  • Delete(byte[] rowKey, long timestamp)
  • Delete(byte[] rowKey, long timestamp, rowKeyLock rowKeyLock)
  • Delete(Delete delObj)

Methods

The following methods are available with the Delete class to perform the deletion of columns, column families, or a record in the HBase table:

  • deleteColumn(byte[] family, byte[] qualifier): This is used to delete the latest version of a given column
  • deleteColumn(byte[] family, byte[] qualifier, long timestamp): This is used to delete the specified version of a given column
  • deleteColumns(byte[] family, byte[] qualifier): This is used to delete all versions of a given column
  • deleteColumns(byte[] family, byte[] qualifier, long timestamp): This is used to delete all versions of a given column with a timestamp that's less than or equal to the given timestamp
  • deleteFamily(byte[] family): This is used to delete all versions of all columns of a given column family
  • deleteFamily(byte[] family, long timestamp): This is used to delete all columns of a given family with a timestamp less than or equal to the given timestamp
  • deleteFamilyVersion(byte[] family, long timestamp): This is used to delete all column versions of a given family with a timestamp equal to the given timestamp
  • setTimestamp(long timestamp): This is used to set the timestamp of the delete operation

The following code is an example of Delete():

import org.apache.hadoop.hbase.util.Bytes.toBytes;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.HTable;
import java.io.IOException;
public class DeleteOperationExample {
  public static void main(String[] arguments) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable tableToDeleteDataFrom = new HTable(config, "logTable");
    Delete deleteobj1 = new Delete(toBytes("rowIDToDelete"));
    tableToDeleteDataFrom.delete(deleteobj1);
    Delete deleteobj2 = new Delete(toBytes("2ndRowIDToDelete"));
    delete1.deleteColumns(toBytes("columnFamily"), toBytes("columnName"));
    tableToDeleteDataFrom.delete(deleteobj2);
    tableToDeleteDataFrom.close();
  }
}

This code will delete two rows in first delete. It will delete the entire row, and in the second delete operation, it will delete the given column for a column family.

Note

More updated delete APIs and methods can be found at https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html.

All the latest HBase APIs can be found at the following links:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.156.251