Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

HBase filters

As the name suggests, filter means to extract or take out only required data and discard useless or excess data. HBase provides a good number of filters, which we can use in get and scan operations to extract or fetch only the needed data from HBase, preventing scanning-not-required data.

HBase filters are a powerful feature that can greatly enhance effectiveness while working with data stored in tables. The two read functions for HBase, get() and scan(), support direct access to data and the use of a start and end key, respectively. We can limit the data retrieved by adding limiting selectors to the HBase query. These include column families, column qualifiers, timestamps, ranges, and version numbers.

We can represent HBase filter uses as shown in the following diagram, where we specify filters in get or scan. It fetches data from different RegionServers where these filters are shipped using RPC calls and compared with the local data at RegionServers:

Types of filters

Now, let's see different types of filters and their uses. Before discussing this, we will see the operator on which filters depend for comparison:

Operator type	Description
`BitComparator.BitwiseOp`	This performs the bitwise comparison. The following are the enum constants: `AND` (and) `OR` (or) `XOR` (xor) You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BitComparator.BitwiseOp.html .
`CompareFilter.CompareOp`	This is a generic type of filter that is to be used to compare. It can take operators such as equal, greater and not equal. This is also a `byte []` comparator. The following are the enum constants: `EQUAL` `GREATER` `GREATER_OR_EQUAL` `LESS` `LESS_OR_EQUAL` `NO_OP` `NOT_EQUAL` You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html.
`Filter.ReturnCode`	These are the return code for the filter value. The following are the enum constants: `INCLUDE`: This is used to include the cell `INCLUDE_AND_NEXT_COL`: This is used to seek the next column by skipping and also include the cell `NEXT_COL`: This is used to move to the next column by skipping `NEXT_ROW`: This is used to move to the next row by skipping `SEEK_NEXT_USING_HINT`: This is used to move to the next key that's given as a hint using a filter `SKIP`: This is the skip cell You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html.
`FilterList.Operator`	These are the conditions for more than one filter in a filter list. The following are the enum constants: `MUST_PASS_ALL` `!AND` `MUST_PASS_ONE` `!OR` You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.Operator.html.

We have seen the operators used in combination with filters. We will see the use of this in example code; now, let's understand the list of filters available:

Filter types	Description
`BinaryComparator`	This filter is used for binary comparison lexicographically. It compares against the given byte array, using `Bytes.compareTo (byte[], byte[])`. Have a look at the following example: SingleColumnValueFilter colValFilterbinary = new SingleColumnValueFilter(Bytes.toBytes("detail"), Bytes.toBytes("name") ,CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("shash")));
`BinaryPrefixComparator`	This is a binary comparator filter that compares byte arrays at the prefix level.
`BitComparator`	This filter comparator performs the given bitwise operation on each of the bytes with the given byte array.
`ByteArrayComparable`	This is the base class for byte array comparators.
`ColumnCountGetFilter`	This is the filter that gives the first N number of columns on rows only.
`ColumnPaginationFilter`	This is based on the `ColumnCountGetFilter`; it takes two arguments, limit and offset, and is used for pagination.
`ColumnPrefixFilter`	This filter is used to get keys with columns that match a specified prefix.
`ColumnRangeFilter`	This filter is used to select columns between the min-column and max-column values.
`CompareFilter`	This is a generic filter used to filter by comparison.
`DependentColumnFilter`	This filter is used to add intercolumn timestamp matching cells with a corresponding timestamp.
`FamilyFilter`	This filter is based on column families.
`Filter`	This is the interface for row and column filters, which can be directly applied within RegionServer.
`FilterList`	Using this, we can implement a logical comparison. This is an ordered list or a set of other filters accompanied by comparison operators that must satisfy the conditions implied in the filter list while comparison. The following are the comparison operators: `FilterList.Operator.MUST_PASS_ALL (AND)` `FilterList.Operator.MUST_PASS_ONE (OR)`
`FirstKeyOnlyFilter`	This filter returns only the first KeyValue from each row.
`FirstKeyValueMatchingQualifiersFilter`	This filter checks for the specified columns in KeyValue.
`FuzzyRowFilter`	This filter is based on fuzzy row keys.
`InclusiveStopFilter`	This filter stops after the given row.
`KeyOnlyFilter`	This filter will only return the key component of each KeyValue.
`MultipleColumnPrefixFilter`	This is used to select keys with columns that match a given prefix.
`NullComparator`	This is a binary comparator; it lexicographically compares against the given byte array using `Bytes.compareTo (byte[], byte[])`.
`PageFilter`	This filter limits results to a specific page size.
`ParseConstants`	This holds a set of constants related to parsing filter strings used by `ParseFilter`.
`ParseFilter`	This allows users to specify a filter via a string.
`PrefixFilter`	This passes results that have same row prefixes.
`QualifierFilter`	This is a filter based on column qualifiers.
`RandomRowFilter`	This includes rows based on a chance.
`RegexStringComparator`	This is a regular expression-based filter.
`RowFilter`	This is used to filter based on the row key.
`SingleColumnValueExcludeFilter`	This checks a single column value, but does not return the tested column.
`SingleColumnValueFilter`	This is used to filter cells based on value.
`SkipFilter`	This is a filter that filters an entire row if any one of the row cell checks do not pass the comparison.
`SubstringComparator`	This is a filter based on substrings in a value.
`TimestampsFilter`	This is a filter based on timestamps of the data.
`ValueFilter`	This filter is based on column values.
`WhileMatchFilter`	This filter will continue till the match is found.

So, we have seen the list of filters that can be used in read methods; they are Get() and Scan(), which are used to filter out the unnecessary data and fetch only the required data. The following is a sample code that contains the use of filters in read methods:

import java.io.IOException;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.filter.ValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.util.Bytes.toBytes;

public class FilterExample {
  public static void main(String[] arguments) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable hbaseTableObj = new HTable(config, "logTable");
    Scan scanObj = new Scan();
    scanObj.setFilter(new ValueFilter(CompareOp.EQUAL, new SubstringComparator("shash")));
    ResultScanner resultScannerObj = hbaseTableObj.getScanner(scanObj);
      for ( Result result : resultScannerObj){
        byte [] value = result.getValue(toBytes("ColFamily"), toBytes("columnName"));
          System.out.println(Bytes.toString(value));
      }
        resultScannerObj.close();
        hbaseTableObj.close();
  }
}

The following example shows how we can use a list of filters that is not a single filter but a combination of many, and this is done using a filter list:

import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import java.io.IOException;
import org.apache.hadoop.hbase.util.Bytes.toBytes;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.filter.KeyOnlyFilter;
import org.apache.hadoop.hbase.util.Bytes;
public class ExampleOfFilterList {
  public static void main(String[] arguments) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable hbaseTableObj = new HTable(config, "logTable");
    Scan scanObj = new Scan();
    FilterList filterListObj = new FilterList(Operator.MUST_PASS_ALL);
    filterListObj.addFilter(new KeyOnlyFilter());
    filterListObj.addFilter(new FirstKeyOnlyFilter());
    scanObj.setFilter(filterListObj);
    ResultScanner resultScannerObj = hbaseTableObj.getScanner(scanObj);
    for ( Result result : resultScannerObj){
      byte [] value = result.getValue(toBytes("colFamName"), toBytes("colName"));
      System.out.println("Value found :" +Bytes.toString(value));
    }
    resultScannerObj.close();
    hbaseTableObj.close();
  }
}

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for HBase filters

Create new playlist

Sign In

Sign Up

HBase filters

Types of filters

Table of Contents for
HBase filters