As the name suggests, filter means to extract or take out only required data and discard useless or excess data. HBase provides a good number of filters, which we can use in get and scan operations to extract or fetch only the needed data from HBase, preventing scanning-not-required data.
HBase filters are a powerful feature that can greatly enhance effectiveness while working with data stored in tables. The two read functions for HBase, get()
and scan()
, support direct access to data and the use of a start and end key, respectively. We can limit the data retrieved by adding limiting selectors to the HBase query. These include column families, column qualifiers, timestamps, ranges, and version numbers.
We can represent HBase filter uses as shown in the following diagram, where we specify filters in get
or scan
. It fetches data from different RegionServers where these filters are shipped using RPC calls and compared with the local data at RegionServers:
Now, let's see different types of filters and their uses. Before discussing this, we will see the operator on which filters depend for comparison:
Operator type |
Description |
---|---|
|
This performs the bitwise comparison. The following are the enum constants:
You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BitComparator.BitwiseOp.html . |
|
This is a generic type of filter that is to be used to compare. It can take operators such as equal, greater and not equal. This is also a
You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html. |
|
These are the return code for the filter value. The following are the enum constants:
You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html. |
|
These are the conditions for more than one filter in a filter list. The following are the enum constants:
You can read more on this operator at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.Operator.html. |
We have seen the operators used in combination with filters. We will see the use of this in example code; now, let's understand the list of filters available:
So, we have seen the list of filters that can be used in read methods; they are Get()
and Scan()
, which are used to filter out the unnecessary data and fetch only the required data. The following is a sample code that contains the use of filters in read methods:
import java.io.IOException; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp; import org.apache.hadoop.hbase.filter.SubstringComparator; import org.apache.hadoop.hbase.filter.ValueFilter; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.util.Bytes.toBytes; public class FilterExample { public static void main(String[] arguments) throws IOException { Configuration config = HBaseConfiguration.create(); HTable hbaseTableObj = new HTable(config, "logTable"); Scan scanObj = new Scan(); scanObj.setFilter(new ValueFilter(CompareOp.EQUAL, new SubstringComparator("shash"))); ResultScanner resultScannerObj = hbaseTableObj.getScanner(scanObj); for ( Result result : resultScannerObj){ byte [] value = result.getValue(toBytes("ColFamily"), toBytes("columnName")); System.out.println(Bytes.toString(value)); } resultScannerObj.close(); hbaseTableObj.close(); } }
The following example shows how we can use a list of filters that is not a single filter but a combination of many, and this is done using a filter list:
import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.filter.FilterList; import org.apache.hadoop.hbase.filter.FilterList.Operator; import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter; import java.io.IOException; import org.apache.hadoop.hbase.util.Bytes.toBytes; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.filter.KeyOnlyFilter; import org.apache.hadoop.hbase.util.Bytes; public class ExampleOfFilterList { public static void main(String[] arguments) throws IOException { Configuration config = HBaseConfiguration.create(); HTable hbaseTableObj = new HTable(config, "logTable"); Scan scanObj = new Scan(); FilterList filterListObj = new FilterList(Operator.MUST_PASS_ALL); filterListObj.addFilter(new KeyOnlyFilter()); filterListObj.addFilter(new FirstKeyOnlyFilter()); scanObj.setFilter(filterListObj); ResultScanner resultScannerObj = hbaseTableObj.getScanner(scanObj); for ( Result result : resultScannerObj){ byte [] value = result.getValue(toBytes("colFamName"), toBytes("colName")); System.out.println("Value found :" +Bytes.toString(value)); } resultScannerObj.close(); hbaseTableObj.close(); } }
52.15.74.25