This chapter will discuss the client APIs provided by HBase. As noted earlier, HBase is written in Java and so is its native API. This does not mean, though, that you must use Java to access HBase. In fact, Chapter 6 will show how you can use other programming languages.
The primary client interface to HBase is the HTable
class in the org.apache.hadoop.
hbase.client
package. It provides the user with all the functionality needed
to store and retrieve data from HBase as well as delete obsolete values
and so on. Before looking at the various methods this class provides, let
us address some general aspects of its usage.
All operations that mutate data are guaranteed to be atomic on a per-row basis. This affects all other concurrent readers and writers of that same row. In other words, it does not matter if another client or thread is reading from or writing to the same row: they either read a consistent last mutation, or may have to wait before being able to apply their change.[49] More on this in Chapter 8.
Suffice it to say for now that during normal operations and load, a reading client will not be affected by another updating a particular row since their contention is nearly negligible. There is, however, an issue with many clients trying to update the same row at the same time. Try to batch updates together to reduce the number of separate operations on the same row as much as possible.
It also does not matter how many columns are written for the particular row; all of them are covered by this guarantee of atomicity.
Finally, creating HTable
instances is not without cost. Each
instantiation involves scanning the .META.
table to check if the table actually
exists and if it is enabled, as well as a few other operations that make
this call quite costly. Therefore, it is recommended that you create
HTable
instances only once—and one per
thread—and reuse that instance for the rest of the lifetime of your client
application.
As soon as you need multiple instances of HTable
, consider using the HTablePool
class (see HTablePool), which provides you with a convenient
way to reuse multiple instances.
Here is a summary of the points we just discussed:
Create HTable
instances
only once, usually when your application starts.
Create a separate HTable
instance for every thread you execute (or use HTablePool
).
Updates are atomic on a per-row basis.
The initial set of basic operations are often referred to as
CRUD, which stands for create, read, update, and
delete. HBase has a set of those and we will look into each of them
subsequently. They are provided by the HTable
class, and the remainder of this chapter
will refer directly to the methods without specifically mentioning the
containing class again.
Most of the following operations are often seemingly self-explanatory, but the subtle details warrant a close look. However, this means you will start to see a pattern of repeating functionality so that we do not have to explain them again and again.
The examples you will see in partial source
code can be found in full detail in the publicly available GitHub repository at
https://github.com/larsgeorge/hbase-book
. For details on how
to compile them, see Building the Examples.
Initially you will see the import
statements, but they will be
subsequently omitted for the sake of brevity. Also, specific parts of
the code are not listed if they do not immediately help with the topic
explained. Refer to the full source if in doubt.
This group of operations can be split into separate types: those that work on single rows and those that work on lists of rows. Since the latter involves some more complexity, we will look at each group separately. Along the way, you will also be introduced to accompanying client API features.
The very first method you may want to know about is one that lets you store data in HBase. Here is the call that lets you do that:
void put(Put put) throws IOException
It expects one or a list of Put
objects that, in turn, are created with
one of these constructors:
Put(byte[] row) Put(byte[] row, RowLock rowLock) Put(byte[] row, long ts) Put(byte[] row, long ts, RowLock rowLock)
You need to supply a row
to create a Put
instance. A row in HBase is identified
by a unique row key and—as is the case with most
values in HBase—this is a Java byte[]
array. You are free to choose any row
key you like, but please also note that Chapter 9
provides a whole section on row key design (see Key Design). For now, we assume this can be anything,
and often it represents a fact from the physical world—for example, a
username or an order ID.
These can be simple numbers but also
UUIDs[50] and so on.
HBase is kind enough to provide us with a helper class that
has many static methods to convert Java types into byte[]
arrays. Example 3-1 provides a short list of what it
offers.
Once you have created the Put
instance you can add data to it. This is
done using these methods:
Put add(byte[] family, byte[] qualifier, byte[] value) Put add(byte[] family, byte[] qualifier, long ts, byte[] value) Put add(KeyValue kv) throws IOException
Each call to add()
specifies exactly one column, or, in
combination with an optional timestamp, one single cell. Note that if
you do not specify the timestamp with the
add()
call, the Put
instance will use the optional timestamp
parameter from the constructor (also called ts
) and you should leave it to the region
server to set it.
The variant that takes an existing KeyValue
instance is for advanced users that
have learned how to retrieve, or create, this internal class. It
represents a single, unique cell; like a coordination system used with
maps it is addressed by the row key, column family, column qualifier,
and timestamp, pointing to one value in a three-dimensional, cube-like
system—where time is the third dimension.
One way to come across the internal KeyValue
type is by using the reverse
methods to add()
, aptly named
get()
:
List<KeyValue> get(byte[] family, byte[] qualifier) Map<byte[], List<KeyValue>> getFamilyMap()
These two calls retrieve what you have added
earlier, while having converted the unique cells into KeyValue
instances. You can retrieve all
cells for either an entire column family, a specific column within a
family, or everything. The latter is the getFamilyMap()
call, which you can then
iterate over to check the details contained in each available KeyValue
.
Every KeyValue
instance contains its full
address—the row key, column family, qualifier, timestamp, and so
on—as well as the actual data. It is the
lowest-level class in HBase with respect to the storage
architecture. Storage explains this in great
detail. As for the available functionality in regard to the KeyValue
class from the client API, see
The KeyValue class.
Instead of having to iterate to check for the existence of specific cells, you can use the following set of methods:
boolean has(byte[] family, byte[] qualifier) boolean has(byte[] family, byte[] qualifier, long ts) boolean has(byte[] family, byte[] qualifier, byte[] value) boolean has(byte[] family, byte[] qualifier, long ts, byte[] value)
They increasingly ask for more specific
details and return true
if a match
can be found. The first method simply checks for the presence of a
column. The others add the option to check for a timestamp, a given
value, or both.
There are more methods provided by the
Put
class, summarized in Table 3-1.
Note that the getters listed in Table 3-1 for the Put
class only retrieve what you have set
beforehand. They are rarely used, and make sense only when you, for
example, prepare a Put
instance
in a private method in your code, and inspect the values in another
place.
Example 3-2 shows how all this is put together (no pun intended) into a basic application.
The examples in this chapter use a very
limited, but exact, set of data. When you look at the full source
code you will notice that it uses an internal class named HBaseHelper
. It is used to create a test
table with a very specific number of rows and columns. This makes it
much easier to compare the before and after.
Feel free to run the code as-is against a standalone HBase instance on your local machine for testing—or against a fully deployed cluster. Building the Examples explains how to compile the examples. Also, be adventurous and modify them to get a good feel for the functionality they demonstrate.
The example code usually first removes all
data from a previous execution by dropping the table it has created.
If you run the examples against a production cluster, please make
sure that you have no name collisions. Usually the table is testtable
to indicate its purpose.
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes; import java.io.IOException; public class PutExample { public static void main(String[] args) throws IOException { Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); table.put(put); } }
This is a (nearly) full representation of the code used and every line is explained. The following examples will omit more and more of the boilerplate code so that you can focus on the important parts.
You can, once again, make use of the command-line shell (see Quick-Start Guide) to verify that our insert has succeeded:
hbase(main):001:0>
list
TABLE testtable 1 row(s) in 0.0400 secondshbase(main):002:0>
scan 'testtable'
ROW COLUMN+CELL row1 column=colfam1:qual1, timestamp=1294065304642, value=val1 1 row(s) in 0.2050 seconds
Another optional parameter while creating a Put
instance is called ts
, or timestamp. It
allows you to store a value at a particular
version in the HBase table.
When you do not specify that parameter, it is implicitly set to the current time of the RegionServer responsible for the given row at the moment it is added to the underlying storage.
The constructors of the Put
class have another optional parameter,
called rowLock
. It gives you the
ability to hand in an external row lock,
something discussed in Row Locks. Suffice
it to say for now that you can create your own RowLock
instance that can be used to prevent
other clients from accessing specific rows while you are modifying it
repeatedly.
From your code you may have to deal with
KeyValue
instances directly. As you
may recall from our discussion earlier in this book, these instances
contain the data as well as the coordinates of
one specific cell. The coordinates are the row key, name of the column
family, column qualifier, and timestamp. The class provides a plethora
of constructors that allow you to combine all of these in many
variations. The fully specified constructor looks like this:
KeyValue(byte[] row, int roffset, int rlength, byte[] family, int foffset, int flength, byte[] qualifier, int qoffset, int qlength, long timestamp, Type type, byte[] value, int voffset, int vlength)
Be advised that the KeyValue
class, and its accompanying
comparators, are designed for internal use. They are available in a
few places in the client API to give you access to the raw data so
that extra copy operations can be avoided. They also allow
byte-level comparisons, rather than having to rely on a slower,
class-level comparison.
The data as well as the coordinates are
stored as a Java byte[]
, that is,
as a byte array. The design behind this type of low-level storage is
to allow for arbitrary data, but also to be able to efficiently store
only the required bytes, keeping the overhead of internal data
structures to a minimum. This is also the reason that there is an
offset
and length
parameter for each byte array
paremeter. They allow you to pass in existing byte arrays while doing
very fast byte-level operations.
For every member of the coordinates, there is a getter that can retrieve the byte arrays and their given offset and length. This also can be accessed at the topmost level, that is, the underlying byte buffer:
byte[] getBuffer() int getOffset() int getLength()
They return the full byte array details
backing the current KeyValue
instance. There will be few occasions where you will ever have to go
that far. But it is available and you can make use of it—if need
be.
Two very interesting methods to know are:
byte [] getRow() byte [] getKey()
The question you may ask yourself is: what
is the difference between a row and a
key? While you will learn about the difference in
Storage, for now just remember that the row is
what we have been referring to alternatively as the row
key, that is, the row
parameter of the Put
constructor,
and the key is what was previously introduced as the
coordinates of a cell—in their raw, byte array
format. In practice, you hardly ever have to use getKey()
but will be more likely to use
getRow()
.
The KeyValue
class also provides a large list of
internal classes implementing the Comparator
interface. They can be used in
your own code to do the same comparisons as done inside HBase. This is
useful when retrieving KeyValue
instances using the API and further sorting or processing them in
order. They are listed in Table 3-2.
The KeyValue
class exports most of these
comparators as a static instance for each class. For example, there is
a public field named KEY_COMPARATOR
, giving access to a KeyComparator
instance. The COMPARATOR
field is pointing to an instance
of the more frequently used KVComparator
class. So instead of creating
your own instances, you could use a provided one—for example, when
creating a set holding KeyValue
instances that should be sorted in the same order that HBase is using
internally:
TreeSet<KeyValue> set = new TreeSet<KeyValue>(KeyValue.COMPARATOR)
There is one more field per KeyValue
instance that is representing an
additional dimension for its unique coordinates: the
type. Table 3-3 lists the
possible values.
You can see the type of an existing KeyValue
instance by, for example, using
another provided call:
String toString()
This prints out the meta information of the
current KeyValue
instance, and has
the following format:
<row-key>/<family>:<qualifier>/<version>/<type>/<value-length>
This is used by some of the example code for this book to check if data has been set or retrieved, and what the meta information is.
The class has many more convenience methods
that allow you to compare parts of the stored data, as well as check
what type it is, get its computed heap size, clone or copy it, and
more. There are static methods to create special instances of KeyValue
that can be used for comparisons,
or when manipulating data on that low of a level within HBase. You
should consult the provided Java documentation to learn more about
them.[52] Also see Storage for a detailed
explanation of the raw, binary format.
Each put operation is effectively an RPC[53] that is transferring data from the client to the server and back. This is OK for a low number of operations, but not for applications that need to store thousands of values per second into a table.
The importance of reducing the number of separate RPC calls is tied to the round-trip time, which is the time it takes for a client to send a request and the server to send a response over the network. This does not include the time required for the data transfer. It simply is the overhead of sending packages over the wire. On average, these take about 1ms on a LAN, which means you can handle 1,000 round-trips per second only.
The other important factor is the message size: if you send large requests over the network, you already need a much lower number of round-trips, as most of the time is spent transferring data. But when doing, for example, counter increments, which are small in size, you will see better performance when batching updates into fewer requests.
The HBase API comes with a built-in client-side write buffer that collects put operations so that they are sent in one RPC call to the server(s). The global switch to control if it is used or not is represented by the following methods:
void setAutoFlush(boolean autoFlush) boolean isAutoFlush()
By default, the client-side buffer is not
enabled. You activate the buffer by setting
auto-flush to false
, by invoking:
table.setAutoFlush(false)
This will enable the client-side buffering
mechanism, and you can check the state of the flag respectively with
the isAutoFlush()
method. It will return true
when
you initially create the HTable
instance. Otherwise, it will obviously return the current state as set
by your code.
Once you have activated the buffer, you can
store data into HBase as shown in Single Puts. You do not cause any RPCs to
occur, though, because the Put
instances you stored are kept in memory in your client process. When
you want to force the data to be written, you can call
another API function:
void flushCommits() throws IOException
The flushCommits()
method ships all the modifications to the remote server(s). The
buffered Put
instances can span
many different rows. The client is smart enough to batch these updates
accordingly and send them to the appropriate region server(s). Just as
with the single put()
call, you do not have to
worry about where data resides, as this is handled transparently for
you by the HBase client. Figure 3-1 shows how the
operations are sorted and grouped before they are shipped over the
network, with one single RPC per region server.
While you can force a flush of the buffer, this is usually not necessary, as the API tracks how much data you are buffering by counting the required heap size of every instance you have added. This tracks the entire overhead of your data, also including necessary internal data structures. Once you go over a specific limit, the client will call the flush command for you implicitly. You can control the configured maximum allowed client-side write buffer size with these calls:
long getWriteBufferSize() void setWriteBufferSize(long writeBufferSize) throws IOException
The default size is a moderate 2 MB (or 2,097,152 bytes) and assumes you are inserting reasonably small records into HBase, that is, each a fraction of that buffer size. If you were to store larger data, you may want to consider increasing this value to allow your client to efficiently group together a certain number of records per RPC.
Setting this value for every HTable
instance you create may seem
cumbersome and can be avoided by adding a higher value to your local
hbase-site.xml configuration
file—for example, adding:
<property> <name>hbase.client.write.buffer</name> <value>20971520</value> </property>
The buffer is only ever flushed on two occasions:
Use the flushCommits()
call to send the data
to the servers for permanent storage.
This is triggered when you call
put()
or setWriteBufferSize()
. Both calls
compare the currently used buffer size with the configured limit
and optionally invoke the flushCommits()
method. In case the
entire buffer is disabled, setting setAutoFlush(true)
will force the
client to call the flush method for every invocation of put()
.
Another call triggering the flush
implicitly and unconditionally is the close()
method of HTable
.
Example 3-3 shows how the write buffer is controlled from the client API.
HTable table = new HTable(conf, "testtable"); System.out.println("Auto flush: " + table.isAutoFlush()); table.setAutoFlush(false); Put put1 = new Put(Bytes.toBytes("row1")); put1.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); table.put(put1); Put put2 = new Put(Bytes.toBytes("row2")); put2.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val2")); table.put(put2); Put put3 = new Put(Bytes.toBytes("row3")); put3.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val3")); table.put(put3); Get get = new Get(Bytes.toBytes("row1")); Result res1 = table.get(get); System.out.println("Result: " + res1); table.flushCommits(); Result res2 = table.get(get); System.out.println("Result: " + res2);
Check what the auto flush flag is set to; should print “Auto flush: true”.
Set the auto flush to false to enable the client-side write buffer.
Store some rows with columns into HBase.
Try to load previously stored row. This will print “Result: keyvalues=NONE”.
Force a flush. This causes an RPC to occur.
Now the row is persisted and can be loaded.
This example also shows a specific behavior of the buffer that you may not anticipate. Let’s see what it prints out when executed:
Auto flush: true Result: keyvalues=NONE Result: keyvalues={row1/colfam1:qual1/1300267114099/Put/vlen=4}
While you have not seen the get()
operation yet, you should still be
able to correctly infer what it does, that is, reading data back from
the servers. But for the first get()
in the example, the API returns a
NONE
value—what does that mean? It
is caused by the fact that the client write buffer is an in-memory
structure that is literally holding back any unflushed records.
Nothing was sent to the servers yet, and therefore you cannot access
it.
If you were ever required to access the
write buffer content, you would find that
ArrayList<Put> getWriteBuffer()
can be used
to get the internal list of buffered Put
instances you have added so far
calling table.put(put)
.
I mentioned earlier that it is exactly
that list that makes HTable
not
safe for multithreaded use. Be very careful with what you do to that
list when accessing it directly. You are bypassing the heap size
checks, or you might modify it while a flush is in progress!
Since the client buffer is a simple list retained in the local process memory, you need to be careful not to run into a problem that terminates the process mid-flight. If that were to happen, any data that has not yet been flushed will be lost! The servers will have never received that data, and therefore there will be no copy of it that can be used to recover from this situation.
Also note that a bigger buffer takes more
memory—on both the client and server side since the server
instantiates the passed write buffer to process it. On the other
hand, a larger buffer size reduces the number of RPCs made. For an
estimate of server-side memory-used, evaluate hbase.client.write.buffer *
hbase.regionserver.handler.count * number of region
server
.
Referring to the round-trip time again, if you only store large cells, the local buffer is less useful, since the transfer is then dominated by the transfer time. In this case, you are better advised to not increase the client buffer size.
The client API has the ability to insert single Put
instances as shown earlier, but it also
has the advanced feature of batching operations together. This comes
in the form of the following call:
void put(List<Put> puts) throws IOException
You will have to create a list of Put
instances and hand it to this call.
Example 3-4 updates the previous example by
creating a list to hold the mutations and eventually calling the
list-based put()
method.
List<Put> puts = new ArrayList<Put>(); Put put1 = new Put(Bytes.toBytes("row1")); put1.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); puts.add(put1); Put put2 = new Put(Bytes.toBytes("row2")); put2.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val2")); puts.add(put2); Put put3 = new Put(Bytes.toBytes("row2")); put3.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val3")); puts.add(put3); table.put(puts);
A quick check with the HBase Shell reveals
that the rows were stored as expected. Note that the example actually
modified three columns, but in two rows only. It added two columns
into the row with the key row2
,
using two separate qualifiers, qual1
and qual2
, creating two uniquely named columns
in the same row.
hbase(main):001:0>
scan 'testtable'
ROW COLUMN+CELL row1 column=colfam1:qual1, timestamp=1300108258094, value=val1 row2 column=colfam1:qual1, timestamp=1300108258094, value=val2 row2 column=colfam1:qual2, timestamp=1300108258098, value=val3 2 row(s) in 0.1590 seconds
Since you are issuing a list of row
mutations to possibly many different rows, there is a chance that not
all of them will succeed. This could be due to a few reasons—for
example, when there is an issue with one of the region servers and the
client-side retry mechanism needs to give up because the number of
retries has exceeded the configured maximum. If there is problem with
any of the put calls on the remote servers, the error is reported back
to you subsequently in the form of an IOException
.
Example 3-5 uses a bogus column family name to insert a column. Since the client is not aware of the structure of the remote table—it could have been altered since it was created—this check is done on the server side.
Put put1 = new Put(Bytes.toBytes("row1")); put1.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); puts.add(put1); Put put2 = new Put(Bytes.toBytes("row2")); put2.add(Bytes.toBytes("BOGUS"), Bytes.toBytes("qual1"), Bytes.toBytes("val2")); puts.add(put2); Put put3 = new Put(Bytes.toBytes("row2")); put3.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val3")); puts.add(put3); table.put(puts);
The call to put()
fails with the following (or similar)
error message:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NoSuchColumnFamilyException: 1 time, servers with issues: 10.0.0.57:51640,
You may wonder what happened to the other, nonfaulty puts in the list. Using the shell again you should see that the two correct puts have been applied:
hbase(main):001:0>
scan 'testtable'
ROW COLUMN+CELL row1 column=colfam1:qual1, timestamp=1300108925848, value=val1 row2 column=colfam1:qual2, timestamp=1300108925848, value=val3 2 row(s) in 0.0640 seconds
The servers iterate over all operations and
try to apply them. The failed ones are returned and the client reports the
remote error using the RetriesExhaustedWith
Details
Exception
,
giving you insight into how many operations have failed, with what
error, and how many times it has retried to apply the erroneous
modification. It is interesting to note that, for the bogus column
family, the retry is automatically set to 1 (see the NoSuchColumnFamilyException: 1 time
), as
this is an error from which HBase cannot recover.
Those Put
instances that have failed on the server side are kept in the local
write buffer. They will be retried the next time the buffer is
flushed. You can also access them using the getWriteBuffer()
method of HTable
and take, for
example, evasive actions.
Some checks are done on the client
side, though—for example, to ensure that the put
has a column specified or that it is
completely empty. In that event, the client is throwing an exception
that leaves the operations preceding the faulty one in the client
buffer.
The list-based put()
call uses the client-side write
buffer to insert all puts into the local buffer and then to call
flushCache()
implicitly. While
inserting each instance of Put
,
the client API performs the mentioned check. If it fails, for
example, at the third put out of five—the first two are added to the
buffer while the last two are not. It also then does not trigger the
flush command at all.
You could catch the exception and flush the write buffer manually to apply those modifications. Example 3-6 shows one approach to handle this.
Put put1 = new Put(Bytes.toBytes("row1")); put1.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); puts.add(put1); Put put2 = new Put(Bytes.toBytes("row2")); put2.add(Bytes.toBytes("BOGUS"), Bytes.toBytes("qual1"), Bytes.toBytes("val2")); puts.add(put2); Put put3 = new Put(Bytes.toBytes("row2")); put3.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val3")); puts.add(put3); Put put4 = new Put(Bytes.toBytes("row2")); puts.add(put4); try { table.put(puts); } catch (Exception e) { System.err.println("Error: " + e); table.flushCommits(); }
The example code this time should give you two errors, similar to:
Error: java.lang.IllegalArgumentException: No columns to insert Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NoSuchColumnFamilyException: 1 time, servers with issues: 10.0.0.57:51640,
The first Error
is the client-side check, while the
second is the remote exception that now is caused by calling
table.flushCommits()
Since you possibly have the client-side write buffer enabled—refer to Client-side write buffer—you will find that the exception is not reported right away, but is delayed until the buffer is flushed.
You need to watch out for a peculiarity
using the list-based put
call: you
cannot control the order in which the puts are
applied on the server side, which implies that the order in which the
servers are called is also not under your control. Use this call with
caution if you have to guarantee a specific order—in the worst case,
you need to create smaller batches and explicitly flush the
client-side write cache to enforce that they are sent to the remote
servers.
There is a special variation of the put
calls that warrants its own section:
check and put. The method signature is:
boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier, byte[] value, Put put) throws IOException
This call allows you to issue atomic, server-side mutations that are guarded by an accompanying check. If the check passes successfully, the put operation is executed; otherwise, it aborts the operation completely. It can be used to update data based on current, possibly related, values.
Such guarded operations are often used in systems that handle, for example, account balances, state transitions, or data processing. The basic principle is that you read data at one point in time and process it. Once you are ready to write back the result, you want to make sure that no other client has done the same already. You use the atomic check to compare that the value is not modified and therefore apply your value.
A special type of check can be performed
using the checkAndPut()
call:
only update if another value is not already present. This is
achieved by setting the value
parameter to null
. In that case,
the operation would succeed when the specified column is
nonexistent.
The call returns a boolean
result value, indicating whether the
Put
has been applied or not,
returning true
or false
, respectively. Example 3-7 shows the interactions between the
client and the server, returning the expected results.
Put put1 = new Put(Bytes.toBytes("row1")); put1.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1")); boolean res1 = table.checkAndPut(Bytes.toBytes("row1"), Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), null, put1); System.out.println("Put applied: " + res1); boolean res2 = table.checkAndPut(Bytes.toBytes("row1"), Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), null, put1); System.out.println("Put applied: " + res2); Put put2 = new Put(Bytes.toBytes("row1")); put2.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual2"), Bytes.toBytes("val2")); boolean res3 = table.checkAndPut(Bytes.toBytes("row1"), Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1"), put2); System.out.println("Put applied: " + res3); Put put3 = new Put(Bytes.toBytes("row2")); put3.add(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val3")); boolean res4 = table.checkAndPut(Bytes.toBytes("row1"), Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1"), put3); System.out.println("Put applied: " + res4);
Check if the column does not exist and perform an optional put operation.
Print out the result; it should be “Put applied: false”, as the column now already exists.
Create another
Put
instance, but using a different column
qualifier.
Print out the result; it should be “Put applied: true”, as the checked column already exists.
The last call in the example will throw the following error:
Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: Action's getRow must match the passed row
The compare-and-set operations provided by HBase rely on checking and modifying the same row! As with other operations only providing atomicity guarantees on single rows, this also applies for this call. Trying to check and modify two different rows will return an exception.
Compare-and-set (CAS) operations are very powerful, especially in distributed systems, with even more decoupled client processes. In providing these calls, HBase sets itself apart from other architectures that give no means to reason about concurrent updates performed by multiple, independent clients.
The next step in a client API is to retrieve what was just
saved. For that the HTable
is
providing you with the Get
call and
matching classes. The operations are split into those that operate on a
single row and those that retrieve multiple rows in one call.
First, the method that is used to retrieve specific values from an HBase table:
Result get(Get get) throws IOException
Similar to the Put
class for
the put()
call, there is a matching
Get
class used by the
aforementioned get()
function. As
another similarity, you will have to provide a row key when creating
an instance of Get
, using one of
these constructors:
Get(byte[] row) Get(byte[] row, RowLock rowLock)
A get()
operation is bound to one specific row, but can retrieve any number
of columns and/or cells contained therein.
Each constructor takes a row
parameter specifying the row you want to
access, while the second constructor adds an optional rowLock
parameter, allowing you to hand in your own locks. And, similar to the put operations, you have methods to
specify rather broad criteria to find what you are looking for—or to
specify everything down to exact coordinates for a single cell:
Get addFamily(byte[] family) Get addColumn(byte[] family, byte[] qualifier) Get setTimeRange(long minStamp, long maxStamp) throws IOException Get setTimeStamp(long timestamp) Get setMaxVersions() Get setMaxVersions(int maxVersions) throws IOException
The addFamily()
call narrows the request down to
the given column family. It can be called multiple times to add more
than one family. The same is true for the addColumn()
call. Here you can add
an even narrower address space: the specific column. Then there are methods that let
you set the exact timestamp you are looking for—or a time range to match those cells
that fall inside it.
Lastly, there are methods that allow you to
specify how many versions you want to retrieve, given that you have
not set an exact timestamp. By default, this is set to 1
, meaning that the get()
call returns the most current match
only. If you are in doubt, use getMaxVersions()
to check what it is set to.
The setMaxVersions()
without a parameter sets the number of versions
to return to Integer.MAX_VALUE
—which is also the maximum
number of versions you can configure in the column family descriptor,
and therefore tells the API to return every available version of all
matching cells (in other words, up to what is set at the column family
level).
The Get
class provides additional calls, which are listed in Table 3-4 for your perusal.
Method | Description |
getRow() | Returns the row key as specified when creating the
Get instance. |
getRowLock() | Returns the row RowLock instance for the current
Get instance. |
getLockId() | Returns the optional lock ID handed into the
constructor using the rowLock parameter. Will be -1L if not set. |
getTimeRange() | Retrieves the associated timestamp or time range of the
Get instance. Note that
there is no getTimeStamp()
since the API converts a value assigned with setTimeStamp() into a TimeRange instance internally,
setting the minimum and maximum values to the given
timestamp. |
setFilter()/getFilter() | Special filter instances can be used to select certain columns or cells, based on a wide variety of conditions. You can get and set them with these methods. See Filters for details. |
setCacheBlocks()/ getCacheBlocks() | Each HBase region server has a block cache that efficiently retains recently accessed data for subsequent reads of contiguous information. In some events it is better to not engage the cache to avoid too much churn when doing completely random gets. These methods give you control over this feature. |
numFamilies() | Convenience method to retrieve the size of the family map,
containing the families added using the addFamily() or addColumn() calls. |
hasFamilies() | Another helper to check if a family—or column—has been
added to the current instance of the Get class. |
familySet()/getFamilyMap() | These methods give you access to the column families
and specific columns, as added by the addFamily() and/or addColumn() calls. The family map is
a map where the key is the family name and the value a list of
added column qualifiers for this particular family. The
familySet() returns the
Set of all stored families,
i.e., a set containing only the family names. |
The getters listed in Table 3-4 for the Get
class only retrieve what you have set
beforehand. They are rarely used, and make sense only when you, for
example, prepare a Get
instance
in a private method in your code, and inspect the values in another
place.
As mentioned earlier, HBase provides us with a helper class
named Bytes
that has many static
methods to convert Java types into byte[]
arrays. It also can do the same in
reverse: as you are retrieving data from HBase—for example, one of the
rows stored previously—you can make use of these helper functions to
convert the byte[]
data back into Java types. Here is a short
list of what it offers, continued from the earlier discussion:
static String toString(byte[] b) static boolean toBoolean(byte[] b) static long toLong(byte[] bytes) static float toFloat(byte[] bytes) static int toInt(byte[] bytes) ...
Example 3-8 shows how this is all put together.
Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "testtable"); Get get = new Get(Bytes.toBytes("row1")); get.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); System.out.println("Value: " + Bytes.toString(val));
If you are running this example after, say Example 3-2, you should get this as the output:
Value: val1
The output is not very spectacular, but it
shows that the basic operation works. The example also only adds the
specific column to retrieve, relying on the default for maximum
versions being returned set to 1. The call to get()
returns an instance of the Result
class, which you will learn about next.
When you retrieve data using the get()
calls, you receive an instance of the
Result
class that contains all the
matching cells. It provides you with the means to access everything
that was returned from the server for the given row and matching the
specified query, such as column family, column qualifier, timestamp,
and so on.
There are utility methods you can use to ask
for specific results—just as Example 3-8 used
earlier—using more concrete dimensions. If you have, for example,
asked the server to return all columns of one specific column family,
you can now ask for specific columns within that family. In other
words, you need to call get()
with
just enough concrete information to be able to process the matching
data on the client side. The functions provided are:
byte[] getValue(byte[] family, byte[] qualifier) byte[] value() byte[] getRow() int size() boolean isEmpty() KeyValue[] raw() List<KeyValue> list()
The getValue()
call allows you to get the data
for a specific cell stored in HBase. As you cannot specify what
timestamp—in other words, version—you want, you get the newest one.
The value()
call makes this even
easier by returning the data for the newest cell in the first column
found. Since columns are also sorted lexicographically on the server,
this would return the value of the column with the column name
(including family and qualifier) sorted first.
You saw getRow()
before: it returns the row key, as
specified when creating the current instance of the Get
class. size()
is returning the number of KeyValue
instances the server has returned.
You may use this call—or isEmpty()
,
which checks if size()
returns a
number greater than zero—to check in your own client code if the
retrieval call returned any matches.
Access to the raw, low-level KeyValue
instances is provided by the
raw()
method, returning the array of KeyValue
instances backing the current
Result
instance. The list()
call simply converts the array
returned by raw()
into a List
instance, giving you convenience by
providing iterator access, for example. The created list is backed by
the original array of KeyValue
instances.
The array returned by raw()
is already lexicographically sorted,
taking the full coordinates of the KeyValue
instances into account. So it is
sorted first by column family, then within each family by qualifier,
then by timestamp, and finally by type.
Another set of accessors is provided which are more column-oriented:
List<KeyValue> getColumn(byte[] family, byte[] qualifier) KeyValue getColumnLatest(byte[] family, byte[] qualifier) boolean containsColumn(byte[] family, byte[] qualifier)
Here you ask for multiple values of a
specific column, which solves the issue pointed out earlier, that is,
how to get multiple versions of a given column. The number returned
obviously is bound to the maximum number of versions you have
specified when configuring the Get
instance, before the call to get()
,
with the default being set to 1
. In
other words, the returned list contains zero (in case the column has
no value for the given row) or one entry, which is the newest version
of the value. If you have specified a value greater than the default
of 1
version to be returned, it
could be any number, up to the specified maximum.
The getColumnLatest()
method is returning the
newest cell of the specified column, but in contrast to getValue()
, it does not return the raw byte
array of the value but the full KeyValue
instance instead. This may be
useful when you need more than just the data. The containsColumn()
is a convenience method to
check if there was any cell returned in the specified column.
These methods all support the fact that
the qualifier can be left unspecified—setting it to null
—and therefore matching the special
column with no name.
Using no qualifier means that there is no label to the column. When looking at the table from, for example, the HBase Shell, you need to know what it contains. A rare case where you might want to consider using the empty qualifier is in column families that only ever contain a single column. Then the family name might indicate its purpose.
There is a third set of methods that provide access to the returned data from the get request. These are map-oriented and look like this:
NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap() NavigableMap<byte[], NavigableMap<byte[], byte[]>> getNoVersionMap() NavigableMap<byte[], byte[]> getFamilyMap(byte[] family)
The most generic call, named getMap()
, returns the entire result set in a
Java Map
class instance that you
can iterate over to access all the values. The getNoVersionMap()
does the same while only
including the latest cell for each column. Finally, the getFamilyMap()
lets you select the KeyValue
instances for a specific column
family only—but including all versions, if specified.
Use whichever access method of Result
matches your access pattern; the data
has already been moved across the network from the server to your
client process, so it is not incurring any other performance or
resource penalties.
Another similarity to the put()
calls is that you can ask for more
than one row using a single request. This allows you to quickly and
efficiently retrieve related—but also completely random, if
required—data from the remote servers.
As shown in Figure 3-1, the request may actually go to more than one server, but for all intents and purposes, it looks like a single call from the client code.
The method provided by the API has the following signature:
Result[] get(List<Get> gets) throws IOException
Using this call is straightforward, with the
same approach as seen earlier: you need to create a list that holds
all instances of the Get
class you
have prepared. This list is handed into the call and you will be
returned an array of equal size holding the matching Result
instances. Example 3-9 brings this together, showing two
different approaches to accessing the data.
byte[] cf1 = Bytes.toBytes("colfam1"); byte[] qf1 = Bytes.toBytes("qual1"); byte[] qf2 = Bytes.toBytes("qual2"); byte[] row1 = Bytes.toBytes("row1"); byte[] row2 = Bytes.toBytes("row2"); List<Get> gets = new ArrayList<Get>(); Get get1 = new Get(row1); get1.addColumn(cf1, qf1); gets.add(get1); Get get2 = new Get(row2); get2.addColumn(cf1, qf1); gets.add(get2); Get get3 = new Get(row2); get3.addColumn(cf1, qf2); gets.add(get3); Result[] results = table.get(gets); System.out.println("First iteration..."); for (Result result : results) { String row = Bytes.toString(result.getRow()); System.out.print("Row: " + row + " "); byte[] val = null; if (result.containsColumn(cf1, qf1)) { val = result.getValue(cf1, qf1); System.out.println("Value: " + Bytes.toString(val)); } if (result.containsColumn(cf1, qf2)) { val = result.getValue(cf1, qf2); System.out.println("Value: " + Bytes.toString(val)); } } System.out.println("Second iteration..."); for (Result result : results) { for (KeyValue kv : result.raw()) { System.out.println("Row: " + Bytes.toString(kv.getRow()) + " Value: " + Bytes.toString(kv.getValue())); } }
Assuming that you execute Example 3-4 just before you run Example 3-9, you should see something like this on the command line:
First iteration... Row: row1 Value: val1 Row: row2 Value: val2 Row: row2 Value: val3 Second iteration... Row: row1 Value: val1 Row: row2 Value: val2 Row: row2 Value: val3
Both iterations return the same values,
showing that you have a number of choices on how to access them, once
you have received the results. What you have not yet seen is how
errors are reported back to you. This differs from what you learned in
List of Puts. The get()
call either returns the said array,
matching the same size as the given list by the gets
parameter, or throws an exception.
Example 3-10 showcases this behavior.
List<Get> gets = new ArrayList<Get>(); Get get1 = new Get(row1); get1.addColumn(cf1, qf1); gets.add(get1); Get get2 = new Get(row2); get2.addColumn(cf1, qf1); gets.add(get2); Get get3 = new Get(row2); get3.addColumn(cf1, qf2); gets.add(get3); Get get4 = new Get(row2); get4.addColumn(Bytes.toBytes("BOGUS"), qf2); gets.add(get4); Result[] results = table.get(gets); System.out.println("Result count: " + results.length);
Executing this example will abort the entire
get()
operation, throwing the
following (or similar) error, and not returning a result at
all:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NoSuchColumnFamilyException: 1 time, servers with issues: 10.0.0.57:51640,
One way to have more control over how the
API handles partial faults is to use the batch()
operations discussed in Batch Operations.
There are a few more calls that you can use from your code to retrieve or check your stored data. The first is:
boolean exists(Get get) throws IOException
You can set up a Get
instance, just like you do when using
the get()
calls of HTable
. Instead of having to retrieve the
data from the remote servers, using an RPC, to verify that it actually
exists, you can employ this call because it only returns a boolean
flag indicating that same
fact.
Using exists()
involves the same lookup
semantics on the region servers, including loading file blocks to
check if a row or column actually exists. You only avoid shipping
the data over the network—but that is very useful if you are
checking very large columns, or do so very frequently.
Sometimes it might be necessary to find a specific row, or the one just before the requested row, when retrieving data. The following call can help you find a row using these semantics:
Result getRowOrBefore(byte[] row, byte[] family) throws IOException
You need to specify the row
you are looking for, and a column
family. The latter is required because, in HBase, which is a
column-oriented database, there is no row if there are no columns.
Specifying a family name tells the servers to check if the row
searched for has any values in a column contained in the given
family
.
Be careful to specify an existing column
family name when using the getRowOrBefore()
method, or you will get a
Java NullPointerException
back
from the server. This is caused by the server trying to access a
nonexistent storage file.
The returned instance of the Result
class can be used to retrieve the
found row key. This should be either the exact row you were asking
for, or the one preceding it. If there is no match at all, the call
returns null
. Example 3-11 uses the call to find the rows you
created using the put examples earlier.
Result result1 = table.getRowOrBefore(Bytes.toBytes("row1"), Bytes.toBytes("colfam1")); System.out.println("Found: " + Bytes.toString(result1.getRow())); Result result2 = table.getRowOrBefore(Bytes.toBytes("row99"), Bytes.toBytes("colfam1")); System.out.println("Found: " + Bytes.toString(result2.getRow())); for (KeyValue kv : result2.raw()) { System.out.println(" Col: " + Bytes.toString(kv.getFamily()) + "/" + Bytes.toString(kv.getQualifier()) + ", Value: " + Bytes.toString(kv.getValue())); } Result result3 = table.getRowOrBefore(Bytes.toBytes("abc"), Bytes.toBytes("colfam1")); System.out.println("Found: " + result3);
Assuming you ran Example 3-4 just before this code, you should see output similar or equal to the following:
Found: row1 Found: row2 Col: colfam1/qual1, Value: val2 Col: colfam1/qual2, Value: val3 Found: null
The first call tries to find a matching row
and succeeds. The second call uses a large number postfix to find the
last stored row, starting with the prefix row-
. It did find row-2
accordingly. Lastly, the example tries
to find row abc
, which sorts before
the rows the put example added, using the row-
prefix, and therefore does not exist,
nor matches any previous row keys. The returned result is then
null
and indicates the missed
lookup.
What is interesting is the loop to print out
the data that was returned along with the matching row. You can see
from the preceding code that all columns of the specified column
family were returned, including their latest
values. You could use this call to quickly retrieve all the latest
values from an entire column family—in other words, all columns
contained in the given column family—based on a specific sorting
pattern. For example, assume our put()
example,
which is using row-
as the prefix
for all keys. Calling getRowOrBefore()
with a row
set to row-999999999
will always return the row
that is, based on the lexicographical sorting, placed at the end of
the table.
You are now able to create, read, and update data in HBase
tables. What is left is the ability to delete from it. And surely you
may have guessed by now that the HTable
provides you with a method of exactly
that name, along with a matching class aptly named Delete
.
The variant of the delete()
call that takes a single Delete
instance is:
void delete(Delete delete) throws IOException
Just as with the get()
and
put()
calls you saw already, you will have to
create a Delete
instance and then
add details about the data you want to remove. The constructors
are:
Delete(byte[] row) Delete(byte[] row, long timestamp, RowLock rowLock)
You need to provide the row
you want to modify, and optionally provide a rowLock
, an instance of RowLock
to specify your own lock details, in
case you want to modify the same row more than once subsequently.
Otherwise, you would be wise to narrow down what you want to remove
from the given row, using one of the following methods:
Delete deleteFamily(byte[] family) Delete deleteFamily(byte[] family, long timestamp) Delete deleteColumns(byte[] family, byte[] qualifier) Delete deleteColumns(byte[] family, byte[] qualifier, long timestamp) Delete deleteColumn(byte[] family, byte[] qualifier) Delete deleteColumn(byte[] family, byte[] qualifier, long timestamp) void setTimestamp(long timestamp)
You do have a choice to narrow in on what to
remove using four types of calls. First you can use the deleteFamily()
methods to remove an entire
column family, including all contained columns. You have the option to
specify a timestamp that triggers more specific filtering of cell
versions. If specified, the timestamp matches the same and all older
versions of all columns.
The next type is deleteColumns()
, which operates on exactly
one column and deletes either all versions of that cell when no
timestamp is given, or all matching and older versions when a
timestamp is specified.
The third type is similar, using deleteColumn()
. It also operates on a
specific, given column only, but deletes either the most current or
the specified version, that is, the one with the matching
timestamp.
Finally, there is setTimestamp()
,
which is not considered when using any of the other three types of
calls. But if you do not specify either a family or a column, this
call can make the difference between deleting the entire row or just
all contained columns, in all column families, that match or have an
older timestamp compared to the given one. Table 3-5 shows the functionality in a matrix to
make the semantics more readable.
Method | Deletes without timestamp | Deletes with timestamp |
none | Entire row, i.e., all columns, all versions. | All versions of all columns in all column families, whose timestamp is equal to or older than the given timestamp. |
deleteColumn() | Only the latest version of the given column; older versions are kept. | Only exactly the specified version of the given column, with the matching timestamp. If nonexistent, nothing is deleted. |
deleteColumns() | All versions of the given column. | Versions equal to or older than the given timestamp of the given column. |
deleteFamily() | All columns (including all versions) of the given family. | Versions equal to or older than the given timestamp of all columns of the given family. |
The Delete
class provides additional calls,
which are listed in Table 3-6 for your
reference.
Example 3-12 shows how
to use the single delete()
call
from client code.
Delete delete = new Delete(Bytes.toBytes("row1")); delete.setTimestamp(1); delete.deleteColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), 1); delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual1")); delete.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), 15); delete.deleteFamily(Bytes.toBytes("colfam3")); delete.deleteFamily(Bytes.toBytes("colfam3"), 3); table.delete(delete); table.close();
The example lists all the different calls you can use to
parameterize the delete()
operation. It does not make too much sense to call them all one after
another like this. Feel free to comment out the various delete calls
to see what is printed on the console.
Setting the timestamp for the deletes has the effect of only matching the exact cell, that is, the matching column and value with the exact timestamp. On the other hand, not setting the timestamp forces the server to retrieve the latest timestamp on the server side on your behalf. This is slower than performing a delete with an explicit timestamp.
If you attempt to delete a cell with a
timestamp that does not exist, nothing happens.
For example, given that you have two versions of a column, one at
version 10
and one at version
20
, deleting from this column with
version 15
will not affect either
existing version.
Another note to be made about the example is
that it showcases custom versioning. Instead of relying on timestamps,
implicit or explicit ones, it uses sequential numbers, starting with
1
. This is perfectly valid,
although you are forced to always set the version yourself, since the
servers do not know about your schema and would use epoch-based
timestamps instead.
As of this writing, using custom versioning is not recommended. It will very likely work, but is not tested very well. Make sure you carefully evaluate your options before using this technique.
Another example of using custom versioning can be found in Search Integration.
The list-based delete()
call works very similarly to the list-based put()
. You need to create a list of Delete
instances, configure them, and call
the following method:
void delete(List<Delete> deletes) throws IOException
Example 3-13 shows
where three different rows are affected during the operation, deleting
various details they contain. When you run this example, you will see
a printout of the before and
after states of the delete. The output is
printing the raw KeyValue
instances, using KeyValue.toString()
.
Just as with the other list-based
operation, you cannot make any assumption regarding the order in
which the deletes are applied on the remote servers. The API is free
to reorder them to make efficient use of the single RPC per affected
region server. If you need to enforce specific orders of how
operations are applied, you would need to batch those calls into
smaller groups and ensure that they contain the operations in the
desired order across the batches. In a worst-case scenario, you
would need to send separate delete
calls altogether.
List<Delete> deletes = new ArrayList<Delete>(); Delete delete1 = new Delete(Bytes.toBytes("row1")); delete1.setTimestamp(4); deletes.add(delete1); Delete delete2 = new Delete(Bytes.toBytes("row2")); delete2.deleteColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("qual1")); delete2.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), 5); deletes.add(delete2); Delete delete3 = new Delete(Bytes.toBytes("row3")); delete3.deleteFamily(Bytes.toBytes("colfam1")); delete3.deleteFamily(Bytes.toBytes("colfam2"), 3); deletes.add(delete3); table.delete(deletes); table.close();
The output you should see is:[54]
Before delete call... KV: row1/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row1/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row1/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam2:qual2/3/Put/vlen=4, Value: val3 KV: row1/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam2:qual3/5/Put/vlen=4, Value: val5 KV: row2/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row2/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row2/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row2/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row2/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row2/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row2/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row2/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row2/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row2/colfam2:qual2/3/Put/vlen=4, Value: val3 KV: row2/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row2/colfam2:qual3/5/Put/vlen=4, Value: val5 KV: row3/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row3/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row3/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row3/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row3/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row3/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row3/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row3/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row3/colfam2:qual2/3/Put/vlen=4, Value: val3 KV: row3/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam2:qual3/5/Put/vlen=4, Value: val5 After delete call... KV: row1/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row1/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam2:qual3/5/Put/vlen=4, Value: val5 KV: row2/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row2/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row2/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row2/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row2/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row2/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row2/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row2/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row2/colfam2:qual2/3/Put/vlen=4, Value: val3 KV: row2/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row3/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam2:qual3/5/Put/vlen=4, Value: val5
The deleted original data is highlighted in the Before delete call... block. All three rows contain the same data, composed of two column families, three columns in each family, and two versions for each column.
The example code first deletes, from the
entire row, everything up to version 4
. This leaves the columns with versions
5
and 6
as the remainder of the row
content.
It then goes about and uses the two
different column-related delete
calls on row2
to remove the newest
cell in the column named colfam1:qual1
, and subsequently every cell
with a version of 5
and older—in
other words, those with a lower version number—from colfam1:qual3
. Here you have only one
matching cell, which is removed as expected in due course.
Lastly, operating on row-3
, the code removes the entire column
family colfam1
, and then everything
with a version of 3
or less from
colfam2
. During the execution of
the example code, you will see the printed KeyValue
details, using something like
this:
System.out.println("KV: " + kv.toString() + ", Value: " + Bytes.toString(kv.getValue()))
By now you are familiar with the usage of
the Bytes
class, which is used to
print out the value of the KeyValue
instance, as returned by the getValue()
method. This is necessary because the KeyValue.toString()
output (as explained in
The KeyValue class) is not printing out the
actual value, but rather the key part only. The toString()
does not print the value since it
could be very large.
Here, the example code inserts the column values, and therefore knows that these are short and human-readable; hence it is safe to print them out on the console as shown. You could use the same mechanism in your own code for debugging purposes.
Please refer to the entire example code in the accompanying source code repository for this book. You will see how the data is inserted and retrieved to generate the discussed output.
What is left to talk about is the error
handling of the list-based delete()
call. The handed-in deletes
parameter, that is, the list of Delete
instances, is modified to only
contain the failed delete instances when the call returns. In other
words, when everything has succeeded, the list will be empty. The call
also throws the exception—if there was one—reported from the remote
servers. You will have to guard the call using a try/catch
, for example, and react
accordingly. Example 3-14 may serve as a
starting point.
Delete delete4 = new Delete(Bytes.toBytes("row2")); delete4.deleteColumn(Bytes.toBytes("BOGUS"), Bytes.toBytes("qual1")); deletes.add(delete4); try { table.delete(deletes); } catch (Exception e) { System.err.println("Error: " + e); } table.close(); System.out.println("Deletes length: " + deletes.size()); for (Delete delete : deletes) { System.out.println(delete); }
Example 3-14 modifies Example 3-13 but adds an erroneous delete detail: it inserts a BOGUS column family name. The output is the same as that for Example 3-13, but has some additional details printed out in the middle part:
Before delete call... KV: row1/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 ... KV: row3/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam2:qual3/5/Put/vlen=4, Value: val5 Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NoSuchColumnFamilyException: 1 time, servers with issues: 10.0.0.43:59057, Deletes length: 1 row=row2, ts=9223372036854775807, families={(family=BOGUS, keyvalues= (row2/BOGUS:qual1/9223372036854775807/Delete/vlen=0)} After delete call... KV: row1/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam1:qual3/5/Put/vlen=4, Value: val5 ... KV: row3/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row3/colfam2:qual3/5/Put/vlen=4, Value: val5
As expected, the list contains one remaining
Delete
instance: the one with the
bogus column family. Printing out the instance—Java uses the implicit
toString()
method when
printing an object—reveals the internal details
of the failed delete. The important part is the family name being the
obvious reason for the failure. You can use this technique in your own
code to check why an operation has failed. Often the reasons are
rather obvious indeed.
Finally, note the exception that was caught
and printed out in the catch
statement of the example. It is the same RetriesExhaustedWithDetailsException
you saw
twice already. It reports the number of failed actions plus how often
it did retry to apply them, and on which server. An advanced task that
you will learn about in later chapters is how to verify and monitor
servers so that the given server address could be useful to find the
root cause of the failure.
You saw in Atomic compare-and-set how to use an atomic, conditional operation to insert data into a table. There is an equivalent call for deletes that gives you access to server-side, read-and-modify functionality:
boolean checkAndDelete(byte[] row, byte[] family, byte[] qualifier, byte[] value, Delete delete) throws IOException
You need to specify the row key, column
family, qualifier, and value to check before the actual delete
operation is performed. Should the test fail, nothing is deleted and
the call returns a false
. If the
check is successful, the delete is applied and true
is returned. Example 3-15 shows this in context.
Delete delete1 = new Delete(Bytes.toBytes("row1")); delete1.deleteColumns(Bytes.toBytes("colfam1"), Bytes.toBytes("qual3")); boolean res1 = table.checkAndDelete(Bytes.toBytes("row1"), Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), null, delete1); System.out.println("Delete successful: " + res1); Delete delete2 = new Delete(Bytes.toBytes("row1")); delete2.deleteColumns(Bytes.toBytes("colfam2"), Bytes.toBytes("qual3")); table.delete(delete2); boolean res2 = table.checkAndDelete(Bytes.toBytes("row1"), Bytes.toBytes("colfam2"), Bytes.toBytes("qual3"), null, delete1); System.out.println("Delete successful: " + res2); Delete delete3 = new Delete(Bytes.toBytes("row2")); delete3.deleteFamily(Bytes.toBytes("colfam1")); try{ boolean res4 = table.checkAndDelete(Bytes.toBytes("row1"), Bytes.toBytes("colfam1"), Bytes.toBytes("qual1"), Bytes.toBytes("val1"), delete3); System.out.println("Delete successful: " + res4); } catch (Exception e) { System.err.println("Error: " + e); }
The entire output of the example should look like this:
Before delete call... KV: row1/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row1/colfam1:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam1:qual3/5/Put/vlen=4, Value: val5 KV: row1/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam2:qual2/3/Put/vlen=4, Value: val3 KV: row1/colfam2:qual3/6/Put/vlen=4, Value: val6 KV: row1/colfam2:qual3/5/Put/vlen=4, Value: val5 Delete successful: false Delete successful: true After delete call... KV: row1/colfam1:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam1:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam1:qual2/3/Put/vlen=4, Value: val3 KV: row1/colfam2:qual1/2/Put/vlen=4, Value: val2 KV: row1/colfam2:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam2:qual2/4/Put/vlen=4, Value: val4 KV: row1/colfam2:qual2/3/Put/vlen=4, Value: val3 Error: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Action's getRow must match the passed row ...
Using null
as the value
parameter triggers the
nonexistence test, that is, the check is
successful if the column specified does not
exist. Since the example code inserts the checked column before the
check is performed, the test will initially fail, returning false
and aborting the delete
operation.
The column is then deleted by hand and the
check-and-modify call is run again. This time the check succeeds and
the delete is applied, returning true
as the overall result.
Just as with the put-related CAS call, you
can only perform the check-and-modify on the same row. The example
attempts to check on one row key while the supplied instance of
Delete
points to another. An
exception is thrown accordingly, once the check is performed. It is
allowed, though, to check across column families—for example, to have
one set of columns control how the filtering is done for another set
of columns.
This example cannot justify the importance of the check-and-delete operation. In distributed systems, it is inherently difficult to perform such operations reliably, and without incurring performance penalties caused by external locking approaches, that is, where the atomicity is guaranteed by the client taking out exclusive locks on the entire row. When the client goes away during the locked phase the server has to rely on lease recovery mechanisms ensuring that these rows are eventually unlocked again. They also cause additional RPCs to occur, which will be slower than a single, server-side operation.
You have seen how you can add, retrieve, and remove data from a table using single or list-based operations. In this section, we will look at API calls to batch different operations across multiple rows.
In fact, a lot of the internal functionality
of the list-based calls, such as delete(List<Delete> deletes)
or get(List<Get> gets)
, is based on the
batch()
call. They are more or less
legacy calls and kept for convenience. If you start fresh, it is
recommended that you use the batch()
calls for all your operations.
The following methods of the client API
represent the available batch operations. You may note the introduction of
a new class type named Row
, which is
the ancestor, or parent class, for Put
,
Get
, and Delete
.
void batch(List<Row> actions, Object[] results) throws IOException, InterruptedException Object[] batch(List<Row> actions) throws IOException, InterruptedException
Using the same parent class allows for polymorphic list items, representing any of these three operations. It is equally easy to use these calls, just like the list-based methods you saw earlier. Example 3-16 shows how you can mix the operations and then send them off as one server call.
Be aware that you should
not mix a Delete
and Put
operation for the same row in
one batch call. The operations will be applied in a different order that
guarantees the best performance, but also causes unpredictable results.
In some cases, you may see fluctuating results due to race
conditions.
private final static byte[] ROW1 = Bytes.toBytes("row1"); private final static byte[] ROW2 = Bytes.toBytes("row2"); private final static byte[] COLFAM1 = Bytes.toBytes("colfam1"); private final static byte[] COLFAM2 = Bytes.toBytes("colfam2"); private final static byte[] QUAL1 = Bytes.toBytes("qual1"); private final static byte[] QUAL2 = Bytes.toBytes("qual2"); List<Row> batch = new ArrayList<Row>(); Put put = new Put(ROW2); put.add(COLFAM2, QUAL1, Bytes.toBytes("val5")); batch.add(put); Get get1 = new Get(ROW1); get1.addColumn(COLFAM1, QUAL1); batch.add(get1); Delete delete = new Delete(ROW1); delete.deleteColumns(COLFAM1, QUAL2); batch.add(delete); Get get2 = new Get(ROW2); get2.addFamily(Bytes.toBytes("BOGUS")); batch.add(get2); Object[] results = new Object[batch.size()]; try { table.batch(batch, results); } catch (Exception e) { System.err.println("Error: " + e); } for (int i = 0; i < results.length; i++) { System.out.println("Result[" + i + "]: " + results[i]); }
You should see the following output on the console:
Before batch call... KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam1:qual2/2/Put/vlen=4, Value: val2 KV: row1/colfam1:qual3/3/Put/vlen=4, Value: val3 Result[0]: keyvalues=NONE Result[1]: keyvalues={row1/colfam1:qual1/1/Put/vlen=4} Result[2]: keyvalues=NONE Result[3]: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family BOGUS does not exist in ... After batch call... KV: row1/colfam1:qual1/1/Put/vlen=4, Value: val1 KV: row1/colfam1:qual3/3/Put/vlen=4, Value: val3 KV: row2/colfam2:qual1/1308836506340/Put/vlen=4, Value: val5 Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NoSuchColumnFamilyException: 1 time, servers with issues: 10.0.0.43:60020,
As with the previous examples, there is some wiring behind the printed lines of code that inserts a test row before executing the batch calls. The content is printed first, then you will see the output from the example code, and finally the dump of the rows after everything else. The deleted column was indeed removed, and the new column was added to the row as expected.
Finding the result of the Get
operation requires you to investigate the
middle part of the output, that is, the lines printed by the example code.
The lines starting with Result[n]
—with
n
ranging from zero to 3—is where you
see the outcome of the corresponding operation in the actions
parameter. The first operation in the
example is a Put
, and the result is an
empty Result
instance, containing no
KeyValue
instances. This is the general
contract of the batch calls; they return a best match result per input
action, and the possible types are listed in Table 3-7.
Result | Description |
null | The operation has failed to communicate with the remote server. |
Empty Result | Returned for successful Put and Delete operations. |
Result | Returned for successful Get operations, but may also be empty
when there was no matching row or column. |
Throwable | In case the servers return an exception for the operation it is returned to the client as-is. You can use it to check what went wrong and maybe handle the problem automatically in your code. |
Looking further through the returned result
array in the console output you can see the empty Result
instances printing keyvalues=NONE
. The Get
call succeeded and found a match, returning
the KeyValue
instances accordingly.
Finally, the operation with the BOGUS
column family has the exception for your perusal.
When you use the batch()
functionality, the included Put
instances will not be buffered using the client-side write buffer. The
batch()
calls are synchronous and
send the operations directly to the servers; no delay or other
intermediate processing is used. This is obviously different compared to
the put()
calls, so choose which one
you want to use carefully.
There are two different batch calls that look
very similar. The difference is that one needs to have the array handed
into the call, while the other creates it for you. So why do you need
both, and what—if any—semantical differences do they expose? Both throw
the RetriesExhaustedWithDetailsException
that you
saw already, so the crucial difference is that
void batch(List<Row> actions, Object[] results) throws IOException, InterruptedException
gives you access to the partial results, while
Object[] batch(List<Row> actions) throws IOException, InterruptedException
does not! The latter throws the exception and nothing is returned to you since the control flow of the code is interrupted before the new result array is returned.
The former function fills your given array and
then throws the exception. The code in Example 3-16 makes use of that fact and hands in the results
array. Summarizing the features, you can
say the following about the batch()
functions:
Supports gets, puts, and deletes. If there is a problem executing any of them, a client-side exception is thrown, reporting the issues. The client-side write buffer is not used.
void batch(actions,
results)
Gives access to the results of all succeeded operations, and the remote exceptions for those that failed.
Object[]
batch(actions)
Only returns the client-side exception; no access to partial results is possible.
All batch operations are executed before the results are checked: even if you receive an error for one of the actions, all the other ones have been applied. In a worst-case scenario, all actions might return faults, though.
On the other hand, the batch code is aware of
transient errors, such as the NotServingRegionException
(indicating, for
instance, that a region has been moved), and is trying to apply the
action multiple times. The hbase.client.retries.number
configuration
property (by default set to 10
) can
be adjusted to increase, or reduce, the number of retries.
Mutating operations—like put()
,
delete()
, checkAndPut()
, and so on—are executed
exclusively, which means in a serial fashion, for each row, to guarantee
row-level atomicity. The region
servers provide a row lock feature ensuring that only
a client holding the matching lock can modify a row. In practice, though,
most client applications do not provide an explicit
lock, but rather rely on the mechanism in place that guards each operation
separately.
You should avoid using row locks whenever possible. Just as with RDBMSes, you can end up in a situation where two clients create a deadlock by waiting on a locked row, with the lock held by the other client.
While the locks wait to time out, these two blocked clients are holding on to a handler, which is a scarce resource. If this happens on a heavily used row, many other clients will lock the remaining few handlers and block access to the complete server for all other clients: the server will not be able to serve any row of any region it hosts.
To reiterate: do not use row locks if you do not have to. And if you do, use them sparingly!
When you send, for example, a put()
call to the server with an instance of
Put
, created with the following
constructor:
Put(byte[] row)
which is not providing a
RowLock
instance parameter, the servers
will create a lock on your behalf, just for the duration of the call. In
fact, from the client API you cannot even retrieve this short-lived,
server-side lock instance.
Instead of relying on the implicit, server-side locking to occur, clients can also acquire explicit locks and use them across multiple operations on the same row. This is done using the following calls:
RowLock lockRow(byte[] row) throws IOException void unlockRow(RowLock rl) throws IOException
The first call, lockRow()
, takes a row key and returns an
instance of RowLock
, which you can hand
in to the constructors of Put
or
Delete
subsequently. Once you no longer
require the lock, you must release it with the
accompanying unlockRow()
call.
Each unique lock, provided by the server for you, or handed in by you through the client API, guards the row it pertains to against any other lock that attempts to access the same row. In other words, locks must be taken out against an entire row, specifying its row key, and—once it has been acquired—will protect it against any other concurrent modification.
While a lock on a row is held by someone—whether by the server briefly or a client explicitly—all other clients trying to acquire another lock on that very same row will stall, until either the current lock has been released, or the lease on the lock has expired. The latter case is a safeguard against faulty processes holding a lock for too long—or possibly indefinitely.
The default timeout on locks is one minute, but can be configured system-wide by adding the following property key to the hbase-site.xml file and setting the value to a different, millisecond-based timeout:
<property> <name>hbase.regionserver.lease.period</name> <value>120000</value> </property>
Adding the preceding code would double the timeout to 120 seconds, or two minutes, instead. Be careful not to set this value too high, since every client trying to acquire an already locked row will have to block for up to that timeout for the lock in limbo to be recovered.
Example 3-17 shows how a user-generated lock on a row will block all concurrent readers.
static class UnlockedPut implements Runnable { @Override public void run() { try { HTable table = new HTable(conf, "testtable"); Put put = new Put(ROW1); put.add(COLFAM1, QUAL1, VAL3); long time = System.currentTimeMillis(); System.out.println("Thread trying to put same row now..."); table.put(put); System.out.println("Wait time: " + (System.currentTimeMillis() - time) + "ms"); } catch (IOException e) { System.err.println("Thread error: " + e); } } } System.out.println("Taking out lock..."); RowLock lock = table.lockRow(ROW1); System.out.println("Lock ID: " + lock.getLockId()); Thread thread = new Thread(new UnlockedPut()); thread.start(); try { System.out.println("Sleeping 5secs in main()..."); Thread.sleep(5000); } catch (InterruptedException e) { // ignore } try { Put put1 = new Put(ROW1, lock); put1.add(COLFAM1, QUAL1, VAL1); table.put(put1); Put put2 = new Put(ROW1, lock); put2.add(COLFAM1, QUAL1, VAL2); table.put(put2); } catch (Exception e) { System.err.println("Error: " + e); } finally { System.out.println("Releasing lock..."); table.unlockRow(lock); }
When you run the example code, you should see the following output on the console:
Taking out lock... Lock ID: 4751274798057238718 Sleeping 5secs in main()... Thread trying to put same row now... Releasing lock... Wait time: 5007ms After thread ended... KV: row1/colfam1:qual1/1300775520118/Put/vlen=4, Value: val2 KV: row1/colfam1:qual1/1300775520113/Put/vlen=4, Value: val1 KV: row1/colfam1:qual1/1300775515116/Put/vlen=4, Value: val3
You can see how the explicit lock blocks the
thread using a different, implicit lock. The main thread sleeps for five
seconds, and once it wakes up, it calls put()
twice, setting the same column to two
different values, respectively.
Once the main thread releases the lock, the
thread’s run()
method continues to
execute and applies the third put call. An interesting observation is how
the puts are applied on the server side. Notice that the timestamps of the
KeyValue
instances show the third put
having the lowest timestamp, even though the put was seemingly applied
last. This is caused by the fact that the put()
call in the thread was executed
before the two puts in the main thread, after it had
slept for five seconds. Once a put is sent to the servers, it is assigned a
timestamp—assuming you have not provided your own—and then tries to
acquire the implicit lock. But the example code has already taken out the
lock on that row, and therefore the server-side processing stalls until
the lock is released, five seconds and a tad more later. In the preceding
output, you can also see that it took seven milliseconds to execute the
two put calls in the main thread and to unlock the row.
When you try to use an explicit row lock that
you have acquired earlier but failed to use within the lease recovery time
range, you will receive an error from the servers, in the form of an
UnknownRowLockException
. It tells you
that the server has already discarded the lock you are trying to use.
Drop it in your code and acquire a new one to recover from this state.
Now that we have discussed the basic CRUD-type operations, it is time to take a look at scans, a technique akin to cursors[56] in database systems, which make use of the underlying sequential, sorted storage layout HBase is providing.
Use of the scan operations is very similar to
the get()
methods. And again, similar
to all the other functions, there is also a supporting class,
named Scan
. But since scans
are similar to iterators, you do not have a scan()
call, but rather a getScanner()
, which
returns the actual scanner instance you need to iterate over. The
available methods are:
ResultScanner getScanner(Scan scan) throws IOException ResultScanner getScanner(byte[] family) throws IOException ResultScanner getScanner(byte[] family, byte[] qualifier) throws IOException
The latter two are for your convenience,
implicitly creating an instance of Scan
on your behalf, and subsequently calling
the getScanner(Scan scan)
method.
The Scan
class has the
following constructors:
Scan() Scan(byte[] startRow, Filter filter) Scan(byte[] startRow) Scan(byte[] startRow, byte[] stopRow)
The difference between this and the Get
class is immediately obvious: instead of
specifying a single row key, you now can optionally provide a startRow
parameter—defining the row key where
the scan begins to read from the HBase table. The optional stopRow
parameter can be used to limit the scan to a specific row key where it
should conclude the reading.
The start row is always inclusive, while the
end row is exclusive. This is often expressed as [startRow, stopRow)
in the interval
notation.
A special feature that scans offer is that you do not need to have an exact match for either of these rows. Instead, the scan will match the first row key that is equal to or larger than the given start row. If no start row was specified, it will start at the beginning of the table.
It will also end its work when the current row key is equal to or greater than the optional stop row. If no stop row was specified, the scan will run to the end of the table.
There is another optional parameter, named
filter
, referring to a Filter
instance. Often, though, the Scan
instance is simply created using the
empty constructor, as all of the optional parameters also have matching
getter and setter methods that can be used instead.
Once you have created the Scan
instance, you may want to add more
limiting details to it—but you are also allowed to use the empty scan,
which would read the entire table, including all column families and
their columns. You can narrow down the read data using various methods:
Scan addFamily(byte [] family) Scan addColumn(byte[] family, byte[] qualifier)
There is a lot of similar functionality
compared to the Get
class: you may
limit the data returned by the
scan in setting the column families to specific ones using
add
Family()
, or, even more
constraining, to only include certain columns with the add
Column()
call.
If you only need subsets of the data, narrowing the scan’s scope is playing into the strengths of HBase, since data is stored in column families and omitting entire families from the scan results in those storage files not being read at all. This is the power of column-oriented architecture at its best.
Scan setTimeRange(long minStamp, long maxStamp) throws IOException Scan setTimeStamp(long timestamp) Scan setMaxVersions() Scan setMaxVersions(int maxVersions)
A further limiting detail you can add is to set the
specific timestamp you want, using setTimestamp()
, or a wider time range with
setTimeRange()
. The same applies to
setMaxVersions()
, allowing you to
have the scan only return a specific number of versions per column, or
return them all.
Scan setStartRow(byte[] startRow) Scan setStopRow(byte[] stopRow) Scan setFilter(Filter filter) boolean hasFilter()
Using setStartRow()
, setStopRow()
, and setFilter()
, you can define the same
parameters the constructors exposed, all of them limiting the returned
data even further, as explained earlier. The additional hasFilter()
can be used to check that a filter
has been assigned.
There are a few more related methods, listed in Table 3-8.
Method | Description |
getStartRow()/getStopRow() | Can be used to retrieve the currently assigned values. |
getTimeRange() | Retrieves the associated timestamp or time range of the
Get instance. Note that there
is no getTimeStamp() since
the API converts a value assigned with setTimeStamp() into a TimeRange instance internally, setting
the minimum and maximum values to the given timestamp. |
getMaxVersions() | Returns the currently configured number of versions that should be retrieved from the table for every column. |
getFilter() | Special filter instances can be used to select certain
columns or cells, based on a wide variety of conditions. You can
get the currently assigned filter using this method. It may
return null if none was
previously set. See Filters for details. |
setCacheBlocks()/getCacheBlocks() | Each HBase region server has a block cache that efficiently retains recently accessed data for subsequent reads of contiguous information. In some events it is better to not engage the cache to avoid too much churn when doing full table scans. These methods give you control over this feature. |
numFamilies() | Convenience method to retrieve the size of the family map,
containing the families added using the addFamily() or addColumn() calls. |
hasFamilies() | Another helper to check if a family—or column—has been
added to the current instance of the Scan class. |
getFamilies()/setFamilyMap()/ getFamilyMap() | These methods give you access to the column families
and specific columns, as added by the addFamily() and/or addColumn() calls. The family map is a
map where the key is the family name and the value is a list of
added column qualifiers for this particular family. The getFamilies() returns an array of all
stored families, i.e., containing only the family names (as
byte[] arrays). |
Once you have configured the Scan
instance, you can call the HTable
method, named getScanner()
, to retrieve the ResultScanner
instance. We will discuss this
class in more detail in the next section.
Scans do not ship all the matching rows in one RPC to the client, but instead do this on a row basis. This obviously makes sense as rows could be very large and sending thousands, and most likely more, of them in one call would use up too many resources, and take a long time.
The ResultScanner
converts the scan into a
get-like operation, wrapping the Result
instance for each row into an iterator
functionality. It has a few methods of its own:
Result next() throws IOException Result[] next(int nbRows) throws IOException void close()
You have two types of next()
calls at your disposal. The close()
call is required to release all the
resources a scan may hold explicitly.
The next()
calls return a single instance of Result
representing the next available row.
Alternatively, you can fetch a larger number of rows using the next(int nbRows)
call, which returns an array
of up to nbRows
items, each an
instance of Result
, representing a
unique row. The resultant array may be shorter if there were not enough
rows left. This obviously can happen just before you reach the end of
the table, or the stop row. Otherwise, refer to The Result class for details on how to make use of the
Result
instances. This works exactly
like you saw in Get Method.
Example 3-18 brings together the explained functionality to scan a table, while accessing the column data stored in a row.
Scan scan1 = new Scan(); ResultScanner scanner1 = table.getScanner(scan1); for (Result res : scanner1) { System.out.println(res); } scanner1.close(); Scan scan2 = new Scan(); scan2.addFamily(Bytes.toBytes("colfam1")); ResultScanner scanner2 = table.getScanner(scan2); for (Result res : scanner2) { System.out.println(res); } scanner2.close(); Scan scan3 = new Scan(); scan3.addColumn(Bytes.toBytes("colfam1"), Bytes.toBytes("col-5")). addColumn(Bytes.toBytes("colfam2"), Bytes.toBytes("col-33")). setStartRow(Bytes.toBytes("row-10")). setStopRow(Bytes.toBytes("row-20")); ResultScanner scanner3 = table.getScanner(scan3); for (Result res : scanner3) { System.out.println(res); } scanner3.close();
The code inserts 100 rows with two column families, each containing 100 columns. The scans performed vary from the full table scan, to one that only scans one column family, and finally to a very restrictive scan, limiting the row range, and only asking for two very specific columns. The output should look like this:
Scanning table #3... keyvalues={row-10/colfam1:col-5/1300803775078/Put/vlen=8, row-10/colfam2:col-33/1300803775099/Put/vlen=9} keyvalues={row-100/colfam1:col-5/1300803780079/Put/vlen=9, row-100/colfam2:col-33/1300803780095/Put/vlen=10} keyvalues={row-11/colfam1:col-5/1300803775152/Put/vlen=8, row-11/colfam2:col-33/1300803775170/Put/vlen=9} keyvalues={row-12/colfam1:col-5/1300803775212/Put/vlen=8, row-12/colfam2:col-33/1300803775246/Put/vlen=9} keyvalues={row-13/colfam1:col-5/1300803775345/Put/vlen=8, row-13/colfam2:col-33/1300803775376/Put/vlen=9} keyvalues={row-14/colfam1:col-5/1300803775479/Put/vlen=8, row-14/colfam2:col-33/1300803775498/Put/vlen=9} keyvalues={row-15/colfam1:col-5/1300803775554/Put/vlen=8, row-15/colfam2:col-33/1300803775582/Put/vlen=9} keyvalues={row-16/colfam1:col-5/1300803775665/Put/vlen=8, row-16/colfam2:col-33/1300803775687/Put/vlen=9} keyvalues={row-17/colfam1:col-5/1300803775734/Put/vlen=8, row-17/colfam2:col-33/1300803775748/Put/vlen=9} keyvalues={row-18/colfam1:col-5/1300803775791/Put/vlen=8, row-18/colfam2:col-33/1300803775805/Put/vlen=9} keyvalues={row-19/colfam1:col-5/1300803775843/Put/vlen=8, row-19/colfam2:col-33/1300803775859/Put/vlen=9} keyvalues={row-2/colfam1:col-5/1300803774463/Put/vlen=7, row-2/colfam2:col-33/1300803774485/Put/vlen=8}
Once again, note the actual rows that have been matched. The lexicographical sorting of the keys makes for interesting results. You could simply pad the numbers with zeros, which would result in a more human-readable sort order. This is completely under your control, so choose carefully what you need.
So far, each call to next()
will be a separate RPC for each
row—even when you use the next(int
nbRows)
method, because it is nothing else but a client-side
loop over next()
calls. Obviously,
this is not very good for performance when dealing with small cells (see
Client-side write buffer for a
discussion). Thus it would make sense to fetch more than one row per RPC
if possible. This is called scanner caching and is
disabled by default.
You can enable it at two different levels: on
the table level, to be effective for all scan instances, or at the scan
level, only affecting the current scan. You can set the table-wide scanner caching using these
HTable
calls:
void setScannerCaching(int scannerCaching) int getScannerCaching()
You can also change the default value of 1
for the entire HBase setup. You do this by
adding the following configuration key to the hbase-site.xml configuration file:
<property> <name>hbase.client.scanner.caching</name> <value>10</value> </property>
This would set the scanner caching to 10 for
all instances of Scan
. You can
still override the value at the table and scan levels, but you would
need to do so explicitly.
The setScannerCaching()
call sets the value, while
getScannerCaching()
retrieves the
current value. Every time you call getScanner(scan)
thereafter, the API will
assign the set value to the scan instance—unless you use the scan-level
settings, which take highest precedence. This is done with the following methods of the Scan
class:
void setCaching(int caching) int getCaching()
They work the same way as the table-wide
settings, giving you control over how many rows are retrieved with every
RPC. Both types of next()
calls take
these settings into account.
You may need to find a sweet spot between a
low number of RPCs and the memory used on the client and server. Setting
the scanner caching higher will improve scanning performance most of the
time, but setting it too high can have adverse effects as well: each
call to next()
will take longer as
more data is fetched and needs to be transported to the client, and once
you exceed the maximum heap the client process has available it may
terminate with an OutOfMemoryException
.
When the time taken to transfer the rows to
the client, or to process the data on the client, exceeds the
configured scanner lease threshold, you will end up receiving a
lease expired error, in the form of a
Scanner
TimeoutException
being thrown.
Example 3-19 showcases the issue with the scanner leases.
Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan); int scannerTimeout = (int) conf.getLong( HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, -1); try { Thread.sleep(scannerTimeout + 5000); } catch (InterruptedException e) { // ignore } while (true){ try { Result result = scanner.next(); if (result == null) break; System.out.println(result); } catch (Exception e) { e.printStackTrace(); break; } } scanner.close();
The code gets the currently configured lease period value and sleeps a little longer to trigger the lease recovery on the server side. The console output (abbreviated for the sake of readability) should look similar to this:
Adding rows to table... Current (local) lease period: 60000 Sleeping now for 65000ms... Attempting to iterate over scanner... Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hbase.client.ScannerTimeoutException: 65094ms passed since the last invocation, timeout is currently set to 60000 at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext at ScanTimeoutExample.main Caused by: org.apache.hadoop.hbase.client.ScannerTimeoutException: 65094ms passed since the last invocation, timeout is currently set to 60000 at org.apache.hadoop.hbase.client.HTable$ClientScanner.next at org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext ... 1 more Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: -315058406354472427 at org.apache.hadoop.hbase.regionserver.HRegionServer.next ...
The example code prints its progress and, after sleeping for the specified time, attempts to iterate over the rows the scanner should provide. This triggers the said timeout exception, while reporting the configured values.
You might be tempted to add the following into your code:
Configuration conf = HBaseConfiguration.create() conf.setLong(HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, 120000)
assuming this increases the lease threshold (in this example, to two minutes). But that is not going to work as the value is configured on the remote region servers, not your client application. Your value is not being sent to the servers, and therefore will have no effect.
If you want to change the lease period setting you need to add the appropriate configuration key to the hbase-site.xml file on the region servers—while not forgetting to restart them for the changes to take effect!
The stack trace in the console output also
shows how the ScannerTimeoutException
is a wrapper around an UnknownScannerException
. It means that the
next()
call is using a scanner ID
that has since expired and been removed in due course. In other words,
the ID your client has memorized is now unknown to
the region servers—which is the name of the exception.
So far you have learned to use client-side scanner caching to make better use of bulk transfers between your client application and the remote region’s servers. There is an issue, though, that was mentioned in passing earlier: very large rows. Those—potentially—do not fit into the memory of the client process. HBase and its client API have an answer for that: batching. You can control batching using these calls:
void setBatch(int batch) int getBatch()
As opposed to caching, which operates on a row
level, batching works on the column level instead. It controls how many
columns are retrieved for every call to any of the next()
functions provided by the ResultScanner
instance. For example, setting
the scan to use setBatch(5)
would
return five columns per Result
instance.
When a row contains more columns than the
value you used for the batch, you will get the entire row piece by
piece, with each next Result
returned by the scanner.
The last Result
may include fewer columns, when the
total number of columns in that row is not divisible by whatever batch
it is set to. For example, if your row has 17 columns and you set the
batch to 5, you get four Result
instances, with 5, 5, 5, and the remaining two columns within.
The combination of scanner caching and batch
size can be used to control the number of RPCs required to scan the row
key range selected. Example 3-20 uses the
two parameters to fine-tune the
size of each Result
instance in
relation to the number of requests needed.
private static void scan(int caching, int batch) throws IOException { Logger log = Logger.getLogger("org.apache.hadoop"); final int[] counters = {0, 0}; Appender appender = new AppenderSkeleton() { @Override protected void append(LoggingEvent event) { String msg = event.getMessage().toString(); if (msg != null && msg.contains("Call: next")) { counters[0]++; } } @Override public void close() {} @Override public boolean requiresLayout() { return false; } }; log.removeAllAppenders(); log.setAdditivity(false); log.addAppender(appender); log.setLevel(Level.DEBUG); Scan scan = new Scan(); scan.setCaching(caching); scan.setBatch(batch); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { counters[1]++; } scanner.close(); System.out.println("Caching: " + caching + ", Batch: " + batch + ", Results: " + counters[1] + ", RPCs: " + counters[0]); } public static void main(String[] args) throws IOException { scan(1, 1); scan(200, 1); scan(2000, 100); scan(2, 100); scan(2, 10); scan(5, 100); scan(5, 20); scan(10, 10); }
The code prints out the values used for caching and batching, the number of results returned by the servers, and how many RPCs were needed to get them. For example:
Caching: 1, Batch: 1, Results: 200, RPCs: 201 Caching: 200, Batch: 1, Results: 200, RPCs: 2 Caching: 2000, Batch: 100, Results: 10, RPCs: 1 Caching: 2, Batch: 100, Results: 10, RPCs: 6 Caching: 2, Batch: 10, Results: 20, RPCs: 11 Caching: 5, Batch: 100, Results: 10, RPCs: 3 Caching: 5, Batch: 20, Results: 10, RPCs: 3 Caching: 10, Batch: 10, Results: 20, RPCs: 3
You can tweak the two numbers to see how they affect the outcome. Table 3-9 lists a few selected combinations. The numbers relate to Example 3-20, which creates a table with two column families, adds 10 rows, with 10 columns per family in each row. This means there are a total of 200 columns—or cells, as there is only one version for each column—with 20 columns per row.
Caching | Batch | Results | RPCs | Notes |
1 | 1 | 200 | 201 | Each column is returned as a separate Result instance. One more RPC is
needed to realize the scan is complete. |
200 | 1 | 200 | 2 | Each column is a separate Result , but they are all transferred
in one RPC (plus the extra check). |
2 | 10 | 20 | 11 | The batch is half the row width, so 200 divided by 10 is
20 Results needed. 10 RPCs
(plus the check) to transfer them. |
5 | 100 | 10 | 3 | The batch is too large for each row, so all 20 columns
are batched. This requires 10 Result instances. Caching brings the
number of RPCs down to two (plus the check). |
5 | 20 | 10 | 3 | This is the same as above, but this time the batch matches the columns available. The outcome is the same. |
10 | 10 | 20 | 3 | This divides the table into smaller Result instances, but larger caching also means only two RPCs
are needed. |
To compute the number of RPCs required for a scan, you need to first multiply the number of rows with the number of columns per row (at least some approximation). Then you divide that number by the smaller value of either the batch size or the columns per row. Finally, divide that number by the scanner caching value. In mathematical terms this could be expressed like so:
RPCs = (Rows * Cols per Row) / Min(Cols per Row, Batch Size) / Scanner Caching
In addition, RPCs are also required to open and close the scanner. You would need to add these two calls to get the overall total of remote calls when dealing with scanners.
Figure 3-2 shows how the caching and batching works in tandem. It has a table with nine rows, each containing a number of columns. Using a scanner caching of six, and a batch set to three, you can see that three RPCs are necessary to ship the data across the network (the dashed, rounded-corner boxes).
The small batch value causes the servers to
group three columns into one Result
,
while the scanner caching of six causes one RPC to transfer six rows—or,
more precisely, results—sent in the batch. When the
batch size is not specified but scanner caching is specified, the result
of the call will contain complete rows, because each row will be
contained in one Result
instance.
Only when you start to use the batch mode are you getting access to the
intra-row scanning functionality.
You may not have to worry about the consequences of using scanner caching and batch mode initially, but once you try to squeeze the optimal performance out of your setup, you should keep all of this in mind and find the sweet spot for both values.
Before looking into more involved features that clients can use, let us first wrap up a handful of miscellaneous features and functionality provided by HBase and its client API.
The client API is represented by an instance of the HTable
class and gives you access to an
existing HBase table. Apart from the major features we already
discussed, there are a few more notable methods of this class that you
should be aware of:
void close()
This method was mentioned before, but for the sake of
completeness, and its importance, it warrants repeating. Call
close()
once you have completed
your work with a table. It will flush any buffered write
operations: the close()
call
implicitly invokes the flushCache()
method.
byte[]
getTableName()
Configuration
getConfiguration()
This allows you to access the configuration in use by the
HTable
instance. Since this is
handed out by reference, you can make changes
that are effective immediately.
HTableDescriptor
getTableDescriptor()
As explained in Tables, each
table is defined using an instance of the HTableDescriptor
class. You gain access
to the underlying definition using getTableDescriptor()
.
static boolean
isTableEnabled(table)
There are four variants of this static helper method. They all need either an explicit configuration—if one is not provided, it will create one implicitly using the default values, and the configuration found on your application’s classpath—and a table name. It checks if the table in question is marked as enabled in ZooKeeper.
byte[][]
getStartKeys()
byte[][]
getEndKeys()
Pair<byte[][],byte[][]>
getStartEndKeys()
These calls give you access to the current physical layout
of the table—this is likely to change when you are adding more
data to it. The calls give you the start and/or end keys of all
the regions of the table. They are returned as arrays of byte
arrays. You can use Bytes.toStringBinary()
, for example, to
print out the keys.
void
clearRegionCache()
HRegionLocation
getRegionLocation(row)
Map<HRegionInfo,
HServerAddress> getRegionsInfo()
This set of methods lets you retrieve more details regarding where a row lives, that is, in what region, and the entire map of the region information. You can also clear out the cache if you wish to do so. These calls are only for advanced users that wish to make use of this information to, for example, route traffic or perform work close to where the data resides.
void
prewarmRegionCache(Map<HRegionInfo, HServerAddress>
regionMap)
static void
setRegionCachePrefetch(table, enable)
static boolean
getRegionCachePrefetch(table)
Again, this is a group of methods for advanced usage. In
Implementation it was mentioned that it would
make sense to prefetch region information on the client to avoid
more costly lookups for every row—until the local cache is stable.
Using these calls, you can either warm up the region cache while
providing a list of regions—you could, for example, use getRegionsInfo()
to gain access to the
list, and then process it—or switch on region prefetching for the
entire table.
You saw how this class was used to convert native Java types,
such as String
, or long
, into the raw, byte array format HBase
supports natively. There are a few more notes that are worth mentioning
about the class and its functionality.
Most methods come in three variations, for example:
static long toLong(byte[] bytes) static long toLong(byte[] bytes, int offset) static long toLong(byte[] bytes, int offset, int length)
You hand in just a byte array, or an array and
an offset, or an array, an offset, and a length value. The usage depends
on the originating byte array you have. If it was created by toBytes()
beforehand, you can safely use the
first variant, and simply hand in the array and nothing else. All the
array contains is the converted value.
The API, and HBase internally, store data in larger arrays, though, using, for example, the following call:
static int putLong(byte[] bytes, int offset, long val)
This call allows you to write the long
value into a given byte array, at a
specific offset. If you want to access the data in that larger byte
array you can make use of the latter two toLong()
calls instead.
The Bytes
class has support to convert from and to the following native Java
types: String
, boolean
, short
, int
,
long
, double
, and float
. Apart from that, there are some
noteworthy methods, which are listed in Table 3-10.
There is some overlap of the Bytes
class to the Java-provided ByteBuffer
. The difference is that the former
does all operations without creating new class instances. In a way it is
an optimization, because the provided methods are called many times
within HBase, while avoiding possibly costly garbage collection
issues.
For the full documentation, please consult the JavaDoc-based API documentation.[57]
[49] The region servers use a multiversion concurrency
control mechanism, implemented internally by the ReadWriteConsistencyControl
(RWCC) class, to
guarantee that readers can read without having to wait for writers.
Writers do need to wait for other writers to complete, though, before
they can continue.
[50] Universally Unique Identifier; see
http://en.wikipedia.org/wiki/Universally_unique_identifier
for details.
[51] See “Unix time” on Wikipedia.
[52] See the API documentation for the KeyValue
class for a complete description.
[53] See “Remote procedure call” on Wikipedia.
[54] For easier readability, the related details were broken up into groups using blank lines.
[56] Scans are similar to nonscrollable cursors. You need to declare, open, fetch, and eventually close a database cursor. While scans do not need the declaration step, they are otherwise used in the same way. See “Cursors” on Wikipedia.
[57] See the Bytes
documentation online.
3.145.101.109