5. Client API: Administrative Features

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Client API: Administrative Features

Apart from the client API used to deal with data manipulation features, HBase also exposes a data definition-like API. This is similar to the separation into DDL and DML found in RDBMSes. First we will look at the classes required to define the data schemas and subsequently see the API that makes use of it to, for example, create a new HBase table.

Schema Definition

Creating a table in HBase implicitly involves the definition of a table schema, as well as the schemas for all contained column families. They define the pertinent characteristics of how—and when—the data inside the table and columns is ultimately stored.

Tables

Everything stored in HBase is ultimately grouped into one or more tables. The primary reason to have tables is to be able to control certain features that all columns in this table share. The typical things you will want to define for a table are column families. The constructor of the table descriptor in Java looks like the following:

HTableDescriptor();
HTableDescriptor(String name);
HTableDescriptor(byte[] name);
HTableDescriptor(HTableDescriptor desc);

You will find that most classes provided by the API and discussed throughout this chapter do possess a special constructor, one that does not take any parameters. This is attributed to these classes implementing the Hadoop Writable interface.

Every communication between remote disjoint systems—for example, the client talking to the servers, but also the servers talking with one another—is done using the Hadoop RPC framework. It employs the Writable class to denote objects that can be sent over the network. Those objects implement the two Writable methods required:

void write(DataOutput out) throws IOException;
void readFields(DataInput in) throws IOException;

They are invoked by the framework to write the object’s data into the output stream, and subsequently read it back on the receiving system. For that the framework calls write() on the sending side, serializing the object’s fields—while the framework is taking care of noting the class name and other details on their behalf.

On the receiving server the framework reads the metadata, and will create an empty instance of the class, then call readFields() of the newly created instance. This will read back the field data and leave you with a fully working and initialized copy of the sending object.

Since the receiver needs to create the class using reflection, it is implied that it must have access to the matching, compiled class. Usually that is the case, as both the servers and clients are using the same HBase Java archive file, or JAR.

But if you develop your own extensions to HBase—for example, filters and coprocessors, as we discussed in Chapter 4—you must ensure that your custom class follows these rules:

It is available on both sides of the RPC communication channel, that is, the sending and receiving processes.
It implements the Writable interface, along with its write() and readFields() methods.
It has the parameterless constructor, that is, one without any parameters.

Failing to provide the special constructor will result in a runtime error. And calling the constructor explicitly from your code is also a futile exercise, since it leaves you with an uninitialized instance that most definitely does not behave as expected.

As a client API developer, you should simply acknowledge the underlying dependency on RPC, and how it manifests itself. As an advanced developer extending HBase, you need to implement and deploy your custom code appropriately. Custom Filters has an example and further notes.

You either create a table with a name or an existing descriptor. The constructor without any parameters is only for deserialization purposes and should not be used directly. You can specify the name of the table as a Java String or byte[], a byte array. Many functions in the HBase Java API have these two choices. The string version is plainly for convenience and converts the string internally into the usual byte array representation as HBase treats everything as such. You can achieve the same using the supplied Bytes class:

byte[] name = Bytes.toBytes("test");
HTableDescriptor desc = new HTableDescriptor(name);

There are certain restrictions on the characters you can use to create a table name. The name is used as part of the path to the actual storage files, and therefore complies with filename rules. You can later browse the low-level storage system—for example, HDFS—to see the tables as separate directories—in case you ever need to.

The column-oriented storage format of HBase allows you to store many details into the same table, which, under relational database modeling, would be divided into many separate tables. The usual database normalization^[64] rules do not apply directly to HBase, and therefore the number of tables is usually very low. More on this is discussed in Database (De-)Normalization.

Although conceptually a table is a collection of rows with columns in HBase, physically they are stored in separate partitions called regions. Figure 5-1 shows the difference between the logical and physical layout of the stored data. Every region is served by exactly one region server, which in turn serve the stored values directly to clients.

Figure 5-1. Logical and physical layout of rows within regions

Table Properties

The table descriptor offers getters and setters^[65] to set other options of the table. In practice, a lot are not used very often, but it is important to know them all, as they can be used to fine-tune the table’s performance.

Name

The constructor already had the parameter to specify the table name. The Java API has additional methods to access the name or change it.

byte[] getName();
String getNameAsString();
void setName(byte[] name);

Note

The name of a table must not start with a “.” (period) or a “-” (hyphen). Furthermore, it can only contain Latin letters or numbers, as well as “_” (underscore), “-” (hyphen), or “.” (period). In regular expression syntax, this could be expressed as [a-zA-Z_0-9-.].

For example, .testtable is wrong, but test.table is allowed.

Refer to Column Families for more details, and Figure 5-2 for an example of how the table name is used to form a filesystem path.

Column families

This is the most important part of defining a table. You need to specify the column families you want to use with the table you are creating.

void addFamily(HColumnDescriptor family);
boolean hasFamily(byte[] c);
HColumnDescriptor[] getColumnFamilies();
HColumnDescriptor getFamily(byte[]column);
HColumnDescriptor removeFamily(byte[] column);

You have the option of adding a family, checking if it exists based on its name, getting a list of all known families, and getting or removing a specific one. More on how to define the required HColumnDescriptor is explained in Column Families.

Maximum file size

This parameter is specifying the maximum size a region within the table can grow to. The size is specified in bytes and is read and set using the following methods:

long getMaxFileSize();
void setMaxFileSize(long maxFileSize);

Note

Maximum file size is actually a misnomer, as it really is about the maximum size of each store, that is, all the files belonging to each column family. If one single column family exceeds this maximum size, the region is split. Since in practice, this involves multiple files, the better name would be maxStoreSize.

The maximum size is helping the system to split regions when they reach this configured size. As discussed in Building Blocks, the unit of scalability and load balancing in HBase is the region. You need to determine what a good number for the size is, though. By default, it is set to 256 MB, which is good for many use cases, but a larger value may be required when you have a lot of data.

Please note that this is more or less a desired maximum size and that, given certain conditions, this size can be exceeded and actually be completely rendered without effect. As an example, you could set the maximum file size to 10 MB and insert a 20 MB cell in one row. Since a row cannot be split across regions, you end up with a region of at least 20 MB in size, and the system cannot do anything about it.

Read-only

By default, all tables are writable, but it may make sense to specify the read-only option for specific tables. If the flag is set to true, you can only read from the table and not modify it at all. The flag is set and read by these methods:

boolean isReadOnly();
void setReadOnly(boolean readOnly);

Memstore flush size

We discussed the storage model earlier and identified how HBase uses an in-memory store to buffer values before writing them to disk as a new storage file in an operation called flush. This parameter of the table controls when this is going to happen and is specified in bytes. It is controlled by the following calls:

long getMemStoreFlushSize();
void setMemStoreFlushSize(long memstoreFlushSize);

As you do with the aforementioned maximum file size, you need to check your requirements before setting this value to something other than the default 64 MB. A larger size means you are generating larger store files, which is good. On the other hand, you might run into the problem of longer blocking periods, if the region server cannot keep up with flushing the added data. Also, it increases the time needed to replay the write-ahead log (the WAL) if the server crashes and all in-memory updates are lost.

Deferred log flush

We will look into log flushing in great detail in Write-Ahead Log, where this option is explained. For now, note that HBase uses one of two different approaches to save write-ahead-log entries to disk. You either use deferred log flushing or not. This is a boolean option and is, by default, set to false. Here is how to access this parameter through the Java API:

synchronized boolean isDeferredLogFlush();
void setDeferredLogFlush(boolean isDeferredLogFlush);

Miscellaneous options

In addition to those already mentioned, there are methods that let you set arbitrary key/value pairs:

byte[] getValue(byte[] key) {
String getValue(String key)
Map<ImmutableBytesWritable, ImmutableBytesWritable> getValues()
void setValue(byte[] key, byte[] value)
void setValue(String key, String value)
void remove(byte[] key)

They are stored with the table definition and can be retrieved if necessary. One actual use case within HBase is the loading of coprocessors, as detailed in Coprocessor Loading. You have a few choices in terms of how to specify the key and value, either as a String, or as a byte array. Internally, they are stored as ImmutableBytesWritable, which is needed for serialization purposes (see Writable and the Parameterless Constructor).

Column Families

We just saw how the HTableDescriptor exposes methods to add column families to a table. Similar to this is a class called HColumnDescriptor that wraps each column family’s settings into a dedicated Java class. In other programming languages, you may find the same concept or some other means of specifying the column family properties.

Note

The class in Java is somewhat of a misnomer. A more appropriate name would be HColumnFamilyDescriptor, which would indicate its purpose to define column family parameters as opposed to actual columns.

Column families define shared features that apply to all columns that are created within them. The client can create an arbitrary number of columns by simply using new column qualifiers on the fly. Columns are addressed as a combination of the column family name and the column qualifier (or sometimes also called the column key), divided by a colon:

family:qualifier

The column family name must be composed of printable characters: the qualifier can be composed of any arbitrary binary characters. Recall the Bytes class mentioned earlier, which you can use to convert your chosen names to byte arrays. The reason why the family name must be printable is that because the name is used as part of the directory name by the lower-level storage layer. Figure 5-2 visualizes how the families are mapped to storage files. The family name is added to the path and must comply with filename standards. The advantage is that you can easily access families on the filesystem level as you have the name in a human-readable format.

Note

You should also be aware of the empty column qualifier. You can simply omit the qualifier and specify just the column family name. HBase then creates a column with the special empty qualifier. You can write and read that column like any other, but obviously there is only one of those, and you will have to name the other columns to distinguish them.

For simple applications, using no qualifier is an option, but it also carries no meaning when looking at the data—for example, using the HBase Shell. You should get used to naming your columns and do this from the start, because you cannot simply rename them later.

Figure 5-2. Column families mapping to separate storage files

When you create a column family, you can specify a variety of parameters that control all of its features. The Java class has many constructors that allow you to specify most parameters while creating an instance. Here are the choices:

HColumnDescriptor();
HColumnDescriptor(String familyName),
HColumnDescriptor(byte[] familyName);
HColumnDescriptor(HColumnDescriptor desc);
HColumnDescriptor(byte[] familyName, int maxVersions, String compression, 
  boolean inMemory, boolean blockCacheEnabled, int timeToLive, 
String bloomFilter);
HColumnDescriptor(byte [] familyName, int maxVersions, String compression,
  boolean inMemory, boolean blockCacheEnabled, int blocksize, 
  int timeToLive, String bloomFilter, int scope);

The first one is only used internally for deserialization again. The next two simply take a name as a String or byte[], the usual byte array we have seen many times now. There is another one that takes an existing HColumnDescriptor and then two more that list all available parameters.

Instead of using the constructor, you can also use the getters and setters to specify the various details. We will now discuss each of them.

Name

Each column family has a name, and you can use the following methods to retrieve it from an existing HColumnDescriptor instance:

byte[] getName();
String getNameAsString();

Warning

A column family cannot be renamed. The common approach to rename a family is to create a new family with the desired name and copy the data over, using the API.

You cannot set the name, but you have to use these constructors to hand it in. Keep in mind the requirement for the name to be printable characters.

Note

The name of a column family must not start with a “.” (period) and not contain “:” (colon), “/” (slash), or ISO control characters, in other words, if its code is in the range u0000 through u001F or in the range u007F through u009F.

Maximum versions

Per family, you can specify how many versions of each value you want to keep. Recall the predicate deletion mentioned earlier where the housekeeping of HBase removes values that exceed the set maximum. Getting and setting the value is done using the following API calls:

int getMaxVersions();
void setMaxVersions(int maxVersions);

The default value is 3, but you may reduce it to 1, for example, in case you know for sure that you will never want to look at older values.

Compression

HBase has pluggable compression algorithm support (you can find more on this topic in Compression) that allows you to choose the best compression—or none—for the data stored in a particular column family. The possible algorithms are listed in Table 5-1.

Table 5-1. Supported compression algorithms

Value	Description
NONE	Disables compression (default)
GZ	Uses the Java-supplied or native GZip compression
LZO	Enables LZO compression; must be installed separately
SNAPPY	Enables Snappy compression; binaries must be installed separately

The default value is NONE—in other words, no compression is enabled when you create a column family. Once you deal with the Java API and a column descriptor, you can use these methods to change the value:

Compression.Algorithm getCompression();
Compression.Algorithm getCompressionType();
void setCompressionType(Compression.Algorithm type);
Compression.Algorithm getCompactionCompression();
Compression.Algorithm getCompactionCompressionType();
void setCompactionCompressionType(Compression.Algorithm type);

Note how the value is not a String, but rather a Compression.Algorithm enumeration that exposes the same values as listed in Table 5-1. The constructor of HColumnDescriptor takes the same values as a string, though.

Another observation is that there are two sets of methods, one for the general compression setting and another for the compaction compression setting. Also, each group has a getCompression() and getCompressionType() (or getCompactionCompression() and getCompactionCompressionType(), respectively) returning the same type of value. They are indeed redundant, and you can use either to retrieve the current compression algorithm type.^[66]

We will look into this topic in much greater detail in Compression.

Block size

All stored files in HBase are divided into smaller blocks that are loaded during a get or scan operation, analogous to pages in RDBMSes. The default is set to 64 KB and can be adjusted with these methods:

synchronized int getBlocksize();
void setBlocksize(int s);

The value is specified in bytes and can be used to control how much data HBase is required to read from the storage files during retrieval as well as what is cached in memory for subsequent accesses. How this can be used to fine-tune your setup can be found in Configuration.

Note

There is an important distinction between the column family block size, or HFile block size, and the block size specified on the HDFS level. Hadoop, and HDFS specifically, is using a block size of—by default—64 MB to split up large files for distributed, parallel processing using the MapReduce framework. For HBase the HFile block size is—again by default—64 KB, or one 1024th of the HDFS block size. The storage files used by HBase are using this much more fine-grained size to efficiently load and cache data in block operations. It is independent from the HDFS block size and only used internally. See Storage for more details, especially Figure 8-3, which shows the two different block types.

Block cache

As HBase reads entire blocks of data for efficient I/O usage, it retains these blocks in an in-memory cache so that subsequent reads do not need any disk operation. The default of true enables the block cache for every read operation. But if your use case only ever has sequential reads on a particular column family, it is advisable that you disable it from polluting the block cache by setting the block cache-enabled flag to false. Here is how the API can be used to change this flag:

boolean isBlockCacheEnabled();
void setBlockCacheEnabled(boolean blockCacheEnabled);

There are other options you can use to influence how the block cache is used, for example, during a scan operation. This is useful during full table scans so that you do not cause a major churn on the cache. See for more information about this feature.

Time-to-live

HBase supports predicate deletions on the number of versions kept for each value, but also on specific times. The time-to-live (or TTL) sets a threshold based on the timestamp of a value and the internal housekeeping is checking automatically if a value exceeds its TTL. If that is the case, it is dropped during major compactions.

The API provides the following getter and setter to read and write the TTL:

int getTimeToLive();
void setTimeToLive(int timeToLive);

The value is specified in seconds and is, by default, set to Integer.MAX_VALUE or 2,147,483,647 seconds. The default value also is treated as the special case of keeping the values forever, that is, any positive value less than the default enables this feature.

In-memory

We mentioned the block cache and how HBase is using it to keep entire blocks of data in memory for efficient sequential access to values. The in-memory flag defaults to false but can be modified with these methods:

boolean isInMemory();
void setInMemory(boolean inMemory);

Setting it to true is not a guarantee that all blocks of a family are loaded into memory nor that they stay there. Think of it as a promise, or elevated priority, to keep them in memory as soon as they are loaded during a normal retrieval operation, and until the pressure on the heap (the memory available to the Java-based server processes) is too high, at which time they need to be discarded by force.

In general, this setting is good for small column families with few values, such as the passwords of a user table, so that logins can be processed very fast.

Bloom filter

An advanced feature available in HBase is Bloom filters,^[67] allowing you to improve lookup times given you have a specific access pattern (see Bloom Filters for details). Since they add overhead in terms of storage and memory, they are turned off by default. Table 5-2 shows the possible options.

Table 5-2. Supported Bloom filter types

Type	Description
NONE	Disables the filter (default)
ROW	Use the row key for the filter
ROWCOL	Use the row key and column key (family+qualifier) for the filter

Because there are many more columns than rows (unless you only have a single column in each row), the last option, ROWCOL, requires the largest amount of space. It is more fine-grained, though, since it knows about each row/column combination, as opposed to just rows.

The Bloom filter can be changed and retrieved with these calls:

StoreFile.BloomType getBloomFilterType();
void setBloomFilterType(StoreFile.BloomType bt);

As with the compression value, these methods take a StoreFile.BloomType type, while the constructor for the column descriptor lets you specify the aforementioned types as a string. The letter casing is not important, so you can, for example, use “row”. has more on the Bloom filters and how to use them best.

Replication scope

Another more advanced feature coming with HBase is replication. It enables you to have multiple clusters that ship local updates across the network so that they are applied to the remote copies.

By default, replication is disabled and the replication scope is set to 0, meaning it is disabled. You can change the scope with these functions:

int getScope();
void setScope(int scope);

The only other supported value (as of this writing) is 1, which enables replication to a remote cluster. There may be more scope values in the future. See Table 5-3 for a list of supported values.

Table 5-3. Supported replication scopes

Scope	Description
0	Local scope, i.e., no replication for this family (default)
1	Global scope, i.e., replicate family to a remote cluster

The full details can be found in Replication.

Finally, the Java class has a helper method to check if a family name is valid:

static byte[] isLegalFamilyName(byte[] b);

Use it in your program to verify user-provided input conforming to the specifications that are required for the name. It does not return a boolean flag, but throws an IllegalArgumentException when the name is malformed. Otherwise, it returns the given parameter value unchanged. The fully specified constructors shown earlier use this method internally to verify the given name; in this case, you do not need to call the method beforehand.

HBaseAdmin

Just as with the client API, you also have an API for administrative tasks at your disposal. Compare this to the Data Definition Language (DDL) found in RDBMSes—while the client API is more an analog to the Data Manipulation Language (DML).

It provides operations to create tables with specific column families, check for table existence, alter table and column family definitions, drop tables, and much more. The provided functions can be grouped into related operations; they’re discussed separately on the following pages.

Basic Operations

Before you can use the administrative API, you will have to create an instance of the HBaseAdmin class. The constructor is straightforward:

HBaseAdmin(Configuration conf) 
  throws MasterNotRunningException, ZooKeeperConnectionException

Note

This section omits the fact that most methods may throw either an IOException (or an exception that inherits from it), or an InterruptedException. The former is usually a result of a communication error between your client application and the remote servers. The latter is caused by an event that interrupts a running operation, for example, when the region server executing the command is shut down before being able to complete it.

Handing in an existing configuration instance gives enough details to the API to find the cluster using the ZooKeeper quorum, just like the client API does. Use the administrative API instance for the operation required and discard it afterward. In other words, you should not hold on to the instance for too long.

Note

The HBaseAdmin instances should be short-lived as they do not, for example, handle master failover gracefully right now.

The class implements the Abortable interface, adding the following call to it:

void abort(String why, Throwable e)

This method is called by the framework implicitly—for example, when there is a fatal connectivity issue and the API should be stopped. You should not call it directly, but rely on the system taking care of invoking it, in case of dire emergencies, that require a complete shutdown—and possible restart—of the API instance.

You can get access to the remote master using:

HMasterInterface getMaster()
  throws MasterNotRunningException, ZooKeeperConnectionException

This will return an RPC proxy instance of HMasterInterface, allowing you to communicate directly with the master server. This is not required because the HBaseAdmin class provides a convenient wrapper to all calls exposed by the master interface.

Note

Do not use the HMasterInterface returned by getMaster() directly, unless you are sure what you are doing. The wrapper functions in HBaseAdmin perform additional work—for example, checking that the input parameters are valid, converting remote exceptions to client exceptions, or adding the ability to run inherently asynchronous operations as if they were synchronous.

In addition, the HBaseAdmin class also exports these basic calls:

boolean isMasterRunning(): Checks if the master server is running. You may use it from your client application to verify that you can communicate with the master, before instantiating the HBaseAdmin class.
HConnection getConnection(): Returns a connection instance. See Connection Handling for details on the returned class type.
Configuration getConfiguration(): Gives you access to the configuration that was used to create the current HBaseAdmin instance. You can use it to modify the configuration for a running administrative API instance.
close(): Closes all resources kept by the current HBaseAdmin instance. This includes the connection to the remote servers.

Table Operations

After the first set of basic operations, there is a group of calls related to HBase tables. These calls help when working with the tables themselves, not the actual schemas inside. The commands addressing this are in Schema Operations.

Before you can do anything with HBase, you need to create tables. Here is the set of functions to do so:

void createTable(HTableDescriptor desc)
void createTable(HTableDescriptor desc, byte[] startKey,
  byte[] endKey, int numRegions)
void createTable(HTableDescriptor desc, byte[][] splitKeys)
 
void createTableAsync(HTableDescriptor desc, byte[][] splitKeys)

All of these calls must be given an instance of HTableDescriptor, as described in detail in Tables. It holds the details of the table to be created, including the column families. Example 5-1 uses the simple variant of createTable() that just takes a table name.

Example 5-1. Using the administrative API to create a table

    Configuration conf = HBaseConfiguration.create();

    HBaseAdmin admin = new HBaseAdmin(conf); 

    HTableDescriptor desc = new HTableDescriptor( 
      Bytes.toBytes("testtable"));

    HColumnDescriptor coldef = new HColumnDescriptor( 
      Bytes.toBytes("colfam1"));
    desc.addFamily(coldef);

    admin.createTable(desc); 

    boolean avail = admin.isTableAvailable(Bytes.toBytes("testtable")); 
    System.out.println("Table available: " + avail);

: Create an administrative API instance.
: Create the table descriptor instance.
: Create a column family descriptor and add it to the table descriptor.
: Call the createTable() method to do the actual work.
: Check if the table is available.

The other createTable() versions have an additional—yet more advanced—feature set: they allow you to create tables that are already populated with specific regions. The code in Example 5-2 uses both possible ways to specify your own set of region boundaries.

Example 5-2. Using the administrative API to create a table with predefined regions

  private static void printTableRegions(String tableName) throws IOException { 
    System.out.println("Printing regions of table: " + tableName);
    HTable table = new HTable(Bytes.toBytes(tableName));
    Pair<byte[][], byte[][]> pair = table.getStartEndKeys(); 
    for (int n = 0; n < pair.getFirst().length; n++) {
      byte[] sk = pair.getFirst()[n];
      byte[] ek = pair.getSecond()[n];
      System.out.println("[" + (n + 1) + "]" +
        " start key: " +
        (sk.length == 8 ? Bytes.toLong(sk) : Bytes.toStringBinary(sk)) + 
        ", end key: " +
        (ek.length == 8 ? Bytes.toLong(ek) : Bytes.toStringBinary(ek)));
    }
  }
  public static void main(String[] args) throws IOException, InterruptedException {
    Configuration conf = HBaseConfiguration.create();
    HBaseAdmin admin = new HBaseAdmin(conf);

    HTableDescriptor desc = new HTableDescriptor(
      Bytes.toBytes("testtable1"));
    HColumnDescriptor coldef = new HColumnDescriptor(
      Bytes.toBytes("colfam1"));
    desc.addFamily(coldef);

    admin.createTable(desc, Bytes.toBytes(1L), Bytes.toBytes(100L), 10); 
    printTableRegions("testtable1");

    byte[][] regions = new byte[][] { 
      Bytes.toBytes("A"),
      Bytes.toBytes("D"),
      Bytes.toBytes("G"),
      Bytes.toBytes("K"),
      Bytes.toBytes("O"),
      Bytes.toBytes("T")
    };
    desc.setName(Bytes.toBytes("testtable2"));
    admin.createTable(desc, regions); 
    printTableRegions("testtable2");
  }

: Helper method to print the regions of a table.
: Retrieve the start and end keys from the newly created table.
: Print the key, but guarding against the empty start (and end) key.
: Call the createTable() method while also specifying the region boundaries.
: Manually create region split keys.
: Call the createTable() method again, with a new table name and the list of region split keys.

Running the example should yield the following output on the console:

Printing regions of table: testtable1
[1] start key: , end key: 1
[2] start key: 1, end key: 13
[3] start key: 13, end key: 25
[4] start key: 25, end key: 37
[5] start key: 37, end key: 49
[6] start key: 49, end key: 61
[7] start key: 61, end key: 73
[8] start key: 73, end key: 85
[9] start key: 85, end key: 100
[10] start key: 100, end key: 
Printing regions of table: testtable2
[1] start key: , end key: A
[2] start key: A, end key: D
[3] start key: D, end key: G
[4] start key: G, end key: K
[5] start key: K, end key: O
[6] start key: O, end key: T
[7] start key: T, end key:

The example uses a method of the HTable class that you saw earlier, getStartEndKeys(), to retrieve the region boundaries. The first start and the last end keys are empty, as is customary with HBase regions. In between the keys are either the computed, or the provided split keys. Note how the end key of a region is also the start key of the subsequent one—just that it is exclusive for the former, and inclusive for the latter, respectively.

The createTable(HTableDescriptor desc, byte[] startKey, byte[] endKey, int numRegions) call takes a start and end key, which is interpreted as numbers. You must provide a start value that is less than the end value, and a numRegions that is at least 3: otherwise, the call will return with an exception. This is to ensure that you end up with at least a minimum set of regions.

The start and end key values are subtracted and divided by the given number of regions to compute the region boundaries. In the example, you can see how we end up with the correct number of regions, while the computed keys are filling in the range.

The createTable(HTableDescriptor desc, byte[][] splitKeys) method used in the second part of the example, on the other hand, is expecting an already set array of split keys: they form the start and end keys of the regions created. The output of the example demonstrates this as expected.

Note

The createTable() calls are, in fact, related. The createTable(HTableDescriptor desc, byte[] startKey, byte[] endKey, int numRegions) method is calculating the region keys implicitly for you, using the Bytes.split() method to use your given parameters to compute the boundaries. It then proceeds to call the createTable(HTableDescriptor desc, byte[][] splitKeys), doing the actual table creation.

Finally, there is the createTableAsync(HTableDescriptor desc, byte[][] splitKeys) method that is taking the table descriptor, and region keys, to asynchronously perform the same task as the createTable() call.

Note

Most of the table-related administrative API functions are asynchronous in nature, which is useful, as you can send off a command and not have to deal with waiting for a result. For a client application, though, it is often necessary to know if a command has succeeded before moving on with other operations. For that, the calls are provided in asynchronous—using the Async postfix—and synchronous versions.

In fact, the synchronous commands are simply a wrapper around the asynchronous ones, adding a loop at the end of the call to repeatedly check for the command to have done its task. The createTable() method, for example, wraps the createTableAsync() method, while adding a loop that waits for the table to be created on the remote servers before yielding control back to the caller.

Once you have created a table, you can use the following helper functions to retrieve the list of tables, retrieve the descriptor for an existing table, or check if a table exists:

boolean tableExists(String tableName)
boolean tableExists(byte[] tableName)
HTableDescriptor[] listTables()
HTableDescriptor getTableDescriptor(byte[] tableName)

Example 5-1 uses the tableExists() method to check if the previous command to create the table has succeeded. The listTables() returns a list of HTableDescriptor instances for every table that HBase knows about, while the getTableDescriptor() method is returning it for a specific one. Example 5-3 uses both to show what is returned by the administrative API.

Example 5-3. Listing the existing tables and their descriptors

    HBaseAdmin admin = new HBaseAdmin(conf);

    HTableDescriptor[] htds = admin.listTables();
    for (HTableDescriptor htd : htds) {
      System.out.println(htd);
    }

    HTableDescriptor htd1 = admin.getTableDescriptor(
      Bytes.toBytes("testtable1"));
    System.out.println(htd1);

    HTableDescriptor htd2 = admin.getTableDescriptor(
      Bytes.toBytes("testtable10"));
    System.out.println(htd2);

The console output is quite long, since every table descriptor is printed, including every possible property. Here is an abbreviated version:

Printing all tables...
{NAME => 'testtable1', FAMILIES => [{NAME => 'colfam1', BLOOMFILTER => 
'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', 
TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME => 'colfam2', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', 
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
'colfam3', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', 
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
...
Exception org.apache.hadoop.hbase.TableNotFoundException: testtable10
  ...
    at ListTablesExample.main(ListTablesExample.java)

The interesting part is the exception you should see being printed as well. The example uses a nonexistent table name to showcase the fact that you must be using existing table names—or wrap the call into a try/catch guard, handling the exception more gracefully.

After creating a table, it is time to also be able to delete them. The HBaseAdmin calls to do so are:

void deleteTable(String tableName)
void deleteTable(byte[] tableName)

Hand in a table name as a String, or a byte array, and the rest is taken care of: the table is removed from the servers, and all data deleted.

But before you can delete a table, you need to ensure that it is first disabled, using the following methods:

void disableTable(String tableName)
void disableTable(byte[] tableName)
void disableTableAsync(String tableName)
void disableTableAsync(byte[] tableName)

Disabling the table first tells every region server to flush any uncommitted changes to disk, close all the regions, and update the .META. table to reflect that no region of this table is not deployed to any servers.

The choices are again between doing this asynchronously, or synchronously, and supplying the table name in various formats for convenience.

Note

Disabling a table can potentially take a very long time, up to several minutes. This depends on how much data is residual in the server’s memory and not yet persisted to disk. Undeploying a region requires all the data to be written to disk first, and if you have a large heap value set for the servers this may result in megabytes, if not even gigabytes, of data being saved. In a heavily loaded system this could contend with other processes writing to disk, and therefore require time to complete.

Once a table has been disabled, but not deleted, you can enable it again:

void enableTable(String tableName)
void enableTable(byte[] tableName)
void enableTableAsync(String tableName)
void enableTableAsync(byte[] tableName)

This call—again available in the usual flavors—reverses the disable operation by deploying the regions of the given table to the active region servers. Finally, there is a set of calls to check on the status of a table:

boolean isTableEnabled(String tableName)
boolean isTableEnabled(byte[] tableName)
boolean isTableDisabled(String tableName)
boolean isTableDisabled(byte[] tableName)
boolean isTableAvailable(byte[] tableName)
boolean isTableAvailable(String tableName)

Example 5-4 uses various combinations of the preceding calls to create, delete, disable, and check the state of a table.

Example 5-4. Using the various calls to disable, enable, and check the status of a table

    HBaseAdmin admin = new HBaseAdmin(conf);

    HTableDescriptor desc = new HTableDescriptor(
      Bytes.toBytes("testtable"));
    HColumnDescriptor coldef = new HColumnDescriptor(
      Bytes.toBytes("colfam1"));
    desc.addFamily(coldef);
    admin.createTable(desc);

    try {
      admin.deleteTable(Bytes.toBytes("testtable"));
    } catch (IOException e) {
      System.err.println("Error deleting table: " + e.getMessage());
    }

    admin.disableTable(Bytes.toBytes("testtable"));
    boolean isDisabled = admin.isTableDisabled(Bytes.toBytes("testtable"));
    System.out.println("Table is disabled: " + isDisabled);

    boolean avail1 = admin.isTableAvailable(Bytes.toBytes("testtable"));
    System.out.println("Table available: " + avail1);

    admin.deleteTable(Bytes.toBytes("testtable"));

    boolean avail2 = admin.isTableAvailable(Bytes.toBytes("testtable"));
    System.out.println("Table available: " + avail2);

    admin.createTable(desc);
    boolean isEnabled = admin.isTableEnabled(Bytes.toBytes("testtable"));
    System.out.println("Table is enabled: " + isEnabled);

The output on the console should look like this (the exception printout was abbreviated, for the sake of brevity):

Creating table...
Deleting enabled table...
Error deleting table: 
  org.apache.hadoop.hbase.TableNotDisabledException: testtable
  ...
Disabling table...
Table is disabled: true
Table available: true
Deleting disabled table...
Table available: false
Creating table again...
Table is enabled: true

The error thrown when trying to delete an enabled table shows that you either disable it first, or handle the exception gracefully in case that is what your client application requires. You could prompt the user to disable the table explicitly and retry the operation.

Also note how the isTableAvailable() is returning true, even when the table is disabled. In other words, this method checks if the table is physically present, no matter what its state is. Use the other two functions, isTableEnabled() and isTableDisabled(), to check for the state of the table.

After creating your tables with the specified schema, you must either delete the newly created table to change the details, or use the following method to alter its structure:

void modifyTable(byte[] tableName, HTableDescriptor htd)

As with the aforementioned deleteTable() commands, you must first disable the table to be able to modify it. Example 5-5 does create a table, and subsequently modifies it.

Example 5-5. Modifying the structure of an existing table

    byte[] name = Bytes.toBytes("testtable");
    HBaseAdmin admin = new HBaseAdmin(conf);
    HTableDescriptor desc = new HTableDescriptor(name);
    HColumnDescriptor coldef1 = new HColumnDescriptor(
      Bytes.toBytes("colfam1"));
    desc.addFamily(coldef1);

    admin.createTable(desc); 

    HTableDescriptor htd1 = admin.getTableDescriptor(name); 
    HColumnDescriptor coldef2 = new HColumnDescriptor(
      Bytes.toBytes("colfam2"));
    htd1.addFamily(coldef2);
    htd1.setMaxFileSize(1024 * 1024 * 1024L);

    admin.disableTable(name);
    admin.modifyTable(name, htd1); 
    admin.enableTable(name);

    HTableDescriptor htd2 = admin.getTableDescriptor(name);
    System.out.println("Equals: " + htd1.equals(htd2)); 
    System.out.println("New schema: " + htd2);

: Create the table with the original structure.
: Get the schema, and update by adding a new family and changing the maximum file size property.
: Disable, modify, and enable the table.
: Check if the table schema matches the new one created locally.

The output shows that both the schema modified in the client code and the final schema retrieved from the server after the modification are consistent:

Equals: true
New schema: {NAME => 'testtable', MAX_FILESIZE => '1073741824', FAMILIES => 
[{NAME => 'colfam1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', 
COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => 
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'colfam2', 
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', 
VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 
'false', BLOCKCACHE => 'true'}]}

Calling the equals() method on the HTableDescriptor class compares the current with the specified instance and returns true if they match in all properties, also including the contained column families and their respective settings.

Note

The modifyTable() call is asynchronous, and there is no synchronous variant. If you want to make sure that changes have been propagated to all the servers and applied accordingly, you should use the getTableDescriptor() call and loop over it in your client code until the schema you sent matches up with the remote schema.

Schema Operations

Besides using the modifyTable() call, there are dedicated methods provided by the HBaseAdmin class to modify specific aspects of the current table schema. As usual, you need to make sure the table to be modified is disabled first.

The whole set of column-related methods is as follows:

void addColumn(String tableName, HColumnDescriptor column)
void addColumn(byte[] tableName, HColumnDescriptor column) 
void deleteColumn(String tableName, String columnName) 
void deleteColumn(byte[] tableName, byte[] columnName) 
void modifyColumn(String tableName, HColumnDescriptor descriptor)
void modifyColumn(byte[] tableName, HColumnDescriptor descriptor)

You can add, delete, and modify columns. Adding or modifying a column requires that you first prepare an HColumnDescriptor instance, as described in detail in Column Families. Alternatively, you could use the getTableDescriptor() call to retrieve the current table schema, and subsequently invoke getColumnFamilies() on the returned HTableDescriptor instance to retrieve the existing columns.

Otherwise, you supply the table name—and optionally the column name for the delete calls—in one of the common format variations to eventually invoke the method of choice. All of these calls are asynchronous, so as mentioned before, caveat emptor.

An interesting use case for the administrative API is to create and alter tables and their schemas based on an external configuration file. Hush is making use of this idea and defines the table and column descriptors in an XML file, which is read and the contained schema compared with the current table definitions. If there are any differences they are applied accordingly. The following example has the core of the code that does this task:

  private void createOrChangeTable(final TableSchema schema)
    throws IOException {
    HTableDescriptor desc = null;
    if (tableExists(schema.getName(), false)) {
      desc = getTable(schema.getName(), false);
      LOG.info("Checking table " + desc.getNameAsString() + "...");
      final HTableDescriptor d = convertSchemaToDescriptor(schema);

      final List<HColumnDescriptor> modCols =
        new ArrayList<HColumnDescriptor>();
      for (final HColumnDescriptor cd : desc.getFamilies()) {
        final HColumnDescriptor cd2 = d.getFamily(cd.getName());
        if (cd2 != null && !cd.equals(cd2)) { 
          modCols.add(cd2);
        }
      }
      final List<HColumnDescriptor> delCols =
        new ArrayList<HColumnDescriptor>(desc.getFamilies());
      delCols.removeAll(d.getFamilies());
      final List<HColumnDescriptor> addCols =
        new ArrayList<HColumnDescriptor>(d.getFamilies());
      addCols.removeAll(desc.getFamilies());

      if (modCols.size() > 0 || addCols.size() > 0 || delCols.size() > 0 || 
          !hasSameProperties(desc, d)) {
        LOG.info("Disabling table...");
        hbaseAdmin.disableTable(schema.getName());
        if (modCols.size() > 0 || addCols.size() > 0 || delCols.size() > 0) {
          for (final HColumnDescriptor col : modCols) {
            LOG.info("Found different column -> " + col);
            hbaseAdmin.modifyColumn(schema.getName(), col.getNameAsString(), 
              col);
          }
          for (final HColumnDescriptor col : addCols) {
            LOG.info("Found new column -> " + col);
            hbaseAdmin.addColumn(schema.getName(), col); 
          }
          for (final HColumnDescriptor col : delCols) {
            LOG.info("Found removed column -> " + col);
            hbaseAdmin.deleteColumn(schema.getName(), col.getNameAsString()); 
          }
        } else if (!hasSameProperties(desc, d)) {
          LOG.info("Found different table properties...");
          hbaseAdmin.modifyTable(Bytes.toBytes(schema.getName()), d); 
        }
        LOG.info("Enabling table...");
        hbaseAdmin.enableTable(schema.getName());
        LOG.info("Table enabled");
        desc = getTable(schema.getName(), false);
        LOG.info("Table changed");
      } else {
        LOG.info("No changes detected!");
      }
    } else {
      desc = convertSchemaToDescriptor(schema);
      LOG.info("Creating table " + desc.getNameAsString() + "...");
      hbaseAdmin.createTable(desc); 
      LOG.info("Table created");
    }
  }

: Compute the differences between the XML-based schema and what is currently in HBase.
: See if there are any differences in the column and table definitions.
: Alter the columns that have changed. The table was properly disabled first.
: Add newly defined columns.
: Delete removed columns.
: Alter the table itself, if there are any differences found.
: If the table did not exist yet, create it now.

Cluster Operations

The last group of operations the HBaseAdmin class exposes is related to cluster operations. They allow you to check the status of the cluster, and perform tasks on tables and/or regions. The Region Life Cycle has the details on regions and their life cycle.

Warning

Many of the following operations are for advanced users, so please handle with care.

static void checkHBaseAvailable(Configuration conf) ClusterStatus getClusterStatus()

You can use checkHBaseAvailable() to verify that your client application can communicate with the remote HBase cluster, as specified in the given configuration file. If it fails to do so, an exception is thrown—in other words, this method does not return a boolean flag, but either silently succeeds, or throws said error.

The getClusterStatus() call allows you to retrieve an instance of the ClusterStatus class, containing detailed information about the cluster status. See Cluster Status Information for what you are provided with.

void closeRegion(String regionname, String hostAndPort) void closeRegion(byte[] regionname, String hostAndPort)

Use these calls to close regions that have previously been deployed to region servers. Any enabled table has all regions enabled, so you could actively close and undeploy a region.

You need to supply the exact regionname as stored in the .META. table. Further, you may optionally supply the hostAndPort parameter, that overrides the server assignment as found in the .META. as well.

Using this close call does bypass any master notification, that is, the region is directly closed by the region server, unseen by the master node.

void flush(String tableNameOrRegionName) void flush(byte[] tableNameOrRegionName)

As updates to a region (and the table in general) accumulate the MemStore instances of the region, servers fill with unflushed modifications. A client application can use these synchronous methods to flush such pending records to disk, before they are implicitly written by hitting the memstore flush size (see Table Properties) at a later time.

The method takes either a region name, or a table name. The value provided by your code is tested to see if it matches an existing table; if it does, it is assumed to be a table, otherwise it is treated as a region name. If you specify neither a proper table nor a region name, an UnknownRegionException is thrown.

void compact(String tableNameOrRegionName) void compact(byte[] tableNameOrRegionName)

Similar to the preceding operations, you must give either a table or a region name. The call itself is asynchronous, as compactions can potentially take a long time to complete. Invoking this method queues the table, or region, for compaction, which is executed in the background by the server hosting the named region, or by all servers hosting any region of the given table (see Auto-Sharding for details on compactions).

void majorCompact(String tableNameOrRegionName) void majorCompact(byte[] tableNameOrRegionName)

These are the same as the compact() calls, but they queue the region, or table, for a major compaction instead. In case a table name is given, the administrative API iterates over all regions of the table and invokes the compaction call implicitly for each of them.

void split(String tableNameOrRegionName) void split(byte[] tableNameOrRegionName) void split(String tableNameOrRegionName, String splitPoint) void split(byte[] tableNameOrRegionName, byte[] splitPoint)

Using these calls allows you to split a specific region, or table. In case a table name is given, it iterates over all regions of that table and implicitly invokes the split command on each of them.

A noted exception to this rule is when the splitPoint parameter is given. In that case, the split() command will try to split the given region at the provided row key. In the case of specifying a table name, all regions are checked and the one containing the splitPoint is split at the given key.

The splitPoint must be a valid row key, and—in case you specify a region name—be part of the region to be split. It also must be greater than the region’s start key, since splitting a region at its start key would make no sense. If you fail to give the correct row key, the split request is ignored without reporting back to the client. The region server currently hosting the region will log this locally with the following message:

Split row is not inside region key range or is equal to startkey:
<split row>

void assign(byte[] regionName, boolean force) void unassign(byte[] regionName, boolean force)

When a client requires a region to be deployed or undeployed from the region servers, it can invoke these calls. The first would assign a region, based on the overall assignment plan, while the second would unassign the given region.

The force parameter set to true has different meanings for each of the calls: first, for assign(), it forces the region to be marked as unassigned in ZooKeeper before continuing in its attempt to assign the region to a new region server. Be careful when using this on already-assigned regions.

Second, for unassign(), it means that a region already marked to be unassigned—for example, from a previous call to unassign()—is forced to be unassigned again. If force were set to false, this would have no effect.

void move(byte[] encodedRegionName, byte[] destServerName)

Using the move() call enables a client to actively control which server is hosting what regions. You can move a region from its current region server to a new one. The destServerName parameter can be set to null to pick a new server at random; otherwise, it must be a valid server name, running a region server process. If the server name is wrong, or currently not responding, the region is deployed to a different server instead. In a worst-case scenario, the move could fail and leave the region unassigned.

boolean balanceSwitch(boolean b) boolean balancer()

The first method allows you to switch the region balancer on or off. When the balancer is enabled, a call to balancer() will start the process of moving regions from the servers, with more deployed to those with less deployed regions. Load Balancing explains how this works in detail.

void shutdown() void stopMaster() { void stopRegionServer(String hostnamePort)

These calls either shut down the entire cluster, stop the master server, or stop a particular region server only. Once invoked, the affected servers will be stopped, that is, there is no delay nor a way to revert the process.

Chapters 8 and 11 have more information on these advanced—yet very powerful—features. Use with utmost care!

Cluster Status Information

When you query the cluster status using the HBaseAdmin.getClusterStatus() call, you will be given a ClusterStatus instance, containing all the information the master server has about the current state of the cluster. Note that this class also has setters—methods starting with set, allowing you to modify the information they contain—but since you will be given a copy of the current state, it is impractical to call the setters, unless you want to modify your local-only copy.

Table 5-4 lists the methods of the ClusterStatus class.

Table 5-4. Quick overview of the information provided by the ClusterStatus class

Method	Description
`int getServersSize()`	The number of region servers currently live as known to the master server. The number does not include the number of dead servers.
`Collection<ServerName> getServers()`	The list of live servers. The names in the collection are `ServerName` instances, which contain the hostname, RPC port, and start code.
`int getDeadServers()`	The number of servers listed as dead. This does not contain the live servers.
`Collection<ServerName> getDeadServerNames()`	A list of all server names currently considered dead. The names in the collection are `ServerName` instances, which contain the hostname, RPC port, and start code.
`double getAverageLoad()`	The total average number of regions per region server. This is the same currently as `getRegionsCount()/getServers()`.
`int getRegionsCount()`	The total number of regions in the cluster.
`int getRequestsCount()`	The current number of requests across all regions’ servers in the cluster.
`String getHBaseVersion()`	Returns the HBase version identification string.
`byte getVersion()`	Returns the version of the `ClusterStatus` instance. This is used during the serialization process of sending an instance over RPC.
`String getClusterId()`	Returns the unique identifier for the cluster. This is a UUID generated when HBase starts with an empty storage directory. It is stored in hbase.id under the root directory.
`Map<String, RegionState>getRegionsInTransition()`	Gives you access to a map of all regions currently in transition, e.g., being moved, assigned, or unassigned. The key of the map is the encoded region name (as returned by `HRegionInfo.getEncodedName()`, for example), while the value is an instance of `RegionState`.^[a]
`HServerLoad getLoad(ServerName sn)`	Retrieves the status information available for the given server name.
^[a]See The Region Life Cycle for the details.

Accessing the overall cluster status gives you a high-level view of what is going on with your servers—as a whole. Using the getServers() array, and the returned ServerName instances, lets you drill further into each actual live server, and see what it is doing currently. Table 5-5 lists the available methods.

Table 5-5. Quick overview of the information provided by the ServerName class

Method	Description
`String getHostname()`	Returns the hostname of the server. This might resolve to the IP address, when the hostname cannot be looked up.
`String getHostAndPort()`	Concatenates the hostname and RPC port, divided by a colon: `<hostname>:<rpc-port>`.
`long getStartcode()`	The start code is the epoch time in milliseconds of when the server was started, as returned by `System.currentTimeMillis()`.
`String getServerName()`	The server name, consisting of `<hostname>,<rpc-port>,<start-code>`.
`int getPort()`	Specifies the port used by the server for the RPCs.

Each server also exposes details about its load, by offering an HServerLoad instance, returned by the getLoad() method of the ClusterStatus instance. Using the aforementioned ServerName, as returned by the getServers() call, you can iterate over all live servers and retrieve their current details. The HServerLoad class gives you access to not just the load of the server itself, but also for each hosted region. Table 5-6 lists the provided methods.

Table 5-6. Quick overview of the information provided by the HServerLoad class

Method	Description
`byte getVersion()`	Returns the version of the `HServerLoad` instance. This is used during the serialization process of sending an instance over RPC.
`int getLoad()`	Currently returns the same value as `getNumberOfRegions()`.
`int getNumberOfRegions()`	The number of regions on the current server.
`int getNumberOfRequests()`	Returns the number of requests accumulated within the last `hbase.regionserver.msginterval` time frame. It is reset at the end of this time frame, and counts all API requests, such as gets, puts, increments, deletes, and so on.
`int getUsedHeapMB()`	The currently used Java Runtime heap size in megabytes.
`int getMaxHeapMB()`	The configured maximum Java Runtime heap size in megabytes.
`int getStorefiles()`	The number of store files in use by the server. This is across all regions it hosts.
`int getStorefileSizeInMB()`	The total size in megabytes of the used store files.
`int getStorefileIndexSizeInMB()`	The total size in megabytes of the indexes—the block and meta index, to be precise—across all store files in use by this server.
`int getMemStoreSizeInMB()`	The total size of the in-memory stores, across all regions hosted by this server.
`Map<byte[], RegionLoad>getRegionsLoad()`	Returns a map containing the load details for each hosted region of the current server. The key is the region name and the value an instance of the `RegionsLoad` class, discussed next.

Finally, there is a dedicated class for the region load, aptly named RegionLoad. See Table 5-7 for the list of provided information.

Table 5-7. Quick overview of the information provided by the RegionLoad class

Method	Description
`byte[] getName()`	The region name in its raw, `byte[]` byte array form.
`String getNameAsString()`	Converts the raw region name into a `String` for convenience.
`int getStores()`	The number of stores in this region.
`int getStorefiles()`	The number of store files, across all stores of this region.
`int getStorefileSizeMB()`	The size in megabytes of the store files for this region.
`int getStorefileIndexSizeMB()`	The size of the indexes for all store files, in megabytes, for this region.
`int getMemStoreSizeMB()`	The heap size in megabytes as used by the `MemStore` of the current region.
`long getRequestsCount()`	The number of requests for the current region.
`long getReadRequestsCount()`	The number of read requests for this region, since it was deployed to the region server. This counter is not reset.
`long getWriteRequestsCount()`	The number of write requests for this region, since it was deployed to the region server. This counter is not reset.

Example 5-6 shows all of the getters in action.

Example 5-6. Reporting the status of a cluster

    HBaseAdmin admin = new HBaseAdmin(conf);

    ClusterStatus status = admin.getClusterStatus(); 

    System.out.println("Cluster Status:
--------------");
    System.out.println("HBase Version: " + status.getHBaseVersion());
    System.out.println("Version: " + status.getVersion());
    System.out.println("No. Live Servers: " + status.getServersSize());
    System.out.println("Cluster ID: " + status.getClusterId());
    System.out.println("Servers: " + status.getServers());
    System.out.println("No. Dead Servers: " + status.getDeadServers());
    System.out.println("Dead Servers: " + status.getDeadServerNames());
    System.out.println("No. Regions: " + status.getRegionsCount());
    System.out.println("Regions in Transition: " +
      status.getRegionsInTransition());
    System.out.println("No. Requests: " + status.getRequestsCount());
    System.out.println("Avg Load: " + status.getAverageLoad());

    System.out.println("
Server Info:
--------------");
    for (ServerName server : status.getServers()) { 
      System.out.println("Hostname: " + server.getHostname());
      System.out.println("Host and Port: " + server.getHostAndPort());
      System.out.println("Server Name: " + server.getServerName());
      System.out.println("RPC Port: " + server.getPort());
      System.out.println("Start Code: " + server.getStartcode());

      HServerLoad load = status.getLoad(server); 

      System.out.println("
Server Load:
--------------");
      System.out.println("Load: " + load.getLoad());
      System.out.println("Max Heap (MB): " + load.getMaxHeapMB());
      System.out.println("Memstore Size (MB): " + load.getMemStoreSizeInMB());
      System.out.println("No. Regions: " + load.getNumberOfRegions());
      System.out.println("No. Requests: " + load.getNumberOfRequests());
      System.out.println("Storefile Index Size (MB): " +
        load.getStorefileIndexSizeInMB());
      System.out.println("No. Storefiles: " + load.getStorefiles());
      System.out.println("Storefile Size (MB): " + load.getStorefileSizeInMB());
      System.out.println("Used Heap (MB): " + load.getUsedHeapMB());

      System.out.println("
Region Load:
--------------");
      for (Map.Entry<byte[], HServerLoad.RegionLoad> entry : 
        load.getRegionsLoad().entrySet()) {
        System.out.println("Region: " + Bytes.toStringBinary(entry.getKey()));

        HServerLoad.RegionLoad regionLoad = entry.getValue(); 

        System.out.println("Name: " + Bytes.toStringBinary(
          regionLoad.getName()));
        System.out.println("No. Stores: " + regionLoad.getStores());
        System.out.println("No. Storefiles: " + regionLoad.getStorefiles());
        System.out.println("Storefile Size (MB): " +
          regionLoad.getStorefileSizeMB());
        System.out.println("Storefile Index Size (MB): " +
          regionLoad.getStorefileIndexSizeMB());
        System.out.println("Memstore Size (MB): " +
          regionLoad.getMemStoreSizeMB());
        System.out.println("No. Requests: " + regionLoad.getRequestsCount());
        System.out.println("No. Read Requests: " +
          regionLoad.getReadRequestsCount());
        System.out.println("No. Write Requests: " +
          regionLoad.getWriteRequestsCount());
        System.out.println();
      }
    }

: Get the cluster status.
: Iterate over the included server instances.
: Retrieve the load details for the current server.
: Iterate over the region details of the current server.
: Get the load details for the current region.

On a standalone setup, and having run the earlier examples in the book, you should see something like this:

Cluster Status:
--------------
Avg Load: 12.0
HBase Version: 0.91.0-SNAPSHOT
Version: 2
No. Servers: [10.0.0.64,60020,1304929650573]
No. Dead Servers: 0
Dead Servers: []
No. Regions: 12
No. Requests: 0

Server Info:
--------------
Hostname: 10.0.0.64
Host and Port: 10.0.0.64:60020
Server Name: 10.0.0.64,60020,1304929650573
RPC Port: 60020
Start Code: 1304929650573

Server Load:
--------------
Load: 12
Max Heap (MB): 987
Memstore Size (MB): 0
No. Regions: 12
No. Requests: 0
Storefile Index Size (MB): 0
No. Storefiles: 3
Storefile Size (MB): 0
Used Heap (MB): 62

Region Load:
--------------
Region: -ROOT-,,0
Name: -ROOT-,,0
No. Stores: 1
No. Storefiles: 1
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 52
No. Read Requests: 51
No. Write Requests: 1

Region: .META.,,1
Name: .META.,,1
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 4764
No. Read Requests: 4734
No. Write Requests: 30

Region: hush,,1304930393059.1ae3ea168c42fa9c855051c888ed36d4.
Name: hush,,1304930393059.1ae3ea168c42fa9c855051c888ed36d4.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 20
No. Read Requests: 14
No. Write Requests: 6

Region: ldom,,1304930390882.520fc727a3ce79749bcbbad51e138fff.
Name: ldom,,1304930390882.520fc727a3ce79749bcbbad51e138fff.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 14
No. Read Requests: 6
No. Write Requests: 8

Region: sdom,,1304930389795.4a49f5ba47e4466d284cea27629c26cc.
Name: sdom,,1304930389795.4a49f5ba47e4466d284cea27629c26cc.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 8
No. Read Requests: 0
No. Write Requests: 8

Region: surl,,1304930386482.c965c89368951cf97d2339a05bc4bad5.
Name: surl,,1304930386482.c965c89368951cf97d2339a05bc4bad5.
No. Stores: 4
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 1329
No. Read Requests: 1226
No. Write Requests: 103

Region: testtable,,1304930621191.962abda0515c910ed91f7520e71ba101.
Name: testtable,,1304930621191.962abda0515c910ed91f7520e71ba101.
No. Stores: 2
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 29
No. Read Requests: 0
No. Write Requests: 29

Region: testtable,row-030,1304930621191.0535bb40b407321d499d65bab9d3b2d7.
Name: testtable,row-030,1304930621191.0535bb40b407321d499d65bab9d3b2d7.
No. Stores: 2
No. Storefiles: 2
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 6
No. Read Requests: 6
No. Write Requests: 0

Region: testtable,row-060,1304930621191.81b04004d72bd28cc877cb1514dbab35.
Name: testtable,row-060,1304930621191.81b04004d72bd28cc877cb1514dbab35.
No. Stores: 2
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 41
No. Read Requests: 0
No. Write Requests: 41

Region: url,,1304930387617.a39d16967d51b020bb4dad13a80a1a02.
Name: url,,1304930387617.a39d16967d51b020bb4dad13a80a1a02.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 11
No. Read Requests: 8
No. Write Requests: 3

Region: user,,1304930388702.60bae27e577a620ae4b59bc830486233.
Name: user,,1304930388702.60bae27e577a620ae4b59bc830486233.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 11
No. Read Requests: 9
No. Write Requests: 2

Region: user-surl,,1304930391974.71b9cecc9c111a5217bd1a81bde60418.
Name: user-surl,,1304930391974.71b9cecc9c111a5217bd1a81bde60418.
No. Stores: 1
No. Storefiles: 0
Storefile Size (MB): 0
Storefile Index Size (MB): 0
Memstore Size (MB): 0
No. Requests: 24
No. Read Requests: 21
No. Write Requests: 3

^[64]See “Database normalization” on Wikipedia.

^[65]Getters and setters in Java are methods of a class that expose internal fields in a controlled manner. They are usually named like the field, prefixed with get and set, respectively—for example, getName() and setName().

^[66]After all, this is open source and a redundancy like this is often caused by legacy code being carried forward. Please feel free to help clean this up and to contribute back to the HBase project.

^[67]See “Bloom filter” on Wikipedia.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Client API: Administrative Features

Create new playlist

Sign In

Sign Up

Chapter 5. Client API: Administrative Features

Schema Definition

Tables

Table Properties

Note

Note

Column Families

Note

Note

Warning

Note

Note

HBaseAdmin

Basic Operations

Note

Note

Note

Table Operations

Note

Note

Note

Note

Schema Operations

Cluster Operations

Warning

Cluster Status Information

Table of Contents for
5. Client API: Administrative Features