The metadata table contains a row for every tablet in Accumulo. Tablets are uniquely described by the ID of their table and the last row in the range assigned to the tablet, or end row. Table B-1 describes the columns that can appear in a tablet’s row in the metadata table, and Table B-2 shows some sample entries from a real metadata table.
In addition to tablet entries, there is a section of the metadata table that records file deletion entries. There is also a section for files that are in the process of being bulk-imported into Accumulo, to assist the garbage collector in not deleting these files prematurely. More about file deletion can be found in “Garbage Collector”.
Row | Column family | Column qualifier | Value |
---|---|---|---|
table id ; tablet end row |
|
regular data file name |
size in bytes , number of keys |
table id ; tablet end row |
|
tablet server session id |
tserver IP : port |
table id ; tablet end row |
|
tablet server session id |
tserver IP : port |
table id ; tablet end row |
|
tablet server session id |
tserver IP : port |
table id ; tablet end row |
|
server |
log set |
table id ; tablet end row |
|
file currently being scanned |
|
table id ; tablet end row |
|
|
compaction id |
table id ; tablet end row |
|
|
tablet directory |
table id ; tablet end row |
|
|
flush id |
table id ; tablet end row |
|
|
zookeeper lock location |
table id ; tablet end row |
|
|
|
table id ; tablet end row |
|
|
|
The row ID for a tablet contains the table ID and the tablet end row separated by a semicolon.
For the last tablet in a table, there is no end row.
The row for that tablet is the table ID followed by <
.
Rows starting with ~del
are for deletion entries and rows starting with ~blip
are for files that are in the process of being bulk loaded.
These entries also contain the name of the file marked for deletion or bulk loading.
There are also entries for problems with loading resources.
If the problem involves the metadata table, the information about the problem is written directly to ZooKeeper, but problems with other tablets are written to the metadata table.
These entries have row ID beginning with ~err
and also containing the table name.
The column family is either FILE_READ
, FILE_WRITE
, or TABLET_LOAD
, indicating the type of problem, and the column qualifier is the resource name, which is either a filename or a tablet key extent (prev row and end row).
The value contains additional information such as the time the problem occurred, the server, and the exception if available.
This column family contains information about a tablet’s files. The column qualifier is the name of the file and the value contains information about the file, its size in bytes, and number of keys. Under some conditions these values are estimates. For example, when a tablet is split, the two resulting tablets’ file entries will each be assumed to contain about half the bytes and number of keys of the original tablet’s files.
The first letter of the filename (the actual file name, not including its path) indicates what type of operation created the file:
Minor compaction
Major compaction
Full major compaction
Merging minor compaction
Bulk import
This column family is used to ensure that files are not deleted while they are being scanned. The column qualifier is the name of a file currently being scanned. The garbage collector takes this information into account when determining which files are still in use and which can be safely deleted.
These column families contain information about where a tablet has been assigned. The future column contains the current assignment. The loc column contains the current assignment once the tablet has been successfully loaded by the assigned tablet server. The last column is the last assignment, used to try to reassign a tablet to the same server to improve data locality.
The column qualifier is the tablet server session ID, and the value is the tablet server location, its IP address, and port. Each tablet server process has a unique session ID, so if the tablet server process is restarted on a machine Accumulo will be able to distinguish between tablets assigned to it before and after it was restarted.
This column family contains information about a tablet’s write-ahead logfiles. The column qualifier is the server name and the logfile name separated by a slash. The value is the log set and table ID separated by a pipe. In 1.5.0 and later, the log set is the same as the logfile name.
The dir
column qualifier has the tablet’s main directory as its value.
The tablet can use files outside of this directory, but new files will be created in the directory.
The compact
column qualifier has the most recent compaction ID as its value.
The flush
column qualifier has the most recent flush ID as its value.
These IDs are used to determine whether requested flushes or compactions have successfully completed for all relevant tablets.
The lock
column qualifier contains the ZooKeeper lock location for a tablet server that is attempting to write to the metadata table.
There is a constraint on the metadata table that only accepts writes from tablet servers with currently held ZooKeeper locks.
The time
column qualifier stores the timestamp of the most recently written data to a tablet.
It is preceded by an M
indicating that the timestamp is in milliseconds since the epoch, or an L
indicating that the timestamp is in logical time (essentially a one-up counter).
This column contains the end row of the previous tablet, which helps Accumulo keep track of its metadata.
The value is 0x01
followed by previous tablet’s end row.
For the first tablet in a table, there is no previous tablet, so the value is set to 0x00
.
There are a few additional metadata entry types that are ephemeral, such as those written in the process of a tablet split operation. These include a ~tab:oldprevrow
and ~tab:splitRatio
for split operations; chopped:chopped
for merge operations; loaded
for bulk import operations; and !cloned
for table clone operations.
3.139.238.226