Reading and writing cycle

Now, let's see how the read-and-write operation takes place in HBase diagrammatically:

Reading and writing cycle

Let's discuss and understand how the read-and-write operation takes place in and from HBase tables. In HBase, the client does not write data to HFile directly; it is first written to WAL and then to HBase MemStore, which is shared by an HStore in the main memory and then flushed to HFile later. Refer to the following figure:

Reading and writing cycle

Write-Ahead Logs

Write-Ahead Logs facilitate the data reliability and reside on HDFS; each RegionServer hosts a single WAL. In the case of a RegionServer crash where MemStore is not flushed, WAL is used to restore the data to a new RegionServer. So, only once data is written successfully to WAL and MemStore, the write operation is said to be successful.

MemStore

MemStore acts as an in-memory write buffer with a default size of 64 MB. Once data in MemStore reaches the threshold (which is by default 40 percent of the heap size or 64 MB), it is flushed to a new HFile on HDFS for persistence. The 64 MB HFile is not related to block size here; Hadoop internally manages block allocation and storage. HBase does not play a role in the underlying mechanism of block replication or dividing HFiles into blocks. Each column family might have many HFiles, but the HFile will only belong to a specific column family.

Now, let's take a look at the process flow of reading from HBase. The reading process starts when the client initiates a read request; the client gets the RegionServer and region information, and it communicates this to the acquired RegionServer. At the acquired RegionServer, the client first tries to read from MemStore; if hit, the read activity completes; if it's a miss, it navigates to block cache. Finally, it reaches out to HFile to read the required row of data. If there is a missing record, the corresponding HFile is loaded into the memory that contains the required row of data. So, MemStore and block cache provide real-time access to data for performance purposes, and HFile provides persistent, on-demand data.

Block cache follows the least recently used (LRU) algorithm. Every RegionServer has a single block cache that keeps the most frequently accessed data from HFile in the main memory, which results in reducing the disk seek for data access time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.172.132