List of Figures

Chapter 1. Introducing HBase

Figure 1.1. Providing web-search results using Bigtable, simplified. The crawlers—applications collecting web pages—store their data in Bigtable. A MapReduce process scans the table to produce the search index. Search results are queried from Bigtable to display to the user.

Figure 1.2. The HBase Master status page. From this interface, you can get a general sense of the health of your installation. It also allows you to explore the distribution of data and perform basic administrative tasks, but most administration isn’t done through this interface. You’ll learn more about HBase operations in chapter 10.

Chapter 2. Getting started

Figure 2.1. HBase write path. Every write to HBase requires confirmation from both the WAL and the MemStore. The two steps ensure that every write to HBase happens as fast as possible while maintaining durability. The MemStore is flushed to a new HFile when it fills up.

Figure 2.2. HBase read path. Data is reconciled from the BlockCache, the Mem-Store, and the HFiles to give the client an up-to-date view of the row(s) it asked for.

Figure 2.3. Minor compaction. Records are read from the existing HFiles and combined into a single, merged HFile. That new HFile is then marked as the new canonical data on disk, and the old HFiles are deleted. When a compaction is run simultaneously over all HFiles in a column family, it’s called a major compaction.

Figure 2.4. The coordinates used to identify data in an HBase table are rowkey, column family, column qualifier, and version.

Figure 2.5. HBase can be considered a key-value store, where the four coordinates to a cell act as a key. In the API, the complete coordinates to a value, plus the value itself, are packaged together by the KeyValue class.

Figure 2.6. Alternate views of HBase as a key-value data store. Decreasing the precision of your cell coordinates results in larger groups of KeyValues as the resulting values.

Figure 2.7. Sorted map of maps. HBase logically organizes data as a nested map of maps. Within each map, data is physically sorted by that map’s key. In this example, "email" comes before "name" and more recent versions come before older ones.

Figure 2.8. HFile data for the info column family in the users table. Each record is a complete entry in the HFile.

Figure 2.9. A region in the users table. All data for a given row in the table is managed together in a region.

Chapter 3. Distributed HBase, HDFS, and MapReduce

Figure 3.1. Splitting and assigning work. Each record in the log file can be processed independently, so you split the input file according to the number of workers available.

Figure 3.2. Do work. The data assigned to Host1, [k1,v1], is passed to the Map Step as pairs of line number to line. The Map Step processes each line and produces pairs of UserID to TimeSpent, [k2,v2].

Figure 3.3. Hadoop performs the Shuffle and Sort Step automatically. It serves to prepare the output from the Map Step for aggregation in the Reduce Step. No values are changed by the process; it serves only to reorganize data.

Figure 3.4. Aggregate work. Available servers process the groups of UserID to Times, [k2,<v2>], in this case, summing the values. Final results are emitted back to Hadoop.

Figure 3.5. The JobTracker and its TaskTrackers are responsible for execution of the MapReduce applications submitted to the cluster.

Figure 3.6. A table consists of multiple smaller chunks called regions.

Figure 3.7. HBase RegionServer and HDFS DataNode processes are typically collocated on the same host.

Figure 3.8. Any RegionServer can host any region. RegionServers 1 and 2 are hosting regions, whereas RegionServer 3 isn’t.

Figure 3.9. -ROOT-, .META., and user tables viewed as a B+Tree

Figure 3.10. User table T1 in HBase, along with -ROOT- and .META., distributed across the various RegionServers

Figure 3.11. Steps that take place when a client interacts with an HBase system. The interaction starts with ZooKeeper and goes to the RegionServer serving the region with which the client needs to interact. The interaction with the RegionServer could be for reads or writes. The information about -ROOT- and .META. is cached by the client for future interactions and is refreshed if the regions it’s expecting to interact with based on that information don’t exist on the node it thinks they should be on.

Figure 3.12. MapReduce job with mappers taking regions from HBase as their input source. By default, one mapper is created per region.

Figure 3.13. HBase as a sink for a MapReduce job. In this case, the reduce tasks are writing to HBase.

Figure 3.14. Using HBase as a lookup store for the map tasks to do a map-side join

Figure 3.15. If a RegionServer fails for some reason (such as a Java process dying or the entire physical node catching fire), a different RegionServer picks up the regions the first one was serving and begins serving them. This is enabled by the fact that HDFS provides a single namespace to all the RegionServers, and any of them can access the persisted files from any other.

Chapter 4. HBase table design

Figure 4.1. The follows table, which persists a list of users a particular user follows

Figure 4.2. The follows table with sample data. 1:TheRealMT represents a cell in the column family follows with the column qualifier 1 and the value TheRealMT. The fake Mark Twain wants to know everything he can about the real one. He’s following not only the real Mark Twain but also his fans, his wife, and his friends. Cheeky, huh? The real Mark Twain, on the other hand, keeps it simple and only wants to get twits from his dear friend and his wife.

Figure 4.3. The follows table with a counter in each row to keep track of the number of users any given user is following at the moment

Figure 4.4. Steps required to add a new user to the list of followed users, based on the current table design

Figure 4.5. Cells now have the followed user’s username as the column qualifier and an arbitrary string as the cell value.

Figure 4.6. New schema for the follows table with the follower as well as followed user IDs in the rowkey. This translates into a single follower-followed relationship per row in the HBase table. This is a tall table instead of a wide table like the previous ones.

Figure 4.7. The follows table designed as a tall table instead of a wide table. (And Amandeep is the fanboy we’ve been referring to so far.) Putting the user name in the column qualifier saves you from looking up the users table for the name of the user given an ID. You can simply list names or IDs while looking at relationships just from this table. The downside is that you need to update the name in all the cells if the user updates their name in their profile. This is classic de-normalization.

Figure 4.8. Using MD5s as part of rowkeys to achieve fixed lengths and better distribution

Figure 4.9. Using MD5 in the rowkey lets you get rid of the + delimiter that you needed so far. The rowkeys now consist of fixed-length portions, with each user ID being 16 bytes.

Figure 4.10. Logical to physical translation of an HBase table. The KeyValue object represents a single entry in the HFile. The figure shows the result of executing get(r5) on the table to retrieve row r5.

Figure 4.11. Depending on what part of the key you specify, you can limit the amount of data you read off the disk or transfer over the network. Specifying the rowkey lets you read just the exact row you need. But the server returns the entire row to the client. Specifying the column family lets you further specify what part of the row to read, thereby allowing for reading only a subset of the HFiles if the row spans multiple families. Further specifying the column qualifier and timestamp lets you save on the number of columns returned to the client, thereby saving on network I/O.

Figure 4.12. A table to store twit streams for every user. Reverse timestamps let you sort the twits with the latest twit first. That allows for efficient scanning and retrieval of the n latest twits. Retrieving the latest twits in a user’s stream involves scanning the table.

Figure 4.13. The conceptual structure of an HFile

Figure 4.14. Nesting entities in an HBase table

Figure 4.15. HBase tables can contain regular columns along with nested entities.

Figure 4.16. Filtering data can be done at the client side by reading data into the client application from the RegionServers and applying the filter logic there; or it can be done at the server side by pushing the filtering logic down to the RegionServers, thereby reducing the amount of data transferred over the network to the clients. Filters can essentially save on network I/O costs, and sometimes even on disk I/O.

Figure 4.17. The steps during the filtering process. This happens for each row in the range that is being scanned by the scanner object.

Chapter 5. Extending HBase with coprocessors

Figure 5.1. The lifecycle of a request. A Put request dispatched from the client results directly in a put() call on the region.

Figure 5.2. A RegionObserver in the wild. Instead of calling put() directly, the region calls prePut() and postPut() on all registered RegionObservers, one after the next. Each has a chance to modify or interrupt the operation before a response is returned to the client.

Figure 5.3. An endpoint coprocessor at work. The regions deploy an implementation of the interface consumed by the client. An instance of Batch.Call encapsulates method invocation, and the coprocessorExec() method handles distributed invocation. After each request completes, results are returned to the client and aggregated.

Figure 5.4. Schema for follows and followedBy tables as optimized for space and I/O efficiency. The follows table stores half of a relation entity indexed according to the follower participant. The followedBy table stores the other half of the same relation entity indexed according to the followed participant.

Figure 5.5. Schema for the updated follows and followedBy tables. Now both tables store a full relation entity in each row.

Chapter 6. Alternative HBase clients

Figure 6.1. A REST gateway deployment. All client activity is funneled through the gateway, greatly reducing client throughput. Clustering the REST gateway machines can mitigate some of this limitation. Clustering introduces a new limitation, however, forcing the client to only use the stateless portions of the API.

Figure 6.2. A Thrift gateway deployment. All clients are funneled through the gateway, greatly reducing client throughput. Clustering is easier because the Thrift protocol is session-based.

Figure 6.3. Building a data-processing pipeline with Callbacks. Each step takes output from the previous one, processes it, and sends it to the next, until a final result is reached.

Figure 6.4. Step 1 is to scan over all rows in the users table. A KeyValue is produced for each user.

Figure 6.5. Step 3 is to interpret the Put response as either a success or a failure.

Figure 6.6. Steps 4a and 4b format a message to send based on response result.

Figure 6.7. Step 5 sends the notification message.

Figure 6.8. Step 2 calculates a new password based on the old and sends a Put to HBase.

Chapter 7. HBase by example: OpenTSDB

Figure 7.1. OpenTSDB graph output.OpenTSDBinfrastructure monitoringGraph reproduced directly from the OpenTSDB website. OpenTSDB is a tool for visualizing data. Ultimately it’s about providing insight into the data it stores in graphs like this one.

Figure 7.2. A time series is a sequence of time-ordered points. Here, two time series on the same scale are rendered on the same graph. They don’t share a common interval. The timestamp is commonly used as an X-axis value when representing a time series visually.

Figure 7.3. Balanced and unbalanced trees. Persisting data into structures that arrange themselves based on data values can result in worst-case data distribution.

Figure 7.4. The layout of an OpenTSDB rowkey consists of 3 bytes for the metric id, 4 bytes for the high-order timestamp bits, and 3 bytes each for the tag name ID and tag value ID, repeated

Figure 7.5. Column qualifiers store the final precision of the timestamp as well as a bitmask. The first bit in that mask indicates whether the value in the cell is an integer or a float value.

Figure 7.6. An example rowkey, column qualifier, and cell value storing 476 mysql.bytes_sent at 1292148123 seconds in the tsdb table.

Figure 7.7. OpenTSDB architecture: separation of concerns. The three areas of concern are data collection, data storage, and serving queries.

Figure 7.8. OpenTSDB read path. Requests are routed to an available tsd process that queries HBase and serves the results in the appropriate format.

Figure 7.9. OpenTSDB write path. Collection scripts on monitored hosts report measurements to the local tcollector process. Measurements are then transmitted to a tsd process that handles writing observations to HBase.

Figure 7.10. OpenTSDB metric auto-completion is supported by the name-to-UID mapping stored in the tsdb-uid table.

Chapter 8. Scaling GIS on HBase

Figure 8.1. In GIS, all dimensions matter. Building an index of the world’s cities over only longitude, the X-axis, would order data inaccurately for a certain set of queries.

Figure 8.2. Find the wifi. Geographic data wants to be seen, so draw it on a map. Here’s a sampling of the full dataset—a handful of places to find a wifi connection in Midtown Manhattan.

Figure 8.3. A naïve approach to spatial schema design: concatenated axes values. This schema fails the first objective of mapping spatial locality to record locality.

Figure 8.4. Truncating a geohash. By dropping characters from the end of a geohash, you drop precision from the space that hash represents. A single character goes a long way.

Figure 8.5. Relative distances. When viewed on a map, it’s easy to see that the distance between Central Park and JFK is much farther than the distance between Central Park and LaGuardia. This is precisely the relationship you want to reproduce with your hashing algorithm.

Figure 8.6. Constructing a geohash. The first 3 bits from longitude and latitude are calculated and woven to produce a geohash of 6-bit precision. The example data we discussed previously executed this algorithm out to 7 Base32 characters, or 35-bit precision.

Figure 8.7. Seeing prefix matches in action. If the target search is in this area, a simple rowkey scan will get the data you need. Not only that, but the order of results makes a lot more sense than the order in figure 8.3.

Figure 8.8. Prefix matches with geohash overlay. Lots of additional, unnecessary area is introduced into the query result by using the 6-character prefix. An ideal implementation would use only 7-character prefixes to minimize the amount of extra data transmitted over the wire.

Figure 8.9. Visualizing the geohash edge case. The encoding isn’t perfect; this is one such case. Imagine a nearest-neighbor search falling on the point under the arrow in this illustration. It’s possible you’ll find a neighbor in a tile with only two characters of common prefix.

Figure 8.10. Visualizing query results. This simple spiraling technique searches out around the query coordinate looking for matches. A smarter implementation would take into account the query coordinate’s position within the central spatial extent. Once the minimum number of matches had been found, it would skip any neighbors that are too far away to contribute.

Figure 8.11. Querying within a block of Times Square. We used Google Earth to eyeball the four corners of the query space. It looks like all those flashy sign boards sucked up the wifi; it’s not very dense compared to other parts of the city. You can expect about 25 points to match your query.

Figure 8.12. Query polygon with centroid. The centroid point is where you’ll begin with the calculation for a minimum bounding set of geohashes.

Figure 8.13. The convex hull is the shape made by fully containing a collection of geometries. In this case, the geometries are simple points. You’ll use this to test query containment of the full set of neighbors of a geohash.

Figure 8.14. Checking for containment at seven and six characters of precision. At seven characters, both the central geohash and the combined set of all its neighbors aren’t sufficiently large to cover the entire query extent. Moving up to six characters gets the job done.

Figure 8.15. Within query results. The containment filter appears to work as expected. It’s also good to know that the geometry library appears to agree with the cartography library.

Figure 8.16. Results of the filtered scan. This should look an awful lot like figure 8.15.

Chapter 9. Deploying HBase

Figure 9.1. HBase Master UI of a working HBase instance

Chapter 10. Operations

Figure 10.1. Ganglia, set up to take metrics from HBase. Notice the list of HBase and JVM metrics in the drop-down Metrics list.

Figure 10.2. The HBase Master web UI shows the number of requests per second being served by each of the RegionServers, the number of regions that are online on the RegionServers, and the used and max heap. This is a useful place to start when you’re trying to find out the state of the system. Often, you can find issues here when RegionServers have fallen over, aren’t balanced in terms of the regions and requests they’re serving, or are misconfigured to use less heap than you had planned to give them.

Figure 10.3. Ganglia graphs showing a summary of the entire cluster for load, CPU, memory, and network metrics

Figure 10.4. CPU I/O wait percentage is a useful metric to use to understand whether your system is I/O bound. These Ganglia graphs show significant I/O load on five out of the six boxes. This was during a heavy write workload. More disks on the boxes would speed up the writes by distributing the load.

Figure 10.5. MemStore size metrics from Ganglia. This isn’t an ideal graph: it indicates that tuning garbage collection and other HBase configs might help improve performance.

Figure 10.6. The compaction queues going up during heavy writes. Notice that the queue is higher in some boxes than the others. This likely indicates that the write load on those RegionServers is higher than on the others.

Figure 10.7. Block-cache-size metrics captured during a read-heavy workload. It turns out that the load was too heavy and brought down one of the RegionServers—that’s the box in the upper-left corner. If this happens, you should configure your ops systems to alert you. It’s not critical enough that you should be paged in the middle of the night if only one box goes down. If many go down, you should be worried.

Figure 10.8. HBase and its dependencies. Every dependency affects the performance of HBase.

Figure 10.9. The key ranges F–I are being served by two regions. There is an overlap in the ranges for which the regions are responsible. You can use hbck to fix this inconsistency.

Figure 10.10. The HBase Master UI showing the presplit table that was created by providing the split keys at creation time. Notice the start and end keys of the regions.

Figure 10.11. Master-slave replication configuration, where replication happens only in a single direction

Figure 10.12. Master-master replication scheme, where replication happens both ways. Writes to either cluster are replicated to the other cluster.

Figure 10.13. Cyclic replication scheme, where more than two clusters participate in the replication process and the relationship between any two clusters can be no replication, master-slave replication, or master-master replication

Appendix B. More about the workings of HDFS

Figure B.1. Write operation: client’s communication with the NameNode

Figure B.2. Write operation: Name-Node acknowledges the write operation and sends back a DataNode list

Figure B.3. Write operation: client sends the file contents to the DataNodes

Figure B.4. Write operation: DataNode acknowledges completion of the write operation

Figure B.5. Read operation: client’s communication with the NameNode

Figure B.6. Read operation: Name-Node acknowledges the read and sends back block information to the client

Figure B.7. Read operation: client contacts the relevant DataNodes and asks for the contents of the blocks

Figure B.8. Read operation: DataNodes serve the block contents to the client. This completes the read step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.223