Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by Lars George
HBase: The Definitive Guide
Dedication
Foreword
Preface
General Information
HBase Version
Building the Examples
Hush: The HBase URL Shortener
Running Hush
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
1. Introduction
The Dawn of Big Data
The Problem with Relational Database Systems
Nonrelational Database Systems, Not-Only SQL or NoSQL?
Dimensions
Scalability
Database (De-)Normalization
Building Blocks
Backdrop
Tables, Rows, Columns, and Cells
Auto-Sharding
Storage API
Implementation
Summary
HBase: The Hadoop Database
History
Nomenclature
Summary
2. Installation
Quick-Start Guide
Requirements
Hardware
Servers
Networking
Software
Operating system
Filesystem
Java
Hadoop
SSH
Domain Name Service
Synchronized time
File handles and process limits
Datanode handlers
Swappiness
Windows
Filesystems for HBase
Local
HDFS
S3
Other Filesystems
Installation Choices
Apache Binary Release
Building from Source
Run Modes
Standalone Mode
Distributed Mode
Pseudodistributed mode
Fully distributed mode
Specifying region servers
ZooKeeper setup
Using the existing ZooKeeper ensemble
Configuration
hbase-site.xml and hbase-default.xml
hbase-env.sh
regionserver
log4j.properties
Example Configuration
hbase-site.xml
regionservers
hbase-env.sh
Client Configuration
Deployment
Script-Based
Apache Whirr
Puppet and Chef
Operating a Cluster
Running and Confirming Your Installation
Web-based UI Introduction
Shell Introduction
Stopping the Cluster
3. Client API: The Basics
General Notes
CRUD Operations
Put Method
Single Puts
The KeyValue class
Client-side write buffer
List of Puts
Atomic compare-and-set
Get Method
Single Gets
The Result class
List of Gets
Related retrieval methods
Delete Method
Single Deletes
List of Deletes
Atomic compare-and-delete
Batch Operations
Row Locks
Scans
Introduction
The ResultScanner Class
Caching Versus Batching
Miscellaneous Features
The HTable Utility Methods
The Bytes Class
4. Client API: Advanced Features
Filters
Introduction to Filters
The filter hierarchy
Comparison operators
Comparators
Comparison Filters
RowFilter
FamilyFilter
QualifierFilter
ValueFilter
DependentColumnFilter
Dedicated Filters
SingleColumnValueFilter
SingleColumnValueExcludeFilter
PrefixFilter
PageFilter
KeyOnlyFilter
FirstKeyOnlyFilter
InclusiveStopFilter
TimestampsFilter
ColumnCountGetFilter
ColumnPaginationFilter
ColumnPrefixFilter
RandomRowFilter
Decorating Filters
SkipFilter
WhileMatchFilter
FilterList
Custom Filters
Filters Summary
Counters
Introduction to Counters
Single Counters
Multiple Counters
Coprocessors
Introduction to Coprocessors
The Coprocessor Class
Coprocessor Loading
Loading from the configuration
Loading from the table descriptor
The RegionObserver Class
Handling region life-cycle events
State: pending open
State: open
State: pending close
Handling client API events
The RegionCoprocessorEnvironment class
The ObserverContext class
The BaseRegionObserver class
The MasterObserver Class
The MasterCoprocessorEnvironment class
The BaseMasterObserver class
Endpoints
The CoprocessorProtocol interface
The BaseEndpointCoprocessor class
HTablePool
Connection Handling
5. Client API: Administrative Features
Schema Definition
Tables
Table Properties
Column Families
HBaseAdmin
Basic Operations
Table Operations
Schema Operations
Cluster Operations
Cluster Status Information
6. Available Clients
Introduction to REST, Thrift, and Avro
Interactive Clients
Native Java
REST
Operation
Supported formats
Plain (text/plain)
XML (text/xml)
JSON (application/json)
Protocol Buffer (application/x-protobuf)
Raw binary (application/octet-stream)
REST Java client
Thrift
Installation
Operation
Example: PHP
Avro
Installation
Operation
Other Clients
Batch Clients
MapReduce
Native Java
Clojure
Hive
Pig
Cascading
Shell
Basics
Commands
General
Data definition
Data manipulation
Tools
Replication
Scripting
Web-based UI
Master UI
Main page
User Table page
ZooKeeper page
Region Server UI
Main page
Shared Pages
7. MapReduce Integration
Framework
MapReduce Introduction
Classes
InputFormat
Mapper
Reducer
OutputFormat
Supporting Classes
MapReduce Locality
Table Splits
MapReduce over HBase
Preparation
Static Provisioning
Dynamic Provisioning
Data Sink
Data Source
Data Source and Sink
Custom Processing
8. Architecture
Seek Versus Transfer
B+ Trees
Log-Structured Merge-Trees
Storage
Overview
Write Path
Files
Root-level files
Table-level files
Region-level files
Region splits
Compactions
HFile Format
KeyValue Format
Write-Ahead Log
Overview
HLog Class
HLogKey Class
WALEdit Class
LogSyncer Class
LogRoller Class
Replay
Single log
Log splitting
Edits recovery
Durability
Read Path
Region Lookups
The Region Life Cycle
ZooKeeper
Replication
Life of a Log Edit
Normal processing
Non-Responding slave clusters
Internals
Choosing region servers to replicate to
Keeping track of logs
Reading, filtering, and sending edits
Cleaning logs
Region server failover
9. Advanced Usage
Key Design
Concepts
Tall-Narrow Versus Flat-Wide Tables
Partial Key Scans
Pagination
Time Series Data
Time-Ordered Relations
Advanced Schemas
Secondary Indexes
Search Integration
Transactions
Bloom Filters
Versioning
Implicit Versioning
Custom Versioning
10. Cluster Monitoring
Introduction
The Metrics Framework
Contexts, Records, and Metrics
Master Metrics
Region Server Metrics
RPC Metrics
JVM Metrics
Info Metrics
Ganglia
Installation
Ganglia-related steps
Ganglia monitoring daemon
Ganglia meta daemon
Ganglia web frontend
HBase-related steps
Usage
JMX
JConsole
JMX Remote API
Nagios
11. Performance Tuning
Garbage Collection Tuning
Memstore-Local Allocation Buffer
Compression
Available Codecs
Snappy
LZO
GZIP
Verifying Installation
Compression test tool
Startup check
Enabling Compression
Optimizing Splits and Compactions
Managed Splitting
Region Hotspotting
Presplitting Regions
Load Balancing
Merging Regions
Client API: Best Practices
Configuration
Load Tests
Performance Evaluation
YCSB
12. Cluster Administration
Operational Tasks
Node Decommissioning
Rolling Restarts
Adding Servers
Pseudodistributed mode
Adding a local backup master
Adding a local region server
Fully distributed cluster
Adding a backup master
Adding a region server
Data Tasks
Import and Export Tools
CopyTable Tool
Bulk Import
Bulk load procedure
Using the importtsv tool
Using the completebulkload Tool
Advanced usage
Replication
Additional Tasks
Coexisting Clusters
Required Ports
Changing Logging Levels
Troubleshooting
HBase Fsck
Analyzing the Logs
Common Issues
Basic setup checklist
File handles
DataNode connections
Compression
Garbage collection/memory tuning
Stability issues
ZooKeeper problems
“Could not obtain block” errors
A. HBase Configuration Properties
B. Road Map
HBase 0.92.0
HBase 0.94.0
C. Upgrade from Previous Releases
Upgrading to HBase 0.90.x
From 0.20.x or 0.89.x
Within 0.90.x
Upgrading to HBase 0.92.0
D. Distributions
Cloudera’s Distribution Including Apache Hadoop
E. Hush SQL Schema
F. HBase Versus Bigtable
Index
About the Author
Colophon
Copyright
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
O'Reilly Strata Conference
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset