Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Letter to the HBase Community

Preface

Acknowledgments

About this Book

About the Authors

About the Cover Illustration

1. HBase fundamentals

Chapter 1. Introducing HBase

1.1. Data-management systems: a crash course

1.1.1. Hello, Big Data

1.1.2. Data innovation

1.1.3. The rise of HBase

1.2. HBase use cases and success stories

1.2.1. The canonical web-search problem: the reason for Bigtable’s invention

1.2.2. Capturing incremental data

1.2.3. Content serving

1.2.4. Information exchange

1.3. Hello HBase

1.3.1. Quick install

1.3.2. Interacting with the HBase shell

1.3.3. Storing data

1.4. Summary

Chapter 2. Getting started

2.1. Starting from scratch

2.1.1. Create a table

2.1.2. Examine table schema

2.1.3. Establish a connection

2.1.4. Connection management

2.2. Data manipulation

2.2.1. Storing data

2.2.2. Modifying data

2.2.3. Under the hood: the HBase write path

2.2.4. Reading data

2.2.5. Under the hood: the HBase read path

2.2.6. Deleting data

2.2.7. Compactions: HBase housekeeping

2.2.8. Versioned data

2.2.9. Data model recap

2.3. Data coordinates

2.4. Putting it all together

2.5. Data models

2.5.1. Logical model: sorted map of maps

2.5.2. Physical model: column family oriented

2.6. Table scans

2.6.1. Designing tables for scans

2.6.2. Executing a scan

2.6.3. Scanner caching

2.6.4. Applying filters

2.7. Atomic operations

2.8. ACID semantics

2.9. Summary

Chapter 3. Distributed HBase, HDFS, and MapReduce

3.1. A case for MapReduce

3.1.1. Latency vs. throughput

3.1.2. Serial execution has limited throughput

3.1.3. Improved throughput with parallel execution

3.1.4. MapReduce: maximum throughput with distributed parallelism

3.2. An overview of Hadoop MapReduce

3.2.1. MapReduce data flow explained

3.2.2. MapReduce under the hood

3.3. HBase in distributed mode

3.3.1. Splitting and distributing big tables

3.3.2. How do I find my region?

3.3.3. How do I find the -ROOT- table?

3.4. HBase and MapReduce

3.4.1. HBase as a source

3.4.2. HBase as a sink

3.4.3. HBase as a shared resource

3.5. Putting it all together

3.5.1. Writing a MapReduce application

3.5.2. Running a MapReduce application

3.6. Availability and reliability at scale

Availability

Reliability and Durability

3.6.1. HDFS as the underlying storage

3.7. Summary

2. Advanced concepts

Chapter 4. HBase table design

4.1. How to approach schema design

4.1.1. Modeling for the questions

4.1.2. Defining requirements: more work up front always pays

4.1.3. Modeling for even distribution of data and load

4.1.4. Targeted data access

4.2. De-normalization is the word in HBase land

4.3. Heterogeneous data in the same table

4.4. Rowkey design strategies

4.5. I/O considerations

4.5.1. Optimized for writes

4.5.2. Optimized for reads

4.5.3. Cardinality and rowkey structure

4.6. From relational to non-relational

4.6.1. Some basic concepts

4.6.2. Nested entities

4.6.3. Some things don’t map

4.7. Advanced column family configurations

4.7.1. Configurable block size

4.7.2. Block cache

4.7.3. Aggressive caching

4.7.4. Bloom filters

4.7.5. TTL

4.7.6. Compression

4.7.7. Cell versioning

4.8. Filtering data

4.8.1. Implementing a filter

4.8.2. Prebundled filters

4.9. Summary

Chapter 5. Extending HBase with coprocessors

5.1. The two kinds of coprocessors

5.1.1. Observer coprocessors

5.1.2. Endpoint Coprocessors

5.2. Implementing an observer

5.2.1. Modifying the schema

5.2.2. Starting with the Base

5.2.3. Installing your observer

5.2.4. Other installation options

5.3. Implementing an endpoint

5.3.1. Defining an interface for the endpoint

5.3.2. Implementing the endpoint server

5.3.3. Implement the endpoint client

5.3.4. Deploying the endpoint server

5.3.5. Try it!

5.4. Summary

Chapter 6. Alternative HBase clients

6.1. Scripting the HBase shell from UNIX

6.1.1. Preparing the HBase shell

6.1.2. Script table schema from the UNIX shell

6.2. Programming the HBase shell using JRuby

6.2.1. Preparing the HBase shell

6.2.2. Interacting with the TwitBase users table

6.3. HBase over REST

6.3.1. Launching the HBase REST service

6.3.2. Interacting with the TwitBase users table

6.4. Using the HBase Thrift gateway from Python

6.4.1. Generating the HBase Thrift client library for Python

6.4.2. Launching the HBase Thrift service

6.4.3. Scanning the TwitBase users table

6.5. Asynchbase: an alternative Java HBase client

6.5.1. Creating an asynchbase project

6.5.2. Changing TwitBase passwords

6.5.3. Try it out

6.6. Summary

3. Example applications

Chapter 7. HBase by example: OpenTSDB

7.1. An overview of OpenTSDB

7.1.1. Challenge: infrastructure monitoring

7.1.2. Data: time series

7.1.3. Storage: HBase

7.2. Designing an HBase application

7.2.1. Schema design

7.2.2. Application architecture

7.3. Implementing an HBase application

7.3.1. Storing data

7.3.2. Querying data

7.4. Summary

Chapter 8. Scaling GIS on HBase

8.1. Working with geographic data

8.2. Designing a spatial index

8.2.1. Starting with a compound rowkey

8.2.2. Introducing the geohash

8.2.3. Understand the geohash

8.2.4. Using the geohash as a spatially aware rowkey

8.3. Implementing the nearest-neighbors query

8.4. Pushing work server-side

8.4.1. Creating a geohash scan from a query polygon

8.4.2. Within query take 1: client side

8.4.3. Within query take 2: WithinFilter

8.5. Summary

4. Operationalizing HBase

Chapter 9. Deploying HBase

9.1. Planning your cluster

9.1.1. Prototype cluster

9.1.2. Small production cluster (10–20 servers)

9.1.3. Medium production cluster (up to ~50 servers)

9.1.4. Large production cluster (>~50 servers)

9.1.5. Hadoop Master nodes

9.1.6. HBase Master

9.1.7. Hadoop DataNodes and HBase RegionServers

9.1.8. ZooKeeper(s)

9.1.9. What about the cloud?

9.2. Deploying software

9.2.1. Whirr: deploying in the cloud

9.3. Distributions

9.3.1. Using the stock Apache distribution

9.3.2. Using Cloudera’s CDH distribution

9.4. Configuration

9.4.1. HBase configurations

9.4.2. Hadoop configuration parameters relevant to HBase

9.4.3. Operating system configurations

9.5. Managing the daemons

9.6. Summary

Chapter 10. Operations

10.1. Monitoring your cluster

10.1.1. How HBase exposes metrics

10.1.2. Collecting and graphing the metrics

10.1.3. The metrics HBase exposes

10.1.4. Application-side monitoring

10.2. Performance of your HBase cluster

10.2.1. Performance testing

10.2.2. What impacts HBase’s performance?

10.2.3. Tuning dependency systems

10.2.4. Tuning HBase

10.3. Cluster management

10.3.1. Starting and stopping HBase

10.3.2. Graceful stop and decommissioning nodes

10.3.3. Adding nodes

10.3.4. Rolling restarts and upgrading

10.3.5. bin/hbase and the HBase shell

10.3.6. Maintaining consistency—hbck

10.3.7. Viewing HFiles and HLogs

10.3.8. Presplitting tables

10.4. Backup and replication

10.4.1. Inter-cluster replication

10.4.2. Backup using MapReduce jobs

10.4.3. Backing up the root directory

10.5. Summary

Appendix A. Exploring the HBase system

A.1. Exploring ZooKeeper

A.2. Exploring -ROOT-

A.3. Exploring .META.

Appendix B. More about the workings of HDFS

B.1. Distributed file systems

B.2. Separating metadata and data: NameNode and DataNode

B.3. HDFS write path

B.4. HDFS read path

B.5. Resilience to hardware failures via replication

B.6. Splitting files across multiple DataNodes

Index

List of Figures

List of Tables

List of Listings

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.63.191