0%

Book Description

Many organizations have begun to rethink the strategy of allowing regional teams to maintain independent databases that are periodically consolidated with the head office. As businesses extend their reach globally, these hierarchical approaches no longer work. Instead, an enterprise’s entire data infrastructure—including multiple types of data persistence—needs to be shared and updated everywhere at the same time with fine-grained control over who has access.

This practical report examines the requirements and challenges of constructing a geo-distributed data platform, including examples of specific technologies designed to meet them. Authors Ted Dunning and Ellen Friedman also provide real-world use cases that show how low-latency geo-distribution of very large-scale data and computation provide a competitive edge.

With this report, you’ll explore:

  • How replication and mirroring methods for data movement provide the large scale, low latency, and low cost that systems demand
  • The importance of multimaster replication of data streams and databases
  • Advantages (and disadvantages) of cloud neutrality, cloud bursting, and hybrid cloud architecture for transferring data
  • Why effective data governance is a complex process that requires the right tools for controlling and monitoring geo-distributed data
  • How to make containers work for geo-distributed data at scale, even where stateful applications are involved
  • Use cases that demonstrate how telecoms and online advertisers distribute large quantities of data

Table of Contents

  1. 1. Why Geo-Distribution Matters
    1. Goals of Modern Geo-Distributed Systems
      1. Global View: Data Storage and Computation
    2. Moving Data: Replication and Mirroring
      1. Remote Mirroring
      2. Remote Replication
      3. Why Multi-Master Geo-Replication Matters
      4. Conflict Resolution: The Question of Consistency
      5. Beyond Database Replication: Streaming Data
    3. Clouds and Geo-distribution
      1. The Core Trend to Cloud
      2. Cloud Neutrality
      3. Cloud Bursting and Load Leveling
      4. Hybrid Cloud Architectures
    4. Global Data Governance
      1. Global Namespace
      2. Data Sovereignty and Geo-Distribution
    5. Containers for Big Data
      1. Key Problem: Stateful Containers
      2. How to Make Stateful Containers Work for Big Data
      3. Example: Handling State and Containers on MapR
      4. Summary
    6. Use Case: Geo-Replication in Telecom
    7. It’s Actually Broken Even If It Works Most of the Time
    8. Use Case: Shared Inventory in Ad Tech
  2. A. Additional Resources
    1. Selected O’Reilly Publications by Ted Dunning and Ellen Friedman
    2. O’Reilly Publication by Ellen Friedman and Kostas Tzoumas
3.149.242.118