0%

Book Description

Why have stream-oriented data systems become so popular, when batch-oriented systems have served big data needs for many years? In the updated edition of this report, Dean Wampler examines the rise of streaming systems for handling time-sensitive problems—such as detecting fraudulent financial activity as it happens. You’ll explore the characteristics of fast data architectures, along with several open source tools for implementing them.

Batch processing isn’t going away, but exclusive use of these systems is now a competitive disadvantage. You’ll learn that, while fast data architectures using tools such as Kafka, Akka, Spark, and Flink are much harder to build, they represent the state of the art for dealing with mountains of data that require immediate attention.

  • Learn how a basic fast data architecture works, step-by-step
  • Examine how Kafka’s data backplane combines the best abstractions of log-oriented and message queue systems for integrating components
  • Evaluate four streaming engines, including Kafka Streams, Akka Streams, Spark, and Flink
  • Learn which streaming engines work best for different use cases
  • Get recommendations for making real-world streaming systems responsive, resilient, elastic, and message driven
  • Explore an example IoT streaming application that includes telemetry ingestion and anomaly detection

Table of Contents

  1. 1. Introduction
    1. A Brief History of Big Data
    2. Batch-Mode Architecture
  2. 2. The Emergence of Streaming
    1. Streaming Architecture
    2. What About the Lambda Architecture?
  3. 3. Logs and Message Queues
    1. The Log Is the Core Abstraction
    2. Message Queues and Integration
    3. Combining Logs and Queues
    4. The Case for Apache Kafka
    5. Alternatives to Kafka
    6. When Should You Not Use a Log System?
  4. 4. How Do You Analyze Infinite Data Sets?
    1. Streaming Semantics
    2. Which Streaming Engines Should You Use?
      1. Criteria for Evaluating Streaming Engines
      2. Spark and Flink: Scalable Data Processing Systems
      3. Akka Streams and Kafka Streams: Data-Centric Microservices
      4. Okay, So What Should I Use?
  5. 5. Real-World Systems
    1. Some Specific Recommendations
  6. 6. Example Application
    1. Other Machine Learning Considerations
  7. 7. Recap and Where to Go from Here
    1. Additional References
18.221.165.246