0%

Book Description

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, outgoing messages, or something else. Moving all of this data is just as important as the data itself. This book’s updated second edition shows application architects, developers, and production engineers new to the Kafka open source streaming platform how to handle real-time data feeds. Additional chapters cover Kafka’s AdminClient API, new security features, and tooling changes.

Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.

You’ll examine:

  • How publish-subscribe messaging fits in the big data ecosystem
  • Kafka producers and consumers for writing and reading messages
  • Patterns and use-case requirements to ensure reliable data delivery
  • Best practices for building data pipelines and applications with Kafka
  • How to perform monitoring, tuning, and maintenance tasks with Kafka in production
  • The most critical metrics among Kafka’s operational measurements
  • Kafka’s delivery capabilities for stream processing systems

Table of Contents

  1. 1. Meet Kafka
    1. Publish/Subscribe Messaging
      1. How It Starts
      2. Individual Queue Systems
    2. Enter Kafka
      1. Messages and Batches
      2. Schemas
      3. Topics and Partitions
      4. Producers and Consumers
      5. Brokers and Clusters
      6. Multiple Clusters
    3. Why Kafka?
      1. Multiple Producers
      2. Multiple Consumers
      3. Disk-Based Retention
      4. Scalable
      5. High Performance
    4. The Data Ecosystem
      1. Use Cases
    5. Kafka’s Origin
      1. LinkedIn’s Problem
      2. The Birth of Kafka
      3. Open Source
      4. Commercial Engagement
      5. The Name
    6. Getting Started with Kafka
  2. 2. Kafka Producers: Writing Messages to Kafka
    1. Producer Overview
    2. Constructing a Kafka Producer
    3. Sending a Message to Kafka
      1. Sending a Message Synchronously
      2. Sending a Message Asynchronously
    4. Configuring Producers
      1. client.id
      2. acks
      3. Message Delivery Time
      4. linger.ms
      5. compression.type
      6. batch.size
      7. max.in.flight.requests.per.connection
      8. max.request.size
      9. receive.buffer.bytes and send.buffer.bytes
      10. enable.idempotence
    5. Serializers
      1. Custom Serializers
      2. Serializing Using Apache Avro
      3. Using Avro Records with Kafka
    6. Partitions
    7. Interceptors
    8. Quotas and Throttling
    9. Summary
  3. 3. Managing Apache Kafka Programmatically
    1. AdminClient Overview
      1. Asynchronous and Eventually Consistent API
      2. Options
      3. Flat Hierarchy
      4. Additional Notes
    2. AdminClient Lifecycle: Creating, Configuring and Closing
      1. client.dns.lookup
      2. request.timeout.ms
    3. Essential Topic Management
    4. Configuration management
    5. Consumer group management
      1. Exploring Consumer Groups
      2. Modifying consumer groups
    6. Cluster Metadata
    7. Advanced Admin Operations
      1. Adding partitions to a topic
      2. Deleting records from a topic
      3. Leader Election
      4. Reassigning Replicas
    8. Testing
    9. Summary
  4. 4. Monitoring Kafka
    1. Metric Basics
      1. Where Are the Metrics?
      2. What Metrics Do I Need?
      3. Application Health Checks
    2. Service Level Objectives
      1. Service Level Definitions
      2. What Metrics Make Good SLIs
      3. Using SLOs In Alerting
    3. Kafka Broker Metrics
      1. Diagnosing Cluster Problems
      2. The Art of Under-Replicated Partitions
      3. Broker Metrics
      4. Topic and Partition Metrics
      5. JVM Monitoring
      6. OS Monitoring
      7. Logging
    4. Client Monitoring
      1. Producer Metrics
      2. Consumer Metrics
      3. Quotas
    5. Lag Monitoring
    6. End-to-End Monitoring
    7. Summary
3.141.31.240