0%

Book Description

With so many interacting components, the number of things that can go wrong in a distributed system is enormous. You’ll never be able to prevent all possible failure modes, but you can identify many of the weaknesses in your system before they’re triggered by these events. This report introduces you to Chaos Engineering, a method of experimenting on infrastructure that lets you expose weaknesses before they become a real problem.

Members of the Netflix team that developed Chaos Engineering explain how to apply these principles to your own system. By introducing controlled experiments, you’ll learn how emergent behavior from component interactions can cause your system to drift into an unsafe, chaotic state.

  • Hypothesize about steady state by collecting data on the health of the system
  • Vary real-world events by turning off a server to simulate regional failures
  • Run your experiments as close to the production environment as possible
  • Ramp up your experiment by automating it to run continuously
  • Minimize the effects of your experiments to keep from blowing everything up
  • Learn the process for designing chaos engineering experiments
  • Use the Chaos Maturity Model to map the state of your chaos program, including realistic goals

Table of Contents

  1. I. Introduction
  2. 1. Why Do Chaos Engineering?
    1. How Does Chaos Engineering Differ from Testing?
    2. It’s Not Just for Netflix
    3. Prerequisites for Chaos Engineering
  3. 2. Managing Complexity
    1. Understanding Complex Systems
    2. Example of Systemic Complexity
    3. Takeaway from the Example
  4. II. The Principles of Chaos
    1. Experimentation
    2. Advanced Principles
  5. 3. Hypothesize about Steady State
    1. Characterizing Steady State
    2. Forming Hypotheses
  6. 4. Vary Real-World Events
  7. 5. Run Experiments in Production
    1. State and Services
    2. Input in Production
    3. Other People’s Systems
    4. Agents Making Changes
    5. External Validity
    6. Poor Excuses for Not Practicing Chaos
      1. I’m pretty sure it will break!
      2. If it does break, we’re in big trouble!
    7. Get as Close as You Can
  8. 6. Automate Experiments to Run Continuously
    1. Automatically Executing Experiments
    2. Automatically Creating Experiments
  9. 7. Minimize Blast Radius
  10. III. Chaos In Practice
  11. 8. Designing Experiments
    1. 1. Pick a Hypothesis
    2. 2. Choose the Scope of the Experiment
    3. 3. Identify the Metrics You’re Going to Watch
    4. 4. Notify the Organization
    5. 5. Run the Experiment
    6. 6. Analyze the Results
    7. 7. Increase the Scope
    8. 8. Automate
  12. 9. Chaos Maturity Model
    1. Sophistication
    2. Adoption
    3. Draw the Map
  13. 10. Conclusion
    1. Resources
3.17.75.227