0%

Book Description

A single dramatic software failure can cost a company millions of dollars - but can be avoided with simple changes to design and architecture. This new edition of the best-selling industry standard shows you how to create systems that run longer, with fewer failures, and recover better when bad things happen. New coverage includes DevOps, microservices, and cloud-native architecture. Stability antipatterns have grown to include systemic problems in large-scale systems. This is a must-have pragmatic guide to engineering for production systems.

If you're a software developer, and you don't want to get alerts every night for the rest of your life, help is here. With a combination of case studies about huge losses - lost revenue, lost reputation, lost time, lost opportunity - and practical, down-to-earth advice that was all gained through painful experience, this book helps you avoid the pitfalls that cost companies millions of dollars in downtime and reputation. Eighty percent of project life-cycle cost is in production, yet few books address this topic.

This updated edition deals with the production of today's systems - larger, more complex, and heavily virtualized - and includes information on chaos engineering, the discipline of applying randomness and deliberate stress to reveal systematic problems. Build systems that survive the real world, avoid downtime, implement zero-downtime upgrades and continuous delivery, and make cloud-native applications resilient. Examine ways to architect, design, and build software - particularly distributed systems - that stands up to the typhoon winds of a flash mob, a Slashdotting, or a link on Reddit. Take a hard look at software that failed the test and find ways to make sure your software survives.

To skip the pain and get the experience...get this book.

Table of Contents

  1.  Acknowledgments
  2.  Preface
    1. Who Should Read This Book
    2. How This Book Is Organized
    3. About the Case Studies
    4. Online Resources
  3. 1. Living in Production
    1. Aiming for the Right Target
    2. The Scope of the Challenge
    3. A Million Dollars Here, a Million Dollars There
    4. Use the Force
    5. Pragmatic Architecture
    6. Wrapping Up
  4. Part I. Create Stability
    1. 2. Case Study: The Exception That Grounded an AirlineCase Study: The Exception That Grounded an Airline
      1. The Change Window
      2. The Outage
      3. Consequences
      4. Postmortem
      5. Hunting for Clues
      6. The Smoking Gun
      7. An Ounce of Prevention?
    2. 3. Stabilize Your System
      1. Defining Stability
      2. Extending Your Life Span
      3. Failure Modes
      4. Stopping Crack Propagation
      5. Chain of Failure
      6. Wrapping Up
    3. 4. Stability Antipatterns
      1. Integration Points
      2. Chain Reactions
      3. Cascading Failures
      4. Users
      5. Blocked Threads
      6. Self-Denial Attacks
      7. Scaling Effects
      8. Unbalanced Capacities
      9. Dogpile
      10. Force Multiplier
      11. Slow Responses
      12. Unbounded Result Sets
      13. Wrapping Up
    4. 5. Stability Patterns
      1. Timeouts
      2. Circuit Breaker
      3. Bulkheads
      4. Steady State
      5. Fail Fast
      6. Let It Crash
      7. Handshaking
      8. Test Harnesses
      9. Decoupling Middleware
      10. Shed Load
      11. Create Back Pressure
      12. Governor
      13. Wrapping Up
  5. Part II. Design for Production
    1. 6. Case Study: Phenomenal Cosmic Powers, Itty-Bitty Living SpaceCase Study: Phenomenal Cosmic Powers, Itty-Bitty Living Space
      1. Baby’s First Christmas
      2. Taking the Pulse
      3. Thanksgiving Day
      4. Black Friday
      5. Vital Signs
      6. Diagnostic Tests
      7. Call In a Specialist
      8. Compare Treatment Options
      9. Does the Condition Respond to Treatment?
      10. Winding Down
    2. 7. Foundations
      1. Networking in the Data Center and the Cloud
      2. Physical Hosts, Virtual Machines, and Containers
      3. Wrapping Up
    3. 8. Processes on Machines
      1. Code
      2. Configuration
      3. Transparency
      4. Wrapping Up
    4. 9. Interconnect
      1. Solutions at Different Scales
      2. DNS
      3. Load Balancing
      4. Demand Control
      5. Network Routing
      6. Discovering Services
      7. Migratory Virtual IP Addresses
      8. Wrapping Up
    5. 10. Control Plane
      1. How Much Is Right for You?
      2. Mechanical Advantage
      3. Platform and Ecosystem
      4. Development Is Production
      5. System-Wide Transparency
      6. Configuration Services
      7. Provisioning and Deployment Services
      8. Command and Control
      9. The Platform Players
      10. The Shopping List
      11. Wrapping Up
    6. 11. Security
      1. The OWASP Top 10
      2. The Principle of Least Privilege
      3. Configured Passwords
      4. Security as an Ongoing Process
      5. Wrapping Up
  6. Part III. Deliver Your System
    1. 12. Case Study: Waiting for Godot
    2. 13. Design for Deployment
      1. So Many Machines
      2. The Fallacy of Planned Downtime
      3. Automated Deployments
      4. Continuous Deployment
      5. Phases of Deployment
      6. Deploy Like the Pros
      7. Wrapping Up
    3. 14. Handling Versions
      1. Help Others Handle Your Versions
      2. Handle Others’ Versions
      3. Wrapping Up
  7. Part IV. Solve Systemic Problems
    1. 15. Case Study: Trampled by Your Own Customers
      1. Countdown and Launch
      2. Aiming for Quality Assurance
      3. Load Testing
      4. Murder by the Masses
      5. The Testing Gap
      6. Aftermath
    2. 16. Adaptation
      1. Convex Returns
      2. Process and Organization
      3. System Architecture
      4. Information Architecture
      5. Wrapping Up
    3. 17. Chaos Engineering
      1. Breaking Things to Make Them Better
      2. Antecedents of Chaos Engineering
      3. The Simian Army
      4. Adopting Your Own Monkey
      5. Disaster Simulations
      6. Wrapping Up
  8.  Bibliography
3.144.84.155