0%

Book Description

Learn how to train site reliability engineers at your organization in both general and domain-specific subject matter. With this detailed guide from Google’s SRE team, you’ll not only learn a set of training best practices Google uses for ramping up new SREs; you’ll also explore use cases from smaller organizations that have successfully trained people for SRE (or SRE-like) functions.

To be effective, your company should purposefully design SRE training to fit both your company’s needs and your audience. While much of this report focuses on the specific experience of Google SRE, it explains the theory behind training design and presents best practices and lessons learned throughout the industry over the past several years.

  • Explore several SRE training use cases, complete with optimal approaches and trade-offs
  • Build training materials, experiences, and activities to onboard SREs and provide continuous education
  • Dive into case studies that demonstrate SRE training in practice at both large and small organizations
  • Use instructional design principles to develop the most effective learning solutions whether you’re training brand-new SREs or experienced engineers
  • Learn how to apply SRE principles to run an SRE training program consistently and reliably

Table of Contents

  1. Preface
    1. Acknowledgments
  2. 1. Identifying Your SRE Training Needs
    1. Organizational Maturity
    2. Organizational Familiarity
    3. SRE Experience
    4. Types of Skills to Develop
      1. Skills That Support a Career Shift Toward SRE
      2. Troubleshooting Skills
      3. Training That Supports a Culture Shift and Promotes Trust
      4. Incident Management Training and the Corresponding Soft Skills
    5. An Introduction to SRE Training Techniques
      1. Sink or Swim
      2. Self-Study
      3. Buddy System
      4. Ad Hoc Classes
      5. Systematic Training Program
      6. Teaching to Learn
    6. Conclusion
  3. 2. Use Cases
    1. Organizations Adopting the SRE Model
      1. Building a Training Program to Drive Adoption of the SRE Model
      2. Encountering Resistance
      3. Receptive, Resistant, and Catalytic Individuals
      4. Benefits of Implementing SRE
      5. Convince Teams of the Benefits of the SRE Model
      6. Organization Size and Rate of Growth
    2. Organizations with an Established SRE Team or Teams
      1. Newbies
      2. Old-Timers
      3. Internal Transfers
      4. Industry Veterans
      5. Choosing Your Training Solution
    3. New Team Members on an Existing SRE Team
      1. Instill Key Elements of SRE Culture
    4. Experienced SREs Transferring to a New Team
    5. Experienced SREs at a New Company with an Existing SRE Culture and Practice
    6. Conclusion
  4. 3. Case Studies
    1. Training in a Large Organization
      1. Stages of Training
      2. Summary
    2. SRE Training in Smaller Organizations
      1. Applying What They’ve Learned
      2. Company X
      3. Readiness
      4. Continuous Development
    3. Conclusion
  5. 4. Instructional Design Principles
    1. Identifying Training Needs
    2. Build Your Learner Profile
    3. Consider Your Culture
      1. Storytelling
      2. Build the Vocabulary
    4. Consider Your Learners
      1. Adult Learners
      2. Learning Modalities
      3. Instructor-Led Training
      4. Self-Paced Training
      5. Mentoring and Shadowing
    5. Create Learning Objectives
    6. Designing Training Content
      1. ADDIE Model
      2. Modular Design
      3. Train the Trainers
      4. Pilot, Pilot, Pilot
    7. Making Training Hands-On
      1. The Breakage Service
      2. Scaffolding: Build, Stretch, and Reach
    8. Evaluating Training Outcomes
      1. When We Evaluate
      2. How We Evaluate
    9. Instructional Design Principles at Scale
    10. Conclusion
  6. 5. How to “SRE” an SRE Training Program
    1. Applying SRE Principles to Your Training
      1. “SRE’ing” your SRE Training Program
      2. Monitoring and Measuring
      3. Incident Response
      4. Postmortem and Analysis of Root Causes
      5. Testing and Release Procedures
      6. Capacity Planning
      7. Development
      8. Product
      9. Other Considerations
    2. Managing SRE Training Materials
      1. Strategies for Discoverability
      2. Content Curation
      3. Content Freshness and Reliability
    3. Conclusion
  7. 6. Summary and Conclusions
  8. A. Example Training Design Document
    1. World of SRE Design Doc
      1. Learner Profile
52.15.71.15