Home Page Icon
Home Page
Table of Contents for
I. New to SRE
Close
I. New to SRE
by
97 Things Every SRE Should Know
Preface
How We Structured the Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. New to SRE
1. Site Reliability Engineering in Six Words
Alex Hidalgo
2. Do We Know Why We Really Want Reliability?
Niall Murphy
3. Building Self-Regulating Processes
Denise Yu
4. Four Engineers of an SRE Seder
Jacob Scott
5. The Reliability Stack
Alex Hidalgo
6. Infrastructure: It’s Where the Power Is
Charity Majors
7. Thinking About Resilience
Justin Li
8. Observability in the Development Cycle
Charity Majors and Liz Fong-Jones
9. There Is No Magic
Bouke van der Bijl
10. How Wikipedia Is Served to You
Effie Mouzeli
11. Why You Should Understand (a Little) About TCP
Julia Evans
12. The Importance of a Management Interface
Salim Virji
13. When It Comes to Storage, Think Distributed
Salim Virji
14. The Role of Cardinality
Charity Majors and Liz Fong-Jones
15. Security Is like an Onion
Lucas Fontes
16. Use Your Words
Tanya Reilly
17. Where to SRE
Fatema Boxwala
18. Dear Future Team
Frances Rees
19. Sustainability and Burnout
Denise Yu
20. Don’t Take Advice from Graybeards
John Looney
21. Facing That First Page
Andrew Louis
II. Zero to One
22. SRE, at Any Size, Is Cultural
Matthew Huxtable
23. Everyone Is an SRE in a Small Organization
Matthew Huxtable
24. Auditing Your Environment for Improvements
Joan O’Callaghan
25. With Incident Response, Start Small
Thai Wood
26. Solo SRE: Effecting Large-Scale Change as a Single Individual
Ashley Poole
27. Design Goals for SLO Measurement
Ben Sigelman
28. I Have an Error Budget—Now What?
Alex Hidalgo
29. How to Change Things
Joan O’Callaghan
30. Methodological Debugging
Avishai Ish-Shalom and Nati Cohen
31. How Startups Can Build an SRE Mindset
Tamara Miner
32. Bootstrapping SRE in Enterprises
Vanessa Yiu
33. It’s Okay Not to Know, and It’s Okay to Be Wrong
Todd Palino
34. Storytelling Is a Superpower
Anita Clarke
35. Get Your Work Recognized: Write a Brag Document
Julia Evans and Karla Burnett
III. One to Ten
36. Making Work Visible
Lorin Hochstein
37. An Overlooked Engineering Skill
Murali Suriar
38. Unpacking the On-Call Divide
Jason Hand
39. The Maestros of Incident Response
Andrew Louis
Stop the Bleeding
What’s Everyone Doing?
40. Effortless Incident Management
Suhail Patel, Miles Bryant, and Chris Evans
41. If You’re Doing Runbooks, Do Them Well
Spike Lindsey
42. Why I Hate Our Playbooks
Frances Rees
43. What Machines Do Well
Michelle Brush
44. Integrating Empathy into SRE Tools
Daniella Niyonkuru
45. Using ChatOps to Implement Empathy
Daniella Niyonkuru
46. Move Fast to Unbreak Things
Michelle Brush
47. You Don’t Know for Sure Until It Runs in Production
Ingrid Epure
48. Sometimes the Fix Is the Problem
Jake Pittis
49. Legendary
Elise Gale
50. Metrics Are Not SLIs (The Measure Everything Trap)
Brian Murphy
51. When SLOs Attack: Pathological SLOs and How to Fix Them
Narayan Desai
52. Holistic Approach to Product Reliability
Kristine Chen and Bart Ponurkiewicz
53. In Search of the Lost Time
Ingrid Epure
54. Unexpected Lessons from Office Hours
Tamara Miner
55. Building Tools for Internal Customers that They Actually Want to Use
Vinessa Wan
56. It’s About the Individuals and Interactions
Vinessa Wan
57. The Human Baseline in SRE
Effie Mouzeli
58. Remotely Productive or Productively Remote
Avleen Vig
59. Of Margins and Individuals
Kurt Andersen
60. The Importance of Margins in Systems
Kurt Andersen
61. Fewer Spreadsheets, More Napkins
Jacob Bednarz
62. Sneaking in Your DevOps Deliciously
Vinessa Wan
63. Effecting SRE Cultural Changes in Enterprises
Vanessa Yiu
64. To All the SREs I’ve Loved
Felix Glaser
65. Complex: The Most Overloaded Word in Technology
Laura Nolan
IV. Ten to Hundred
66. The Best Advice I Can Give to Teams
Nicole Forsgren
67. Create Your Supporting Artifacts
Daria Barteneva and Eva Parish
68. The Order of Operations for Getting SLO Buy-In
David K. Rensin
69. Heroes Are Necessary, but Hero Culture Is Not
Lei Lopez
70. On-Call Rotations that People Want to Join
Miles Bryant, Chris Evans, and Suhail Patel
71. Study of Human Factors and Team Culture to Improve Pager Fatigue
Daria Barteneva
72. Optimize for MTTBTB (Mean Time to Back to Bed)
Spike Lindsey
73. Mitigating and Preventing Cascading Failures
Rita Lu
74. On-Call Health: The Metric You Could Be Measuring
Caitie McCaffrey
75. Helping Leaders Prioritize On-Call Health
Caitie McCaffrey
Bring Quantitative Data
Link SLAs to On-Call Health
Treat On-Call Health like a Feature
Measure Attrition
76. The SRE as a Diplomat
Johnny Boursiquot
77. The Forward-Deployed SRE
Johnny Boursiquot
78. Test Your Disaster Plan
Tanya Reilly
79. Why Training Matters to an SRE Practice and SRE Matters to Your Training Program
Jennifer Petoff
80. The Power of Uniformity
Chris Evans, Suhail Patel, and Miles Bryant
81. Bytes per User Value
Arshia Mufti
82. Make Your Engineering Blog a Priority
Anita Clarke
83. Don’t Let Anyone Run Code in Your Context
John Looney
84. Trading Places: SRE and Product
Shubheksha Jalan
85. You See Teams, I See Product
Avleen Vig
86. The Performance Emergency Fund
Dawn Parzych
87. Important but Not Urgent: Roadmaps for SREs
Laura Nolan
V. The Future of SRE
88. That 50% Thing
Tanya Reilly
89. Following the Path of Safety-Critical Systems
Heidy Khlaaf
90. Applicable and Achievable Static Analysis
Heidy Khlaaf
91. The Importance of Formal Specification
Hillel Wayne
92. Risk and Rot in Sociotechnical Systems
Laura Nolan
93. SRE in Crisis
Niall Murphy
94. Expected Risk Limitations
Blake Bisset
95. Beyond Local Risk: Accounting for Angry Birds
Blake Bisset
96. A Word from Software Safety Nerds
J. Paul Reed
97. Incidents: A Window into Gaps
Lorin Hochstein
98. The Third Age of SRE
Björn “Beorn” Rabenstein
Contributors
Index
About the Editors
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Preface
Next
Next Chapter
1. Site Reliability Engineering in Six Words
Part I.
New to SRE
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset