Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Patterns and Antipatterns of SRE

This IS NOT SRE

There are many ways that an attempt to implement SRE practices and teams can go wrong. You can find more on Twitter and in Chapter 23 of Seeking SRE, but here are some key problems to avoid:

Changing the name of any existing team (usually “ops”) to “SRE” without making the organizational adjustments required to enable them to do meaningful development work
Using the SRE team to shield devs from the pain of how their services really function in production
Failing to contain interrupts
Attempting to do SRE project work without the same support (such as project managers, technical writers, etc.) that any other dev team would have (because SREs only spend 50% of their time on project work, we contend that support structures are even more important for SRE teams to make effective use of their development time)
Valuing (perhaps simply through call-out recognition) incident response heroics over prudent design and preventative planning
Implementing processes or systems that slow down the delivery of value to customers without incontrovertible benefit
Building a “gatekeeper” team that functions as a chokepoint
Static or ill-considered SLOs
Thinking that SRE is a point solution to a particular problem rather than a fundamental cultural shift

This IS SRE

Hearkening back to the beginning:

SRE is an organizational model for running reliable online services by teams that are chartered to do reliability-focused engineering work.

As a discipline, SREs are devoted to helping an organization sustainably achieve the appropriate level of reliability for its services by implementing and continually improving data-informed production feedback loops to balance availability, performance, and agility.

Does it make sense for your company to commit heavily to reliability and pursue the implementation of SRE in your organization? Only you and the other leaders in your company can answer that question. Some companies will be at a size where having a distinct organizational component or team just does not fit, but the principles can be put in place to provide a foundation for the future.

Just like with any new methodology or cultural shift, when implementing SRE it will take time, grit, and humility to adjust to the changing circumstances—but the payoff will be an institutionalized commitment to the importance of the user’s interaction with your site, service, system, or other “online stuff.” Over time, with the SRE team(s) consistently representing reliability and operability concerns as well as actively contributing to the product codebase to improve reliability, feature developers will learn to factor these pieces into their plans as they develop new features. At that point, SREs will be able to shift their impact to a deeper and wider level, making next month’s problems different from today’s.

Our hope is that this brief introduction to Site Reliability Engineering will have provided you with an effective understanding of the what and how of SRE. There are lots of resources available to dive into greater detail. We’ve listed some of the best starting points for further reading in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Patterns and Antipatterns of SRE

Create new playlist

Sign In

Sign Up

Chapter 5. Patterns and Antipatterns of SRE

This IS NOT SRE

This IS SRE

Table of Contents for
5. Patterns and Antipatterns of SRE