They always say time changes things, but you actually have to change them yourself.
Andy Warhol
Previous chapters have explored how to get buy-in from your organization when adopting SLOs, and the importance of building an SLO culture. By now, you understand why your organization needs SLOs and how much they will impact your engineering processes and your users. You can’t wait to start advocating for SLOs across your organization!
But wait—there are three questions you need to answer before beginning:
Do you have leadership buy-in on implementing SLOs in your organization? It can be helpful to have an executive sponsor with a vested interest in SLO implementation who will be able to support you and unblock you in the case of conflicts of interest.
Does your management chain agree on expectations and the time investment required to drive SLO adoption? Being an SLO Advocate will probably be a full-time job for you for the first few months (or even longer if you’re in a large organization).
Are you ready to have a horizontal role that impacts an entire organization? Such a role will require skills in multiple domains, from communicating with senior leadership and stakeholders, to writing documentation, to analyzing data and building reporting, to reviewing monitoring implementation, to delivering training, and more.
If you answered yes to all three questions, congratulations: you’re ready to be an SLO Advocate. But what does that mean?
Your role is to help your organization successfully implement SLOs by doing all of the following:
Cultivating a deep understanding of SLO implementation for different types of services (see Chapter 4)
Understanding your monitoring platform, statistics in general, and your organization’s data visualization platform (to do instrumentation and produce dashboards and other outputs for your metric signals)
Most importantly, motivating and inspiring people to reach beyond their current role and scope
Your people and leadership skills will be critical during this journey: you will need to convince others of your vision, teach them what they need to know, and generate positive energy to inspire them and drive successful SLO adoption.
It will help a lot if you have experience designing training materials and delivering technical training. If you don’t, you might want to team up with someone who does. If teaming up isn’t an option, however, don’t worry: this chapter gives you some tips on how to do all of this successfully. It focuses on activities, artifacts, and processes for SLO adoption that have worked in other organizations.
In sum, becoming an SLO Advocate is an opportunity to improve your leadership, engineering, and project management skills, while creating positive change in your organization.
As with many things in life, and especially engineering projects, it’s best to start small and iterate as you go. We can break this journey into three phases: Crawl, Walk, and Run.
Throughout each phase, make sure you are soliciting feedback to understand whether what you’re doing is working or you need to change your approach. You’ll face some challenges (we’ll talk about those in more detail in “Learn How to Handle Challenges”), and at times you may even think you have failed. But remember: you learn more from failure than you do from success.
The Crawl phase of your SLO advocacy journey is where you’ll build the foundation of your program. You’ll educate yourself, create artifacts to help you spread your message, start to connect with leaders and teams in your organization, and run your first few training sessions.
First, you need to become an SLO expert yourself. The fact that you’re reading this book shows that you’re on the right path! You should also read the chapters on SLOs in Site Reliability Engineering and The Site Reliability Workbook.
There are many online resources you can use to deepen your knowledge, from technical conferences to written works to online courses. Choose what works best for you. Define your learning process in advance and track it as you would any other activity. Learning without a timeline and without clear outcomes may become demotivating.
We also recommend creating a working example of a service with SLOs while you’re learning. Having a concrete, small example to apply each new piece of knowledge to will keep you focused and help you retain what you learn.
It usually takes me more than three weeks to prepare a good impromptu speech.
Mark Twain
Imagine you meet the CEO of your company in the elevator, and they ask you what you do. With only a few seconds of their attention, what would you say?
Spend some time preparing your SLO “elevator pitch,” and remember to adapt it to different audiences. You should be able to articulate the value of SLOs and why others should care about them, and you should be able to do this when speaking to people with all different perspectives.
Engineers usually appreciate understanding if their service is working well enough, so shape your conversations with this goal in mind. You could talk about real-time measures of service reliability and the ability to get insights into the health of service dependencies; about correlating different signals and using that data to detect service degradations; and about embedding SLI metrics into your CI/CD pipeline to detect regressions and perform automatic rollbacks. Don’t forget to mention how SLOs and error budgets help with assigning the best priorities to service incidents and improve alerting efficiency, both of which are especially critical to on-call engineers.
Here you can talk about real-time measures of user experience and satisfaction, or about a data-driven approach to service reliability and the ability to prioritize efforts that will improve user satisfaction in the areas where it really matters. Mention that SLOs will allow them to identify the right investment opportunities, because “reliability is the most important feature of any system” and the only way to gain your users’ trust is to provide reliable and secure systems.
Feel free to pull from this book in your sales pitch. Identify the language that will resonate with your organization, and borrow it. A strong message that works for both audiences is this quote by Peter Drucker: “If you can’t measure it, you can’t improve it.” Above all, SLOs are new and better measurements of your service from your users’ point of view.
Having done your research, you have a good understanding of SLOs and are confident about defining them for a simple service. You’re also prepared to talk about the value of SLOs with anyone you encounter. Next, you can focus on creating artifacts to support your engineering organization as it adopts SLOs. These artifacts fall into two main categories: documentation and training materials.
Don’t forget to define where all your artifacts will live—for example, a wiki paired with a code repository—and make sure they’re discoverable and easy to navigate to. The biggest mistakes we see across engineering organizations are not taking the time to create well-structured and discoverable technical documentation, and not demanding that documentation undergo the same quality review process as code. Don’t underestimate the power of documentation to support you and your organization during SLO implementation.
You might be tempted to use this book as your documentation, and just ask everyone to read it. Some people in your organization may do that, but many will not. Even among those who do read it, people will take away different things from the book. It’s more effective to create your own documentation, tailored to your organization’s needs, than to expect everyone to read an entire book. See the next section to learn what types of documentation you need to increase your chances of successful SLO adoption.
Your goal is to break down SLO creation into three phases: define the SLO, collect SLIs, and, later, use the SLO. Here is a list of the documentation we recommend at this stage:
Collect a list of the questions you expect people to ask as they begin their own SLO journeys, and compile them into an FAQ document. To start with, you might include questions like:
What if my “user” is another service? Do I still need to care about SLOs?
What if my service’s dependencies don’t have SLOs?
How many SLOs should a service have? How many SLIs?
To supplement your documentation, we recommend developing a few training programs creating the following trainings that you can run at this stage. Here are some ideas:
As a follow-on, walk your audience through your sample service and the instrumentation you implemented to collect SLIs. Building a hands-on lab for your audience may be time-consuming, but depending on the complexity of your platform, building it now may save you time in the future by establishing consistency in how to instrument SLIs.
Make sure to give your attendees a break between hands-on workshops.
People enjoy (and get more out of) real-time, collaboration-based training than watching recordings. In a large enough organization, you may be able to train a few other teachers to help with this. Adding new teachers also provides advantages such as being able to cover more time zones, if a company is distributed, and helping your colleagues in their career growth. But even with multiple teachers, if you need to reach thousands of people it won’t be feasible to train everyone in person, so you’ll need to scale your training by leveraging recordings and online collaboration.
Usually, if the training is lecture-based (that is, for sharing definitions and concepts), a recorded session works great. But if you need to teach people how to think in a specific way and to solve a problem, a collaborative environment will give you a much faster outcome. The 3 hours of collaborative SLO workshops that we attended at a technical conference gave us the same amount of (or even more) experience defining SLOs as 40 hours of watching recordings and reading on our own. In person or online, you can build a collaborative environment by organizing your students in small groups of five or six people, assigning a mentor to each group, and defining the rules of engagement and the desired discussion outcomes. If you are doing this online, consider using a digital whiteboard and video chat to encourage more effective collaboration between group members.
Tell me and I forget. Teach me and I remember. Involve me and I learn.
Benjamin Franklin
Most likely, your first training and workshop will not be perfect. This is to be expected: remember that one of the foundations of an SLO-based approach is acknowledging that nothing is ever perfect, and this extends to your SLO advocacy efforts as well.
Make sure you create a survey to collect feedback from attendees and learn how you can improve the training. It’s also a good idea to deliver your first couple of training sessions to a friendly group of people who already have an idea of what SLOs are, and who can give you candid feedback on what’s missing from your training. As with everything in life, you may get it wrong at first. Listen to your attendees, iterate, and improve. Either online or in person, ensure that you create a collaborative environment among your students so they can practice discussing SLO-based approaches with each other while going through the exercises.
In the first SLO workshop w ran, the hands-on exercise was to define meaningful SLOs for a simple request and response service. After a few workshop sessions, we got clear feedback: the service type we were looking at was too easy. Attendees wanted exercises on working with more complex types of services—storage services, pipelines, continuous compute services, network services, serverless services, and so on. They also asked for a hands-on lab on instrumentation for SLIs, with examples of each service type for them to pick from. We took this feedback on board and added more and more service examples to the workshop over time.
To bring SLO adoption to the next level, you need an example of a real service that has implemented SLOs and can show how much those SLOs have impacted service reliability.
Choose one of the smaller services in your organization and develop SLOs for it. Request and response services without many dependencies are a good option. Define an SLO for the service, instrument it to collect SLIs, and build visualizations for the SLIs that demonstrate the value of SLIs and SLOs in improving service reliability. Work with the engineering team that owns the service to gather their feedback, make any necessary adjustments, and help them start to use SLIs and SLOs in their engineering practices. Then, crucially, document the pilot as a case study and add it to your documentation, so that other teams can read about the experience.
The single biggest problem in communication is the illusion that it has taken place.
George Bernard Shaw
Your next goal is to ensure that your organization is aware of the work you’re doing, the push toward SLO implementation, and any new content you’re building.
Talk at internal meetups or conferences. Organize engineering review sessions to deep-dive into SLO implementation examples. Use every opportunity to talk about SLOs at your internal community events. You need to make sure people know what your role is as SLO Advocate, and how they can find you.
Publish the schedule for your next training sessions, outline who the experts available to help are, and share how people across your organization can get in touch. Understand what channels your organization uses to share information. Make sure you have an SLO landing page with an easy-to-read shortlink and share it over and over, until everyone knows where to find information on SLOs.
Here are some ideas for what information to publish on the landing page:
Your training schedule
An email distribution list that can be used to ask SLO experts questions
A list of Slack or Teams channels dedicated to SLOs
An SLO newsletter, and a distribution list people can subscribe to
Your office hours schedule
As an SLO Advocate, you are the agent of change. You may encounter some challenges in this journey, and facing them with an open mind and positivity is critical to your program’s success. Remember, we learn more from failures than from successes! Here are a few of the issues you may run into, and our suggestions for dealing with them.
First of all, your role may be misunderstood. Some teams will expect you to do SLO implementation for them. You can overcome this obstacle by setting the right expectations, up front and clearly, when you engage with teams. You might also encounter people who simply don’t know what you’re doing at all. You can add clarity to your role by making sure you have a clear backlog and you are tracking SLO advocacy work in the same way you would track your engineering work, breaking it down by artifacts, activities, and other deliverables. Tracking your work will also help you do retrospectives for this program and communicate your deliverables and timeline clearly.
Second, you may encounter some resistance while trying to implement changes to the processes and practices used by your partner teams. People are naturally resistant to change, and you should be prepared for pushback as a result of this. Breaking the changes into smaller iterations and making them easy to implement will help you to overcome change aversion. Try to use your earliest successes to build confidence in what you are doing and turn those teams into SLO evangelists—it may help their career growth and will help your organization to adopt an SLO-based approach faster.
Third, as we mentioned before, you may encounter teams that are overloaded with work or that have well-defined priorities that are not SLOs. In these situations, you can work with leadership to see about prioritizing SLO work against the team’s other responsibilities.
In the Crawl phase of your SLO advocacy journey, you laid the foundation for SLO implementation in your engineering organization and had some initial success. Now, in the Walk phase, you’ll expand your work to other teams and continue building a library of examples. You’ll also make sure your feedback loops and internal communication methods are working well, expand your training program, and revisit how much time you spend working with each team.
By now, you already have one or two services piloting SLO implementation and working on incorporating SLOs into their engineering practices, aiming to improve service reliability. You need to move further, but you can’t tackle all the services in your organization at once. (I’m assuming you have more than three services in your organization; if you don’t, congratulations, you may be very close to completion!)
Choose a number of services to implement SLOs next that you can give white-glove assistance to. It’s a good idea to pick a few services of different types (request/response, pipeline, continuous compute, etc.). It may be tempting to keep looking at request and response services, but you need to build a body of real-life SLO implementation examples for different service types.
Variety is not the only criterion you should use when choosing to which services you will give white-glove assistance. Other things to consider are:
Level of complexity (look for variation here too)
Amenability of the teams
Criticality of the service to the system
Closeness to human users
Schedule weekly meetings with the owners of each service to assist them on their SLO implementation journey. Define a timeline for completing the different SLO implementation phases, and keep each team accountable to ensure that this work doesn’t get deprioritized.
We can’t stress this enough: celebrating achievements will bring positive energy to your mission and accelerate SLO adoption. If teams are not moving forward fast enough (or not moving at all), try to understand what challenges they are facing. It’s not always aversion to change or conflicting priorities; teams may find that in order to implement SLOs, they first need to make fundamental changes to the way their service is built.
To help build confidence about SLO implementation for those services, you could try some of the following:
Start with something as simple as possible, even if that means the SLO isn’t as useful as intended—for example, a single, easy endpoint or a subset of the user flow.
Try something like measuring through synthetics or a dedicated client instead of the service itself.
If the issue is with some other piece of knowledge that is missing from the team, try to get someone knowledgeable in that area to pair with the team. Gaps in knowledge could be in middleware patterns, particular frameworks or tooling, and so forth.
By now, you hopefully have multiple teams interested in implementing SLOs that are trying to get your assistance with that work. While continuing to work with your handful of early adopters, you need to make sure all these other teams have a good level of support, without overloading yourself. Your best friends in this phase will be well-structured documentation and a set of internal case studies teams can use to learn more. Being able to follow the example of your early adopters will help other engineering teams with their own SLO implementation, making it easier for you to scale this program.
If you find yourself receiving a lot of requests from teams to meet with you to ask questions about SLOs, define your boundaries. You can’t meet with every team individually; ask them to read the documentation and attend your office hours. Often, these teams will have one or two questions that can be answered in a few minutes in office hours, and there’s no need to schedule a separate meeting with them. The most frequent question will likely be, “My service looks like this; do you have any examples of SLO implementation for this service type?” Having case studies for different service types from your early adopters will help many other teams.
By now, if your organization is small, you might have trained everyone. But if you’re dealing with a medium or large enterprise, with teams distributed around the country or globe, it’s time to scale your training program. Train other people to deliver the training sessions and workshops you’ve created. If your organization has an internal training team, they might be able to support you in scaling your training. You might even consider handing off training to that team completely, and focusing on other aspects of SLO advocacy. Otherwise, seek out passionate individuals in the organizations you’ve been working with who understand SLOs well, and engage them to scale your training program. Handing off your training work to others will allow you to focus on the other important tasks in this phase.
Earlier, we mentioned that as your advocacy work ramps up you will no longer be able to spend as much time working with every single team. You need to scale how you engage and communicate with teams. Some activities and artifacts you might consider at this stage include:
Not everything can be scaled, unfortunately, so there will be exceptions when you may need to provide 1:1 consultancy to a team with in-person engagement. We recommend limiting this to large or core services that carry significant complexity, with multiple upstream dependencies.
To deal with time zone challenges and maintain your work/life balance, make sure you have at least one SLO expert per region who can provide support in local time. Leverage your early adopter teams to support other teams working on SLO implementation, too.
Communicate, communicate, communicate. Keep yourself accountable, and keep teams implementing SLOs accountable as well. These teams should capture their SLO work in the same work-tracking platform that they use for engineering work (Jira, Bugzilla). Build dashboards reporting on SLO adoption progress. Communicate as much as you can!
When you get to the Run phase, SLO implementation is going viral and everyone is at some stage of SLO maturity. (If this isn’t the case, you might want to consider going back to the Crawl or Walk phases and continue iterating on them until you achieve enough momentum to move to the Run phase.)
In the Run phase, your role is to use what you’ve learned so far and keep improving, by sharing your library of case studies, creating a community of SLO experts, driving platform improvements, and improving your advocacy process. Remember that defining and implementing SLOs is just a first step toward improving reliability. The game changer is actually using SLOs as part of your engineering practices to drive service quality and operational excellence.
Build an internal community of SLO experts, pulling from your early adopters, your other trainers, and anyone else who is passionate about SLOs. SLO experts can support engineering teams across your organization by answering questions and helping with hands-on SLO implementation. Create an email distribution list for these SLO experts, or use other internal communication channels (for example, chat) to give teams an easy way to reach them.
As you go on, continue to improve your platform, review your SLOs, and update your documentation.
Based on what you’ve learned so far, you may have discovered that you need to make some changes at the platform level, or that you need to rethink your observability strategy or reporting toolset. Work with your internal partners on defining those platform improvements.
Even if you’ve had some initial success implementing SLOs for a service, you should review them again a month or two later. Remember, SLOs are a process, not a project. Services evolve and platforms change. What worked well before may no longer be relevant. Use your SLO maturity framework to review SLOs for specific services periodically.
Other ways you can keep improving include:
Lastly, review your existing documentation periodically to make sure it’s up to date. You might even try defining a “freshness” SLO for your documentation and making sure you maintain it above a certain level!
Progress is impossible without change, and those who cannot change their minds cannot change anything.
George Bernard Shaw
This chapter looked at the different phases of the SLO advocacy journey, and the recommended goals and tasks for each one. Your role as an agent of change, seeking not just to implement SLOs in your organization but also to build an SLO culture, is one of the most challenging roles. To help ensure your success, make sure you have executive support and surround yourself with people who believe in your mission and who will keep you accountable. Iterate on everything, overcommunicate, and don’t forget to celebrate successes, no matter how small.
18.191.44.23