Appendix A. SLO Definition Template

As mentioned earlier, consistency is invaluable to ensuring that your SLOs are understandable to everyone. This is where templatized SLO definition documents come into play.

SLO Definition: Service Name

SLO Dashboard: Link

Primary Author(s): Who owns this document and should be contacted about it?

Team: Who owns this service and should be contacted about it?

Collaborators: Anyone who contributed but isn’t a primary author?

Original Proposal Date: Date

Last Updated Date: Date

Approval Date: Date

Next Revisit Date: Date

Approver(s):

Approver Status Date
Name Yes/No/Pending YYYY-MM-DD
Insert as many rows as you need.

Service Overview

Briefly describe the service in question here. Keep things to about a paragraph. You can provide links to additional documentation about the service if you would like. Focus on the service from the viewpoint of its users (whether those are humans or other services).

SLIs and SLOs

Dashboard: Link to where people can get a visual representation of your performance.

Category SLI SLO
Parta
Categoryb Descriptionc SLO1d
Querye SLO2f
Part
Category Description SLO1
Query SLO2
Insert or remove as many rows as you need.

a The part or component of your service that is being addressed by a certain SLI. For example, this could be an API, a public-facing HTTP server, a data processing pipeline, or something else. Your service might have only one or many components that warrant an SLI.

b The type of SLI being measured. For example, this could be availability, latency, data correctness, data freshness, and so on.

c A human-readable description of what is being measured. For example, “The proportion of successful HTTP requests from external sources.”

d The SLO that is being informed by the SLI in question. For example, “95% of requests < 200 ms.”

e The actual query from your systems that deliver the SLI.

f A single SLI might drive multiple SLOs. For example, while “95% of requests < 200 ms” might be your first SLO, you might also want to ensure that “98% of requests < 400 ms.”

Rationale

Provide a short rationale for why these SLIs and SLOs were chosen. Try to keep this to a paragraph or so. You can link to additional documentation here if you would like.

Revision Date Details
Revision # YYYY-MM-DD Summary of the changes
Insert as many more rows as you need.

Revisit Schedule

Describe here how often you plan to revisit the defined values in this document and send it back out for approval. When first establishing your SLIs and SLOs, this should be frequent (once a month is a good starting point), but as your values become more in line with reality, you can scale this back to every quarter or even every year.

Error Budget Policy

Error budget Threshold Actions
SLO X Action to be taken
Insert as many more rows as you need.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.6.77