Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

B. Proofs for Chapter 9

About the Author

Index

A

access, Other Qualities
accuracy, Accuracy-Accuracy, Validity
AIOps (Artificial intelligence operations)-style approach, Picking an SLO number is something a human should do
alert
- definition of, The Shortcomings of Simple Threshold Alerting
  - (see also threshold alert)
- dual, Rolling Windows
- error budgets and, Error Budgets and Response Time-Error Budget Burn Rate
- false positive as, Alert fatigue and fog of war
  - (see also alert fatigue, fog of war)
- recommendations, Parting Recommendations-Parting Recommendations
- troubleshooting, Troubleshooting with SLO Alerting-Run the old and new in parallel
- user experience and, A Better Way
alert attachment, SLO Alerting in a Brownfield Setup
alert fatigue, Alert fatigue and fog of war
Allspaw, John, Incidents are unique
Amazon S3, Durability
Amazon Web Services, Architectural Considerations: Hardware
Apache Flink, Low-Lag, High-Throughput Batch Processing
Apache Kafka, Low-Lag, High-Throughput Batch Processing
Apache Spark, Low-Lag, High-Throughput Batch Processing
approvers, Approvers-Approvers
architecture
- microservice-oriented, Reliability Engineering, Architectural Considerations: Hardware
- service-oriented, Architectural Considerations: Hardware, Architectural Considerations: Monolith or Microservices
- SLO-driven, Architectural Considerations: Hardware, Architectural Considerations: Hardware
Artificial intelligence operations (AIOps)-style approach, Picking an SLO number is something a human should do
Artificial Intelligence, A Modern Approach, Further Reading
asynchronous request, Asynchronous requests
auto-remediation, Error Budgets and Response Time
availability
- data application properties and, Availability-Availability
- definition of, Caring About Many Things
average, Expected value (see mean)
- (see also expected value)
Azure, Durability

B

Backblaze, Measuring Hardware
baseline, Rolling Windows
batch job, Batch jobs, Batch Latency
batch process (see batch request)
batch request, Batch requests-Batch requests
bathtub curve, SLI Example: Durability
Bayes’ theorem, Bayes’ theorem-Bayes’ theorem, Bayesian Inference-Bayesian Inference
Bell, Gordon, Architecting for Reliability
Bentley, Jon, Architecting for Reliability
Bernoulli trial, Coin interlude-Coin interlude, SLI Example: Low QPS, Modeling events with the Poisson distribution, Proof
Beyer, Betsy, Purposely Burning Budget, A Better Way
bias, definition of, Coin interlude
binomial distribution, SLI Example: Low QPS, SLI Example: Low QPS
- (see also geometric distribution, negative binomial distribution)
binomial theorem, Proof
binomial trial, SLI Example: Low QPS, SLI Example: Low QPS
birth-death process (see M/M/1 queue)
black swan event, How to Think About Reliability, Error budget burn policies
blackhole exercise, Blackhole Exercises
brownfield, How to Do SLO Alerting, SLO Alerting in a Brownfield Setup, Parting Recommendations
Bruce, Andrew, Further Reading
Bruce, Peter, Further Reading
Bugzilla, Scale Your Communications
burn rate, Rolling Windows

C

cache, Architectural Considerations: Hardware, Revisited, The Importance of Identifying and Understanding Dependencies, The Design of a Service
- (see also capacity cache, latency cache)
caching layer, Turning hard dependencies into soft dependencies, Project Focus
calendar-bound window, Rolling versus calendar-bound windows-Rolling versus calendar-bound windows
CAP theorem, Consistency
capacity cache, Architectural Considerations: Hardware, Revisited
cardinality, The five Ms, TSDBs and our design goals
- (see also distinct combinations of values)
Cauchy distribution, Expected value
chaos engineering, Experimentation and Chaos Engineering
- (see also blackhole exercise, load test)
Chubby, Purposely Burning Budget
collaboration, Collaboration-based training-Collaboration-based training
common vulnerabilities and exposures (CVEs), Counting Incidents
communication, Scale Your Communications-Scale Your Communications
completeness, Completeness-Completeness
complex distributed system, Complexity and failure in distributed systems
complex system
- definition of, Reliability Engineering
- failures, The Problem of Being Too Reliable
- interactions, Unclear correlation between threshold and behavior and nonrange alerting-Unclear correlation between threshold and behavior and nonrange alerting
- organization and, Queueing systems
comprehensiveness (see completeness)
compute platform, Compute platforms
confidence interval, The highest density interval
consistency, Consistency-Consistency
container platforms, Platforms as Services-SLO: Container platform
content delivery network (CDN), How a Service Grows, A worked reporting example
continuous probability distribution, The exponential distribution
correctness, Validity
CPU usage
- latency and, Poor proxies for user experience-Poor proxies for user experience, Picking an SLO number is something a human should do
- server-side, Poor proxies for user experience
credible interval, The highest density interval
- (see also highest density interval (HDI))
cumulative distribution function (CDF), Variance, percentiles, and the cumulative distribution function-Variance, percentiles, and the cumulative distribution function, Proof
customer research, Listening to Users
customer service, A Happier Business, Listening to Users (Redux)

D

dashboards, Lessons Learned the Hard Way, Dashboards-Dashboards, SLO Status-SLO Status
Data Analysis with Open Source Tools, Further Reading
data application, Designing Data Applications
data application properties, Data Application Properties-Robustness
data conformance (see validity)
data processing pipeline, Data processing pipelines
data properties, Data and Data Application Reliability-Durability
data quality, Data Properties
data reliability
- properties, Data and Data Application Reliability
  - (see also data application properties, data properties)
- service and, Data Services
database and storage system, Databases and storage systems
Davidovič, Štĕpán, A Better Way
DDoS attack, Incidents are unique, A worked reporting example
- (see also volumetric attack)
dependency, Dependency Changes-Dependency Introduction or Retirement
dependency math, Dependency math-Dependency math
Design Patterns, Architecting for Reliability
Designing Data-Intensive Applications, Scalability
deviations, Ranges-Ranges
DevOps, How to Think About Reliability
discoverability, Discoverability-Dashboards
distinct combinations of values, TSDBs and our design goals
- (see also cardinality)
distributed denial of service attack, Incidents are unique
- (see also DDoS attack)
distribution, Statistical distribution support-Statistical distribution support, Other Qualities
distribution tail, Expected value
documentation, Create Your Supporting Artifacts-Training
- (see also SLO document)
Doing Bayesian Data Analysis, Further Reading
downtime, How to Think About Reliability, The Problem of Being Too Reliable, Availability, Resilience
Drucker, Peter, What do your company executives and business partners care about?
Dunning and Ertl’s t-digest algorithm, Statistical distribution support
durability
- data properties and, Durability, Durability
- SLIs and, SLI Example: Durability-SLI Example: Durability

E

Engineering and the Design and Operation of Manufacturing Systems, Architecting for Reliability
engineering team, Engineering-Engineering, Order of Operation, Engineering
error budget deficit, Events-based error budget math, Decision Making
error budget recovery, Events-based error budget math
error budget surplus, Events-based error budget math, Decision Making
error budgets
- alert and, Error Budgets and Response Time-Error Budget Burn Rate
- approaches and, Error Budgets, How to Use Error Budgets
- benefits of, Operations
- burning, Purposely Burning Budget
- decision making and, Decision Making, Exhausting your error budget-Using surplus error budget
- definition of, The Reliability Stack
- establishing, Establishing Error Budgets-Establishing Error Budgets
- events-based, Establishing Error Budgets-Events-based error budget math
- experimentation and, Experimentation and Chaos Engineering-Experimentation and Chaos Engineering
- policies and, Error Budget Policies-Error budget exceeded policies, Your First Error Budget Policy (and Your First Critical Test), Error budget policy
- projects and, To Release New Features or Not?-Project Focus
- reporting, Error Budget Status-Error Budget Status
- risk factors of, Examining Risk Factors, Examining Risk Factors
- time-based, Establishing Error Budgets, Time-based error budget math-Time-based error budget math, Error Budget Status
error injection, Experimentation and Chaos Engineering
error rate
- importance of, SLO: Front page loads and latency, SLO: Search results
- measuring, Service Level Indicators, Databases and storage systems, Measuring Complex Service User Reliability, Measuring Complex Service User Reliability
error ratio rate, Latency-Sensitive Request Processing, Latency-Sensitive Request Processing
errors, Data Application Failures-Data Application Failures
events, definition of, Sample spaces
Ewaschuk, Rob, A Better Way
executive leadership, Executive Leadership-Executive Leadership, Leadership-Leadership
expectation, Expected value
- (see also expected value)
expected value, Expected value-Expected value
exponential distribution, The exponential distribution-The exponential distribution, SLI Example: Durability, Proof

F

failure domain, Lessons Learned the Hard Way, Architectural Considerations: Hardware
failure mode, Architectural Considerations: Anticipating Failure Modes
failures, Failure-Induced Changes, Paying Attention to Failures
fault tolerance, Resilience
faults, Data Application Failures-Data Application Failures
feature freeze, No new features (feature freeze)-No new features (feature freeze)
flexible targets, Flexible Targets, Statistical distribution support, TSDBs and our design goals, Structured event databases and our design goals
flood attack, Incidents are unique
fog of war, Alert fatigue and fog of war
Fowler, Susan, Architectural Considerations: Monolith or Microservices
freshness, Freshness-Freshness, TSDBs and our design goals, Structured event databases and our design goals, Freshness-Freshness

G

Gamma, Erich, Architecting for Reliability
Gaussian distribution, Expected value
- (see also normal distribution)
geometric distribution, SLI Example: Low QPS
Gershwin, Stanley B., Architecting for Reliability
Google, Making Agreements, Summary, Statistical distribution support
Google Cloud Platform, Durability
Google Docs, Document Repositories
granularity, Accuracy
greenfield, How to Do SLO Alerting-How to Do SLO Alerting

H

hard dependency, Service Dependencies-Turning hard dependencies into soft dependencies, Purposely Burning Budget
hardware
- changes, Thresholds don’t stay relevant, A Better Way
- failures, Modeling events with the Poisson distribution
- measuring, Measuring Hardware-Beyond just hardware
- network and, Hardware and the network
- patterns, Architecting for Reliability, Architecting for Reliability
high dynamic range (HDR) histograms, Statistical distribution support
highest density interval (HDI), The highest density interval-The highest density interval
histograms, Statistical distribution support, Coin interlude, SLI Example: Low QPS-SLI Example: Low QPS
- (see also high dynamic range (HDR) histograms, latency histogram)
Hopper, Grace, Data Services
hosted services, Open Source or Hosted Services
Hyrum’s law, Making Agreements

I

IEEE Standard Glossary of Software Engineering Terminology, The, Robustness
impact, Quantitative Analysis of Systems
incidents
- about, Counting Incidents-Counting Incidents
- severity levels and, Counting Incidents-Severity Levels, SLOs for Basic Reporting
- types of, Incidents are unique
  - (see also DDoS attack, flood attack, SYN flood, volumetric attack)
independence, Independence
infrastructure monitoring system, Centralized Time Series Statistics (Metrics)
- (see also time series database (TSDB))
instrumentation, Instrumentation! The System Also Needs Instrumentation!-Instrumentation! The System Also Needs Instrumentation!
integrity, Integrity-Integrity

J

Janert, Philipp, Further Reading
Jira, Scale Your Communications
Jobs, Steve, Listening to Users
joint distribution, Proof

K

Kenobi, Obi-Wan, Designing Data Applications
key performance indicator (KPI), A Happier Business, Batch requests
Kleppmann, Martin, Scalability
Kruschke, John, Further Reading

L

Large-Scale Cluster Management at Google with Borg, Architectural Considerations: Hardware
last-in first-out (LIFO) queue, Decreasing latency
latency
- client-side, Poor proxies for user experience-Poor proxies for user experience
- CPU usage and, Poor proxies for user experience-Poor proxies for user experience
- distribution, Statistical distribution support
- histogram, Statistical distribution support
- measuring, Measuring Complex Service User Reliability, Single-team component services, Percentile Thresholds, Establishing Error Budgets
- performance and, Performance
- prediction, Decreasing latency
- queueing, SLI Example: Queueing Latency-Variance, percentiles, and the cumulative distribution function
- rate, Latency-Sensitive Request Processing
- response and, Latency-Sensitive Request Processing, Other Services as Users: Buying Products-Other Services as Users: Buying Products
latency cache, Architectural Considerations: Hardware, Revisited
latency-sensitive request processing, Latency-Sensitive Request Processing-Latency-Sensitive Request Processing
law of conditional probability, Proof, Proof
law of large numbers, Expected value
law of total probability, Proof
legal team, Legal-Legal, Order of Operation, Legal
library of case studies, Create a Library of Case Studies, Share Your Library of SLO Case Studies
load balancer, Architectural Considerations: Anticipating Failure Modes
load test, Load and Stress Tests, Error budget burn policies
log lines, SLO: Business data analysis
logging, Structured Event Databases (Logging)
long tail, Percentiles, Percentile Thresholds, Percentile Thresholds
lookahead, Rolling Windows
low-lag, high-throughput batch processing, Low-Lag, High-Throughput Batch Processing

M

M/M/1 queue, Decreasing latency-Decreasing latency
M/M/c queue, Adding capacity
Majors, Charity, Percentile Thresholds
MAP estimator, Maximum a Posteriori, The relationship between MLE and MAP, Bayesian Inference
MapReduce, Low-Lag, High-Throughput Batch Processing, Completeness
Markdown, Document Repositories
Markovian, Decreasing latency
max value, The five Ms
maximum a posteriori, Maximum a Posteriori, The relationship between MLE and MAP
- (see also MAP estimator)
maximum likelihood estimation (MLE), Maximum Likelihood Estimation, The relationship between MLE and MAP
mean, The five Ms, Expected value
- (see also expected value)
mean time between failures (MTBF), Quantitative Analysis of Systems
mean time to <something> (MTTX), The Problem with Mean Time to X, The Problem with Mean Time to X-Incidents are unique, Incidents are unique, SLOs for Basic Reporting, A worked reporting example
mean time to detect (MTTD), Quantitative Analysis of Systems
mean time to mitigate (MTTM), Quantitative Analysis of Systems
mean time to repair (MTTR), Architectural Considerations: Hardware
mean time to resolution (MTTR), Means aren’t always meaningful
median, The five Ms-The five Ms, Median
message queue, Low-Lag, High-Throughput Batch Processing
metric attributes, Metric Attributes
metrics system, A Written Example-A Written Example, Centralized Time Series Statistics (Metrics), Measurement Changes-Calculation Changes
- (see also time series database (TSDB))
microservice
- approach, The Reliability Stack
- dependencies, Service Dependencies and Components, Dependency math
- organization and, Something More Complex, Owners and stakeholders, Architectural Considerations: Monolith or Microservices, Architectural Considerations: Monolith or Microservices
min value, The five Ms
mobile and web clients, Mobile and Web Clients-Mobile and Web Clients
mode, The five Ms
monolith, Architectural Considerations: Monolith or Microservices
MTBF (mean time between failures), Quantitative Analysis of Systems
MTTD (mean time to detect), Quantitative Analysis of Systems
MTTM (mean time to mitigate), Quantitative Analysis of Systems
MTTR (mean time to repair), Architectural Considerations: Hardware
MTTR (mean time to resolution), Means aren’t always meaningful
multidimensional probability distribution, Proof
multimodal dataset, The five Ms
multiple comparison problem, The Problem with Too Many SLOs
multiple-team component services, Multiple-team component services
Murphy, Niall, Show the human impact of the current situation
mutability, Other Qualities

N

negative binomial distribution, SLI Example: Low QPS
nested request processing (see latency-sensitive request processing)
nines, The Problem with the Number Nine-The Problem with the Number Nine, Percentile Thresholds, Putting It Together, Corner Cases, What can you do?, Increased Utilization Changes
Non-Abstract Large System Design (NALSD), Architecting for Reliability, Architectural Considerations: Hardware
nonhomogeneous Poisson process, SLI Example: Durability
normal distribution, Expected value
- (see also Gaussian distribution)
Norvig, Peter, Further Reading

O

Objective and Key Result (OKR), SLOs Are a Process, Not a Project
observability
- approaches to, A Better Way
- definition of, Complexity and failure in distributed systems
- monitoring, Common Machinery, Low-Lag, High-Throughput Batch Processing, Mobile and Web Clients, The General Case, Run the old and new in parallel
- system, Troubleshooting with SLO Alerting, Parting Recommendations
Office 365, Document Repositories
OKR (see remote procedure call (RPC))
open source software, Open Source or Hosted Services-Open Source or Hosted Services, SLO: Internal wiki-SLO: Internal wiki
open source software (OSS), Centralized Time Series Statistics (Metrics)
OpenTelemetry, Latency-Sensitive Request Processing
operational underload, The Problem of Being Too Reliable
operations team, Operations, Order of Operation, Operations
opportunity cost, Cost
order of operations, Order of Operation-Order of Operation
OSS (see open source software (OSS))
outliers, The five Ms, Percentiles, Mobile and Web Clients
ownership, Ownership-Ownership

P

parameter, Bayes’ theorem
PDF (see probability density function (PDF))
percentile thresholds, Percentile Thresholds-Percentile Thresholds
percentiles, Percentiles
performance, Reliability Engineering, Mobile and Web Clients, Show the human impact of the current situation, Architecting for Reliability, Example System: Image-Serving Service, Architectural Considerations: Anticipating Failure Modes, Performance
Philosophy of Alerting, A Better Way
phraseology, Phraseology
platform, Compute platforms, Platform Changes-Platform Changes
- (see also container platforms, computer platform)
PMF, SLI Example: Low QPS
- (see also probability mass function)
pod, Platforms as Services-SLO: Container platform
point estimator, Bayesian Inference
Poisson distribution, Modeling events with the Poisson distribution-Modeling events with the Poisson distribution
Poisson process, Modeling events with the Poisson distribution, The exponential distribution, SLI Example: Durability, SLI Example: Durability, Proof
- (see also nonhomogeneous Poisson process)
polyglot persistence, Designing Data Applications
posterior, Maximum a Posteriori, Bayes’ theorem
posterior distribution, Bayesian Inference
PR (pull request), Error Budgets for Humans
Practical Statistics for Data Scientists, Further Reading
PRD (product requirement document), Product
precision, Systems and Building Blocks, Accuracy
prior, Using MAP, Using MAP (see prior probability)
prior probability, Bayes’ theorem
privacy (see security)
probability, Probability and Statistics for SLIs and SLOs-On Probability
probability density function (PDF), The exponential distribution
probability distribution, SLI Example: Low QPS, Expected value
- (see also expected value)
probability mass function (PMF), SLI Example: Low QPS
prober, What can you do?
product management team, Product-Product, Order of Operation, Product-Product
product requirement document (PRD), Product
Production-Ready Microservices, Architectural Considerations: Monolith or Microservices
Programming Pearls, Architecting for Reliability
Prometheus, Measurement Changes
proofs, Theorem 1-Proof
pull request (PR), Error Budgets for Humans
Push on Green model, Architecting for Reliability

Q

QA team, QA-QA, Order of Operation, QA
quality, Quality
quantile function, Variance, percentiles, and the cumulative distribution function
quantity, Quantity-Quantity
queueing theory, Queueing systems

R

random variables, Coin interlude
range, Ranges-Ranges
recoverability (see resilience)
Reduce Toil Through Better Alerting, A Better Way
redundancy, Other Qualities
reliability
- concepts of, How Reliable Should You Be?-How to Think About Reliability
- costs of, Reliability Is Expensive-Reliability Is Expensive
- definition of, Caring About Many Things
- hardware and, Architectural Considerations: Hardware
- problems, The Problem of Being Too Reliable
- reporting, Reliability Reporting-Basic Reporting, SLOs for Basic Reporting-Advanced Reporting
- service and, Service Truths, Reliability Engineering-Implied Agreements, Business Alignment and SLIs, User Happiness
- utilization changes and, Increased Utilization Changes-Functional Utilization Changes
- worked example of, A Worked Example of Reliability-A Worked Example of Reliability
reliability burndown, SLO Status
reliability engineering, How to Think About Reliability-Reliability Engineering
Reliability Stack, The Reliability Stack, Service Level Indicators, Error Budgets, Developing Meaningful Service Level Indicators, How to Use Error Budgets
reliability targets, Reliability Targets-The Problem with the Number Nine, Beyond just hardware
remote procedure call (RPC), Request and response APIs
reporting
- advanced, Advanced Reporting
- basic, SLOs for Basic Reporting-A worked reporting example
- incidents and, Counting Incidents-Means aren’t always meaningful
- security and, Counting Incidents
- stakeholders and, Basic Reporting-Basic Reporting
request and response API, Request and response APIs-Data processing pipelines, A Request and Response Service-A Request and Response Service, Quantity, SLO: Business data analysis
requests (see asynchronous request, batch request, synchronous request)
resilience, Resilience-Resilience
resolution, Resolution-Quality
retention horizon, Structured event databases and our design goals
retrospective meetings, Error Budgets for Humans
revisits, Periodic Revisits, Functional Utilization Changes, Dependency Introduction or Retirement, Tooling Changes, Revisit Schedules, Definition status, Revisit schedule
RFC 2119, Error budget burn policies
robustness, Robustness-Robustness
rolling window, Rolling versus calendar-bound windows, Rolling Windows-Rolling Windows
RPC (see remote procedure call (RPC))
Russell, Stuart, Further Reading

S

SaaS, Centralized Time Series Statistics (Metrics), Mobile and Web Clients, Architectural Considerations: Hardware
sample, The five Ms
sample space, Sample spaces
sampling, Other Qualities
scalability, Scalability-Scalability
scalars, Aggregate analysis
scale, The exponential distribution
Search as a Service (SaaS), The Reliability Stack
security, Security-Security
Security as a Service (SaaS), The Reliability Stack
Seeking SRE, Show the human impact of the current situation
Serra, James, Designing Data Applications
service
- sorts of, Open Source or Hosted Services-Open Source or Hosted Services
  - (see also hosted services, open source services)
- truths, Service Truths
service components, Service Components-Single-team component services
- (see also multiple-team component services, single-team component services)
service dependency, Service Dependencies-Dependency math
- (see also hard dependency, soft dependency)
service failure, Choosing Good Service Level Objectives
service level agreements (SLAs)
- business changes and, User Requirement Changes
- definition of, The Reliability Stack-The Reliability Stack, Legal
- legal team and, Legal, Order of Operation, Legal, SLO: Checkout success
service level indicators (SLIs)
- approaches, What Meaningful SLIs Provide, The General Case
- benefits of, Developing Meaningful Service Level Indicators-A Happier Business, Legal
- complications with, Data processing pipelines, Databases and storage systems, Iterate Over Everything
- definition of, The Reliability Stack-Service Level Indicators
- determiners, Low-Lag, High-Throughput Batch Processing
- durability, SLI Example: Durability-SLI Example: Durability
- meaningful, What Meaningful SLIs Provide-A Written Example
- measuring, Measuring Many Things by Measuring Only a Few, Past Performance, What Will Your SLIs Be?, How to Change SLOs
service level objectives (SLOs)
- adoption lessons, Lessons Learned the Hard Way
- alerting, Alerting (see alert, threshold alert)
- approaches, Things to Keep in Mind-It’s All About Humans, Reliability Engineering-Reliability Engineering, Making Agreements, Caring About Many Things, Business Alignment and SLIs, Owners and stakeholders, Strategies for Shifting Culture, SLOs for Basic Reporting, Advanced Reporting
- benefits of, Engineering-Legal, Product, Data Application Failures
- buy-in for, Engineering Is More than Code-Order of Operation, Getting Buy-in-Assign it, Prepare Your Sales Pitch
- changes to, How to Change SLOs-Revisit Schedules
- culture of, Path to a Culture of SLOs-Advocating for Others to Use SLOs
- definition of, The Reliability Stack-Service Level Objectives, Quantitative Analysis of Systems
- definition templates, SLO Definition: Service Name-External Links
- document, Start with a document, SLO Definition Documents-Phraseology, Document Repositories
- example services and, Web services-Hardware and the network
- goals, Design Goals-Organizational Constraints
- implementation strategies, Latency-Sensitive Request Processing-The General Case, Assign it-Exhausting your error budget, The First Pass-Periodic Revisits
  - (see also latency-sensitive request processing; low-lag, high throughput processing; mobile and web clients)
- measuring, What is important to measure?-What is important to measure?
- objections to, Common Objections and How to Overcome Them-QA, Summary
- problems with, The Problem with Too Many SLOs
- reports, SLO Reports
- targets, But I am big enough!, Choosing Targets, Percentiles, Flexible Targets
  - (see also flexible targets, testable targets)
silver bullets, No new features (feature freeze)
single-team component services, Single-team component services
Site Reliability Engineering (book), Purposely Burning Budget, Do Your Research
Site Reliability Engineering (SRE), Things to Keep in Mind, How to Think About Reliability, Reliability for Things You Don’t Own, Architecting for Reliability
Site Reliability Workbook, The, Rolling Windows, Architecting for Reliability, Do Your Research
SLO Advocate
- about role, SLO Advocacy-SLO Advocacy
- Crawl phase, Crawl-Learn How to Handle Challenges
- Run phase, Run-Continuously Improve
- Walk phase, Walk-Scale Your Communications
slow burn problem, Error Budgets and Response Time-Error Budgets and Response Time
soft dependency, Service Dependencies-Turning hard dependencies into soft dependencies
Software as a Service (SaaS), The Reliability Stack
span, Latency-Sensitive Request Processing
specification, Architecting for Reliability
standard deviation, Variance, percentiles, and the cumulative distribution function
statistical approaches, Basic Statistics-Percentiles
statistics, The five Ms, Probability and Statistics for SLIs and SLOs
Stockholm syndrome, SLO Alerting in a Brownfield Setup
Storage as a Service (SaaS), The Reliability Stack
stress test, Load and Stress Tests, Error budget burn policies
structured event database, Structured Event Databases (Logging)-Structured event databases and our design goals
structured logging data, Cost
- (see also structured events database)
SYN flood attack, Incidents are unique
- (see also flood attack)
synchronous request, Synchronous requests-Synchronous requests
system failures, Choosing Good Service Level Objectives
systems architect, Architecting for Reliability, Systems and Building Blocks
systems engineering, Architecting for Reliability

T

telemetry, The Reliability Stack, Single-team component services, Mobile and Web Clients, Complexity and failure in distributed systems, Completeness
testable targets, Testable Targets, Statistical distribution support, TSDBs and our design goals, Structured event databases and our design goals
thaw tax, No new features (feature freeze)
threshold alert
- about, Thresholds don’t stay relevant-Thresholds don’t stay relevant
- definition of, The Shortcomings of Simple Threshold Alerting
- problems and, Picking an SLO number is something a human should do, Complexity and failure in distributed systems
- problems with, Counting Incidents
  - (see also reporting incidents)
- slow burn problem and, Error Budgets and Response Time
throughput
- data application properties and, Performance
- high, Cost, Structured event databases and our design goals, Low-Lag, High-Throughput Batch Processing
- low, Structured event databases and our design goals, Latency-Sensitive Request Processing
- use of, Summary
Tilbrook, D., Run the old and new in parallel
time, Other Qualities
time series data, Cost, Centralized Time Series Statistics (Metrics)-TSDBs and our design goals
time series database (TSDB), Centralized Time Series Statistics (Metrics)-TSDBs and our design goals
time windows, Rolling versus calendar-bound windows-Choosing a time window
- (see also calendar-bound window, rolling window)
timeliness (see freshness)
tooling, Tooling Changes-Calculation Changes, Discoverability Tooling
training, Training-Learn How to Handle Challenges, Scale Your Training Program by Adding More Trainers
transactional API, Service Dependency Changes
trials, definition of, Sample spaces, Independence
Trusted Platform Modules (TPM), Integrity

U

understandability, Understandability-Phraseology
uninformative prior, The relationship between MLE and MAP
uptime, Service Truths, How to Think About Reliability, Caring About Many Things, Availability, Other Qualities
user
- definition of, Service Truths, What Is a Service?
- expectations, User Expectation and Requirement Changes-User Requirement Changes
- happiness, User Happiness
- internal, Internal Users
- service expectations and, Developing Meaningful Service Level Indicators-Developing Meaningful Service Level Indicators

V

validity, Validity-Validity
variance, Ranges
virtual private cloud (VPC), Organizational Constraints, Structured event databases and our design goals
volumetric attack, Incidents are unique
- (see also SYN flood attack)
VPC (see virtual private cloud (VPC))

W

web services, Web services
Wiener Shirt-zel Clothing Company, The, Dogs Deserve Clothes-Customers: Finding and Browsing Products
Wright, Hyrum, Making Agreements

Z

Zawinski, Jamie, Run the old and new in parallel

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.143.9.115