index

Numerics

24-hour rule 195-196

A

action items 205-207

follow up on 207

ownership of 205-206

actionable, alerts as 98

after-action reports 192

Agile method 241

alert criteria 97-103

noisy alerts 101-103

thresholds 99-100

alert fatigue. See on-call rotation

@all alias 236

ALTER TABLE statement 175, 210

approvals

automating 22-25

capturing purpose of 18-19

process for 20-22

acceptable risk of change 21-22

necessary people are informed 21

no conflicting actions 21

work is in appropriate state 20-21

areas graph 56

artifact deployment 157

artifacts, defined 82

attributes 173

auth_users table 209-210

automation 6, 11, 17-153

approach for 141-152

automating complex tasks 152

automating complicated tasks 150-152

automating simple tasks 148-150

complexity in tasks 145-147

designing for safety 143-145

ranking tasks 147

safety in tasks 142-143

approval process 20-25

acceptable risk of change 21-22

capturing purpose of approvals 18-19

necessary people are informed 21

no conflicting actions 21

work is in appropriate state 20-21

business impact to 119-120

defining goals 132-137

automation as requirement in all tools
132-133

building automation into time estimates 136

prioritizing automation 133-135

reflecting automation as priority with staff 135

scheduled time for automation 137

time for training and learning 135-136

deployment pipeline 188-190

error handling 27-28

fixing cultural problems 126-131

cost of manual work 128-131

stop accepting manual solutions 126

supporting 126-128

improvements made by 117-119

frequency of performance 118

queue time 117-118

time to perform 118

variance in performance 118-119

logging process 25-26

notification process 26-27

prioritizing 131

automation (continued)

setting as cultural priority 121-123

priority 122-123

time 122

urgency 123

skill-set gap 137-141

building new skill set 140-141

reducing friction around support 139-140

staffing for 123-125

teams with monolithic skill sets 124

victims of environment 124-125

B

backlog, defined 282

bar graphs 56-57

base class 23

BetaAlgorithm 169

blogging 233-234

blue/green deployments 171-174

broadcastmessage.sh 150

build artifact 82, 86

business stakeholders, inviting to postmortems 198-199

buy-in from everyone 139

C

CAMS (culture, automation, metrics, and sharing) 5-6

change management policy 12

@channel alias 236

chaotic tasks 146

chat 234-237

chatbots 236-237

benefits of 236-237

shared responsibility with 237

company etiquette 234-236

limiting use of all channel notifications 236

short, live-topic focused channels 235

threading functionality 235-236

updating status regularly 236

checkout processing time 100

CLASSPATH variable 179

color points 222

commitments 274

complexity in tasks 145-147

automating

complex tasks 152

complicated tasks 150-152

simple tasks 148-150

complex tasks 146-147

complicated tasks 146

ranking tasks 147

simple tasks 146

composite alerting 100

concurrency 163

confidence interval 128

configuration files 184-187

configuration management modification 186

dynamic configuration through key/value stores 185-186

linking 186-187

configuration management 173

consumer_daemon 197-198, 202-204, 206

context 46

continuous integration (CI) 87, 181

continuous integration/continuous deployment (CI/CD) pipeline 135

core group of points 222

counters 34

crushing change control 156

cultural norms 240

culture 268

changing 244-255

creating rituals 251-253

culture chiefs 247-249

examining company values 249-251

sharing culture 244-247

using rituals and language to change cultural norms 253-255

defining 240-243

cultural rituals 241-242

cultural values 240

underlying assumptions 242-243

fixing cultural problems 126-131

cost of manual work 128-131

stop accepting manual solutions 126

supporting 126-128

influence on behavior 243-244

setting automation as cultural priority 121-123

priority 122-123

time 122

urgency 123

talent recruitment and retention 255-267

evaluating candidates 265-266

interviewing candidates 260-264

mindset 255-257

number of candidates to interview 266-267

obsession with senior engineers 257-260

culture chiefs 247, 274

culture pillar 5

Cynefin framework 145

D

dashboards 54, 64

naming 63-64

organizing 61-63

leading the reader 62-63

rows 61-62

starting with user 53-54

widgets 54-58

bar graphs 56-57

gauges 58

giving context to 58-61

line graphs 56

database deployment 157

database-level rollbacks 175-178

rules for database changes 175-178

versioning databases 178

DEBUG level 46

departmental goals 272

-depends flag 183

deployment artifacts 179

configuration files in packages 187

configuration file linking 186-187

configuration management modification 186

dynamic configuration through key/value stores 185-186

deployment artifact rollbacks 174-175

package management 179-184

deployment pipelines 82

detection score 40, 42

DevOps 7

history of 2-4

motivation for book 7

pillars of 5-6

what DevOps isn’t 4

DevOps Days conference 3

DevSecOps 3, 89-90

direct_delivery_items query 210

documenting postmortems 207-210

action items 210

cognitive and process issues 210

incident details 208

incident summary 208

incident walk-through 208-210

Donut plugin 257

DROP COLUMN statement 177

E

Eisenhower decision matrix 276

end-to-end tests

limiting number of 79-80

overview 73-76

error counts 34

error handling 19, 27-28

ERROR level 46

error rates 34

F

facts 173

false value 83-84

FATAL level 46

feature deployment 157

feature flagging 83-84

rollbacks 168-169

when to toggle off 169-170

feedback loop 165

fleet deployment 157

fleet rollbacks 171-174

FMEA (failure mode and effects analysis) 40-43

example process 41-43

metrics from incidents and failures 43

scope 40-41

team 40-41

FPM (Effing Package Management) 180

frequency of execution 120

frequency of performance 118

G

gatekeeping and gatekeepers

evaluating gatekeeper behavior 220

example of 13

problems created by 13-16

gauges 34, 58

goals

overview 270

tiers of 270-274

departmental goals 272

getting goals 273-274

organizational goals 271-272

team goals 272-273

grains 173

H

hash functions 143

hashed database 143

@here alias 236

human resources, inviting to postmortems 199

I

important box 276

incident reports 192

INFO level 46

information hoarding

chat 234-237

chatbots 236-237

company etiquette 234-236

how it happens 213-214

making knowledge discoverable 223-234

learning rituals 229-234

structuring knowledge stores 223-228

structuring communication effectively 221-222

defining audience 222

defining topic 221

outlining key points 222

presenting calls to action 222

unintentional hoarders 214-220

abstraction vs. obfuscation 217-219

access restrictions 219-220

documentation not valued by 215-217

evaluating gatekeeper behavior 220

install.sh script 179

integration tests 67, 72-73

intended audience section 63

intentional hoarding 214

is_approved method 23

iterations, defined 280

J

JSON (JavaScript Object Notation) 44

K

key performance indicators (KPIs) 113

key/value stores 185-186

knowledge discovery 223

knowledge retrieval 223

knowledge stores 223-234

learning rituals 229-234

blogging 233-234

hosting external events 232-233

lightning talks 231-232

lunch-and-learns 229-231

structuring 223-228

common lexicon 223-224

document hierarchy 224-228

structuring around topics 228

knowledge transfer 229

KPIs (key performance indicators) 113

L

language, sharing culture through 244

latency 34

leading the reader 62

learning rituals 229-234

blogging 233-234

hosting external events 232-233

lightning talks 231-232

lunch-and-learns 229-231

lightning talks 231-232

line graphs 54-56

linting 253

log aggregation 44-51

arguments for spending money 48-49

building vs. buying 49-51

hurdles of 48-51

identifying what to log 46-48

log message context 46-48

overview 44-46

logging process 19, 25-26

logrotate command 102

long-term objectives 205

lunch-and-learns 229-231

M

MappableEntityUpdateConsumer 202

mental models 194

messages.processed.count 38

messages.published.count 36

messages.queue.new_orders.size 37

messages.queue.size 37

metrics 6, 11, 33, 40

custom metrics 34-35

defining healthy metrics 39-40

failure mode and effects analysis 40-43

example process 41-43

metrics from incidents and failures 43

scope 41

team 40-41

vanity metrics 80-81

mortgage_calc function 71

N

noisy alerts 101-103

not important box 276

not urgent box 276

note widgets 62

notification process 19, 26-27

NULL column 176

O

occurrence factor 40, 42

off-hour deployments 190

automating deployment pipeline 188-190

deployment artifacts 179-187

configuration files in packages 184-187

package management 179-184

example of 154-156

layers of deployment 156-158

making deployments routine 159-164

accurate preproduction environments
159-160

staging environment 162-164

reducing fear by reducing risk 167

reducing fear through frequency 164-166

rollbacks 168-178

database-level rollbacks 175-178

deployment artifact rollbacks 174-175

feature flagging 168-169

fleet rollbacks 171-174

on-call compensation talks 108

on-call PTO 108

on-call rotation

alert criteria 97-103

noisy alerts 101-103

thresholds 99

compensating for 109

increased work-from-home flexibility 109

monetary compensation 107

time off 108

defining 95-97

time to acknowledge 96

time to begin 97

time to resolve 97

on-call support projects 112-113

performance reporting 113-114

purpose of 94-95

staffing 104-106

tracking on-call happiness 109-112

delivery method 111

individual being alerted 110

level of urgency 110-111

timing 111-112

operational automation 117

operational blindness 51

changing scope of development and operations 31-32

example of 30-31

log aggregation 44-51

arguments for spending money 48-49

building vs. buying 49-51

hurdles of 48-51

identifying what to log 46-48

log message context 46-48

overview 44-46

operational visibility 33-43

custom metrics 34-35

deciding what to measure 35-39

defining healthy metrics 39-40

failure mode and effects analysis 40-43

understanding product 32-33

opportunity cost 155, 216

optimal maximum size 106

organizational goals 271-272

P

package management systems 174

packages 187

configuration files in 184-187

configuration file linking 186

configuration management modification 186

dynamic configuration through key/value stores 185-186

package management 179-184

pair programming 247

Pareto principle 228

paternalist syndrome 28

automation 17-28

approval process 20-22

automating approvals 22-25

capturing purpose of approvals 18-19

error handling 27-28

logging process 25-26

notification process 26-27

creating barriers instead of safeguards 9-12

ensuring continuous improvement 28

gatekeeping and gatekeepers 12

example of 13

problems created by 13-16

PeerApproval class 23

perceived usefulness 77

performance reporting 113-114

pipeline executors 85

postmortems 43-211

action items 205, 207

follow up on 207

ownership of 206

choosing whom to invite to 198-199

business stakeholders 198-199

human resources 199

project managers 198

24-hour rule 195-196

mental models 194

defined 192

documenting 207-210

action items 210

cognitive and process issues 210

postmortems (continued)

incident details 208

incident summary 208

incident walk-through 208-210

example incident 197-198

sharing 210-211

detailing each event in 199-200

primary on-call person 96

prioritization

consciousness 274-279

Eisenhower decision matrix 276

priority vs. urgency and importance 274-276

saying no to commitment 277-279

structuring team’s work 280-283

populating iteration 281-283

time-slicing 280-281

unplanned work 283-289

controlling 283-286

dealing with 286-289

process rituals 251

project managers, inviting to postmortems 198

properly prioritized alerts 98

publishers of systems 36

pulling data 57

pushing data 57

Q

QA (quality assurance) 65

quality 65-91

continuous deployment vs. continuous delivery 81-83

DevSecOps 89-90

feature flagging 83-84

pipeline execution 84-87

test suite 76-81

avoiding vanity metrics 80-81

failing immediately after encountering failure 77

isolating test suite 78-79

limiting number of end-to-end tests 79-80

not tolerating flaky tests 78

testing infrastructure management 88-89

testing pyramid 69-76

end-to-end tests 73-76

integration tests 72-73

overview 66-69

unit tests 69-71

quality as a condiment antipattern 66, 89

queue latency 37

queue time 117-118, 133

queueing systems 35

R

READ LOCK 209-210

read replica 53

release cramming 156

releases 82

retrospectives 192

risk priority number (RPN) 40, 42

rituals

cultural rituals

creating 251-253

defining 241-242

embracing failure with 252-253

sharing culture through 246-247

using and to change cultural norms 253-255

learning rituals 229-234

blogging 233-234

hosting external events 232-233

lightning talks 231-232

lunch-and-learns 229-231

rm -rf * command 142

rollbacks 168-178

database-level rollbacks 175-178

rules for database changes 175-178

versioning databases 178

deployment artifact rollbacks 174-175

feature flagging 168-170

fleet rollbacks 171-174

runbooks 98

rushed features 156

S

S3 (Simple Storage Service) 102

SaaS (software-as-a-service) 44

safety in tasks

designing for 143-145

acquiring operator’s perspective 144

avoiding unexpected side effects 145

confirming risky actions 145

never assuming user’s knowledge 144

overview 142-143

scorecards 127

SELECT query 209

SELECT/INSERT statement 177

senior engineers, obsession with 257-260

hiring junior engineers 259-260

removing years of experience 258-259

service-level objectives (SLOs) 96

severity 40, 42

sharing 6

culture 244-247

through language 244-245

through ritual 246-247

through story 245-246

sharing (continued)

postmortems 210-211

problems through conversation 256-257

short-term objectives 205

Sidekiq node 209

simple tasks

automating 148-150

overview 146

skill-set gap 137-141

building new skill set 140-141

reducing friction around support 139-140

skin-in-the-game concept 139

social rituals 251

sprints 280

stacked line graph 56

stand-up meetings 241

standard change 21

stated values 249

structured logs 44

subject-matter experts (SMEs) 41, 229

synthetic transactions 162

T

talent recruitment and retention 255-267

evaluating candidates 265-266

interviewing candidates 260-264

identifying passion 263

interview panel 261

structuring interview questions 261-262

technical interview questions 263-264

mindset 255-257

number of candidates to interview 266-267

obsession with senior engineers 257-260

hiring junior engineers 259-260

removing years of experience 258-259

team goals 272-273

test coverage 80

testing

test suite 76-81

avoiding vanity metrics 80-81

failing immediately after encountering failure 77

isolating test suite 78-79

limiting number of end-to-end tests 79-80

not tolerating flaky tests 78

testing infrastructure management 88-89

testing pyramid 69-76

end-to-end tests 73-76

integration tests 72-73

overview 66-69

unit tests 69-71

TEXT field 175

thresholds

alert criteria 99-100

giving context to widgets through threshold lines 59-60

throughput 33

time to acknowledge 96

time to begin 97

time to perform 118

time to resolve 97

time-slicing 280-281

timely alerts 98

TLS (Transport Layer Security) 159

tool chain 132

true value 83-84

U

UI tests 73

unintentional hoarding 214

unit tests 69-71

determining what to test 71

determining what to unit test 71

structure of 70-71

unplanned work 283-289

controlling 283-286

coworker unplanned work 284-285

doing it vs. defering it 286

system unplanned work 285-286

dealing with 286-289

update-orders.sh 148-149

urgent box 276

V

values

defining 240

examining company values 249-251

vanity metrics, avoiding 80-81

VARCHAR(10) 175

variability 164

variance in performance 118-120

W

WARN level 46

waterfall model 281

widgets 54-58

bar graphs 56-57

gauges 58

giving context to 58-61

through color 58-59

through threshold lines 59-60

through time comparisons 60-61

line graphs 54-56

work queue systems 280

Y

yum downgrade command 189

yum install httpd command 142

yum update funco-webserver 188

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.228.95