for large Internet service companies, 281–283
accountability, governance and, 435–436
action, 26
Agile, 4, 10, 41, 77, 102, 447, 451
agility, 10
Alexander the Great, 57
alignment
service operations synchronization and improvement, 400
strategic cycle, 421–423, 424–432
through iterative approaches, 397
kanban, 399
Scrum Sprint model, 398
top-down approach, 397
ALM (Application Lifecycle Management), 277
analysis paralysis, 134
analytics, data collection and, 292
Anderson, D., 351
APIs, 121
assessment. See also maturity model
automation, 257–258, 261, 318. See also Tools & Automation Engineering
build and CI system, 311
governance and, 458
implementing 5S principles
for large Internet service companies, 281–283
tools, 268
bias, 46
awareness, 95–96, 99, 115, 214. See also situational awareness
identifying what you can or cannot know, 214
AWS, 124
Battle of Kasserine Pass, 176
anchoring, 46
automation, 47
availability, 46
confirmation, 290
representativeness, 46
Blockbuster Entertainment, 7
Boyd, J., 4, 18, 23, 24–25, 28, 33, 36, 40–41, 74, 77, 83. See also OODA loop
Aerial Attack Study, 24
“Destruction and Creation,” 77
business continuity planning, 204, 236
Business Intelligence, 43, 69, 214, 289
CABs (Change Advisory Boards), 14–15
CEO, 68
change management, 317–319, 447–449
Chaos Engineering, 47, 79, 216–217
firefighter arsonists, 141–142
cloud environments, 96–97, 124, 202
Clousewitz, C., 62
code(ing), 10, 13, 41, 48–51, 92, 95–96, 112–113, 136, 233, 241, 251, 252, 268, 271, 319. See also software
defects, 89
metrics, 311
Power Peg, 266
queryable, 321
cognitive bias, 46–47, 161–162. See also bias
collaboration, incentivizing, 251–252
color coding, 90
Commander’s Intent, 63–66, 211
communication, 95. See also information
implicit, 73
Queue Master role and, 385
competitive objectives, 96
unpredictability and, 61
configuration management, 274, 277, 317–319
confusion, 143
context, 43, 141–142, 261, 306
firefighter arsonists, 141–142
of confusion, 143
target outcomes and, 144
continual improvement, 75–77, 84
kata, 191
improvement and problem solving, 192–193
outcome-directed learning, 188–190
continuous delivery, 4, 10, 41, 218–219
continuous integration, 233, 259, 280, 283, 312
coupling management, measuring, 239–241
CRM, 39
C-suite, 116
culture, information flow and, 177–178
customer engagement, 242
cycles. See strategic cycle; tactical cycle
Cynefin, 129–130, 210. See also context
Dark Matter, 340–342. See also friction
Queue Master role and, 379–380
dark pools, 266
data
determining the purpose and value of, 293–294
buying furniture online, 294–295
getting successfully treated at the hospital, 295–297
capture and presentation distortions, 301
trustworthiness and consistency, 303–305
DBAs (datase ddministrators), specialization, 100–101
DBE (Database Engineering), 101
decision-making, 21, 22–23, 144, 221–222, 396. See also Mission Command; situational awareness
on the battlefield, 25
contexts
of confusion, 143
information
accuracy, 43
context, 43
timeliness of, 43
knowledge and, 44
making the best choice the easiest choice, 145–147
OODA loop, 25
action, 26
decide, 26
nonlinear looping, 28
observe, 26
orient, 26
PDCA cycle and, 28
risk, 129
target outcome and, 16
trust and, 44
unpredictability and, 58
defects, 47, 88–91. See also failure(s)
causes of, 89
in code, 89
“Defend against the Madman” approach, 121–122
“as code,” 284
continuous, 4, 10, 41, 218–219
ecosystem, 197
identifying what you can or cannot know, 214
dependencies and, 41
feedback and, 42
multitasking and, 42
“mistake proof,” 90
outcome-directed approach, 190
pipeline, 238
speed, 201
teams, 15
demand
unexpected variability in, 114–116
Demming, E., 28
failures and, 237
serial loopback, 102
team, 313
deployment, automation and, 260
development environment, instrumenting, 310–314
DevOps, 4, 9, 10, 15, 18, 33, 40, 53, 76, 151, 168, 197, 199, 251
governance, 453
intent, 454
requirements compliance, 458
production services and, 253
uptime, 65
waste. See also waste
WIP (“work in progress”), 103–104
directive briefing, 66
anti-goals and constraints, 68–69
situational overview, 67
statement of desired outcome, 67
disaster recovery, 236
Docker, 97
Donaldson, L., 296
downtime, 123
complexity, 61
delivery, 197
development environment, instrumenting, 310–314
observability, 147–151, 307–309
rebuilding, 323
situational awareness and, 154–156
unknowns, 310
education, formal, shortcomings of, 51–52
EM (Energy-Maneuverability) theory, 24, 83
engagement, measuring, 241–243
entry management, Queue Master, 378–379
Equifax, 9
exploits, 8
dependencies and, 237
friction, 42
timeliness of, 306
Fingerspitzengefühl, 74
firefighter arsonists, 141–142
flexibility, governance and, 458–460
“fog of war,” 58. See also awareness
“follow the sun” model, 384–385
other work when on duty, 386–389
sync point management, 385
formal education, shortcomings of, 51–52
Fowler, M., 119
frameworks. See also Mission Command
configuration management and delivery hygiene, measuring, 232–234
engagement, measuring, 241–243
incentivizing collaboration and improvement, 251–252
information flow and instrumentation, measuring, 229–231
measuring code quality, 224–225
single point of failure mitigation and coupling management, measuring, 239–241
supportability, measuring, 235–239
OKR (Objectives and Key Results), 37
finding and fixing problems in organizations
locally focused metrics, 168–169
Fredendall, L., 176
friction, 9, 14, 37, 57, 83, 84, 459–460. See also waste
configuration, 204
dependencies and, 41
feedback and, 42
multitasking and, 42
speed and, 40
“gemba walk,” 307–309, 430–431
Gilb, T., 208
Gitlab.com, 12
goals
competitive objectives, 96
common mistakes, 440
out-of-the-box process tooling and workflows, 450–453
poor requirement drafting and understanding, 440–445
using off-the-shelf frameworks, 445–447
DevOps, 453
factors for successful, 434–435
maintaining situational awareness and learning, 438–440
no target outcome interference, 437–438
intent, 454
requirements compliance, 458
hindsight bias, 47, 51–52, 162
Hölzle, U., 123
hooks, queryable, 321
human brain, 154–156. See also mental models, situational awareness and
IDE (integrated development environment), 90
IKEA effect, 47
ilities, 207–209, 217–218, 221–222, 223, 224
implicit communication, 73
improvement and problem solving kata, 192–193
incident management, 179, 238, 338
indirect interference, 439–440
industry best practices, 13–14
information, 396. See also data
accuracy, 43
capturing, 290
context, 43
ecosystem dynamics and, 169–172
transmission mismatches, 173–176
timeliness of, 43
instrumentation, 291, 293–294, 295, 305, 306
configuration management, 317–319
interpreting, 300
observability and, 310
production, 320
tool, 316
wastewater management, 328–331
intent, governance and, 434, 435–437, 454
interference
internal cloud, 97
Invisible Gorilla, The, 155
irregularity, 83
IT, 197
cloud environments, friction and, 96–97
ecosystem, instrumenting, 331–333
friction, 86
governance, 433
information flow, 175
mura (“irregularity”), 113–114
overproduction, 92
risk, 127
service management, 337
service providers, data resilience and recovery, 204
JM (Job Methods), 84
JMX (Java Management Extensions), 321
flow and, 368
kaizen, 356
limits of a workflow board, 367
timestamping tasks, 364
WIP (work in progress), 365–367
improvement and problem solving, 192–193
Klein, G., 158
Knight Capital Group, 9, 263–266, 301
knowledge. See also learning
decision-making and, 44
-execution mismatches, 108
“single points of failure,” 120
skills and, 185
known
knowns, 131
unknowns, 133
large Internet service companies, implementing 5S principles, 281–283
leadership, 55
service delivery, 3
top-down approach, 58
Lean, 77, 80, 83, 125–126, 145, 147–148, 350
flow and, 368
kaizen, 356
limits of a workflow board, 367
timestamping tasks, 364
WIP (work in progress), 365–367
kata, 272
improvement and problem solving, 192–193
mura (“irregularity”), 113–114. See also unpredictability
unexpected demand variability, 114–116
unmanaged variability, 118
outcome-directed learning, 188–190
“stop the line,” 90
continual improvement and, 75–77
kata, 191
improvement and problem solving, 192–193
standardized tests and, 185–187
Legal and Compliance teams, 300–301, 442–445, 454
locally focused metrics, 168–169
MacArthur, D., 85
managers, 210–214, 277–278, 371–372. See also Queue Master; Service Engineering Lead; teams
managing
micro-, 278
manufacturing
waste. See also waste
WIP (“work in progress”), 103–104
incentivizing collaboration and improvement, 251–252
measuring code quality, 224–225
metrics
configuration management and delivery hygiene, 232–234
information flow and instrumentation, 229–231
single point of failure mitigation and coupling management, 239–241
Maven, 270
Meltdown, 8
data interpretation issues, 160
method-driven management, 340, 446–447
metrics
configuration management and delivery hygiene, 232–234
information flow and instrumentation, 229–231
interpreting, 300
single point of failure mitigation and coupling management, 239–241
Microsoft Azure, 13
misalignment, 60–61, 396. See also alignment
Commander’s Intent, 63–66, 211
continual improvement and, 75–77
directive briefing, 66
anti-goals and constraints, 68–69
situational overview, 67
statement of desired outcome, 67
organizational impacts of, 80–81
ecosystem complexity and, 61
through knowledge and awareness weaknesses, 59–60
Mobius outcome delivery approach, 3
Moltke, H., 62, 66, 72, 77, 78, 80, 83
MTBF (Mean Time Between Failure), 123
Muda, 86–88, 115. See also waste
multitasking, 42
mura (“irregularity”), 113–114. See also unpredictability
unexpected demand variability, 114–116
unmanaged variability, 118
Muri (“overburden”), 109–110, 115
negative events, 75
Netflix, Simian Army, 122–124, 216–217
nonlinear looping, 28
observability, 295
instrumenting for, 310
observation, 26
office hours, Queue Master, 382–383
off-the-shelf tools, 303–304, 305
OKR (Objectives and Key Results) framework, 37
on-demand service delivery, 205
act, 26
decide, 26
nonlinear looping, 28
observe, 26
PDCA cycle and, 28
open source software, 10
Operation Millennium Challenge, 25, 40
optimism bias, 162
ordered systems
organizational impacts of Mission Command, 80–81
organizations
finding and fixing framing problems
locally focused metrics, 168–169
specialization and, 98–99, 100–101
orientation, 26
origins of Mission Command, 56–57
outcome-directed learning, 188–190
ownership, building a sense of, 199
PaaS (platform-as-a-service), 10
Parvin, A., 35
PDCA cycle, 28
performance, 207, 208. See also ilities
PII (personally identifiable information), 438–439
“playing it safe,” 76
POMs (project object models), 270
Power Peg, 266
pride, building a sense of, 199
process(es), 201. See also governance
CRM, 39
OODA loop and, 28
standardization, 275
production, instrumenting, 320
productivity, 84
QA (quality assurance), 319
queryable hooks, 321
Queue Master, 75, 91, 104, 106, 147–148, 175, 181, 220, 372–373, 404–405
“follow the sun” model, 384–385
other work when on duty, 386–389
sync point management, 385
role mechanics, 374
sorting and dependency discovery, 378–379
rollout challenges, 389
junior team members as Queue Masters, 391–393
pushy Queue Masters, 391
sync points, 394
team members not seeing the value, 389–390
traditional managers thwarting rollout, 390–391
rebuilding the ecosystem, 323
reflection, 396
releases, overloading, 111–113
reliability, 207
representativeness bias, 46
retrospective, 192–193, 411–413
general meeting structure, 413–415
learning and improvement discussion, 415–421
kaizen, 356
decision-making, 129
confusion and, 143
making the best choice the easiest choice, 145–147
ecosystem observability, improving, 147–151
ilities, 208
service delivery, 9
Service Engineering Lead, 244–246
SaaS (Software-as-a-Service), 200
satisficing, 47
Scharnhorst, D., 72
On War, 58
Scrum, 10, 77, 214, 314, 398, 404–408
SDK (software development kit), 445
security
breaches
Equifax, 9
SolarWinds, 128
tracking and analysis, 325–326
“Selective Attention Test,” 155
separation of duties, 435–436, 439
serial loopback, 102
serverless architecture, 136
service delivery, 3, 4, 15, 17, 204–205, 221
agility, 10
on-demand, 205
identifying what you can or cannot know, 214
dependencies and, 41
feedback and, 42
leadership, 3
configuration management and delivery hygiene, measuring, 232–234
engagement, measuring, 241–243
incentivizing collaboration and improvement, 251–252
information flow and instrumentation, measuring, 229–231
measuring code quality, 224–225
single point of failure mitigation and coupling management, measuring, 239–241
supportability, measuring, 235–239
measuring, 17
Mission Command, 33
outcome-directed approach, 190
risk
situational awareness, 205–207
systems thinking, 80
target outcomes, 16, 18, 293–294
uptime and, 65
Service Engineering Lead, 243–244, 380
on the delivery team, 250, 253
organizational configurations and, 248–250
overcoming the operational experience gap, 254–255
Shingo, S., 87
single point of failure, 120, 239–241
situational awareness, 2–3, 4, 16, 42–44, 111, 114, 115–116, 120, 206. See also kanban
Fingerspitzengefühl, 74
ecosystem dynamics and, 169–172
transmission mismatches, 173–176
data interpretation issues, 160
method-driven management and, 340
operational knowledge, 206–207
strategies for improving, 163, 181–182
Skillman, P., 183
SLAs (service-level agreements), 202, 203, 445
software
buggy, 271
customization, 279
development, overloading releases, 111–113
open source, 10
packaging, 234
SolarWinds, Sunburst exploit, 128
specialists, 98
specialization, 98–99, 100–101, 102–104
Spectre, 8
speed
friction and, 40
Spotify, 99
start-ups
development awareness, 312–314
implementing 5S principles, 278–281
Sun Tzu, 25
sunk cost fallacy, 162
supply chain friction, bullwhip effect, 116–118
supportability, measuring, 235–239
sync points, 400
synthetic transaction monitoring, 304
general meeting structure, 413–415
learning and improvement discussion, 415–421
target outcomes, 16, 17, 18, 30–33, 62–63, 67, 214, 293–294
context and, 144
task(s), 371. See also kanban; workflow(s)
black holes, 346
identifiers, 311
structured approach, 183
team-based approach, 189
Taylor, F. W., The Principles of Scientific Management, 37
TCO (total cost of ownership), 146–147
TDD (test-driven development), 122
teams, 198, 199. See also Queue Master; Service Engineering Lead
dependencies, 313
operational experience gap, 254–255
seiketsu and, 274
separation of duties, 435–436, 439
Service Engineering Lead, 243–244, 253
organizational configurations and, 248–250
telecommunications, legacy voice products, 92–93
Terraform, 97
testing, instrumenting, 319–320
tiger teams, 194
tools, 24, 43, 44, 90, 95–96, 203, 207, 217, 221–222, 250, 251, 274. See also Cynefin
Dark Matter and, 342
instrumenting, 316
problem-solving, 193
seiketsu and, 274
Simian Army, 124
target outcome and, 16
Tools & Automation Engineering, 283–284
organizational details, 285
retrospective and, 435
workflow and sync points, 285–287
top-down
alignment, 397
leadership, 58
Toyota, 74, 83, 85, 87, 125–126
transparency, 121, 136, 207, 214, 243–244
decision-making and, 44
TWI (“Traning Within Industry”) program, 4, 84
unknown unknowns, 135
firefighter arsonists, 141–142
confusion, 143
observability and, 148
continual improvement and, 75–77
ecosystem complexity and, 61
“fog of war,” 58
through knowledge and awareness weaknesses, 59–60
workload, 335
variability, 123
visualizing, workflow, 349–351. See also kanban
“walking the gemba,” 307–309, 430–431
causes of, 89
in code, 89
mura (“irregularity”), 113–114
unexpected demand variability, 114–116
unmanaged variability, 118
wastewater management ecosystem, instrumenting, 328–331
WIP (work in progress), 103–104, 365–367
Woolworths Australia, 17
work environment, 95
workflow(s). See also kanban
flow and improvement, managing, 368
limits of, 367
timestamping tasks, 364
WIP (work in progress), 365–367
Queue Master role and, 380–381
18.217.4.206