Index

A

Accelerated receive flow steering, 523

Accelerators in USE method, 49

accept system calls, 95

Access timestamps, 371

ACK detection in TCP, 512

Actions in bpftrace, 769

Active benchmarking, 657660

Active listening in three-way handshakes, 511

Active pages in page caches, 318

Activities overview, 34

Ad hoc checklist method, 4344

Adaptive mutex locks, 198

Adaptive Replacement Cache (ARC), 381

Address space, 304

guests, 603

kernel, 90

memory, 304, 310

processes, 95, 99102, 319322

threads, 227228

virtual memory, 104, 305

Address space layout randomization (ASLR), 723

Advanced Format for magnetic rotational disks, 437

AF_NETLINK address family, 145146

Agents

monitoring software, 137138

product monitoring, 79

AKS (Azure Kubernetes Service), 586

Alerts, 8

Algorithms

caching, 36

congestion control, 115, 118, 513514

big O notation, 175176

Allocation groups in XFS, 380

Allocators

memory, 309

multithreaded applications, 353

process virtual address space, 320321

Amazon EKS (Elastic Kubernetes Service), 586

Amdahl’s Law of Scalability, 6465

Analysis

benchmarking, 644646, 665666

capacity planning, 38, 7172

drill-down, 5556

I/O traces, 478479

latency, 5657, 384386, 454455

off-CPU, 188192

resource, 3839

thread state, 193197

workload, 45, 3940

Analysis step in scientific method, 4445

Analysis strategy in case study, 784

annotate subcommand for perf, 673

Anonymous memory, 304

Anonymous paging, 305307

Anti-methods

blame-someone-else, 43

random change, 4243

streetlight, 42

Apdex (application performance index), 174

Application calls, tuning, 415416

Application I/O, 369, 435

Application instrumentation in off-CPU analysis, 189

Application internals, 213

Application layer, file system latency in, 384

Application performance index (Apdex), 174

Applications, 171

basics, 172173

big O notation, 175176

bpftrace for, 765

common case optimization, 174

common problems, 213215

exercises, 216217

internals, 213

latency documentation, 385

methodology. See Applications methodology

missing stacks, 215216

missing symbols, 214

objectives, 173174

observability, 174

observability tools. See Applications observability tools

performance techniques. See Applications performance techniques

programming languages. See Applications programming languages

references, 217218

Applications methodology

CPU profiling, 187189

distributed tracing, 199

lock analysis, 198

off-CPU analysis, 189192

overview, 186187

static performance tuning, 198199

syscall analysis, 192

thread state analysis, 193197

USE method, 193

Applications observability tools

bpftrace, 209213

execsnoop, 207208

offcputime, 204205

overview, 199200

perf, 200203

profile, 203204

strace, 205207

syscount, 208209

Applications performance techniques

buffers, 177

caching, 176

concurrency and parallelism, 177181

I/O size selection, 176

non-blocking I/O, 181

Performance Mantras, 182

polling, 177

processor binding, 181182

Applications programming languages, 182183

compiled, 183184

garbage collection, 184185

interpreted, 184185

virtual machines, 185

Appropriateness level in methodologies, 2829

ARC (Adaptive Replacement Cache), 381

Architecture

CPUs. See CPUs architecture

disks. See Disks architecture

file systems. See File systems architecture

vs. loads, 581582

memory. See Memory architecture

networks. See Networks architecture

scalable, 581582

archive subcommand for perf, 673

arcstat.pl tool, 410

arg variables for bpftrace, 778

argdist tool, 757759

Arguments

kprobes, 152

networks, 507

tracepoints, 148149

uprobes, 154

Arithmetic mean, 74

Arrival process in queueing systems, 67

ASG (auto scaling group)

capacity planning, 72

cloud computing, 583584

ASLR (address space layout randomization), 723

Associativity in caches, 234

Asynchronous disk I/O, 434435

Asynchronous interrupts, 9697

Asynchronous writes, 366

atop tool, 285

Auto scaling group (ASG)

capacity planning, 72

cloud computing, 583584

available_filter_functions file, 710

Available swap, 309

available_tracers file, 710

Averages, 7475

avg function, 780

await metric, 461

Axes

flame graphs, 10, 187, 290

heat maps, 289, 410, 488489

line charts, 59, 80

scalability tests, 62

scatter plots, 8182, 488

Azure Kubernetes Service (AKS), 586

B

Back-ends in instruction pipeline, 224

Background color in flame graphs, 291

Backlogs in network connections, 507, 519520, 556557, 569

Bad paging, 305

Balloon drivers, 597

Bandwidth

disks, 424

interconnects, 237

networks, 500, 508, 532533

OS virtualization, 614615

Bare-metal hypervisors, 587

Baseline statistics, 59

BATCH scheduling policy, 243

BBR (Bottleneck Bandwidth and RTT) algorithm, 118, 513

bcache technology, 117

BCC (BPF Compiler Collection), 12

vs. bpftrace, 760

disks, 450

documentation, 760761

installing, 754

multi-purpose tools, 757

multi-tool example, 759

networks, 526

one-liners, 757759

overview, 753754

vs. perf-tools, 747748

single-purpose tools, 755757

slow disks case study, 17

system-wide tracing, 136

tool overview, 754755

bcc-tools tool package, 132

BEGIN probes in bpftrace, 774

bench subcommand for perf, 673

Benchmark paradox, 648649

Benchmarketing, 642

Benchmarking, 641642

analysis, 644646

capacity planning, 70

CPUs, 254

effective, 643644

exercises, 668

failures, 645651

industry standards, 654656

memory, 328

micro-benchmarking. See Micro-benchmarking

questions, 667668

reasons, 642643

references, 669670

replay, 654

simulation, 653654

specials, 650

SysBench system, 294

types, 13, 651656

Benchmarking methodology

active, 657660

checklist, 666667

CPU profiling, 660661

custom benchmarks, 662

overview, 656

passive, 656657

ramping load, 662664

sanity checks, 664665

statistical analysis, 665666

USE method, 661

workload characterization, 662

Berkeley Packet Filter (BPF), 751752

BCC compiler. See BCC (BPF Compiler Collection)

bpftrace. See bpftrace tool

description, 1213

extended. See Extended BPF

iterator, 562

JIT compiler, 117

kernels, 92

OS virtualization tracing, 620, 624625, 629

vs. perf-tools, 747748

program, 90

Berkeley Software Distribution (BSD), 113

BFQ (Budget Fair Queueing) I/O schedulers, 119, 449

Big kernel lock (BKL) performance bottleneck, 116

Big O notation, 175176

Billing in cloud computing, 584

Bimodal performance, 76

Binary executable files, 183

Binary translations in hardware virtualization, 588, 590

Binding

CPU, 253, 297298

NUMA, 353

processor, 181182

bioerr tool, 487

biolatency tool

BCC, 753755

disks, 450, 468470

example, 753754

biopattern tool, 487

BIOS, tuning, 299

biosnoop tool

BCC, 755

disks, 470472

event tracing, 58

hardware virtualization, 604605

outliers, 471472

queued time, 472

system-wide tracing, 136

biostacks tool, 474475

biotop tool

BCC, 755

disks, 450, 473474

Bit width in CPUs, 229

bitesize tool

BCC, 755

perf-tools, 743

blame command, 120

Blame-someone-else anti-method, 43

Blanco, Brenden, 753

Blind faith benchmarking, 645

blk tracer, 708

blkio control group, 610, 617

blkreplay tool, 493

blktrace tool

action filtering, 478

action identifiers, 477

analysis, 478479

default output, 476477

description, 116

disks, 475479

RWBS description, 477

visualizations, 479

Block-based file systems, 375376

Block caches in disk I/O, 430

Block device interface, 109110, 447

Block I/O state in delay accounting, 145

Block I/O times for disks, 427428, 472

Block interleaving, 378

Block size

defined, 360

FFS, 378

Block stores in cloud computing, 584

Blue-green cloud computing deployments, 34

Bonnie and Bonnie++ benchmarking tools

active benchmarking, 657660

file systems, 412414

Boolean expressions in bpftrace, 775776

Boot options, security, 298299

Boot-time tracing, 119

Borkmann, Daniel, 121

Borrowed virtual time (BVT) schedulers, 595

Bottleneck Bandwidth and RTT (BBR) algorithm, 118, 513

Bottlenecks

capacity planning, 7071

complexity, 6

defined, 22

USE method, 4750, 245, 324, 450451

BPF. See Berkeley Packet Filter (BPF)

bpftrace tool, 1213

application internals, 213

vs. BCC, 752753, 760

block I/O events, 625, 658659

description, 282

disk I/O errors, 483

disk I/O latency, 482483

disk I/O size, 480481

event sources, 558

examples, 284, 761762

file system internals, 408

hardware virtualization, 602

I/O profiling, 210212

installing, 762

lock tracing, 212213

malloc() bytes flame graph, 346

memory internals, 346347

one-liners for CPUs, 283, 803804

one-liners for disks, 479480, 806807

one-liners for file systems, 402403, 805806

one-liners for memory, 343344, 804805

one-liners for networks, 550552, 807808

one-liners overview, 763765

package contents, 132

packet inspection, 526

page fault flame graphs, 346

programming. See bpftrace tool programming

references, 782

scheduling internals, 284285

signal tracing, 209210

socket tracing, 552555

stacks viewing, 450

syscall tracing, 403405

system-wide tracing, 136

TCP tracing, 555557

tracepoints, 149

user allocation stacks, 345

VFS tracing, 405408

bpftrace tool programming

actions, 769

comments, 767

documentation, 781

example, 766

filters, 769

flow control, 775777

functions, 770772, 778781

Hello, World! program, 770

operators, 776777

probe arguments, 775

probe format, 768

probe types, 774775

probe wildcards, 768769

program structure, 767

timing, 772773

usage, 766767

variables, 770771, 777778

BQL (Byte Queue Limits)

driver queues, 524

tuning, 571

Branch prediction in instruction pipeline, 224

Breakpoints in perf, 680

brk system calls, 95

brkstack tool, 348

Broadcast network messages, 503

BSD (Berkeley Software Distribution), 113

btrace tool, 476, 478

btrfs file system, 381382, 399

btrfsdist tool, 755

btrfsslower tool, 755

btt tool, 478

Buckets

hash tables, 180

heat maps, 8283

Buddy allocators, 317

Budget Fair Queueing (BFQ) I/O schedulers, 119, 449

buf function, 778

Buffer caches, 110, 374

Bufferbloat, 507

Buffers

applications, 177

block devices, 110, 374

networks, 507

ring, 522

TCP, 520, 569

bufgrow tool, 409

Bug database systems

applications, 172

case studies, 792793

buildid-cache subcommand for perf, 673

Built-in bpftrace variables, 770, 777778

Bursting in cloud computing, 584, 614615

Buses, memory, 312313

BVT (borrowed virtual time) schedulers, 595

Bypass, kernel, 94

Byte Queue Limits (BQL)

driver queues, 524

tuning, 571

Bytecode, 185

C

C, C++

compiled languages, 183

symbols, 214

stacks, 215

C-states in CPUs, 231

c2c subcommand for perf, 673, 702

Cache Allocation Technology (CAT), 118, 596

Cache miss rate, 36

Cache warmth, 222

cachegrind tool, 135

Caches and caching

applications, 176

associativity, 234

block devices, 110, 374

cache line size, 234

coherency, 234235

CPUs, hardware virtualization, 596

CPUs, memory, 221222, 314

CPUs, OS virtualization, 615616

CPUs, processors, 230235

CPUs, vs. GPUs, 240

defined, 23

dentry, 375

disks, I/O, 430

disks, on-disk, 437

disks, tuning, 456

file systems, flushing, 414

file systems, OS virtualization, 613

file systems, overview, 361363

file systems, tuning, 389, 414416

file systems, types, 373375

file systems, usage, 309

inode, 375

methodologies, 3537

micro-benchmarking test, 390

operating systems, 108109

page, 315, 374

perf events, 680

RAID, 445

tuning, 60

write-back, 365

cachestat tool

file systems, 399, 658659

memory, 348

perf-tools, 743

slow disks case study, 17

Caching disk model, 425426

Canary testing, 3

Capacity-based utilization, 34

Capacity of file systems, 371

Capacity planning

benchmarking for, 642

cloud computing, 582584

defined, 4

factor analysis, 7172

micro-benchmarking, 70

overview, 69

resource analysis, 38

resource limits, 7071

scaling solutions, 7273

CAPI (Coherent Accelerator Processor Interface), 236

Carrier sense multiple access with collision detection (CSMA/CD) algorithm, 516

CAS (column address strobe) latency, 311

Cascading failures, 5

Case studies

analysis strategy, 784

bug database systems, 792793

conclusion, 792

configuration, 786788

PMCs, 788789

problem statement, 783784

references, 793

slow disks, 1618

software change, 1819

software events, 789790

statistics, 784786

tracing, 790792

Casual benchmarking, 645

CAT (Cache Allocation Technology), 118, 596

cat function, 779

CAT (Intel Cache Allocation Technology), 118, 596

CFQ (completely fair queueing), 115, 449

CFS (completely fair scheduler), 116117

CPU scheduling, 241

CPU shares, 614615

description, 243

cgroup file, 141

cgroup variable, 778

cgroupid function, 779

cgroups

block I/O, 494

description, 116, 118

Linux kernel, 116

memory, 317, 353

OS virtualization, 606, 608611, 613620, 630

resource management, 111, 298

statistics, 139, 141, 620622, 627628

cgtop tool, 621

Character devices, 109110

Characterizing memory usage, 325326

Cheating in benchmarking, 650651

Checklists

ad hoc checklist method, 4344

benchmarking, 666

CPUs, 247, 527

disks, 453

file systems, 387

Linux 60-second analysis, 15

memory, 325

Chip-level multiprocessing (CMP), 220

chrt command, 295

Cilium, 509, 586, 617

Circular buffers for applications, 177

CISCs (complex instruction set computers), 224

clang complier, 122

Classes, scheduling

CPUs, 242243

I/O, 493

kernel, 106, 115

priority, 295

Clean memory, 306

clear function in bpftrace, 780

clear subcommand in trace-cmd, 735

clock routine, 99

Clocks

CPUs, 223, 230

CPUs vs. GPUs, 240

operating systems, 99

clone system calls, 94, 100

Cloud APIs, 580

Cloud computing, 579580

background, 580581

capacity planning, 582584

comparisons, 634636

vs. enterprise, 62

exercises, 636637

hardware virtualization. See Hardware virtualization

instance types, 581

lightweight virtualization, 630633

multitenancy, 585586

orchestration, 586

OS virtualization. See OS virtualization

overview, 14

PMCs, 158

proof-of-concept testing, 3

references, 637639

scalable architecture, 581582

storage, 584585

types, 634

Cloud-native databases, 582

Clue-based approach in thread state analysis, 196

Clusters in cloud computing, 586

CMP (chip-level multiprocessing), 220

CNI (container network interface) software, 586

Co-routines in applications, 178

Coarse view in profiling, 35

Code changes in cloud computing, 583

Coefficient of variation (CoV), 76

Coherence

caches, 234235

models, 63

Coherent Accelerator Processor Interface (CAPI), 236

Cold caches, 36

collectd agent, 138

Collisions

hash, 180

networks, 516

Colors in flame graphs, 291

Column address strobe (CAS) latency, 311

Column quantizations, 8283

comm variable in bpftrace, 778

Comma-separated values (CSV) format for sar, 165

Comments in bpftrace, 767

Common case optimization in applications, 174

Communication in multiprocess vs. multithreading, 228

Community applications, 172173

Comparing benchmarks, 648

Competition, benchmarking, 649

Compiled programming languages

optimizations, 183184

overview, 183

Compilers

CPU optimization, 229

options, 295

Completely fair queueing (CFQ), 115, 449

Completely fair scheduler (CFS), 116117

CPU scheduling, 241

CPU shares, 614615

description, 243

Completion target in workload analysis, 39

Complex benchmark tools, 646

Complex instruction set computers (CISCs), 224

Complexity, 5

Comprehension in flame graphs, 249

Compression

btrfs, 382

disks, 369

ZFS, 381

Compute kernel, 240

Compute Unified Device Architecture (CUDA), 240

Concurrency

applications, 177181

micro-benchmarking, 390, 456

CONFIG options, 295296

CONFIG_TASK_DELAY_ACCT option, 145

Configuration

applications, 172

case study, 786788

network options, 574

Congestion avoidance and control

Linux kernel, 115

networks, 508

TCP, 510, 513

tuning, 570

connect system calls, 95

Connections for networks, 509

backlogs, 507, 519520, 556557, 569

characteristics, 527528

firewalls, 517

latency, 7, 2425, 505506, 528

life span, 507

local, 509

monitoring, 529

NICs, 109

QUIC, 515

TCP queues, 519520

three-way handshakes, 511512

UDP, 514

Container network interface (CNI) software, 586

Containers

lightweight virtualization, 631632

orchestration, 586

observability, 617630

OS virtualization, 605630

resource controls, 52, 70, 613617, 626

Contention

locks, 198

models, 63

Context switches

defined, 90

kernels, 93

Contributors to system performance technologies, 811814

Control groups (cgroups). See cgroups

Control paths in hardware virtualization, 594

Control units in CPUs, 230

Controllers

caches, 430

disk, 426

mechanical disks, 439

micro-benchmarking, 457

network, 501502, 516

solid-state drives, 440441

tunable, 494495

USE method, 49, 451

Controls, resource. See Resource controls

Cookies, TCP, 511, 520

Copy-on-write (COW) file systems, 376

btrfs, 382

ZFS, 380

Copy-on-write (COW) process strategy, 100

CoreLink Interconnects, 236

Cores

CPUs vs. GPUs, 240

defined, 220

Corrupted file system data, 365

count function in bpftrace, 780

Counters, 89

fixed, 133135

hardware, 156158

CoV (coefficient of variation), 76

COW (copy-on-write) file systems, 376

btrfs, 382

ZFS, 380

COW (copy-on-write) process strategy, 100

CPCs (CPU performance counters), 156

CPI (cycles per instruction), 225

CPU affinity, 222

CPU-bound applications, 106

cpu control group, 610

CPU mode for applications, 172

CPU performance counters (CPCs), 156

CPU registers, perf-tools for, 746747

cpu variable in bpftrace, 777

cpuacct control group, 610

cpudist tool

BCC, 755

case study, 790791

threads, 278279

cpufreq tool, 285

cpuinfo tool, 142

cpupower tool, 286287

CPUs, 219220

architecture. See CPUs architecture

benchmark questions, 667668

binding, 181182

bpftrace for, 763, 803804

clock rate, 223

compiler optimization, 229

cross calls, 110

exercises, 299300

experiments, 293294

feedback-directed optimization, 122

flame graphs. See Flame graphs

FlameScope tool, 292293

garbage collection, 185

hardware virtualization, 589592, 596597

I/O wait, 434

instructions, defined, 220

instructions, IPC, 225

instructions, pipeline, 224

instructions, size, 224

instructions, steps, 223

instructions, width, 224

memory caches, 221222

memory tradeoffs with, 27

methodology. See CPUs methodology

models, 221222

multiprocess and multithreading, 227229

observability tools. See CPUs observability tools

OS virtualization, 611, 614, 627, 630

preemption, 227

priority inversion, 227

profiling. See CPUs profiling

references, 300302

run queues, 222

saturation, 226227

scaling in networks, 522523

schedulers, 105106

scheduling classes, 115

simultaneous multithreading, 225

statistic accuracy, 142143

subsecond-offset heat maps, 289

terminology, 220

thread pools, 178

tuning. See CPUs tuning

USE method, 4951, 795797

user time, 226

utilization, 226

utilization heat maps, 288289

virtualization support, 588

visualizations, 288293

volumes and pools, 383

word size, 229

CPUs architecture, 221, 229

accelerators, 240242

associativity, 234

caches, 230235

GPUs, 240241

hardware, 230241

idle threads, 244

interconnects, 235237

latency, 233234

memory management units, 235

NUMA grouping, 244

P-states and C-states, 231

PMCs, 237239

processors, 230

schedulers, 241242

scheduling classes, 242243

software, 241244

CPUs methodology

CPU binding, 253

cycle analysis, 251

micro-benchmarking, 253254

overview, 244245

performance monitoring, 251

priority tuning, 252253

profiling, 247250

resource controls, 253

sample processing, 247248

static performance tuning, 252

tools method, 245

USE, 245246

workload characterization, 246247

CPUs observability tools, 254255

bpftrace, 282285

cpudist, 278279

GPUs, 287

hardirqs, 282

miscellaneous, 285286

mpstat, 259

perf, 267276

pidstat, 262

pmcarch, 265266

profile, 277278

ps, 260261

ptime, 263264

runqlat, 279280

runqlen, 280281

sar, 260

showboost, 265

softirqs, 281282

time, 263264

tlbstat, 266267

top, 261262

turbostat, 264265

uptime, 255258

vmstat, 258

CPUs profiling

applications, 187189

benchmarking, 660661

perf, 200201

record, 695696

steps, 247250

system-wide, 268270

CPUs tuning

compiler options, 295

CPU binding, 297298

exclusive CPU sets, 298

overview, 294295

power states, 297

processor options, 299

resource controls, 298

scaling governors, 297

scheduler options, 295296

scheduling priority and class, 295

security boot options, 298299

Cpusets, 116

CPU binding, 253

exclusive, 298

cpusets control group, 610, 614, 627

cpuunclaimed tool, 755

Crash resilience, multiprocess vs. multithreading, 228

Credit-based schedulers, 595

Crisis tools, 131133

critical-chain command, 120

Critical paths in systemd service manager, 120

criticalstat tool, 756

CSMA/CD (carrier sense multiple access with collision detection) algorithm, 516

CSV (comma-separated values) format for sar, 165

CUBIC algorithm for TCP congestion control, 513

CUDA (Compute Unified Device Architecture), 240

CUMASK values in MSRs, 238239

current_tracer file, 710

curtask variable for bpftrace, 778

Custom benchmarks, 662

Custom load generators, 491

Cycle analysis

CPUs, 251

memory, 326

Cycles per instruction (CPI), 225

Cylinder groups in FFS, 378

D

Daily patterns, monitoring, 78

Data Center TCP (DCTCP) congestion control, 118, 513

Data deduplication in ZFS, 381

Data integrity in magnetic rotational disks, 438

Data paths in hardware virtualization, 594

Data Plane Development Kit (DPDK), 523

Data rate in throughput, 22

Databases

applications, 172

case studies, 792793

cloud computing, 582

Datagrams

OSI model, 502

UDP, 514

DAX (Direct Access), 118

dbslower tool, 756

dbstat tool, 756

Dcache (dentry cache), 375

dcsnoop tool, 409

dcstat tool, 409

DCTCP (Data Center TCP) congestion control, 118, 513

dd command

disks, 490491

file systems, 411412

DDR SDRAM (double data rate synchronous dynamic random-access memory), 313

Deadline I/O schedulers, 243, 448

DEADLINE scheduling policy, 243

DebugFS interface, 116

Decayed average, 75

Deflated disk I/O, 369

Defragmentation in XFS, 380

Degradation in scalability, 3132

Delay accounting

kernel, 116

off-CPU analysis, 197

overview, 145

Delayed ACKs algorithm, 513

Delayed allocation

ext4, 379

XFS, 380

delete function in bpftrace, 780

Demand paging

BSD kernel, 113

memory, 307308

Dentry caches (dcaches), 375

Dependencies in perf-tools, 748

Development, benchmarking for, 642

Development attribute, multiprocess vs. multithreading, 228

Devices

backlog tuning, 569

disk I/O caches, 430

drivers, 109110, 522

hardware virtualization, 588, 594, 597

devices control group, 610

df tool, 409

Dhrystone benchmark

CPUs, 254

simulations, 653

Diagnosis cycle, 46

diff subcommand for perf, 673

Differentiated Services Code Points (DSCPs), 509510

Direct Access (DAX), 118

Direct buses, 313

Direct I/O, 366

Direct mapped caches, 234

Direct measurement approach in thread state analysis, 197

Direct-reclaim memory method, 318319

Directories in file systems, 107

Directory indexes in ext3, 379

Directory name lookup cache (DNLC), 375

Dirty memory, 306

Disk commands, 424

Disk controllers

caches, 430

magnetic rotational disks, 439

tunable, 494495

USE method, 451

Disk I/O state in thread state analysis, 194197

Disk request time, 428

Disk response time, 428

Disk service time, 428429

Disk wait time, 428

Disks, 423424

architecture. See Disks architecture

exercises, 495496

experiments, 490493

I/O. See Disks I/O

IOPS, 432

latency analysis, 384386

methodology. See Disks methodology

models. See Disks models

non-data-transfer disk commands, 432

observability tools. See Disks observability tools

read/write ratio, 431

references, 496498

resource controls, 494

saturation, 434

terminology, 424

tunable, 494

tuning, 493495

USE method, 451

utilization, 433

visualizations, 487490

Disks architecture

interfaces, 442443

magnetic rotational disks, 435439

operating system disk I/O stack, 446449

persistent memory, 441

solid-state drives, 439441

storage types, 443446

Disks I/O

vs. application I/O, 435

bpftrace for, 764, 806807

caching, 430

errors, 483

heat maps, 488490

latency, 428430, 454455, 467472, 482483

operating system stacks, 446449

OS virtualization, 613, 616

OS virtualization strategy, 630

random vs. sequential, 430431

scatter plots, 488

simple disk, 425

size, 432, 480481

synchronous vs. asynchronous, 434435

time measurements, 427429

time scales, 429430

wait, 434

Disks methodology

cache tuning, 456

latency analysis, 454455

micro-benchmarking, 456457

overview, 449450

performance monitoring, 452

resource controls, 456

scaling, 457458

static performance tuning, 455456

tools method, 450

USE method, 450451

workload characterization, 452454

Disks models

caching disk, 425426

controllers, 426

simple disk, 425

Disks observability tools, 484486

biolatency, 468470

biosnoop, 470472

biostacks, 474475

biotop, 473474

blktrace, 475479

bpftrace, 479483

iostat, 459463

iotop, 472473

MegaCli, 484

miscellaneous, 487

overview, 458459

perf, 465468

pidstat, 464465

PSI, 464

sar, 463464

SCSI event logging, 486

diskstats tool, 142, 487

Dispatcher-queue latency, 222

Distributed operating systems, 123124

Distributed tracing, 199

Distributions

multimodal, 7677

normal, 75

dmesg tool

CPUs, 245

description, 15

memory, 348

OS virtualization, 619

dmidecode tool, 348349

DNLC (directory name lookup cache), 375

DNS latency, 2425

Docker 607, 620622

Documentation

application latency, 385

BCC, 760761

bpftrace, 781

Ftrace, 748749

kprobes, 153

perf, 276, 703

perf-tools, 748

PMCs, 158

sar, 165166

trace-cmd, 740

tracepoints, 150151

uprobes, 155

USDT, 156

Domains

scheduling, 244

Xen, 589

Double data rate synchronous dynamic random-access memory (DDR SDRAM), 313

Double-pumped data transfer for CPUs, 237

DPDK (Data Plane Development Kit), 523

DRAM (dynamic random-access memory), 311

Drill-down analysis

overview, 5556

slow disks case study, 17

Drivers

balloon, 597

device, 109110, 522

parameterized, 593595

drsnoop tool

BCC, 756

memory, 342

DSCPs (Differentiated Services Code Points), 509510

DTrace tool

description, 12

Solaris kernel, 114

Duplex for networks, 508

Duplicate ACK detection, 512

Duration in RED method, 53

DWARF (debugging with attributed record formats) stack walking, 216, 267, 676, 696

Dynamic instrumentation

kprobes, 151

latency analysis, 385

overview, 12

Dynamic priority in scheduling classes, 242243

Dynamic random-access memory (DRAM), 311

Dynamic sizing in cloud computing, 583584

Dynamic tracers, 12

Dynamic tracing

DTrace, 114

perf, 677678

tools, 12

Dynamic USDT, 156

DynTicks, 116

E

e2fsck tool, 418

Early Departure Time (EDT), 119, 524

eBPF. See Extended BPF

EBS (Elastic Block Store), 585

ECC (error-correcting code) for magnetic rotational disks, 438

ECN (Explicit Congestion Notification) field

IP, 508510

TCP, 513

tuning, 570

EDT (Early Departure Time), 119, 524

EFS (Elastic File System), 585

EKS (Elastic Kubernetes Service), 586

elasped variable in bpftrace, 777

Elastic Block Store (EBS), 585

Elastic File System (EFS), 585

Elastic Kubernetes Service (EKS), 586

Elevator seeking in magnetic rotational disks, 437438

ELF (Executable and Linking Format) binaries

description, 183

missing symbols in, 214

Embedded caches, 232

eMLC (enterprise multi-level cell) flash memory, 440

Encapsulation for networks, 504

END probes in bpftrace, 774

End-to-end network arguments, 507

Enterprise models, 62

Enterprise multi-level cell (eMLC) flash memory, 440

Environment

benchmarking, 647

processes, 101102

Ephemeral drives, 584

Ephemeral ports, 531

epoll system call, 115, 118

EPTs (extended page tables), 593

Erlang virtual machines, 185

Error-correcting code (ECC) for magnetic rotational disks, 438

Errors

applications, 193

benchmarking, 647

CPUs, 245246, 796, 798

disk controllers, 451

disk devices, 451

I/O, 483, 798

kernels, 798

memory, 324325, 796, 798

networks, 526527, 529, 796797

RED method, 53

storage, 797

task capacity, 799

USE method overview, 4748, 5153

user mutex, 799

Ethernet congestion avoidance, 508

ethtool tool, 132, 546547

Event-based concurrency, 178

Event-based tools, 133

Event-select MSRs, 238

Event sources for Wireshark, 559

Event tracing

disks, 454

file systems, 388

Ftrace, 707708

kprobes, 719720

methodologies, 5758

perf-tools for, 745746

trace-cmd for, 737

uprobes, 722723

Event worker threads, 178

Events

case study, 789790

CPUs, 273274

frequency sampling, 682683

observability source, 159

perf. See perf tool events

SCSI logging, 486

selecting, 274275

stat filters, 693694

synthetic, 731733

trace, 148

events directory in tracefs, 710

Eviction policies for caching, 36

evlist subcommand for perf, 673

Exceptions

synchronous interrupts, 97

user mode, 93

Exclusive CPU sets, 298

exec system calls

kernel, 94

processes, 100

execsnoop tool

BCC, 756

CPUs, 285

perf-tools, 743

process tracing, 207208

static instrumentation, 1112

tracing, 136

Executable and Linking Format (ELF) binaries

description, 183

missing symbols in, 214

Executable data in process virtual address space, 319

Executable text in process virtual address space, 319

Execution in kernels, 9293

execve system call, 11

exit function in bpftrace, 770, 779

Experimentation-based performance gains, 7374

Experiments

CPUs, 293294

disks, 490493

file systems, 411414

networks, 562567

observability, 7

overview, 1314

scientific method, 4546

Experts for applications, 173

Explicit Congestion Notification (ECN) field

IP, 508510

TCP, 513

tuning, 570

Explicit logical metadata in file systems, 368

Exporters for monitoring, 55, 79, 137

Express Data Path (XDP) technology

description, 118

event sources, 558

kernel bypass, 523

ext3 file system, 378379

ext4 file system

features, 379

tuning, 416418

ext4dist tool, 399401, 756

ext4slower tool, 401402, 756

Extended BPF, 12

BCC 751761

bpftrace 752753, 761781, 803808

description, 118

firewalls, 517

histograms, 744

kernel-mode applications, 92

overview, 121122

tracing tools, 166

Extended page tables (EPTs), 593

Extent-based file systems, 375376

Extents, 375376

btrfs, 382

ext4, 380

External caches, 232

F

FaaS (functions as a service), 634

FACK (forward acknowledgments) in TCP, 514

Factor analysis in capacity planning, 7172

Failures, benchmarking, 645651

Fair-share schedulers, 595

False sharing for hash tables, 181

Families of instance types, 581

Fast File System (FFS)

description, 113

overview, 377378

Fast open in TCP, 510

Fast recovery in TCP, 510

Fast retransmits in TCP, 510, 512

Fast user-space mutex (Futex), 115

Fastpath state in Mutex locks, 179

fatrace tool, 395396

Faults

in synchronous interrupts, 97

page faults. See page faults

faults tool, 348

FC (Fibre Channel) interface, 442443

fd tool, 141

Feedback-directed optimization (FDO), 122

ffaults tool, 348

FFS (Fast File System)

description, 113

overview, 377378

Fiber threads, 178

Fibre Channel (FC) interface, 442443

Field-programmable gate arrays (FPGAs), 240241

FIFO scheduling policy, 243

File descriptor capacity in USE method, 52

File offset pattern, micro-benchmarking for, 390

File stores in cloud computing, 584

File system internals, bpftrace for, 408

File systems

access timestamps, 371

ad hoc tools, 411412

architecture. See File systems architecture

bpftrace for, 764, 805806

caches. See File systems caches

capacity, OS virtualization, 616

capacity, performance issues, 371

exercises, 419420

experiments, 411414

hardware virtualization, 597

I/O, logical vs. physical, 368370

I/O, non-blocking, 366367

I/O, random vs. sequential, 363364

I/O, raw and direct, 366

I/O, stack, 107108

interfaces, 361

latency, 362363

memory-mapped files, 367

metadata, 367368

methodology. See File systems methodology

micro-benchmark tools, 412414

models, 361362

observability tools. See File systems observability tools

operations, 370371

OS virtualization, 611612

overview, 106107, 359360

paging, 306

prefetch, 364365

read-ahead, 365

reads, micro-benchmarking for, 61

record size tradeoffs, 27

references, 420421

special, 371

synchronous writes, 366

terminology, 360

tuning, 414419

types. See File systems types

visualizations, 410411

volumes and pools, 382383

File systems architecture

caches, 373375

features, 375377

I/O stacks, 107108, 372

VFS, 107, 373

File systems caches, 361363

defined, 360

flushing, 414

hit ratio, 17

OS virtualization, 616

OS virtualization strategy, 630

tuning, 389

usage, 309

write-back, 365

File systems methodology

cache tuning, 389

disk analysis, 384

latency analysis, 384386

micro-benchmarking, 390391

overview, 383384

performance monitoring, 388

static performance tuning, 389

workload characterization, 386388

workload separation, 389

File systems observability tools

bpftrace, 402408

cachestat, 399

ext4dist, 399401

ext4slower, 401402

fatrace, 395396

filetop, 398399

free, 392393

LatencyTOP, 396

miscellaneous, 409410

mount, 392

opensnoop, 397

overview, 391392

sar, 393394

slabtop, 394395

strace, 395

top, 393

vmstat, 393

File systems types

btrfs, 381382

ext3, 378379

ext4, 379

FFS, 377378

XFS, 379380

ZFS, 380381

FileBench tool, 414

filelife tool, 409, 756

fileslower tool, 409

filetop tool, 398399

filetype tool, 409

Filters

bpftrace, 769, 776

event, 693694

kprobes, 721722

PID, 729730

tracepoints, 717718

uprobes, 723

fio (Flexible IO Tester) tool

disks, 493

file systems, 413414

Firecracker project, 631

Firewalls, 503

misconfigured, 505

overview, 517

tuning, 574

First-byte latency, 506, 528

Five Whys in drill-down analysis, 56

Fixed counters, 133135

Flame graphs

automated, 201

characteristics, 290291

colors, 291

CPU profiling, 1011, 187188, 278, 660661

generating, 249, 270272

interactivity, 291

interpretation, 291292

malloc() bytes, 346

missing stacks, 215

off-CPU time, 190191, 205

overview, 289290

page faults, 340342, 346

perf, 119

performance wins, 250

profiles, 278

sample processing, 249250

scripts, 700

FlameScope tool, 292293, 700

Flash-memory-based SSDs, 439440

Flash translation layer (FTL) in solid-state drives, 440441

Flent (FLExible Network Tester) tool, 567

Flexible IO Tester (fio) tool

disks, 493

file systems, 413414

FLExible Network Tester (Flent) tool, 567

Floating point events in perf, 680

floating-point operations per second (FLOPS) in benchmarking, 655

Flow control in bpftrace, 775777

Flusher threads, 374

Flushing caches, 365, 414

fmapfault tool, 409

Footprints, off-CPU, 188189

fork system calls, 94, 100

forks.bt tool, 624625

Format string for tracepoints, 148149

Forward acknowledgments (FACK) in TCP, 514

4-wide processors, 224

FPGAs (field-programmable gate arrays), 240241

Fragmentation

FFS, 377

file systems, 364

memory, 321

packets, 505

reducing, 380

Frames

defined, 500

networks, 515

OSI model, 502

Free memory lists, 315318

free tool

description, 15

file systems, 392393

memory, 348

OS virtualization, 619

FreeBSD

jails, 606

jemalloc, 322

kernel, 113

TSA analysis, 217

network stack, 514

performance vs. Linux, 124

TCP LRO, 523

Freeing memory, 315318

Frequency sampling for hardware events, 682683

Front-ends in instruction pipeline, 224

Front-side buses, 235237

fsck time in ext4, 379

fsrwstat tool, 409

FTL (flash translation layer) in solid-state drives, 440441

ftrace subcommand for perf, 673

Ftrace, 13, 705706

capabilities overview, 706708

description, 166

documentation, 748749

function_graph, 724725

function profiler, 711712

function tracer, 713716

hist triggers, 727733

hwlat, 726

kprobes, 719722

options, 716

OS virtualization, 629

perf, 741

perf-tools, 741748

references, 749

trace-cmd, 734740

trace file, 713715

trace_pipe file, 715

tracefs, 708711

tracepoints, 717718

tracing, 136

uprobes, 722723

Full I/O distributions disk latency, 454

Full stack in systems performance, 1

Fully associative caches, 234

Fully-preemptible kernels, 110, 114

func variable in bpftrace, 778

funccount tool

BCC, 756758

example, 747

perf-tools, 744, 748

funcgraph tool

Ftrace, 706707

perf-tools, 744, 748

funclatency tool, 757

funcslower tool

BCC, 757

perf-tools, 744

function_graph tracer

description, 708

graph tracing, 724725

options, 725

trace-cmd for, 737, 739

function_profile_enabled file, 710

Function profiling

Ftrace, 707, 711712

observability source, 159

Function tracer. See Ftrace tool

Function tracing

profiling, 248

trace-cmd for, 736737

Functional block diagrams in USE method, 4950

Functional units in CPUs, 223

Functions as a service (FaaS), 634

Functions in bpftrace, 770, 778781

functrace tool, 744

Futex (fast user-space mutex), 115

futex system calls, 95

G

Garbage collection, 185186

gcc compiler

optimizations, 183184

PGO kernels, 122

gdb tool, 136

Generic segmentation offload (GSO) in networks, 520521

Generic system performance methodologies, 4041

Geometric mean, 74

getdelays.c tool, 286

gethostlatency tool, 561, 756

github.com tool package, 132

GKE (Google Kubernetes Engine), 586

glibc allocator, 322

Glossary of terms, 815823

Golang

goroutines, 178

syscalls, 92

Good/fast/cheap trade-offs, 2627

Google Kubernetes Engine (GKE), 586

Goroutines for applications, 178

gprof tool, 135

Grafana, 89, 138

Graph tracing, 724725

Graphics processing units (GPUs)

vs. CPUs, 240

tools, 287

GRO (Generic Receive Offload), 119

Growth

big O notation, 175

heap, 320

memory, 185, 316, 327

GSO (generic segmentation offload) in networks, 520521

Guests

hardware virtualization, 590593, 596605

lightweight virtualization, 632633

OS virtualization, 617, 627629

gVisor project, 631

H

Hard disk drives (HDDs), 435439

Hard interrupts, 282

hardirqs tool, 282, 756

Hardware

memory, 311315

networks, 515517

threads, 220

tracing, 276

Hardware-assisted virtualization, 590

Hardware counters. See Performance monitoring counters (PMCs)

Hardware events

CPUs, 273274

frequency sampling, 682683

perf, 680683

selecting, 274275

Hardware instances in cloud computing, 580

Hardware interrupts, 91

Hardware latency detector (hwlat), 708, 726

Hardware latency tracer, 118

Hardware probes, 774

Hardware RAID, 444

Hardware resources in capacity planning, 70

Hardware virtualization

comparisons, 634636

CPU support, 589592

I/O, 593595

implementation, 588589

memory mapping, 592593

multi-tenant contention, 595

observability, 597605

overhead, 589595

overview, 587588

resource controls, 595597

Harmonic mean, 74

Hash fields in hist triggers, 728

Hash tables in applications, 180181

HBAs (host bus adapters), 426

HDDs (hard disk drives), 435439

hdparm tool, 491492

Head-based sampling in distributed tracing, 199

Heads in magnetic rotational disks, 436

Heap

anonymous paging, 306

description, 304

growth, 320

process virtual address space, 319

Heat maps

CPU utilization, 288289

disk offset, 489490

disk utilization, 490

file systems, 410411

FlameScope, 292293

I/O latency, 488489

overview, 8283

subsecond-offset, 289

Hello, World! program, 770

hfaults tool, 348

hist function in bpftrace, 780

Hist triggers

fields, 728729

modifiers, 729

multiple keys, 730

perf-tools, 748

PID filters, 729730

single keys, 727728

stack trace keys, 730731

synthetic events, 731733

usage, 727

hist triggers profiler, 707

Histogram, 7677

Hits, cache, 3536, 361

Hold times for locks, 198

Holistic approach, 6

Horizontal pod autoscalers (HPAs), 73

Horizontal scaling and scalability

capacity planning, 72

cloud computing, 581582

Host bus adapters (HBAs), 426

Hosts

applications, 172

cloud computing, 580

hardware virtualization, 597603

lightweight virtualization, 632

OS virtualization, 617, 619627

Hot caches, 37

Hot/cold flame graphs, 191

Hourly patterns, monitoring, 78

HPAs (horizontal pod autoscalers), 73

HT (HyperTransport) for CPUs, 236

htop tool, 621

HTTP/3 protocol, 515

Hubs in networks, 516

Hue in flame graphs, 291

Huge pages, 115116, 314, 352353

hugetlb control group, 610

hwlat (hardware latency detector), 708, 726

Hybrid clouds, 580

Hybrid kernels, 92, 123

Hyper-Threading Technology, 225

Hyper-V, 589

Hypercalls in paravirtualization, 588

Hyperthreading-aware scheduling classes, 243

HyperTransport (HT) for CPUs, 236

Hypervisors

cloud computing, 580

hardware virtualization, 587588

kernels, 93

Hypothesis step in scientific method, 4445

I

I/O. See Input/output (I/O)

IaaS (infrastructure as a service), 580

Icicle graphs, 250

icstat tool, 409

IDDs (isolated driver domains), 596

Identification in drill-down analysis, 55

Idle memory, 315

Idle scheduling class, 243

IDLE scheduling policy, 243

Idle state in thread state analysis, 194, 196197

Idle threads, 99, 244

ieee80211scan tool, 561

If statements, 776

ifconfig tool, 537538

ifpps tool, 561

iftop tool, 562

Implicit disk I/O, 369

Implicit logical metadata, 368

Inactive pages in page caches, 318

Incast problem in networks, 524

Index nodes (inodes)

caches, 375

defined, 360

VFS, 373

Indirect disk I/O, 369

Individual synchronous writes, 366

Industry benchmarking, 6061

Industry standards for benchmarking, 654655

Inflated disk I/O, 369

Infrastructure as a service (IaaS), 580

init process, 100

Initial window in TCP, 514

inject subcommand for perf, 673

Inodes (index nodes)

caches, 375

defined, 360

VFS, 373

inotify framework, 116

inotify tool, 409

Input

event tracing, 58

solid-state drive controllers, 440

Input/output (I/O)

disks. See Disks I/O

file systems, 360

hardware virtualization, 593595, 597

I/O-bound applications, 106

latency, 424

logical vs. physical, 368370

merging, 448

multiqueue schedulers, 119

non-blocking, 181, 366367

OS virtualization, 611612, 616617

random vs. sequential, 363364

raw and direct, 366

request time, 427

schedulers, 448

scheduling, 115116

service time, 427

size, applications, 176

size, micro-benchmarking, 390

stacks, 107108, 372

USE method, 798

wait time, 427

Input/output operations per second. See IOPS (input/output operations per second)

Input/output profiling

bpftrace, 210212

perf, 202203

syscall analysis, 192

Installing

BCC, 754

bpftrace, 762

instances directory in tracefs, 710

Instances in cloud computing

description, 14

types, 580

Instruction pointer for threads, 100

Instructions, CPU

defined, 220

IPC, 225

pipeline, 224

size, 224

steps, 223

text, 304

width, 224

Instructions per cycle (IPC), 225, 251, 326

Integrated caches, 232

Intel Cache Allocation Technology (CAT), 118, 596

Intel Clear Containers, 631

Intel processor cache sizes, 230231

Intel VTune Amplifier XE tool, 135

Intelligent Platform Management Interface (IPMI), 9899

Intelligent prefetch in ZFS, 381

Inter-processor interrupts (IPIs), 110

Inter-stack latency in networks, 529

Interactivity in flame graphs, 291

Interconnects

buses, 313

CPUs, 235237

USE method, 4951

Interfaces

defined, 500

device drivers, 109110

disks, 442443

file systems, 361

kprobes, 153

network, 109, 501

network hardware, 515516

network IOPS, 527529

network negotiation, 508

PMCs, 157158

scheduling in NAPI, 522

tracepoints, 149150

uprobes, 154155

Interleaving in FFS, 378

Internet Protocol (IP)

congestion avoidance, 508

overview, 509510

sockets, 509

Interpretation of flame graphs, 291292

Interpreted programming languages, 184185

Interrupt coalescing mode for networks, 522

Interrupt-disabled mode, 98

Interrupt service requests (IRQs), 9697

Interrupt service routines (ISRs), 96

Interrupts

asynchronous, 9697

defined, 91

hardware, 282

masking, 9899

network latency, 529

overview, 96

soft, 281282

synchronous, 97

threads, 9798

interrupts tool, 142

interval probes in bpftrace, 774

Interval statistics, stat for, 693

IO accounting, 116

io_submit command, 181

io_uring_enter command, 181

io_uring interface, 119

ioctl system calls, 95

iolatency tool, 743

ionice tool, 493494

ioping tool, 492

ioprofile tool, 409

IOPS (input/output operations per second)

defined, 22

description, 7

disks, 429, 431432

networks, 527529

performance metric, 32

resource analysis, 38

iosched tool, 487

iosnoop tool, 743

iostat tool

bonnie++ tool, 658

default output, 459460

description, 15

disks, 450, 459463

extended output, 460463

fixed counters, 134

memory, 348

options, 460

OS virtualization, 619, 627

percent busy metric, 33

slow disks case study, 17

iotop tool, 450, 472473

IP (Internet Protocol)

congestion avoidance, 508

overview, 509510

sockets, 509

ip tool, 525, 536537

ipc control group, 608

IPC (instructions per cycle), 225, 251, 326

ipecn tool, 561

iperf tool

example, 1314

network micro-benchmarking, 10

network throughput, 564565

IPIs (inter-processor interrupts), 110

IPMI (Intelligent Platform Management Interface), 9899

iproute2 tool package, 132

IRQs (interrupt service requests), 9697

irqsoff tracer, 708

iscpu tool, 285

Isolated driver domains (IDDs), 596

Isolation in OS virtualization, 629

ISRs (interrupt service routines), 96

istopo tool, 286

J

Jails in BSD kernel, 113, 606

Java

analysis, 29

case study, 783792

flame graphs, 201, 271

dynamic USDT, 156, 213

garbage colleciton, 185186

Java Flight Recorder, 135

stack traces, 215

symbols, 214

uprobes, 213

USDT probes, 155, 213

virtual machines, 185

Java Flight Recorder (JFR), 135

JavaScript Object Notation (JSON) format, 163164

JBOD (just a bunch of disks), 443

jemalloc allocator, 322

JFR (Java Flight Recorder), 135

JIT (just-in-time) compilation

Linux kernel, 117

PGO kernels, 122

runtime missing symbols, 214

Jitter in operating systems, 99

jmaps tool, 214

join function, 778

Journaling

btrfs, 382

ext3, 378379

file systems, 376

XFS, 380

JSON (JavaScript Object Notation) format, 163164

Jumbo frames

packets, 505

tuning, 574

Just a bunch of disks (JBOD), 443

Just-in-time (JIT) compilation

Linux kernel, 117

PGO kernels, 122

runtime missing symbols, 214

K

kaddr function, 779

Kata Containers, 631

KCM (Kernel Connection Multiplexor), 118

Keep-alive strategy in networks, 507

Kendall’s notation for queueing systems, 6768

Kernel-based Virtual Machine (KVM) technology

CPU quotas, 595

description, 589

I/O path, 594

Linux kernel, 116

observability, 600603

Kernel bypass for networks, 523

Kernel Connection Multiplexor (KCM), 118

Kernel mode, 93

Kernel page table isolation (KPTI) patches, 121

Kernel space, 90

Kernel state in thread state analysis, 194197

Kernel statistics (Kstat) framework, 159160

Kernel time

CPUs, 226

syscall analysis, 192

Kernels

bpftrace for, 765

BSD, 113

comparisons, 124

defined, 90

developments, 115120

execution, 9293

file systems, 107

filtering in OS virtualization, 629

Linux, 114122, 124

microkernels, 123

monolithic, 123

overview, 9192

PGO, 122

PMU events, 680

preemption, 110

schedulers, 105106

Solaris, 114

stacks, 103

system calls, 9495

time analysis, 202

unikernels, 123

Unix, 112

USE method, 798

user modes, 9394

versions, 111112

KernelShark software, 8384, 739740

kfunc probes, 774

killsnoop tool

BCC, 756

perf-tools, 743

klockstat tool, 756

kmem subcommand for perf, 673, 702

Knee points

models, 6264

scalability, 31

Known-knowns, 37

Known-unknowns, 37

kprobe_events file, 710

kprobe probes, 774

kprobe profiler, 707

kprobe tool, 744

kprobes, 685686

arguments, 686687, 720721

event tracing, 719720

filters, 721722

overview, 151153

profiling, 722

return values, 721

triggers, 721722

kprobes tracer, 708

KPTI (kernel page table isolation) patches, 121

kretfunc probes, 774

kretprobes, 152153, 774

kstack function in bpftrace, 779

kstack variable in bpftrace, 778

Kstat (kernel statistics) framework, 159160

kswapd tool, 318319, 374

ksym function, 779

kubectl command, 621

Kubernetes

node, 608

orchestration, 586

OS virtualization, 620621

KVM. See Kernel-based Virtual Machine (KVM) technology

kvm_entry tool, 602

kvm_exit tool, 602

kvm subcommand for perf, 673, 702

kvm_vcpu_halt command, 592

kvmexits.bt tool, 602603

Kyber multi-queue schedulers, 449

L

L2ARC cache in ZFS, 381

Label selectors in cloud computing, 586

Language virtual machines, 185

Large Receive Offload (LRO), 116

Large segment offload for packet size, 505

Last-level caches (LLCs), 232

Latency

analysis methodologies, 5657

applications, 173

biolatency, 468470

CPUs, 233234

defined, 22

disk I/O, 428430, 454455, 467472, 482483

distributions, 7677

file systems, 362363, 384386, 388

graph tracing, 724725

hardware, 118

hardware virtualization, 604

heat maps, 8283, 488489

I/O profiling, 210211

interrupts, 98

line charts, 8081

memory, 311, 441

methodologies, 2425

networks, analysis, 528529

networks, connections, 7, 2425, 505506, 528

networks, defined, 500

networks, types, 505507

outliers, 58, 186, 424, 471472

overview, 67

packets, 532533

percentiles, 413414

perf, 467468

performance metric, 32

run-queue, 222

scatter plots, 8182, 488

scheduler, 226, 272273

solid-state drives, 441

ticks, 99

transaction costs analysis, 385386

VFS, 406408

workload analysis, 3940

LatencyTOP tool for file systems, 396

latencytop tool for operating systems, 116

Lazy shootdowns, 367

LBR (last branch record), 216, 676, 696

Leak detection for memory, 326327

Least frequently used (LFU) caching algorithm, 36

Least recently used (LRU) caching algorithm, 36

Level 1 caches

data, 232

instructions, 232

memory, 314

Level 2 ARC, 381

Level 2 caches

embedded, 232

memory, 314

Level 3 caches

LLC, 232

memory, 314

Level of appropriateness in methodologies, 2829

LFU (least frequently used) caching algorithm, 36

lhist function, 780

libpcap library as observability source, 159

Life cycle for processes, 100101

Life span

network connections, 507

solid-state drives, 441

Lightweight threads, 178

Lightweight virtualization

comparisons, 634636

implementation, 631632

observability, 632633

overhead, 632

overview, 630

resource controls, 632

Limit investigations, benchmarking for, 642

Limitations of averages, 75

Limits for OS virtualization resources, 613

limits tool, 141

Line charts

baseline statistics, 59

disks, 487488

working with, 8081

Linear scalability

methodologies, 32

models, 63

Link aggregation tuning, 574

Link-time optimization (LTO), 122

Linux 60-second analysis, 1516

Linux operating system

crisis tools, 131133

extended BPF, 121122

kernel developments, 115120

KPTI patches, 121

network stacks, 518519

observability sources, 138146

observability tools, 130

operating system disk I/O stack, 447448

overview, 114115

static performance tools, 130131

systemd service manager, 120

thread state analysis, 195197

linux-tools-common linux-tools tool package, 132

list subcommand

perf, 673

trace-cmd, 735

Listen backlogs in networks, 519

listen subcommand in trace-cmd, 735

Listing events

perf, 674675

trace-cmd for, 736

Little’s Law, 66

Live reporting in sar, 165

LLCs (last-level caches), 232

llcstat tool

BCC, 756

CPUs, 285

Load averages for uptime, 255257

Load balancers

capacity planning, 72

schedulers, 241

Load generation

capacity planning, 70

custom load generators, 491

micro-benchmarking, 61

Load vs. architecture in methodologies, 3031

loadavg tool, 142

Local memory, 312

Local network connections, 509

Localhost network connections, 509

Lock state in thread state analysis, 194197

lock subcommand for perf, 673, 702

Locks

analysis, 198

applications, 179181

tracing, 212213

Logging

applications, 172

SCSI events, 486

ZFS, 381

Logical CPUs

defined, 220

hardware threads, 221

Logical I/O

defined, 360

vs. physical, 368370

Logical metadata in file systems, 368

Logical operations in file systems, 361

Longest-latency caches, 232

Loopbacks in networks, 509

Loops in bpftrace, 776777

LRO (Large Receive Offload), 116

LRU (least recently used) caching algorithm, 36

lsof tool, 561

LTO (link-time optimization), 122

LTTng tool, 166

M

M/D/1 queueing systems, 6869

M/G/1 queueing systems, 68

M/M/1 queueing systems, 68

M/M/c queueing systems, 68

Macro-benchmarks, 13, 653654

MADV_COLD option, 119

MADV_PAGEOUT option, 119

madvise system call, 367, 415416

Magnetic rotational disks, 435439

Main memory

caching, 3739

defined, 90, 304

latency, 26

managing, 104105

overview, 311312

malloc() bytes flame graphs, 346

Map functions in bpftrace, 771772, 780781

Map variables in bpftrace, 771

Mapping memory. See Memory mappings

maps tool, 141

Marketing, benchmarking for, 642

Markov model, 654

Markovian arrivals in queueing systems, 6869

Masking interrupts, 9899

max function in bpftrace, 780

Maximum controller operation rate, 457

Maximum controller throughput, 457

Maximum disk operation rate, 457

Maximum disk random reads, 457

Maximum disk throughput

magnetic rotational disks, 436437

micro-benchmarking, 457

Maximum transmission unit (MTU) size for packets, 504505

MCS locks, 117

mdflush tool, 487

Mean, 74

"A Measure of Transaction Processing Power," 655

Measuring disk time, 427429

Medians, 75

MegaCli tool, 484

Melo, Arnaldo Carvalho de, 671

Meltdown vulnerability, 121

mem subcommand for perf, 673

meminfo tool, 142

memleak tool

BCC, 756

memory, 348

Memory, 303304

allocators, 309, 353

architecture. See Memory architecture

benchmark questions, 667668

bpftrace for, 763764, 804805

BSD kernel, 113

CPU caches, 221222

CPU tradeoffs with, 27

demand paging, 307308

exercises, 354355

file system cache usage, 309

garbage collection, 185

hardware virtualization, 596597

internals, 346347

mappings. See Memory mappings

methodology. See Memory methodology

multiple page sizes, 352353

multiprocess vs. multithreading, 228

NUMA binding, 353

observability tools. See Memory observability tools

OS virtualization, 611, 613, 615616

OS virtualization strategy, 630

overcommit, 308

overprovisioning in solid-state drives, 441

paging, 306307

persistent, 441

process swapping, 308309

references, 355357

resource controls, 353354

shared, 310

shrinking method, 328

terminology, 304

tuning, 350354

USE method, 4951, 796798

utilization and saturation, 309

virtual, 90, 104105, 304305

word size, 310

working set size, 310

Memory architecture, 311

buses, 312313

CPU caches, 314

freeing memory, 315318

hardware, 311315

latency, 311

main memory, 311312

MMU, 314

process virtual address space, 319322

software, 315322

TLB, 314

memory control group, 610, 616

Memory locality, 222

Memory management units (MMUs), 235, 314

Memory mappings

displaying, 337338

files, 367

hardware virtualization, 592593

heap growth, 320

kernel, 94

micro-benchmarking, 390

OS virtualization, 611

Memory methodology

cycle analysis, 326

leak detection, 326327

memory shrinking, 328

micro-benchmarking, 328

overview, 323

performance monitoring, 326

resource controls, 328

static performance tuning, 327328

tools method, 323324

usage characterization, 325326

USE method, 324325

Memory observability tools

bpftrace, 343347

drsnoop, 342

miscellaneous, 347350

numastat, 334335

overview, 328329

perf, 338342

pmap, 337338

ps, 335336

PSI, 330331

sar, 331333

slabtop, 333334

swapon, 331

top, 336337

vmstat, 329330

wss, 342343

Memory reclaim state in delay accounting, 145

Metadata

ext3, 378

file systems, 367368

Method R, 57

Methodologies, 2122

ad hoc checklist method, 4344

anti-methods, 4243

applications. See Applications methodology

baseline statistics, 59

benchmarking. See Benchmarking methodology

cache tuning, 60

caching, 3537

capacity planning, 6973

CPUs. See CPUs methodology

diagnosis cycle, 46

disks. See Disks methodology

drill-down analysis, 5556

event tracing, 5758

exercises, 8586

file systems. See File systems methodology

general, 4041

known-unknowns, 37

latency analysis, 5657

latency overview, 2425

level of appropriateness, 2829

Linux 60-second analysis checklist, 1516

load vs. architecture, 3031

memory. See Memory methodology

Method R, 57

metrics, 3233

micro-benchmarking, 6061

modeling. See Methodologies modeling

models, 2324

monitoring, 7779

networks. See Networks methodology

performance, 4142

performance mantras, 61

perspectives, 3740

point-in-time recommendations, 2930

problem statement, 44

profiling, 35

RED method, 53

references, 8687

resource analysis, 3839

saturation, 3435

scalability, 3132

scientific method, 4446

static performance tuning, 5960

statistics, 7377

stop indicators, 29

terminology, 2223

time scales, 2526

tools method, 46

trade-offs, 2627

tuning efforts, 2728

USE method, 4753

utilization, 3334

visualizations. See Methodologies visualizations

workload analysis, 3940

workload characterization, 54

Methodologies modeling, 62

Amdahl’s Law of Scalability, 6465

enterprise vs. cloud, 62

queueing theory, 6669

Universal Scalability Law, 6566

visual identification, 6264

Methodologies visualizations, 79

heat maps, 8283

line charts, 8081

scatter plots, 8182

surface plots, 8485

timeline charts, 8384

tools, 85

Metrics, 89

applications, 172

fixed counters, 133135

methodologies, 3233

observability tools, 167168

resource analysis, 38

USE method, 4851

workload analysis, 40

MFU (most frequently used) caching algorithm, 36

Micro-benchmarking

capacity planning, 70

CPUs, 253254

description, 13

design example, 652653

disks, 456457, 491492

file systems, 390391, 412414

memory, 328

methodologies, 6061

networks, 533

overview, 651652

Micro-operations (uOps), 224

Microcode ROM in CPUs, 230

Microkernels, 92, 123

Microservices

cloud computing, 583584

USE method, 53

Midpath state for Mutex locks, 179

Migration types for free lists, 317

min function in bpftrace, 780

MINIX operating system, 114

Minor faults, 307

MIPS (millions of instructions per second) in benchmarking, 655

Misleading benchmarks, 650

Missing stacks, 215216

Missing symbols, 214

Mixed-mode CPU profiles, 187

Mixed-mode flame graphs, 187

MLC (multi-level cell) flash memory, 440

mmap sys call

description, 95

memory mapping, 320, 367

mmapfiles tool, 409

mmapsnoop tool, 348

mmiotrace tracer, 708

MMUs (memory management units), 235, 314

mnt control group, 609

Mode switches

defined, 90

kernels, 93

Model-specific registers (MSRs)

CPUs, 238

observability source, 159

Models

Amdahl’s Law of Scalability, 6465

CPUs, 221222

disks, 425426

enterprise vs. cloud, 62

file systems, 361362

methodologies, 2324

networks, 501502

overview, 62

queueing theory, 6669

Universal Scalability Law, 6566

visual identification, 6264

wireframe, 8485

Modular I/O scheduling, 116

Monitoring, 7779

CPUs, 251

disks, 452

drill-down analysis, 55

file systems, 388

memory, 326

networks, 529, 537

observability tools, 137138

products, 79

sar, 161162

summary-since-boot values, 79

time-based patterns, 7778

Monolithic kernels, 91, 123

Most frequently used (MFU) caching algorithm, 36

Most recently used (MRU) caching algorithm, 36

Mount points in file systems, 106

mount tool

file systems, 392

options, 416417

Mounting file systems, 106, 392

mountsnoop tool, 409

mpstat tool

case study, 785786

CPUs, 245, 259

description, 15

fixed counters, 134

lightweight virtualization, 633

OS virtualization, 619

mq-deadline multi-queue schedulers, 449

MR-IOV (multiroot I/O virtualization), 593594

MRU (most recently used) caching algorithm, 36

MSG_ZEROCOPY flag, 119

msr-tools tool package, 132

MSRs (model-specific registers)

CPUs, 238

observability source, 159

mtr tool, 567

Multi-level cell (MLC) flash memory, 440

Multi-queue schedulers

description, 119

operating system disk I/O stack, 449

Multiblock allocators in ext4, 379

Multicalls in paravirtualization, 588

Multicast network transmissions, 503

Multichannel memory buses, 313

Multics (Multiplexed Information and Computer Services) operating system, 112

Multimodal distributions, 7677

MultiPath TCP, 119

Multiple causes as performance challenge, 6

Multiple page sizes, 352353

Multiple performance issues, 6

Multiple prefetch streams in ZFS, 381

Multiple-zone disk recording, 437

Multiplexed Information and Computer Services (Multics) operating system, 112

Multiprocess CPUs, 227229

Multiprocessors

applications, 177181

overview, 110

Solaris kernel support, 114

Multiqueue block I/O, 117

Multiqueue I/O schedulers, 119

Multiroot I/O virtualization (MR-IOV), 593594

Multitenancy in cloud computing, 580

contention in hardware virtualization, 595

contention in OS virtualization, 612613

overview, 585586

Multithreading

applications, 177181

CPUs, 227229

SMT, 225

Mutex (MUTually EXclusive) locks

applications, 179180

contention, 198

tracing, 212213

USE method, 52

MySQL database

bpftrace tracing, 212213

CPU flame graph, 187188

CPU profiling, 200, 203, 269270, 277, 283284, 697700

disk I/O tracing, 466467, 470471, 488

file tracing, 397398, 401402

memory allocation, 345

memory mappings, 337338

network tracing, 552554

Off–CPU analysis, 204205, 275276

Off–CPU Time flame graphs, 190192

page fault sampling, 339341

query latency analysis, 56

scheduler latency, 272, 279280

shards, 582

slow query log, 172

stack traces, 215

syscall tracing, 201202

working set size, 342

mysqld_qslower tool, 756

N

NAGLE algorithm for TCP congestion control, 513

Name resolution latency, 505, 528

Namespaces in OS virtualization, 606609, 620, 623624

NAPI (New API) framework, 522

NAS (network-attached storage), 446

Native Command Queueing (NCQ), 437

Native hypervisors, 587

Negative caching in Dcache, 375

Nested page tables (NPTs), 593

net control group, 609

net_cls control group, 610

Net I/O state in thread state analysis, 194197

net_prio control group, 610

net tool

description, 562

socket information, 142

Netfilter conntrack as observability source, 159

Netflix cloud performance team, 23

netlink observability tools, 145146, 536

netperf tool, 565566

netsize tool, 561

netstat tool, 525, 539542

nettxlat tool, 561

Network-attached storage (NAS), 446

Network interface cards (NICs)

description, 501502

network connections, 109

sent and received packets, 522

Networks, 499500

architecture. See Networks architecture

benchmark questions, 668

bpftrace for, 764765, 807808

buffers, 27, 507

congestion avoidance, 508

connection backlogs, 507

controllers, 501502

encapsulation, 504

exercises, 574575

experiments, 562567

hardware virtualization, 597

interface negotiation, 508

interfaces, 501

latency, 505507

local connections, 509

methodology. See Networks methodology

micro-benchmarking for, 61

models, 501502

observability tools. See Networks observability tools

on-chip interfaces, 230

operating systems, 109

OS virtualization, 611613, 617, 630

packet size, 504505

protocol stacks, 502

protocols, 504

references, 575578

round-trip time, 507, 528

routing, 503

sniffing, 159

stacks, 518519

terminology, 500

throughput, 527529

tuning. See Networks tuning

USE method, 4951, 796797

utilization, 508509

Networks architecture

hardware, 515517

protocols, 509515

software, 517524

Networks methodology

latency analysis, 528529

micro-benchmarking, 533

overview, 524525

packet sniffing, 530531

performance monitoring, 529

resource controls, 532533

static performance tuning, 531532

TCP analysis, 531

tools method, 525

USE method, 526527

workload characterization, 527528

Networks observability tools

bpftrace, 550558

ethtool, 546547

ifconfig, 537538

ip, 536537

miscellaneous, 560562

netstat, 539542

nicstat, 545546

nstat, 538539

overview, 533534

sar, 543545

ss, 534536

tcpdump, 558559

tcplife, 548

tcpretrans, 549550

tcptop, 549

Wireshark, 560

Networks tuning, 567

configuration, 574

socket options, 573

system-wide, 567572

New API (NAPI) framework, 522

New Vegas (NV) congestion control algorithm, 118

nfsdist tool

BCC, 756

file systems, 399

nfsslower tool, 756

nfsstat tool, 561

NFU (not frequently used) caching algorithm, 36

nice command

CPU priorities, 252

resource management, 111

scheduling priorities, 295

NICs (network interface cards)

description, 501502

network connections, 109

sent and received packets, 522

nicstat tool, 132, 525, 545546

"A Nine Year Study of File System and Storage Benchmarking," 643

Nitro hardware virtualization

description, 589

I/O path, 594595

NMIs (non-maskable interrupts), 98

NO_HZ_FULL option, 117

Node taints in cloud computing, 586

Node.js

dynamic USDT, 156

event-based concurrency, 178

non-blocking I/O, 181

symbols, 214

USDT tracing, 677, 690691

Nodes

cloud computing, 586

free lists, 317

main memory, 312

Noisy neighbors

multitenancy, 585

OS virtualization, 617

Non-blocking I/O

applications, 181

file systems, 366367

Non-data-transfer disk commands, 432

Non-idle time, 34

Non-maskable interrupts (NMIs), 98

Non-regression testing

benchmarking for, 642

software change case study, 18

Non-uniform memory access (NUMA)

CPUs, 244

main memory, 312

memory balancing, 117

memory binding, 353

multiprocessors, 110

Non-uniform random distributions, 413

Non-Volatile Memory express (NVMe) interface, 443

Noop I/O schedulers, 448

nop tracer, 708

Normal distribution, 75

NORMAL scheduling policy, 243

Not frequently used (NFU) caching algorithm, 36

NPTs (nested page tables), 593

nsecs variable in bpftrace, 777

nsenter command, 624

nstat tool, 134, 525, 538539

ntop function, 779

NUMA. See Non-uniform memory access (NUMA)

numactl command, 298, 353

numactl tool package, 132

numastat tool, 334335

Number of service centers in queueing systems, 67

NV (New Vegas) congestion control algorithm, 118

nvmelatency tool, 487

O

O in Big O notation, 175176

O(1) scheduling class, 243

Object stores in cloud computing, 584

Observability

allocators, 321

applications, 174

benchmarks, 643

counters, statistics, and metrics, 89

hardware virtualization, 597605

operating systems, 111

OS virtualization. See OS virtualization observability

overview, 78

profiling, 1011

RAID, 445

tracing, 1112

volumes and pools, 383

Observability tools, 129

applications. See Applications observability tools

coverage, 130

CPUs. See CPUs observability tools

crisis, 131133

disks. See Disks observability tools

evaluating results, 167168

exercises, 168

file system. See File systems observability tools

fixed counters, 133135

memory. See Memory observability tools

monitoring, 137138

network. See Networks observability tools

profiling, 135

references, 168169

sar, 160166

static performance, 130131

tracing, 136, 166

types, 133

Observability tools sources, 138140

delay accounting, 145

hardware counters, 156158

kprobes, 151153

miscellaneous, 159160

netlink, 145146

/proc file system, 140143

/sys file system, 143144

tracepoints, 146151

uprobes, 153155

USDT, 155156

Observation-based performance gains, 73

Observational tests in scientific method, 4445

Observer effect in metrics, 33

off-CPU

analysis process, 189192

footprints, 188189

thread state analysis, 197

time flame graphs, 205

offcputime tool

BCC, 756

description, 285

networks, 561

scheduler tracing, 190

slow disks case study, 17

stack traces, 204205

time flame graphs, 205

Offset heat maps, 289, 489490

offwaketime tool, 756

On-chip caches, 231

On-die caches, 231

On-disk caches, 425426, 430, 437

Online balancing, 382

Online defragmentation, 380

OOM killer (out-of-memory killer), 316317, 324

OOM (out of memory), defined, 304

oomkill tool

BCC, 756

description, 348

open command

description, 94

non-blocking I/O, 181

Open Container Interface, 586

openat syscalls, 404

opensnoop tool

BCC, 756

file systems, 397

perf-tools, 743

Operating systems, 89

additional reading, 127128

caching, 108109

clocks and idle, 99

defined, 90

device drivers, 109110

disk I/O stack, 446449

distributed, 123124

exercises, 124125

file systems, 106108

hybrid kernels, 123

interrupts, 9699

jitter, 99

kernels, 9195, 111114, 124

Linux. See Linux operating system

microkernels, 123

multiprocessors, 110

networking, 109

observability, 111

PGO kernels, 122

preemption, 110

processes, 99102

references, 125127

resource management, 110111

schedulers, 105106

stacks, 102103

system calls, 9495

terminology, 9091

tunables for disks, 493494

unikernels, 123

virtual memory, 104105

virtualization. See OS virtualization

Operation rate

defined, 22

file systems, 387388

Operations

applications, 172

defined, 360

file systems, 370371

micro-benchmarking, 390

Operators for bpftrace, 776777

OProfile system profiler, 115

oprofile tool, 285

Optimistic spinning in Mutex locks, 179

Optimizations

applications, 174

compiler, 183184, 229

feedback-directed, 122

networks, 524

Orchestration in cloud computing, 586

Ordered mode in ext3, 378

Orlov block allocator, 379

OS instances in cloud computing, 580

OS virtualization

comparisons, 634636

control groups, 609610

implementation, 607610

namespaces, 606609

overhead, 610613

overview, 605607

resource controls, 613617

OS virtualization observability

BPF tracing, 624625

containers, 620621

guests, 627629

hosts, 619627

namespaces, 623624

overview, 617618

resource controls, 626627

strategy, 629630

tracing tools, 629

traditional tools, 618619

OS X syscall tracing, 205

OS wait time for disks, 472

OSI model, 502

Out-of-memory killer (OOM killer), 316317, 324

Out of memory (OOM), defined, 304

Out-of-order packets, 529

Outliers

heat maps, 82

latency, 186, 424, 471472

normal distributions, 77

Output formats in sar, 163165

Output with solid-state drive controllers, 440

Overcommit strategy, 115

Overcommitted main memory, 305, 308

Overflow sampling

hardware events, 683

PMCs, 157158

Overhead

hardware virtualization, 589595

kprobes, 153

lightweight virtualization, 632

metrics, 33

multiprocess vs. multithreading, 228

OS virtualization, 610613

strace, 207

ticks, 99

tracepoints, 150

uprobes, 154155

volumes and pools, 383

Overlayfs file system, 118

Overprovisioning cloud computing, 583

override function, 779

Oversize arenas, 322

P

P-caches in CPUs, 230

P-states in CPUs, 231

Pacing in networks, 524

Packages, CPUs vs. GPUs, 240

Packets

defined, 500

latency, 532533

networks, 504

OSI model, 502

out-of-order, 529

size, 504505

sniffing, 530531

throttling, 522

Padding locks for hash tables, 181

Page caches

file systems, 374

memory, 315

Page faults

defined, 304

flame graphs, 340342, 346

sampling, 339340

Page-outs

daemons, 317

working with, 306

Page scanning, 318319, 323, 374

Page tables, 235

Paged virtual memory, 113

Pages

defined, 304

kernel, 115

sizes, 352353

Paging

anonymous, 305307

demand, 307308

file system, 306

memory, 104105

overview, 306

PAPI (performance application programming interface), 158

Parallelism in applications, 177181

Paravirtualization (PV), 588, 590

Paravirtualized I/O drivers, 593595

Parity in RAID, 445

Partitions in Hyper-V, 589

Passive benchmarking, 656657

Passive listening in three-way handshakes, 511

pathchar tool, 564

Pathologies in solid-state drives, 441

Patrol reads in RAID, 445

Pause frames in congestion avoidance, 508

pchar tool, 564

PCI pass-through in hardware virtualization, 593

PCP (Performance Co-Pilot), 138

PE (Portable Executable) format, 183

PEBS (precise event-based sampling), 158

Per-I/O latency values, 454

Per-interval I/O averages latency values, 454

Per-interval statistics with stat, 693

Per-process observability tools, 133

fixed counters, 134135

/proc file system, 140141

profiling, 135

tracing, 136

Percent busy metric, 33

Percentiles

description, 75

latency, 413414

perf c2c command, 118

perf_event control group, 610

perf-stat-hist tool, 744

perf tool, 13

case study, 789790

CPU flame graphs, 201

CPU one-liners, 267268

CPU profiling, 200201, 245, 268270

description, 116

disk block devices, 465467

disk I/O, 450, 467468

documentation, 276

events. See perf tool events

flame graphs, 119, 270272

hardware tracing, 276

hardware virtualization, 601602, 604

I/O profiling, 202203

kernel time analysis, 202

memory, 324

networks, 526, 562

one-liners for counting events, 675

one-liners for CPUs, 267268

one-liners for disks, 467

one-liners for dynamic tracing, 677678

one-liners for listing events, 674675

one-liners for memory, 338339

one-liners for profiling, 675676

one-liners for reporting, 678679

one-liners for static tracing, 676677

OS virtualization, 619, 629

overview, 671672

page fault flame graphs, 340342

page fault sampling, 339340

PMCs, 157, 273274

process profiling, 271272

profiling overview, 135

references, 703704

scheduler latency, 272273

software tracing, 275276

subcommands. See perf tool subcommands

syscall tracing, 201202

thread state analysis, 196

tools collection. See perf-tools collection

vs. trace-cmd, 738739

tracepoint events, 684685

tracepoints, 147, 149

tracing, 136, 166

perf tool events

hardware, 274275, 680683

kprobes, 685687

overview, 679681

software, 683684

uprobes, 687689

USDT probes, 690691

perf tool subcommands

documentation, 703

ftrace, 741

miscellaneous, 702703

overview, 672674

record, 694696

report, 696698

script, 698701

stat, 691694

trace, 701702

perf-tools collection

vs. BCC/BPF, 747748

coverage, 742

documentation, 748

example, 747

multi-purpose tools, 744745

one-liners, 745747

overview, 741742

single-purpose tools, 743744

perf-tools-unstable tool package, 132

Performance and performance monitoring

applications, 172

challenges, 56

cloud computing, 14, 586

CPUs, 251

disks, 452

file systems, 388

memory, 326

networks, 529

OS virtualization, 620

resource analysis investments, 38

Performance application programming interface (PAPI), 158

Performance Co-Pilot (PCP), 138

Performance engineers, 23

Performance instrumentation counters (PICs), 156

Performance Mantras

applications, 182

list of, 61

Performance monitoring counters (PMCs), 156

case study, 788789

challenges, 158

CPUs, 237239, 273274

cycle analysis, 251

documentation, 158

example, 156157

interface, 157158

memory, 326

Performance monitoring unit (PMU) events, 156, 680

perftrace tool, 136

Periods in OS virtualization, 615

Persistent memory, 441

Personalities in FileBench, 414

Perspectives

overview, 45

performance analysis, 3738

resource analysis, 3839

workload analysis, 3940

Perturbations

benchmarks, 648

FlameScope, 292293

system tests, 23

pfm-events, 681

PGO (profile-guided optimization) kernels, 122

Physical I/O

defined, 360

vs. logical, 368370

Physical metadata in file systems, 368

Physical operations in file systems, 361

Physical resources in USE method, 795798

PICs (performance instrumentation counters), 156

pid control group, 609

pid variable in bpftrace, 777

pids control group, 610

PIDs (process IDs)

filters, 729730

process environment, 101

pidstat tool

CPUs, 245, 262

description, 15

disks, 464465

OS virtualization, 619

thread state analysis, 196

Ping latency, 505506, 528

ping tool, 562563

Pipelines in ZFS, 381

pktgen tool, 567

Platters in magnetic rotational disks, 435436

Plugins for monitoring software, 137

pmap tool, 135, 337338

pmcarch tool

CPUs, 265266

memory, 348

PMCs. See Performance monitoring counters (PMCs)

pmheld tool, 212213

pmlock tool, 212

PMU (performance monitoring unit) events, 156, 680

Pods in cloud computing, 586

Point-in-time recommendations in methodologies, 2930

Policies for scheduling classes, 106, 242243

poll system call, 177

Polling applications, 177

Pooled storage

btrfs, 382

overview, 382383

ZFS, 380

Portability of benchmarks, 643

Portable Executable (PE) format, 183

Ports

ephemeral, 531

network, 501

posix_fadvise call, 415

Power states in processors, 297

Preallocation in ext4, 379

Precise event-based sampling (PEBS), 158

Prediction step in scientific method, 4445

Preemption

CPUs, 227

Linux kernel, 116

operating systems, 110

schedulers, 241

Solaris kernel, 114

preemptirsqoff tracer, 708

preemptoff tracer, 708

Prefetch caches, 230

Prefetch for file systems

overview, 364365

ZFS, 381

Presentability of benchmarks, 643

Pressure stall information (PSI)

CPUs, 257258

description, 119

disks, 464

memory, 323, 330331

pressure tool, 142

Price/performance ratio

applications, 173

benchmarking for, 643

print function, 780

printf function, 770, 778

Priority

CPUs, 227, 252253

OS virtualization resources, 613

schedulers, 105106

scheduling classes, 242243, 295

Priority inheritance scheme, 227

Priority inversion, 227

Priority pause frames in congestion avoidance, 508

Private clouds, 580

Privilege rings in kernels, 93

probe subcommand for perf, 673

probe variable in bpftrace, 778

Probes and probe events

bpftrace, 767768, 774775

kprobes, 685687

perf, 685

uprobes, 687689

USDT, 690691

wildcards, 768769

Problem statement

case study, 16, 783784

determining, 44

/proc file system observability tools, 140143

Process-context IDs (PCIDs), 119

Process IDs (PIDs)

filters, 729730

process environment, 101

Processes

accounting, 159

creating, 100

defined, 90

environment, 101102

life cycle, 100101

overview, 99100

profiling, 271272

schedulers, 105106

swapping, 104105, 308309

syscall analysis, 192

tracing, 207208

USE method, 52

virtual address space, 319322

Processors

binding, 181182

defined, 90, 220

power states, 297

tuning, 299

procps tool package, 131

Products, monitoring, 79

Profile-guided optimization (PGO) kernels, 122

profile probes, 774

profile tool

applications, 203204

BCC, 756

CPUs, 245, 277278

profiling, 135

trace-cmd, 735

Profilers

Ftrace, 707

perf-tools for, 745

Profiling

CPUs. See CPUs profiling

I/O, 203204, 210212

interpretation, 249250

kprobes, 722

methodologies, 35

observability tools, 135

overview, 1011

perf, 675676

uprobes, 723

Program counter threads, 100

Programming languages

bpftrace. See bpftrace tool programming

compiled, 183184

garbage collection, 185186

interpreted, 184185

overview, 182183

virtual machines, 185

Prometheus monitoring software, 138

Proofs of concept

benchmarking for, 642

testing, 3

Proportional set size (PSS) in shared memory, 310

Protection rings in kernels, 93

Protocols

HTTP/3, 515

IP, 509510

networks, 502, 504, 509515

QUIC, 515

TCP, 510514

UDP, 514

ps tool

CPUs, 260261

fixed counters, 134

memory, 335336

OS virtualization, 619

PSI. See Pressure stall information (PSI)

PSS (proportional set size) in shared memory, 310

Pterodactyl latency heat maps, 488489

ptime tool, 263264

ptrace tool, 159

Public clouds, 580

PV (paravirtualization), 588, 590

Q

qdisc-fq tool, 561

QEMU (Quick Emulator)

hardware virtualization, 589

lightweight virtualization, 631

qemu-system-x86 process, 600

QLC (quad-level cell) flash memory, 440

QoS (quality of service) for networks, 532533

QPI (Quick Path Interconnect), 236237

Qspinlocks, 117118

Quad-level cell (QLC) flash memory, 440

Quality of service (QoS) for networks, 532533

Quantifying issues, 6

Quantifying performance gains, 7374

Quarterly patterns, monitoring, 79

Question step in scientific method, 4445

Queued spinlocks, 117118

Queued time for disks, 472

Queueing disciplines

networks, 521

OS virtualization, 617

tuning, 571

Queues

I/O schedulers, 448449

interrupts, 98

overview, 2324

queueing theory, 6669

run. See Run queues

TCP connections, 519520

QUIC protocol, 515

Quick Emulator (QEMU)

hardware virtualization, 589

lightweight virtualization, 631

Quick Path Interconnect (QPI), 236237

Quotas in OS virtualization, 615

R

RACK (recent acknowledgments) in TCP, 514

RAID (redundant array of independent disks) architecture, 444445

Ramping load benchmarking, 662664

Random-access pattern in micro-benchmarking, 390

Random change anti-method, 4243

Random I/O

disk read example, 491492

disks, 430431, 436

latency profile, micro-benchmarking, 457

vs. sequential, 363364

Rate transitions in networks, 517

Raw hardware event descriptors, 680

Raw I/O, 366, 447

Raw tracepoints, 150

RCU (read-copy update), 115

RCU-walk (read-copy-update-walk) algorithm, 375

rdma control group, 610

Re-exec method in heap growth, 320

Read-ahead in file systems, 365

Read-copy update (RCU), 115

Read-copy-update-walk (RCU-walk) algorithm, 375

Read latency profile in micro-benchmarking, 457

Read-modify-write operation in RAID, 445

read syscalls

description, 94

tracing, 404405

Read/write ratio in disks, 431

readahead tool, 409

Reader/writer (RW) locks, 179

Real-time scheduling classes, 106, 253

Real-time systems, interrupt masking in, 98

Realism in benchmarks, 643

Reaping memory, 316, 318

Rebuilding volumes and pools, 383

Receive Flow Steering (RFS) in networks, 523

Receive Packet Steering (RPS) in networks, 523

Receive packets in NICs, 522

Receive Side Scaling (RSS) in networks, 522523

Recent acknowledgments (RACK) in TCP, 514

Reclaimed pages, 317

Record size, defined, 360

record subcommand for perf

CPU profiling, 695696

example, 672

options, 695

overview, 694695

software events, 683684

stack walking, 696

record subcommand for trace-cmd, 735

RED method, 53

Reduced instruction set computers (RISCs), 224

Redundant array of independent disks (RAID) architecture, 444445

reg function, 779

Regression testing, 18

Remote memory, 312

Reno algorithm for TCP congestion control, 513

Repeatability of benchmarks, 643

Replay benchmarking, 654

report subcommand for perf

example, 672

overview, 696697

STDIO, 697698

TUI interface, 697

report subcommand for trace-cmd, 735

Reporting

perf, 678679

sar, 163, 165

trace-cmd, 737

Request latency, 7

Request rate in RED method, 53

Request time in I/O, 427

Requests in workload analysis, 39

Resident memory, defined, 304

Resident set size (RSS), 308

Resilvering volumes and pools, 383

Resource analysis perspectives, 45, 3839

Resource controls

cloud computing, 586

CPUs, 253, 298

disks, 456, 494

hardware virtualization, 595597

lightweight virtualization, 632

memory, 328, 353354

networks, 532533

operating systems, 110111

OS virtualization, 613617, 626627

tuning, 571

USE method, 52

Resource isolation in cloud computing, 586

Resource limits in capacity planning, 7071

Resource lists in USE method, 49

Resource utilization in applications, 173

Resources in USE method, 47

Response time

defined, 22

disks, 452

latency, 24

restart subcommand in trace-cmd, 735

Results in event tracing, 58

Retention policy for caching, 36

Retransmits

latency, 528

TCP, 510, 512, 529

UDP, 514

Retrospectives, 4

Return values

kprobes, 721

kretprobes, 152

ukretprobes, 154

uprobes, 723

retval variable in bpftrace, 778

RFS (Receive Flow Steering) in networks, 523

Ring buffers

applications, 177

networks, 522

RISCs (reduced instruction set computers), 224

Robertson, Alastair 761

Roles, 23

Root level in file systems, 106

Rostedt, Steven, 705, 711, 734, 739740

Rotation time in magnetic rotational disks, 436

Round-trip time (RTT) in networks, 507, 528

Route tables, 537

Routers, 516517

Routing networks, 503

RPS (Receive Packet Steering) in networks, 523

RR scheduling policy, 243

RSS (Receive Side Scaling) in networks, 522523

RSS (resident set size), 308

RT scheduling class, 242243

RTT (round-trip time) in networks, 507, 528

Run queues

CPUs, 222

defined, 220

latency, 222

schedulers, 105, 241

Runnability of benchmarks, 643

Runnable state in thread state analysis, 194197

runqlat tool

CPUs, 279280

description, 756

runqlen tool

CPUs, 280281

description, 756

runqslower tool

CPUs, 285

description, 756

RW (reader/writer) locks, 179

S

S3 (Simple Storage Service), 585

SaaS (software as a service), 634

SACK (selective acknowledgment) algorithm, 514

SACKs (selective acknowledgments), 510

Sampling

CPU profiling, 35, 135, 187, 200201, 247248

distributed tracing, 199

off-CPU analysis, 189190

page faults, 339340

PMCs, 157158

run queues, 242243

Sanity checks in benchmarking, 664665

sar (system activity reporter)

configuration, 162

coverage, 161

CPUs, 260

description, 15

disks, 463464

documentation, 165166

file systems, 393394

fixed counters, 134

live reporting, 165

memory, 331333

monitoring, 137, 161165

networks, 543545

options, 801802

OS virtualization, 619

output formats, 163165

overview, 160

reporting, 163

thread state analysis, 196

SAS (Serial Attached SCSI) disk interface, 442

SATA (Serial ATA) disk interface, 442

Saturation

applications, 193

CPUs, 226227, 245246, 251, 795, 797

defined, 22

disk controllers, 451

disk devices, 434, 451

flame graphs, 291

I/O, 798

kernels, 798

memory, 309, 324326, 796797

methodologies, 3435

networks, 526527, 796797

resource analysis, 38

storage, 797

task capacity, 799

USE method, 4748, 5153

user mutex, 799

Saturation points in scalability, 31

Scalability and scaling

Amdahl’s Law of Scalability, 6465

capacity planning, 7273

cloud computing, 581584

CPU, 522523

CPUs vs. GPUs, 240

disks, 457458

methodologies, 3132

models, 6364

multithreading, 227

Universal Scalability Law, 6566

Scalability ceiling, 64

Scalable Vector Graphics (SVG) files, 164

Scaling governors, 297

Scanning pages, 318319, 323, 374

Scatter plots

disk I/O, 8182

I/O latency, 488

sched command, 141

SCHED_DEADLINE policy, 117

sched subcommand for perf, 272273, 673, 702

schedstat tool, 141142

Scheduler latency

CPUs, 226, 272273

delay accounting, 145

run queues, 222

Scheduler tracing off-CPU analysis, 189190

Schedulers

CPUs, 241242

defined, 220

hardware virtualization, 596597

kernel, 105106

multiqueue I/O, 119

options, 295296

OS disk I/O stack, 448449

scheduling internals, 284285

Scheduling classes

CPUs, 115, 242243

I/O, 115, 493

kernel, 106

priority, 295

Scheduling in Kubernetes, 586

Scientific method, 4446

Scratch variables in bpftrace, 770771

scread tool, 409

script subcommand

flame graphs, 700

overview, 698700

trace scripts, 700701

script subcommand for perf, 673

Scrubbing file systems, 376

SCSI (Small Computer System Interface)

disks, 442

event logging, 486

scsilatency tool, 487

scsiresult tool, 487

SDT events, 681

Second-level caches in file systems, 362

Sectors in disks

defined, 424

size, 437

zoning, 437

Security boot options, 298299

SEDA (staged event-driven architecture), 178

SEDF (simple earliest deadline first) schedulers, 595

Seek time in magnetic rotational disks, 436

seeksize tool, 487

seekwatcher tool, 487

Segments

defined, 304

OSI model, 502

process virtual address space, 319

segmentation offload, 520521

Selective acknowledgment (SACK) algorithm, 514

Selective acknowledgments (SACKs), 510

Self-Monitoring, Analysis and Reporting Technology (SMART) data, 485

self tool, 142

Semaphores for applications, 179

Send packets in NICs, 522

sendfile command, 181

Sequential I/O

disks, 430431, 436

vs. random, 363364

Serial ATA (SATA) disk interface, 442

Serial Attached SCSI (SAS) disk interface, 442

Server instances in cloud computing, 580

Service consoles in hardware virtualization, 589

Service thread pools for applications, 178

Service time

defined, 22

I/O, 427429

queueing systems, 6769

Set associative caches, 234

set_ftrace_filter file, 710

Shadow page tables, 593

Shadow statistics, 694

Shards

capacity planning, 73

cloud computing, 582

Shared memory, 310

Shared system buses, 312

Shares in OS virtualization, 614615, 626

Shell scripting, 184

Shingled Magnetic Recording (SMR) drives, 439

shmsnoop tool, 348

Short-lived processes, 12, 207208

Short-stroking in magnetic rotational disks, 437

showboost tool, 245, 265

signal function, 779

Signal tracing, 209210

Simple disk model, 425

Simple earliest deadline first (SEDF) schedulers, 595

Simple Network Management Protocol (SNMP), 55, 137

Simple Storage Service (S3), 585

Simulation benchmarking, 653654

Simultaneous multithreading (SMT), 220, 225

Single-level cell (SLC) flash memory, 440

Single root I/O virtualization (SR-IOV), 593

Site reliability engineers (SREs), 4

Size

blocks, 27, 360, 375, 378

cloud computing, 583584

disk I/O, 432, 480481

disk sectors, 437

free lists, 317

I/O, 176, 390

instruction, 224

multiple page, 352353

packets, 504505

virtual memory, 308

word, 229, 310

working set. See Working set size (WSS)

sizeof function, 779

skbdrop tool, 561

skblife tool, 561

Slab

allocator, 114

process virtual address space, 321322

slabinfo tool, 142

slabtop tool, 333334, 394395

SLC (single-level cell) flash memory, 440

Sleeping state in thread state analysis, 194197

Sliding windows in TCP, 510

SLOG log in ZFS, 381

Sloth disks, 438

Slow disks case study, 1618

Slow-start in TCP, 510

Slowpath state in Mutex locks, 179

SLUB allocator, 116, 322

Small Computer System Interface (SCSI)

disks, 442

event logging, 486

smaps tool, 141

SMART (Self-Monitoring, Analysis and Reporting Technology) data, 485

smartctl tool, 484486

SMP (symmetric multiprocessing), 110

smpcalls tool, 285

SMR (Shingled Magnetic Recording) drives, 439

SMs (streaming multiprocessors), 240

SMT (simultaneous multithreading), 220, 225

Snapshots

btrfs, 382

ZFS, 381

Sniffing packets, 530531

SNMP (Simple Network Management Protocol), 55, 137

SO_BUSY_POLL socket option, 522

SO_REUSEPORT socket option, 117

SO_TIMESTAMP socket option, 529

SO_TIMESTAMPING socket option, 529

so1stbyte tool, 561

soaccept tool, 561

socketio tool, 561

socketio.bt tool, 553554

Sockets

BSD, 113

defined, 500

description, 109

local connections, 509

options, 573

statistics, 534536

tracing, 552555

tuning, 569

socksize tool, 561

sockstat tool, 561

soconnect tool, 561

soconnlat tool, 561

sofamily tool, 561

Soft interrupts, 281282

softirqs tool, 281282

Software

memory, 315322

networks, 517524

Software as a service (SaaS), 634

Software change case study, 1819

Software events

case study, 789790

observability source, 159

perf, 680, 683684

recording and tracing, 275276

software probes, 774

Software resources

capacity planning, 70

USE method, 52, 798799

Solaris

kernel, 114

Kstat, 160

Slab allocator, 322, 652

syscall tracing, 205

top tool Solaris mode, 262

zones, 606, 620

Solid-state disks (SSDs)

cache devices, 117

overview, 439441

soprotocol tool, 561

sormem tool, 561

Source code for applications, 172

SPEC (Standard Performance Evaluation Corporation) benchmarks, 655656

Special file systems, 371

Speedup with latency, 7

Spin locks

applications, 179

contention, 198

queued, 118

splice call, 116

SPs (streaming processors), 240

SR-IOV (single root I/O virtualization), 593

SREs (site reliability engineers), 4

ss tool, 145146, 525, 534536

SSDs (solid-state disks)

cache devices, 117

overview, 439441

Stack helpers, 214

Stack traces

description, 102

displaying, 204205

keys, 730731

Stack walking, 102, 696

stackcount tool, 757758

Stacks

I/O, 107108, 372

JIT symbols, 214

missing, 215216

network, 109, 518519

operating system disk I/O, 446449

overview, 102

process virtual address space, 319

protocol, 502

reading, 102103

user and kernel, 103

Staged event-driven architecture (SEDA), 178

Stall cycles in CPUs, 223

Standard deviation, 75

Standard Performance Evaluation Corporation (SPEC) benchmarks, 655656

Starovoitov, Alexei, 121

start subcommand in trace-cmd, 735

Starvation in deadline I/O schedulers, 448

stat subcommand in perf

description, 635

event filters, 693694

interval statistics, 693

options, 692693

overview, 691692

per-CPU balance, 693

shadow statistics, 694

stat subcommand in trace-cmd, 735

stat tool, 95, 141142

Stateful workload simulation, 654

Stateless workload simulation, 653

Statelessness of UDP, 514

States

TCP, 511512

thread state analysis, 193197

Static instrumentation

overview, 1112

perf events, 681

tracepoints, 146, 717

Static performance tuning

applications methodology, 198199

CPUs, 252

disks, 455456

file systems, 389

memory, 327328

methodologies, 5960

networks, 531532

tools, 130131

Static priority of threads, 242243

Static probes, 116

Static tracing in perf, 676677

Statistical analysis in benchmarking, 665666

Statistics, 89

averages, 7475

baseline, 59

case study, 784786

coefficient of variation, 76

line charts, 8081

multimodal distributions, 7677

outliers, 77

quantifying performance gains, 7374

standard deviation, percentiles, and median, 75

statm tool, 141

stats function, 780

statsnoop tool, 409

status tool, 141

STDIO report option, 697698

stop subcommand in trace-cmd, 735

Storage

benchmark questions, 668

cloud computing, 584585

disks. See Disks

sample processing, 248249

USE method, 4951, 796797

Storage array caches, 430

Storage arrays, 446

str function, 770, 778

strace tool

bonnie++ tool, 660

file system latency, 395

format strings, 149150

limitations, 202

networks, 561

overhead, 207

system call tracing, 205207

tracing, 136

stream subcommand in trace-cmd, 735

Streaming multiprocessors (SMs), 240

Streaming processors (SPs), 240

Streaming workloads in disks, 430431

Streetlight effect, 42

Stress testing in software change case study, 18

Stripe width of volumes and pools, 383

Striped allocation in XFS, 380

Stripes in RAID, 444445

strncmp function, 778

Stub domains in hardware virtualization, 596

Subjectivity, 5

Subsecond-offset heat maps, 289

sum function in bpftrace, 780

Summary-since-boot values monitoring, 79

Super-serial model, 6566

Superblocks in VFS, 373

superping tool, 561

Superscalar architectures for CPUs, 224

Surface plots, 8485

SUT (system under test) models, 23

SVG (Scalable Vector Graphics) files, 164

Swap areas, defined, 304

Swap capacity in OS virtualization, 613, 616

swapin tool, 348

swapon tool

disks, 487

memory, 331

Swapping

defined, 304

memory, 316, 323

overview, 305307

processes, 104105, 308309

Swapping state

delay accounting, 145

thread state analysis, 194197

Switches in networks, 516517

Symbol churn, 214

Symbols, missing, 214

Symmetric multiprocessing (SMP), 110

SYN backlogs, 519

SYN cookies, 511, 520

Synchronization primitives for applications, 179

Synchronous disk I/O, 434435

Synchronous interrupts, 97

Synchronous writes, 366

syncsnoop tool

BCC, 756

file systems, 409

Synthetic events in hist triggers, 731733

/sys file system, 143144

/sys/fs options, 417418

SysBench system benchmark, 294

syscount tool

BCC, 756

CPUs, 285

file systems, 409

perf-tools, 744

system calls count, 208209

sysctl tool

congestion control, 570

network tuning, 567568

schedulers, 296

SCSI logging, 486

sysstat tool package, 131

System activity reporter. See sar (system activity reporter)

System calls

analysis, 192

connect latency, 528

counting, 208209

defined, 90

file system latency, 385

kernel, 92, 9495

micro-benchmarking for, 61

observability source, 159

send/receive latency, 528

tracing in bpftrace, 403405

tracing in perf, 201202

tracing in strace, 205207

System design, benchmarking for, 642

system function in bpftrace, 770, 779

System statistics, monitoring, 138

System under test (SUT) models, 23

System-wide CPU profiling, 268270

System-wide observability tools, 133

fixed counters, 134

/proc file system, 141142

profiling, 135

tracing, 136

System-wide tunable parameters

byte queue limits, 571

device backlog, 569

ECN, 570

networks, 567572

production example, 568

queueing disciplines, 571

resource controls, 571

sockets and TCP buffers, 569

TCP backlog, 569

TCP congestion control, 570

Tuned Project, 572

systemd-analyze command, 120

systemd service manager, 120

Systems performance overview, 12

activities, 34

cascading failures, 5

case studies, 1619

cloud computing, 14

complexity, 5

counters, statistics, and metrics, 89

experiments, 1314

latency, 67

methodologies, 1516

multiple performance issues, 6

observability, 713

performance challenges, 56

perspectives, 45

references, 1920

roles, 23

SystemTap tool, 166

T

Tagged Command Queueing (TCQ), 437

Tahoe algorithm for TCP congestion control, 513

Tail-based sampling in distributed tracing, 199

Tail Loss Probe (TLP), 117, 512

Task capacity in USE method, 799

task tool, 141

Tasklets with interrupts, 98

Tasks

defined, 90

idle, 99

taskset command, 297

tc tool, 566

tcdump tool, 136

TCMalloc allocator, 322

TCP. See Transmission Control Protocol (TCP)

TCP Fast Open (TFO), 117

TCP/IP stack

BSD, 113

kernels, 109

protocol, 502

stack bypassing, 509

TCP segmentation offload (TSO), 521

TCP Small Queues (TSQ), 524

TCP Tail Loss Probe (TLP), 117

TCP TIME_WAIT latency, 528

tcpaccept tool, 561

tcpconnect tool, 561

tcpdump tool

BPF for, 12

description, 526

event tracing, 5758

overview, 558559

packet sniffing, 530531

tcplife tool

BCC, 756

description, 525

overview, 548

tcpnagle tool, 561

tcpreplay tool, 567

tcpretrans tool

BCC, 756

overview, 549550

perf-tools, 743

tcpsynbl.bt tool, 556557

tcptop tool

BCC, 756

description, 526

top processes, 549

tcpwin tool, 561

TCQ (Tagged Command Queueing), 437

Temperature-aware scheduling classes, 243

Temperature sensors for CPUs, 230

Tenancy in cloud computing, 580

contention in hardware virtualization, 595

contention in OS virtualization, 612613

overview, 585586

Tensor processing units (TPUs), 241

Test errors in benchmarking, 646647

Text step in scientific method, 4445

Text user interface (TUI), 697

TFO (TCP Fast Open), 117

Theoretical maximum disk throughput, 436437

Thermal pressure in Linux kernel, 119

THP (transparent huge pages)

Linux kernel, 116

memory, 353

Thread blocks in GPUs, 240

Thread pools in USE method, 52

Thread state analysis, 193194

Linux, 195197

software change case study, 19

states, 194195

Threads

applications, 177181

CPU time, 278279

CPUs, 227229

CPUs vs. GPUs, 240

defined, 90

flusher, 374

hardware, 221

idle, 99, 244

interrupts, 9798

lightweight, 178

micro-benchmarking, 653

processes, 100

schedulers, 105106

SMT, 225

static priority, 242243

USE method, 52

3-wide processors, 224

3D NAND flash memory, 440

3D XPoint persistent memory, 441

Three-way handshakes in TCP, 511

Throttling

benchmarks, 661

hardware virtualization, 597

OS virtualization, 626

packets, 522

Throughput

applications, 173

defined, 22

disks, 424

file systems, 360

magnetic rotational disks, 436437

networks, defined, 500

networks, measuring, 527529

networks, monitoring, 529

performance metric, 32

resource analysis, 38

solid-state drives, 441

workload analysis, 40

Tickless kernels, 99, 117

Ticks, clock, 99

tid variable in bpftrace, 777

Time

averages over, 74

disk measurements, 427429

event tracing, 58

kernel analysis, 202

Time-based patterns in monitoring, 7778

Time-based utilization, 3334

time control group, 609

time function in bpftrace, 778

Time scales

disks, 429430

methodologies, 2526

Time-series metrics, 8

Time sharing for schedulers, 241

Time slices for schedulers, 242

Time to first byte (TTFB) in networks, 506

time tool for CPUs, 263264

TIME_WAIT latency, 528

TIME_WAIT state, 512

timechart subcommand for perf, 673

Timeline charts, 8384

Timer-based profile sampling, 247248

Timer-based retransmits, 512

Timerless multitasking, 117

Timers in TCP, 511512

Timestamps

CPU counters, 230

file systems, 371

TCP, 511

tiptop tool, 348

tiptop tool package, 132

TLBs. See Translation lookaside buffers (TLBs)

tlbstat tool

CPUs, 266267

memory, 348

TLC (tri-level cell) flash memory, 440

TLP (Tail Loss Probe), 117, 512

TLS (transport layer security), 113

Tools method

CPUs, 245

disks, 450

memory, 323324

networks, 525

overview, 46

Top-level directories, 107

Top of file system layer, file system latency in, 385

top subcommand for perf, 673

top tool

CPUs, 245, 261262

description, 15

file systems, 393

fixed counters, 135

hardware virtualization, 600

lightweight virtualization, 632633

memory, 324, 336337

OS virtualization, 619, 624

TPC (Transaction Processing Performance Council) benchmarks, 655

TPC-A benchmark, 650651

tpoint tool, 744

TPUs (tensor processing units), 241

trace-cmd front end, 132

documentation, 740

function_graph, 739

KernelShark, 739740

one-liners, 736737

overview, 734

vs. perf, 738739

subcommands overview, 734736

trace file, 710, 713715

trace_options file, 710

trace_pipe file, 710, 715

Trace scripts, 698, 700701

trace_stat directory, 710

trace subcommand for perf, 673, 701702

trace tool, 757758

tracefs file system, 149150

contents, 709711

overview, 708709

tracepoint probes, 774

Tracepoints

arguments and format string, 148149

description, 11

documentation, 150151

events in perf, 681, 684685

example, 147148

filters, 717718

interface, 149150

Linux kernel, 116

overhead, 150

overview, 146

triggers, 718

tracepoints tracer, 707

traceroute tool, 563564

Tracing

BPF, 1213

bpftrace. See bpftrace tool

case study, 790792

distributed, 199

dynamic instrumentation, 12

events. See Event tracing

Ftrace. See Ftrace tool

locks, 212213

observability tools, 136

OS virtualization, 620, 624625, 629

perf, 676678

perf-tools for, 745

schedulers, 189190

sockets, 552555

software, 275276

static instrumentation, 1112

strace, 136, 205207

tools, 166

trace-cmd. See trace-cmd front end

virtual file system, 405406

tracing_on file, 710

Trade-offs in methodologies, 2627

Traffic control utility in networks, 566

Transaction costs of latency, 385386

Transaction groups (TXGs) in ZFS, 381

Transaction Processing Performance Council (TPC) benchmarks, 655

Translation lookaside buffers (TLBs)

cache statistics, 266267

CPUs, 232

flushing, 121

memory, 314315

MMU, 235

shootdowns, 367

Translation storage buffers (TSBs), 235

Transmission Control Protocol (TCP)

analysis, 531

anti-bufferbloat, 117

autocorking, 117

backlog, tuning, 569

buffers, 520, 569

congestion algorithms, 115

congestion avoidance, 508

congestion control, 118, 513, 570

connection latency, 24, 506, 528

connection queues, 519520

connection rate, 527529

duplicate ACK detection, 512

features, 510511

first-byte latency, 528

friends, 509

initial window, 514

Large Receive Offload, 116

lockless listener, 118

New Vegas, 118

offload in packet size, 505

out-of-order packets, 529

retransmits, 117, 512, 528529

SACK, FACK, and RACK, 514

states and timers, 511512

three-way handshakes, 511

tracing in bpftrace, 555557

transfer time, 2425

Transmit Packet Steering (XPS) in networks, 523

Transparent huge pages (THP)

Linux kernel, 116

memory, 353

Transport, defined, 424

Transport layer security (TLS), 113

Traps

defined, 90

synchronous interrupts, 97

Tri-level cell (TLC) flash memory, 440

Triggers

hist. See Hist triggers

kprobes, 721722

tracepoints, 718

uprobes, 723

Troubleshooting, benchmarking for, 642

TSBs (translation storage buffers), 235

tshark tool, 559

TSO (TCP segmentation offload), 521

TSQ (TCP Small Queues), 524

TTFB (time to first byte) in networks, 506

TUI (text user interface), 697

Tunable parameters

disks, 494

memory, 350351

micro-benchmarking, 390

networks, 567

operating systems, 493495

point-in-time recommendations, 2930

tradeoffs with, 27

tune2fs tool, 416417

Tuned Project, 572

Tuning

benchmarking for, 642

caches, 60

CPUs. See CPUs tuning

disk caches, 456

disks, 493495

file system caches, 389

file systems, 414419

memory, 350354

methodologies, 2728

networks, 567574

static performance. See Static performance tuning

targets, 2728

turboboost tool, 245

turbostat tool, 264265

TXGs (transaction groups) in ZFS, 381

Type 1 hypervisors, 587

Type 2 hypervisors, 587

U

uaddr function, 779

Ubuntu Linux distribution

crisis tools, 131132

memory tunables, 350351

sar configuration, 162

scheduler options, 295296

UDP Generic Receive Offload (GRO), 119

UDP (User Datagram Protocol), 514

udpconnect tool, 561

UDS (Unix domain sockets), 509

uid variable in bpftrace, 777

UIDs (user IDs) for processes, 101

UIO (user space I/O) in kernel bypass, 523

ulimit command, 111

Ultra Path Interconnect (UPI), 236237

UMA (uniform memory access) memory system, 311312

UMA (universal memory allocator), 322

UMASK values in MSRs, 238239

Unicast network transmissions, 503

UNICS (UNiplexed Information and Computing Service), 112

Unified buffer caches, 374

Uniform memory access (UMA) memory system, 311312

Unikernels, 92, 123, 634

UNiplexed Information and Computing Service (UNICS), 112

Units of time for latency, 25

Universal memory allocator (UMA), 322

Universal Scalability Law (USL), 6566

Unix domain sockets (UDS), 509

Unix kernels, 112

UnixBench benchmarks, 254

Unknown-unknowns, 37

Unrelated disk I/O, 368

unroll function, 776

UPI (Ultra Path Interconnect), 236237

uprobe_events file, 710

uprobe profiler, 707

uprobe tool, 744

uprobes, 687688

arguments, 154, 688689, 723

bpftrace, 774

documentation, 155

event tracing, 722723

example, 154

filters, 723

Ftrace, 708

interface and overload, 154155

Linux kernel, 117

overview, 153

profiling, 723

return values, 723

triggers, 723

uptime tool

case study, 784785

CPUs, 245

description, 15

load averages, 255257

OS virtualization, 619

PSI, 257258

uretprobes, 154

usdt probes, 774

USDT (user-level static instrumentation events)

perf, 681

probes, 690691

USDT (user-level statically defined tracing), 11, 155156

USE method. See Utilization, saturation, and errors (USE) method

User address space in processes, 102

User allocation stacks, 345

user control group, 609

User Datagram Protocol (UDP), 514

User IDs (UIDs) for processes, 101

User land, 90

User-level static instrumentation events (USDT)

perf, 681

probes, 690691

User-level statically defined tracing (USDT), 11, 155156

User modes in kernels, 9394

User mutex in USE method, 799

User space, defined, 90

User space I/O (UIO) in kernel bypass, 523

User stacks, 103

User state in thread state analysis, 194197

User time in CPUs, 226

username variable in bpftrace, 777

USL (Universal Scalability Law), 6566

ustack function in bpftrace, 779

ustack variable in bpftrace, 778

usym function, 779

util-linux tool package, 131

Utilization

applications, 173, 193

CPUs, 226, 245246, 251, 795, 797

defined, 22

disk controllers, 451

disk devices, 451

disks, 433, 452

heat maps, 288289, 490

I/O, 798

kernels, 798

memory, 309, 324326, 796797

methodologies, 3334

networks, 508509, 526527, 796797

performance metric, 32

resource analysis, 38

storage, 796797

task capacity, 799

USE method, 4748, 5153

user mutex, 799

Utilization, saturation, and errors (USE) method

applications, 193

benchmarking, 661

CPUs, 245246

disks, 450451

functional block diagrams, 4950

memory, 324325

metrics, 4851

microservices, 53

networks, 526527

overview, 47

physical resources, 795798

procedure, 4748

references, 799

resource controls, 52

resource lists, 49

slow disks case study, 17

software resources, 52, 798799

uts control group, 609

V

V-NAND (vertical NAND) flash memory, 440

valgrind tool

CPUs, 286

memory, 348

Variable block sizes in file systems, 375

Variables in bpftrace, 770771, 777778

Variance

benchmarks, 647

description, 75

FlameScope, 292293

Variation, coefficient of, 76

vCPUs (virtual CPUs), 595

Verification of observability tool results, 167168

Versions

applications, 172

kernel, 111112

Vertical NAND (V-NAND) flash memory, 440

Vertical scaling

capacity planning, 72

cloud computing, 581

VFIO (virtual function I/O) drivers, 523

VFS. See Virtual file system (VFS)

VFS layer, file system latency analysis in, 385

vfs_read function in bpftrace, 772773

vfs_read tool in Ftrace, 706707

vfscount tool, 409

vfssize tool, 409

vfsstat tool, 409

Vibration in magnetic rotational disks, 438

Virtual CPUs (vCPUs), 595

Virtual disks

defined, 424

utilization, 433

Virtual file system (VFS)

defined, 360

description, 107

interface, 373

latency, 406408

Solaris kernel, 114

tracing, 405406

Virtual function I/O (VFIO) drivers, 523

Virtual machine managers (VMMs)

cloud computing, 580

hardware virtualization, 587605

Virtual machines (VMs)

cloud computing, 580

hardware virtualization, 587605

programming languages, 185

Virtual memory

BSD kernel, 113

defined, 90, 304

managing, 104105

overview, 305

size, 308

Virtual processors, 220

Virtual-to-guest physical translation, 593

Virtualization

hardware. See Hardware virtualization

OS. See OS virtualization

Visual identification of models, 6264

Visualizations, 79

blktrace, 479

CPUs, 288293

disks, 487490

file systems, 410411

flame graphs. See Flame graphs

heat maps. See Heat maps

line charts, 8081

scatter plots, 8182

surface plots, 8485

timeline charts, 8384

tools, 85

VMMs (virtual machine managers)

cloud computing, 580

hardware virtualization, 587588

VMs (virtual machines)

cloud computing, 580

hardware virtualization, 587588

programming languages, 185

vmscan tool, 348

vmstat tool, 8

CPUs, 245, 258

description, 15

disks, 487

file systems, 393

fixed counters, 134

hardware virtualization, 604

memory, 323, 329330

OS virtualization, 619

thread state analysis, 196

VMware ESX, 589

Volume managers, 360

Volumes

defined, 360

file systems, 382383

Voluntary kernel preemption, 110, 116

W

W-caches in CPUs, 230

Wait time

disks, 434

I/O, 427

off-CPU analysis, 191192

wakeup tracer, 708

wakeup_rt tracer, 708

wakeuptime tool, 756

Warm caches, 37

Warmth of caches, 37

watchpoint probes, 774

Waterfall charts, 8384

Wear leveling in solid-state drives, 441

Weekly patterns, monitoring, 79

Whetstone benchmark, 254, 653

Whys in drill-down analysis, 56

Width

flame graphs, 290291

instruction, 224

Wildcards for probes, 768769

Windows

DiskMon, 493

fibers, 178

hybrid kernel, 92

Hyper-V, 589

LTO and PGO, 122

microkernel, 123

portable executable format, 183

ProcMon, 207

syscall tracing, 205

TIME_WAIT, 512

word size, 310

Wireframe models, 8485

Wireshark tool, 560

Word size

CPUs, 229

memory, 310

Work queues with interrupts, 98

Working set size (WSS)

benchmarking, 664

memory, 310, 328, 342343

micro-benchmarking, 390391, 653

Workload analysis perspectives, 45, 3940

Workload characterization

benchmarking, 662

CPUs, 246247

disks, 452454

file systems, 386388

methodologies, 54

networks, 527528

workload analysis, 39

Workload separation in file systems, 389

Workloads, defined, 22

Write amplification in solid-state drives, 440

Write-back caches

file systems, 365

on-disk, 425

virtual disks, 433

write system calls, 94

Write-through caches, 425

Write type, micro-benchmarking for, 390

writeback tool, 409

Writes starving reads, 448

writesync tool, 409

wss tool, 342343

WSS (working set size)

benchmarking, 664

memory, 310, 328, 342343

micro-benchmarking, 390391, 653

X

XDP (Express Data Path) technology

description, 118

event sources, 558

kernel bypass, 523

Xen hardware virtualization

CPU usage, 595

description, 589

I/O path, 594

network performance, 597

observability, 599

xentop tool, 599

XFS file system, 379380

xfsdist tool

BCC, 756

file systems, 399

xfsslower tool, 757

XPS (Transmit Packet Steering) in networks, 523

Y

Yearly patterns, monitoring, 79

Z

zero function, 780

ZFS file system

features, 380381

options, 418419

pool statistics, 410

Solaris kernel, 114

zfsdist tool

BCC, 757

file systems, 399

zfsslower tool, 757

ZIO pipeline in ZFS, 381

zoneinfo tool, 142

Zones

free lists, 317

magnetic rotational disks, 437

OS virtualization, 606, 620

Solaris kernel, 114

zpool tool, 410

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.224.197