Appendix A

USE Method: Linux

This appendix contains a checklist for Linux derived from the USE method [Gregg 13d]. This is a method for checking system health, and identifying common resource bottlenecks and errors, introduced in Chapter 2, Methodologies, Section 2.5.9, The USE Method. Later chapters (5, 6, 7, 9, 10) described it in specific contexts and introduced tools to support its use.

Performance tools are often enhanced, and new ones are developed, so you should treat this as a starting point that will need updates. New observability frameworks and tools can also be developed to specifically make following the USE method easier.

Physical Resources






Per CPU: mpstat -P ALL 1, sum of CPU-consuming columns (%usr, %nice, %sys, %irq, %soft, %guest, %gnice) or inverse of idle columns (%iowait, %steal, %idle); sar -P ALL, sum of CPU-consuming columns (%user, %nice, %system) or inverse of idle columns (%iowait, %steal, %idle)

System-wide: vmstat 1, us + sy; sar -u, %user + %nice + %system

Per process: top, %CPU; htop, CPU%; ps -o pcpu; pidstat 1, %CPU

Per kernel thread: top/htop (K to toggle), where VIRT == 0 (heuristic)



System-wide: vmstat 1, r > CPU count1; sar -q, runq-sz > CPU count; runqlat; runqlen

Per process: /proc/PID/schedstat 2nd field (sched_info.run_delay); getdelays.c, CPU2; perf sched latency (shows average and maximum delay per schedule)3



Machine Check Exceptions (MCEs) seen in dmesg or rasdaemon and ras-mc-ctl --summary; perf(1) if processor-specific error events (PMCs) are available; e.g., AMD64’s “04Ah Single-bit ECC Errors Recorded by Scrubber”4 (which can also be classified as a memory device error); ipmtool sel list; ipmitool sdr list

Memory capacity


System-wide: free -m, Mem: (main memory), Swap: (virtual memory); vmstat 1, free (main memory), swap (virtual memory); sar -r, %memused; slabtop -s c for kmem slab usage

Per process: top/htop, RES (resident main memory), VIRT (virtual memory), Mem for system-wide summary

Memory capacity


System-wide: vmstat 1, si/so (swapping); sar -B, pgscank + pgscand (scanning); sar -W

Per process: getdelays.c, SWAP2; 10th field (min_flt) from /proc/PID/stat for minor fault rate, or dynamic instrumentation5; dmesg | grep killed (OOM killer)

Memory capacity


dmesg for physical failures or rasdaemon and ras-mc-ctl --summary or edac-util; dmidecode may also show physical failures; ipmtool sel list; ipmitool sdr list; dynamic instrumentation, e.g., uretprobes for failed malloc()s (bpftrace)

Network interfaces


ip -s link, RX/TX tput / max bandwidth; sar -n DEV, rx/tx kB/s / max bandwidth; /proc/net/dev, bytes RX/TX tput/max

Network interfaces


nstat, TcpRetransSegs; sar -n EDEV, *drop/s, *fifo/s6; /proc/net/dev, RX/TX drop; dynamic instrumentation of other TCP/IP stack queueing (bpftrace)

Network interfaces


ip -s link, errors; sar -n EDEV all; /proc/net/dev, errs, drop6; extra counters may be under /sys/class/net/*/statistics/*error*; dynamic instrumentation of driver function returns

Storage device I/O


System-wide: iostat -xz 1, %util; sar -d, %util; per process: iotop, biotop; /proc/PID/sched se.statistics.iowait_sum

Storage device I/O


iostat -xnz 1, avgqu-sz > 1, or high await; sar -d same; perf(1) block tracepoints for queue length/latency; biolatency

Storage device I/O


/sys/devices/ . . . /ioerr_cnt; smartctl; bioerr; dynamic/static instrumentation of I/O subsystem response codes7

Storage capacity


Swap: swapon -s; free; /proc/meminfo SwapFree/SwapTotal; file systems: df -h

Storage capacity


Not sure this one makes sense—once it’s full, ENOSPC (although when close to full, performance may be degraded depending on the file system free block algorithm)

Storage capacity

File systems: errors

strace for ENOSPC; dynamic instrumentation for ENOSPC; /var/log/messages errs, depending on FS; application log errors

Storage controller


iostat -sxz 1, sum devices and compare to known IOPS/tput limits per card

Storage controller


See storage device saturation, . . .

Storage controller


See storage device errors, . . .

Network controller


Infer from ip –s link (or sar, or /proc/net/dev) and known controller max tput for its interfaces

Network controller


See network interfaces, saturation, . . .

Network controller


See network interfaces, errors, . . .

CPU interconnect


perf stat with PMCs for CPU interconnect ports, tput/max

CPU interconnect


perf stat with PMCs for stall cycles

CPU interconnect


perf stat with PMCs for whatever is available

Memory interconnect


perf stat with PMCs for memory buses, tput/max; e.g. Intel uncore_imc/data_reads/,uncore_imc/data_writes/; or IPC less than, say, 0.2; PMCs may also have local versus remote counters

Memory interconnect


perf stat with PMCs for stall cycles

Memory interconnect


perf stat with PMCs for whatever is available; dmidecode might have something

I/O interconnect


perf stat with PMCs for tput/max if available; inference via known tput from iostat/ip/ . . .

I/O interconnect


perf stat with PMCs for stall cycles

I/O interconnect


perf stat with PMCs for whatever is available

General notes: uptime “load average” (or /proc/loadavg) wasn’t included for CPU metrics since Linux load averages include tasks in the uninterruptible I/O state.

perf(1): is a powerful observability toolkit that reads PMCs and can also use dynamic and static instrumentation. Its interface is the perf(1) command. See Chapter 13, perf.

PMCs: Performance monitoring counters. See Chapter 6, CPUs, and their usage with perf(1).

I/O interconnect: This includes the CPU-to-I/O controller buses, the I/O controller(s), and device buses (e.g., PCIe).

Dynamic instrumentation: allows custom metrics to be developed. See Chapter 4, Observability Tools, and the examples in later chapters. Dynamic tracing tools for Linux include perf(1) (Chapter 13), Ftrace (Chapter 14), BCC and bpftrace (Chapter 15).

For any environment that imposes resource controls (e.g., cloud computing), check USE for each resource control. These may be encountered—and limit usage—before the hardware resource is fully utilized.

1The r column reports those threads that are waiting and threads that are running on-CPU. See the vmstat(1) description in Chapter 6, CPUs.

2Uses delay accounting; see Chapter 4, Observability Tools.

3There is also the sched:sched_process_wait tracepoint for perf(1); be careful about overheads when tracing, as scheduler events are frequent.

4There aren’t many error-related events in the recent Intel and AMD processor manuals.

5This can be used to show what is consuming memory and leading to saturation, by seeing what is causing minor faults. This should be available in htop(1) as MINFLT.

6Dropped packets are included as both saturation and error indicators, since they can occur due to both types of events.

7This includes tracing functions from different layers of the I/O subsystem: block device, SCSI, SATA, IDE... Some static probes are available (perf(1) scsi and block tracepoint events); otherwise, use dynamic tracing.

Software Resources




Kernel mutex


With CONFIG_LOCK_STATS=y, /proc/lock_stat holdtime-total / acquisitions (also see holdtime-min, holdtime-max)8; dynamic instrumentation of lock functions or instructions (maybe)

Kernel mutex


With CONFIG_LOCK_STATS=y, /proc/lock_stat waittime-total / contentions (also see waittime-min, waittime-max); dynamic instrumentation of lock functions, e.g., [Gregg 19]; spinning shows up with profiling perf record -a -g -F 99 ...

Kernel mutex


Dynamic instrumentation (e.g., recursive mutex enter); other errors can cause kernel lockup/panic, debug with kdump/crash

User mutex


valgrind --tool=drd --exclusive-threshold= ... (held time); dynamic instrumentation of lock-to-unlock function time9

User mutex


valgrind --tool=drd to infer contention from held time; dynamic instrumentation of synchronization functions for wait time, e.g.,; profiling (perf(1)) user stacks for spins

User mutex


valgrind --tool=drd various errors; dynamic instrumentation of pthread_mutex_lock() for EAGAIN, EINVAL, EPERM, EDEADLK, ENOMEM, EOWNERDEAD, . . .

Task capacity


top/htop, Tasks (current); sysctl kernel.threads-max, /proc/sys/kernel/threads-max (max)

Task capacity


Threads blocking on memory allocation; at this point the page scanner should be running (sar -B, pgscan*), else examine using dynamic tracing

Task capacity


“can’t fork()” errors; user-level threads: pthread_create() failures with EAGAIN, EINVAL, . . . ; kernel: dynamic tracing of kernel_thread() ENOMEM

File descriptors


System-wide: sar -v, file-nr versus /proc/sys/fs/file-max; or just /proc/sys/fs/file-nr

Per process: echo /proc/PID/fd/* | wc -w versus ulimit -n

File descriptors


This one may not make sense

File descriptors


strace errno == EMFILE on syscalls returning file descriptors (e.g., open(2), accept(2), ...); opensnoop -x

8Kernel lock analysis used to be via lockmeter, which had an interface called lockstat.

9Since these functions can be very frequent, beware of the performance overhead of tracing every call: an application could slow by 2x or more.

A.1 References

[Gregg 13d] Gregg, B., “USE Method: Linux Performance Checklist,”, first published 2013.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.