Appendix D. Sample Monitoring Templates: Infrastructure Key Performance Indicator Metrics

The following examples demonstrate groups of common server KPI metrics that I use in performance testing projects. I have provided examples of generic and application-specific templates to demonstrate the top-down approach I use when troubleshooting performance issues revealed by performance testing.

Generic KPI Templates

Monitoring the use of this first set of metric templates provides a good indication of when a server is under stress. The metrics configured are focused on fundamental capacity indicators such as CPU loading and memory consumption. This is the basic level of server monitoring I use when performance testing.

Windows OS : Generic KPI Template

You will probably recognize these metrics, since they are taken from the Windows Performance Monitor (Perfmon) application. This is a nice mix of counters that monitor disk, memory, and CPU performance data, together with some high-level network information that measures the number of errors encountered and data throughput in bytes per second. For many of these counters you will, of course, need to select the appropriate instances and sampling period based on your system under test (SUT) requirements. Make sure that the sampling period you select is not too frequent, as this will place additional load on the servers being monitored. The default of 15 seconds is usually sufficient to create enough data points without causing excessive load. The number of counters you are monitoring can also have an impact on server performance, so make sure that any templates you create have only the counters you really need. I recommend that, in addition to using this template, you monitor the top 10 processes in terms of CPU utilization and memory consumption. Identifying the CPU and memory hogs is often your best pointer to the next level of KPI monitoring, where you need to drill down into specific software applications and components that are part of the application deployment.

KPI metric Notes

Total processor utilization %

Processor queue length

Context switches per second

Memory available byes

Memory page faults per second

*Use to assess soft page faults

Memory cache faults per second

*Use to assess soft page faults

Memory page readers per second

*Use to assess hard page faults

Free disk space %

Page File Usage %

Average disk queue length

% Disk time

Linux/Unix: Generic KPI Template

This next example is taken from the non-Windows OS world and demonstrates the same basic metrics about server performance. You generally have to use a number of different tools to get similar information to that provided by Windows Performance Monitor.

KPI metric Source utility Indicative parameter(s)

% Processor time

vmstat

cs,us,sys,id,wa

Processes on runq

vmstat

r

Blocked queue

vmstat

b

Memory available bytes

svmon

free

Memory page faults per second

sar

faults per second

Memory pages out per second

vmstat

po

Memory pages in per sec

vmstat

pi

Paging space

svmon

pg space

Device interrupts per sec

vmstat

in

% Disk time

iostat

%tm_act

Application-Specific KPI Templates

Drilling down from the generic KPI templates, the next level of analysis commonly focuses on metrics that are application specific. Use these templates to monitor software applications and discrete components that are part of your application deployment. These might include Microsoft’s SQL Server database or one of the many Java-based application servers, such as JBOSS. Each application type will have its own recommended set of counters, so please refer to the corresponding vendor documentation to ensure that your KPI template contains the appropriate entries.

Windows OS: MS SQL Server KPI Template

The following example is also taken from Windows Performance Monitor and demonstrates suggested counters for MS SQL Server.

KPI metric Notes

Access methods: Forward records/sec

Access methods: Full table scans

*Missing indexes

Access methods: Index searches/sec

Access methods: Page splits/sec

Access methods: Table lock escalations/sec

Buffer manager: Buffer cache hit ratio

Buffer manager: Checkpoint pages/sec

Buffer manager: Free list stalls/sec

Buffer manager: Page life expectancy

Buffer manager: Page lookups/sec

Buffer manager: Page reads/sec

Buffer manager: Page writes/sec

General statistics: Logins/sec

General statistics: Logouts/sec

General Statistics: User connections

Latches: Latch waits/sec

Latches: Total latch wait time (ms)

Locks: Lock wait time (ms)

*Lock contention

Locks: Lock waits/sec

*Lock contention

Locks: Number of deadlocks/sec

*Lock contention

Memory manager: Target server memory (KB)

*MS SQL memory

Memory manager: Total server memory (KB)

*MS SQL memory

SQL statistics: Batch requests/sec

SQL statistics: Compilations/sec

SQL statistics: Recompilations/sec

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.42.116