The following examples demonstrate groups of common server KPI metrics that I use in performance testing projects. I have provided examples of generic and application-specific templates to demonstrate the top-down approach I use when troubleshooting performance issues revealed by performance testing.
Monitoring the use of this first set of metric templates provides a good indication of when a server is under stress. The metrics configured are focused on fundamental capacity indicators such as CPU loading and memory consumption. This is the basic level of server monitoring I use when performance testing.
You will probably recognize these metrics, since they are taken from the Windows Performance Monitor (Perfmon) application. This is a nice mix of counters that monitor disk, memory, and CPU performance data, together with some high-level network information that measures the number of errors encountered and data throughput in bytes per second. For many of these counters you will, of course, need to select the appropriate instances and sampling period based on your system under test (SUT) requirements. Make sure that the sampling period you select is not too frequent, as this will place additional load on the servers being monitored. The default of 15 seconds is usually sufficient to create enough data points without causing excessive load. The number of counters you are monitoring can also have an impact on server performance, so make sure that any templates you create have only the counters you really need. I recommend that, in addition to using this template, you monitor the top 10 processes in terms of CPU utilization and memory consumption. Identifying the CPU and memory hogs is often your best pointer to the next level of KPI monitoring, where you need to drill down into specific software applications and components that are part of the application deployment.
KPI metric | Notes |
---|---|
Total processor utilization % |
|
Processor queue length |
|
Context switches per second |
|
Memory available byes |
|
Memory page faults per second |
*Use to assess soft page faults |
Memory cache faults per second |
*Use to assess soft page faults |
Memory page readers per second |
*Use to assess hard page faults |
Free disk space % |
|
Page File Usage % |
|
Average disk queue length |
|
% Disk time |
This next example is taken from the non-Windows OS world and demonstrates the same basic metrics about server performance. You generally have to use a number of different tools to get similar information to that provided by Windows Performance Monitor.
KPI metric | Source utility | Indicative parameter(s) |
---|---|---|
% Processor time |
vmstat |
cs,us,sys,id,wa |
Processes on runq |
vmstat |
r |
Blocked queue |
vmstat |
b |
Memory available bytes |
svmon |
free |
Memory page faults per second |
sar |
faults per second |
Memory pages out per second |
vmstat |
po |
Memory pages in per sec |
vmstat |
pi |
Paging space |
svmon |
pg space |
Device interrupts per sec |
vmstat |
in |
% Disk time |
iostat |
%tm_act |
Drilling down from the generic KPI templates, the next level of analysis commonly focuses on metrics that are application specific. Use these templates to monitor software applications and discrete components that are part of your application deployment. These might include Microsoft’s SQL Server database or one of the many Java-based application servers, such as JBOSS. Each application type will have its own recommended set of counters, so please refer to the corresponding vendor documentation to ensure that your KPI template contains the appropriate entries.
The following example is also taken from Windows Performance Monitor and demonstrates suggested counters for MS SQL Server.
KPI metric | Notes |
---|---|
Access methods: Forward records/sec |
|
Access methods: Full table scans |
*Missing indexes |
Access methods: Index searches/sec |
|
Access methods: Page splits/sec |
|
Access methods: Table lock escalations/sec |
|
Buffer manager: Buffer cache hit ratio |
|
Buffer manager: Checkpoint pages/sec |
|
Buffer manager: Free list stalls/sec |
|
Buffer manager: Page life expectancy |
|
Buffer manager: Page lookups/sec |
|
Buffer manager: Page reads/sec |
|
Buffer manager: Page writes/sec |
|
General statistics: Logins/sec |
|
General statistics: Logouts/sec |
|
General Statistics: User connections |
|
Latches: Latch waits/sec |
|
Latches: Total latch wait time (ms) |
|
Locks: Lock wait time (ms) |
*Lock contention |
Locks: Lock waits/sec |
*Lock contention |
Locks: Number of deadlocks/sec |
*Lock contention |
Memory manager: Target server memory (KB) |
*MS SQL memory |
Memory manager: Total server memory (KB) |
*MS SQL memory |
SQL statistics: Batch requests/sec |
|
SQL statistics: Compilations/sec |
|
SQL statistics: Recompilations/sec |
3.141.42.116