Chapter 9. Monitoring

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Monitoring

In this chapter, we mention some new and reintroduce some commonly used command line operating system (OS) tools that can monitor various virtualization features or aspects. The first part covers performance monitoring on AIX, the second covers IBM i. Linux running on Power Systems use SAP-based monitoring in CCMS with ST06 or open source tools such as Ganglia.

9.1 Performance Monitoring in AIX

We now describe how to integrate the virtual landscape into the traditional OS monitoring tools. We also introduce new tools that you can use to monitor different aspects of the virtual landscape. In order to improve readability, some columns of the output of commands may have been removed.

9.1.1 The vmstat monitoring tool

vmstat is a familiar monitoring tool that is found on most UNIX variants. It reports statistics about kernel threads, virtual memory, disks, traps, and processor activity. For system processor and memory utilization investigations, vmstat is a good starting point. With the introduction of shared processor partitions and micropartitions, vmstat was enhanced on AIX to include some new virtualization-related statistics.

For micropartitions, vmstat now reports the physical processor consumption by the LPAR and the percentage of entitlement that is consumed, which is identified by the pc and ec columns, respectively. The presence of the ec column also indirectly identifies the LPAR as a Shared Processor LPAR, as shown in Example 9-1.

Example 9-1 vmstat report

$ vmstat -w 1 5

System configuration: lprocessor=2 mem=1024MB ent=0.20

kthr memory page faults processor

------- --------------------- ------- ------------------ -----------------------

r b avm fre … in sy cs us sy id wa pc ec

2 0 112808 126518 … 20 60 143 99 0 0 0 0.96 480.6

2 0 112808 126518 … 22 28 141 99 1 1 0 0.47 236.6

1 0 112808 126518 … 21 20 147 0 2 98 0 0.01 3.7

1 0 112808 126518 … 16 41 144 91 2 7 0 0.19 94.1

1 0 112808 126518 … 14 25 152 99 1 1 0 0.43 216.8

Tip: The –w flag of vmstat provides a formatted wide view, which makes reading vmstat output easier, given the additional columns.

From the WPAR global environment, vmstat can also report information about the WPAR level with the -@ option, -@ ALL for example. Example 9-2 shows a summary line for the overall system and individual statistics for the global environment and each active WPAR.

Example 9-2 vmstat report

#vmstat -w -@ ALL 5 1

System configuration: lprocessor=2 mem=2048MB drives=0 ent=1.00 wpar=2

wpar kthr memory page faults processor

------------ ------- --------------------- ------ ------------------ -----------------------

r b avm fre … in sy cs us sy id wa pc rc

System 1 0 223635 250681 … 16 31 187 77 0 23 0 0.87 87.1

Global 1 0 - - … - - 172 99 1 - - 0.54 53.6

is3042w 1 0 - - … - - 15 99 0 - - 0.24 23.7

is3043w 0 0 - - … - - 0 38 62 - - 0.00 0.0

With the introduction of Active Memory Expansion the vmstat command has been enhanced with additional metrics that can be monitored. For more details about how AME works, refer to 8.5, “Active Memory Expansion for SAP systems” on page 84. The AME-related metrics can be monitored by adding the option -c, as shown in Example 9-3.

Example 9-3 vmstat report including AME information

#vmstat -w -c 1

System Configuration: lprocessor=8 mem=6144MB tmem=4096MB ent=0.20 mmode=dedicated-E

kthr memory page

------- ------------------------------------------------------ ----------------------- ...

r b avm fre csz cfr dxm ci co pi po ...

1 0 422566 598766 4096 4096 0 0 0 0 0

0 0 422621 598711 4096 4096 0 0 0 0 0

1 0 422624 598708 4096 4096 0 0 0 0 0

1 0 422566 598766 4096 4096 0 0 0 0 0

0 0 422588 598744 4096 4096 0 0 0 0 0

The size of the compressed pool is shown in column csz and the amount of free memory in the compressed pool as cfr. Similarly to the paging rates (pi and po), the rates for compressing and decompressing are shown as ci and co. Those values can be higher than the values acceptable for the paging rates, since compression and decompression are much faster than paging. The column dxm describes the memory deficit. If this value does not equal zero, the operating system cannot achieve the requested expansion rate.

9.1.2 The sar monitoring tool

Like vmstat, sar is a generic UNIX tool for collecting statistics about processor, I/O, and other system activities. You can use the sar tool to show both history and real-time data.

Beginning with AIX 5.3, the sar command reports virtualization-related processor utilization statistics. For micropartitions, the physical processors consumed and the entitlement consumed are now shown, as in Example 9-4.

Example 9-4 sar command report

#sar 1 5

AIX is3013 3 5 00C46F8D4C00 09/09/08

System configuration: lprocessor=2 ent=0.20 mode=Uncapped

15:17:04 %usr %sys %wio %idle physc %entc

15:17:05 100 0 0 0 1.00 499.7

15:17:06 100 0 0 0 1.00 499.7

15:17:07 99 1 0 0 0.57 284.9

15:17:08 99 1 0 0 0.61 306.2

15:17:09 4 54 0 42 0.01 2.8

Average 99 0 0 0 0.64 318.0

You can also monitor the metrics per processor, as shown in Example 9-5.

Example 9-5 Metrics per-processor report

#sar -P ALL 1 5

AIX is3013 3 5 00C46F8D4C00 09/09/08

System configuration: lprocessor=2 ent=0.20 mode=Uncapped

15:36:26 processor %usr %sys %wio %idle physc %entc

15:36:27 0 100 0 0 0 0.86 430.0

1 1 8 0 92 0.00 0.5

- 99 0 0 0 0.86 430.4

15:36:28 0 5 61 0 35 0.01 2.7

1 2 21 0 77 0.00 0.4

- 4 56 0 40 0.01 3.1

15:36:29 0 98 1 0 0 0.29 145.1

1 0 16 0 84 0.00 0.4

- 98 1 0 1 0.29 145.5

15:36:30 0 100 0 0 0 1.00 499.2

1 1 7 0 92 0.00 0.5

- 100 0 0 0 1.00 499.7

15:36:31 0 100 0 0 0 1.00 499.1

1 1 7 0 92 0.00 0.5

- 100 0 0 0 1.00 499.6

Average 0 99 0 0 0 0.63 315.2

1 1 11 0 88 0.00 0.4

- 99 0 0 0 0.63 315.7

9.1.3 The mpstat tool

The mpstat command collects and displays detailed output about performance statistics for all logical processors in the system. It presents the same metrics that sar does, but it also provides information about the run queue, page faults, interrupts, and context switches.

The default output from the mpstat command displays two sections of statistics, as shown in Example 9-6. The first section, which displays the system configuration, is shown when the command starts and whenever the system configuration is changed:

•lprocessor: The number of logical processors.

•ent: Entitled processing capacity in processor units. This information is displayed only if the partition type is shared.

The second section displays the utilization statistics for all logical processors. The mpstat command also displays a special processor row with the processor ID ALL, which shows the partition-wide utilization. The mpstat command gives the various statistics. It depends on the flag. The utilization statistics are:

•processor: Logical processor ID

•min, maj: Minor and major page faults

•mpc: Total number of inter-processor calls

•int: Total number of interrupts

•cs, ics: Total number of voluntary and involuntary context

•rq: Run queue size

•mig: Total number of thread migrations to another logical processor

•lpa: Number of re-dispatches within affinity domain 3

•sysc: Total number of system calls

•us, sy, wa, id: Processor usage statistics

•pc: The percentage of entitlement consumed

•%ec: Fraction of processor consumed

•lcs: Total number of logical context switches

Example 9-6 mpstat command report

#mpstat 1 2

System configuration: lprocessor=8 ent=1.0 mode=Uncapped

processor min maj mpc int cs ics rq mig lpa sysc us sy wa id pc %ec lcs

0 19 0 0 117 46 20 0 2 100 128 98 2 0 0 0.18 8.8 120

1 0 0 0 101 27 13 0 2 100 4 100 0 0 0 0.55 27.3 103

2 0 0 0 265 124 66 0 0 100 1 100 0 0 0 0.80 39.4 114

3 0 0 0 102 0 0 0 0 - 0 100 0 0 0 0.20 10.0 113

4 0 0 0 93 1 1 0 1 100 0 100 0 0 0 0.29 14.4 98

5 0 0 0 88 0 0 0 0 - 0 0 36 0 64 0.00 0.0 90

6 0 0 0 88 0 0 0 0 - 0 0 32 0 68 0.00 0.1 88

7 0 0 0 88 0 0 0 0 - 0 0 36 0 64 0.00 0.1 88

ALL 19 0 0 942 198 100 0 5 100 133 99 0 0 0 2.02 202.3 814

--------------------------------------------------------------------------------

0 3 0 0 132 41 18 1 0 100 59 12 52 0 36 0.00 0.1 120

1 0 0 0 107 27 13 0 0 100 5 100 0 0 0 0.91 31.2 107

2 0 0 0 268 125 66 1 0 100 0 100 0 0 0 1.00 34.3 110

3 0 0 0 104 0 0 0 0 - 0 0 34 0 66 0.00 0.0 105

4 0 0 0 112 3 3 1 0 100 0 100 0 0 0 1.00 34.3 107

5 0 0 0 91 0 0 0 0 - 0 0 22 0 78 0.00 0.0 91

6 0 0 0 92 0 0 0 0 - 0 0 36 0 64 0.00 0.0 90

7 0 0 0 90 0 0 0 0 - 0 0 38 0 62 0.00 0.0 90

ALL 3 0 0 996 196 100 3 0 100 64 100 0 0 0 2.91 291.0 820

If the partition type is shared, a special processor row with the processor ID U is displayed when the entitled processing capacity is not entirely consumed.

With the option -s the mpstat command provides information about the utilization of each logical processor as well as the utilization of a complete core, as you see in Example 9-7. It can also be determined which logical processor belongs to a certain core.

Example 9-7 mpstat command information

#mpstat -w -s 1 2

System configuration: lprocessor=8 ent=0.2 mode=Uncapped

Proc2 Proc0

52.83% 28.13%

processor0 processor1 processor4 processor5 processor2 processor3 processor6 processor7

33.57% 6.02% 6.79% 6.45% 19.62% 3.50% 2.51% 2.51%

------------------------------------------------------------------------

Proc2 Proc0

20.18% 0.28%

processor0 processor1 processor4 processor5 processor2 processor3 processor6 processor7

13.92% 2.54% 1.68% 2.03% 0.10% 0.06% 0.07% 0.06%

In Example 9-7 the logical processors 0, 1, 4 and 5 (which map to the threads in one core) belong to processor 2 which is equivalent to a core. This view of mpstat helps to understand how workload is dispatched to the threads and cores and how processor folding may influence the system at a given time.

The mpstat command can also provide a lot more information when you use additional parameter flags.

9.1.4 The lparstat tool

The lparstat command, introduced in AIX 5.3, is a useful tool for displaying the LPAR configuration. You can get information about the LPAR configuration with the -i option, as shown in Example 9-8.

Example 9-8 lparstat command information

#lparstat -i

Node Name : is3046

Partition Name : is3046

Partition Number : 46

Type : Shared-SMT-4

Mode : Uncapped

Entitled Capacity : 0.20

Partition Group-ID : 32814

Shared Pool ID : 0

Online Virtual processors : 2

Maximum Virtual processors : 8

Minimum Virtual processors : 1

Online Memory : 4096 MB

Maximum Memory : 16384 MB

Minimum Memory : 512 MB

Variable Capacity Weight : 128

Minimum Capacity : 0.10

Maximum Capacity : 0.80

Capacity Increment : 0.01

Maximum Physical processors in system : 32

Active Physical processors in system : 32

Active processors in Pool : 26

Shared Physical processors in system : 26

Maximum Capacity of Pool : 2600

Entitled Capacity of Pool : 1820

Unallocated Capacity : 0.00

Physical processor Percentage : 10.00%

Unallocated Weight : 0

Memory Mode : Dedicated-Expanded

Total I/O Memory Entitlement : -

Variable Memory Capacity Weight : -

Memory Pool ID : -

Physical Memory in the Pool : -

Hypervisor Page Size : -

Unallocated Variable Memory Capacity Weight: -

Unallocated I/O Memory entitlement : -

Memory Group ID of LPAR : -

Desired Virtual processors : 2

Desired Memory : 4096 MB

Desired Variable Capacity Weight : 128

Desired Capacity : 0.20

Target Memory Expansion Factor : 1.50

Target Memory Expansion Size : 6144 MB

Power Saving Mode : Disabled

Additionally, detailed utilization metrics can also be displayed, as shown in Example 9-9. The default output contains two sections: a configuration section and a utilization section.

Example 9-9 mpstat command output

#lparstat 1 10

System configuration: type=Shared mode=Uncapped smt=4 lprocessor=8 mem=6144MB psize=26 ent=0.20

%user %sys %wait %idle physc %entc lbusy app vcsw phint

----- ----- ------ ------ ----- ----- ------ --- ----- -----

17.9 45.7 0.0 36.4 0.48 237.6 6.0 25.01 293 0

18.7 47.9 0.0 33.4 0.55 275.1 7.8 24.94 426 3

18.9 47.1 0.0 34.1 1.00 498.4 13.4 23.82 360 0

1.9 10.5 0.0 87.6 0.07 35.6 0.3 24.54 777 0

18.8 47.5 0.0 33.7 0.98 491.2 16.7 23.78 392 24

17.8 45.4 0.0 36.7 0.61 303.3 7.1 23.90 334 13

18.5 48.5 0.0 33.0 0.44 219.2 6.2 23.72 655 12

Here are definitions of the metrics:

•%user, %sys: The percentage of the physical processor used while executing at the user and system levels

•%wait, %idle: I/O wait and idle levels

•physc: The number of physical processors that are consumed

•%entc: The percentage of the entitled capacity consumed

•lbusy: The percentage of logical processor(s) utilization that occurred while executing at the user and system level

•App: The available physical processors in the shared pool

•Vcsw: The number of virtual context switches that are virtual processor hardware preemptions

•Phint: The number of phantom (targeted to another shared partition in this pool) interruptions that are received

Equivalent to the vmstat command, lparstat can provide AME metrics with the option -c, as in Example 9-10.

Example 9-10 lparstat command output

#lparstat -c 1 10

System configuration: type=Shared mode=Uncapped mmode=Ded-E smt=4 lprocessor=8 mem=6144MB tmem=4096MB psize=26 ent=0.20

%user %sys %wait %idle physc %entc lbusy app vcsw phint %xprocessor xphysc dxm

----- ----- ------ ------ ----- ----- ------ --- ----- ----- ------ ------ ------

18.2 45.7 0.0 36.1 0.84 417.8 10.9 24.15 380 3 0.0 0.0000 0

18.8 50.8 0.0 30.4 0.21 103.3 3.4 25.33 512 0 0.0 0.0000 0

18.7 46.7 0.0 34.6 1.00 500.5 16.7 24.55 386 6 0.0 0.0000 0

17.1 44.6 0.0 38.3 0.41 202.8 5.2 25.16 302 0 0.0 0.0000 0

18.7 48.2 0.0 33.1 0.62 312.3 9.1 24.06 421 2 0.0 0.0000 0

18.3 45.9 0.0 35.8 0.96 479.8 12.5 24.54 388 2 0.0 0.0000 0

7.1 22.3 0.0 70.6 0.08 41.1 1.1 25.43 544 0 0.0 0.0000 0

18.5 46.1 0.0 35.3 1.01 506.5 13.0 24.54 258 1 0.0 0.0000 0

18.1 46.1 0.0 35.9 0.51 255.9 8.7 25.05 429 0 0.0 0.0000 0

18.9 48.5 0.0 32.7 0.50 252.0 7.4 24.07 436 1 0.0 0.0000 0

As in vmstat, the deficit memory is shown in column dxm. Additionally, the amount of processor capacity, which is required for compression and decompression, is documented in columns %xprocessor and xphysc.

As described in 8.6, “Processor utilization metrics” on page 85, the reporting of processor utilization changed with POWER7 technology-based systems in some cases. There it was also outlined, that in shared pool environments the metrics physc and physb can show a larger difference than in the past. If you are interested in the computing power an LPAR used (physb), the lparstat command can provide this information as well as part of the metrics reported with the option -m, as in Example 9-11.

Example 9-11 lparstat -m command output

#lparstat -m 1 10

System configuration: lprocessor=8 mem=6144MB mpsz=0.00GB iome=4096.00MB iomp=10 ent=0.20

physb hpi hpit pmem iomin iomu iomf iohwm iomaf %entc vcsw

----- ----- ----- ----- ------ ------ ------ ------ ----- ----- -----

30.42 0 0 4.00 48.2 12.1 - 26.7 0 240.3 1013

84.17 0 0 4.00 48.2 12.1 - 26.7 0 620.2 877

44.64 0 0 4.00 48.2 12.1 - 26.7 0 349.7 607

22.82 0 0 4.00 48.2 12.1 - 26.7 0 174.7 709

65.71 0 0 4.00 48.2 12.1 - 26.7 0 499.6 464

16.51 0 0 4.00 48.2 12.1 - 26.7 0 135.4 658

50.84 0 0 4.00 48.2 12.1 - 26.7 0 386.4 499

52.74 0 0 4.00 48.2 12.1 - 26.7 0 407.7 274

14.27 0 0 4.00 48.2 12.1 - 26.7 0 109.0 514

65.62 0 0 4.00 48.2 12.1 - 26.7 0 494.4 375

9.1.5 The topas tool

Many AIX system administrators and analysts use the topas command to view general statistics about the local system activity. The data is displayed in a simple and convenient character-based format using the curses library.

The topas tool was expanded to include LPAR information. Start topas with the –L option or press L while topas is running and panel output, similar to Example 9-12, is displayed.

Example 9-12 topas -L

Interval: 2 Logical Partition: is3051 Mon Oct 6 14:36:32 2008

Psize: 15 Shared SMT ON Online Memory: 2048.0

Ent: 0.40 Mode: UnCapped Online Logical processors: 8

Partition processor Utilization Online Virtual processors: 4

%usr %sys %wait %idle physc %entc %lbusy app vcsw phint %hypv hcalls

100 0 0 0 2.6 658.46 34.06 10.70 681 15 26.2 509

==================================================================================

Lprocessor minpf majpf intr csw icsw runq lpa scalls usr sys _wt idl pc lcsw

processor0 0 0 283 134 65 0 100 36 99 0 0 0 0.63 146

processor1 0 0 103 2 2 0 100 0 100 0 0 0 0.18 102

processor2 0 0 107 4 4 0 100 3 100 0 0 0 0.38 101

processor3 0 0 101 0 0 0 100 0 91 3 0 6 0.01 101

processor4 0 0 111 9 9 0 100 1 100 0 0 0 1.00 105

processor5 0 0 10 0 0 0 0 0 0 8 0 92 0.00 10

processor6 0 0 106 5 4 0 100 5 100 0 0 0 0.44 102

processor7 0 0 11 0 0 0 0 0 0 15 0 85 0.00 11

The upper section of Example 9-12 on page 103 shows a subset of the lparstat command statistics, while the lower section displays a subset of mpstat data.

The tool also provides a useful cross partition view that displays metrics from all AIX (5.3 TL3 or later) and VIOS (1.3 or later) partitions on the same host server, as shown in Example 9-13. On the command line you can get directly to this panel with the option -C.

Example 9-13 Cross partition view, topas -C

Topas CEC Monitor Interval: 10 Mon Oct 6 14:54:01 2008

Partitions Memory (GB) Processors

Shr: 10 Mon:89.0 InUse:68.1 Shr:2.6 PSz: - Don: 0.0 Shr_PhysB 0.15

Ded: 3 Avl: - Ded: 5 APP: - Stl: 0.0 Ded_PhysB 0.98

Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw Ent %EntC PhI

-------------------------------------shared-------------------------------------

is3013 A53 U 1.0 0.9 4 0 22 0 77 0.05 169 0.20 23.3 2

is3053 A61 U 8.0 5.0 4 1 4 0 94 0.02 481 0.20 7.9 2

is17d4 A53 U 8.0 3.9 4 4 2 0 93 0.02 263 0.20 7.9 0

is32d1 A53 U 8.0 8.0 4 2 3 0 94 0.01 253 0.20 6.9 2

is308v2 A53 U 1.0 1.0 4 0 2 0 97 0.01 408 0.40 3.0 0

is20d1 A53 U 8.0 8.0 4 1 2 0 95 0.01 289 0.20 5.8 0

is33d1 A53 U 8.0 6.0 4 0 2 0 96 0.01 205 0.20 4.5 0

is308v1 A53 U 1.0 1.0 4 0 1 0 98 0.01 225 0.40 2.0 1

is25d1 A53 U 12 12 4 0 1 0 97 0.01 192 0.20 3.7 0

is3036 A53 U 8.0 5.2 8 0 0 0 99 0.01 201 0.40 1.5 0

Host OS M Mem InU Lp Us Sy Wa Id PhysB Vcsw %istl %bstl %bdon %idon

------------------------------------dedicated-----------------------------------

is3024 A61 D 8.0 4.7 4 49 0 0 49 0.98 476 0.00 0.00 0.00 0.07

is3025 A61 S 10 9.0 2 0 0 0 99 0.00 344 0.00 0.00 - -

is3060 A53 S 8.0 3.6 4 0 0 0 99 0.00 303 0.00 0.00 - -

This CEC view has also been enhanced with metrics for the throughput of the Virtual I/O Servers, as in Example 9-14. When the CEC view is on the window, just press v and you are getting to the VIO Server/Client section.

Example 9-14 topas -L

Topas VIO Server/Client Throughput Details host:is3025 Interval: 10

Vio Servers: 2 Vio clients: 26 Mon Jul 18 13:32:49 2011

===============================================================================

Server KBPS TPS KB-R ART MRT KB-W AWT MWT AQW AQD

is314v1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

is314v2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

===============================================================================

Client KBPS TPS KB-R ART MRT KB-W AWT MWT AQW AQD

is20d5 0.0 0.0 0.0 0.0 105.7K 0.0 0.0 47.6K 0.0 0.0

is27d2 0.0 0.0 0.0 0.0 104.0 0.0 0.0 91.0 0.0 0.0

is20d4 0.0 0.0 0.0 0.0 567.0 0.0 0.0 104.6K 0.0 0.0

is19d3 88.0 9.0 0.0 0.0 140.0 9.0 0.0 17.4K 0.0 0.0

is3025 0.0 0.0 0.0 0.0 3.3K 0.0 0.0 7.8K 0.0 0.0

is28d1 0.0 0.0 0.0 0.0 757.0 0.0 0.0 53.7K 0.0 0.0

is34d1 0.0 0.0 0.0 0.0 34.9K 0.0 0.0 5.2K 0.0 0.0

is30d1 0.0 0.0 0.0 0.0 2.0K 0.0 0.0 27.2K 0.0 0.0

is24d2 0.0 0.0 0.0 0.0 10.8K 0.0 0.0 5.3K 0.0 0.0

is31d1 0.0 0.0 0.0 0.0 701.0 0.0 0.0 1.8K 0.0 0.0

is29d2 48.0 3.0 0.0 0.0 342.0 3.0 0.0 31.3K 0.0 0.0

is3106 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

is3065 0.0 0.0 0.0 0.0 12.8K 0.0 0.0 655.0 0.0 0.0

is3093 0.0 0.0 0.0 0.0 1.1K 0.0 0.0 238.0 0.0 0.0

is3068 0.0 0.0 0.0 0.0 47.0 0.0 0.0 1.3K 0.0 0.0

is3092 8.0 2.0 0.0 0.0 1.0K 2.0 0.0 18.4K 0.0 0.0

is3063 0.0 0.0 0.0 0.0 37.8K 0.0 0.0 596.0 0.0 0.0

is3096 0.0 0.0 0.0 0.0 46.8K 0.0 0.0 5.6K 0.0 0.0

is3011 152.0 37.0 0.0 0.0 39.5K 37.0 0.0 3.4K 0.0 0.0

is3098 1.9Ko 113.3 0.5 0.0 11.7K 112.9 0.0 4.3K 0.0 0.0

is3100 0.0 0.0 0.0 0.0 1.4K 0.0 0.0 1.2K 0.0 0.0

is3102 0.0 0.0 0.0 0.0 11.6K 0.0 0.0 274.0 0.0 0.0

is3109 0.0 0.0 0.0 0.0 240.0 0.0 0.0 463.0 0.0 0.0

is3112 1.5Ko 49.0 0.0 0.0 15.9K 49.0 0.0 351.0 0.0 0.0

is3062 0.0 0.0 0.0 0.0 12.9K 0.0 0.0 1.6K 0.0 0.0

With larger systems and growing memory the memory topology is getting more important. Topas is providing the additional option -M to show the actual topology and the memory usage, as shown in Example 9-15.

Example 9-15 topas -M output

Topas Monitor for host: is3046 Interval: 2 Fri Jul 15 08:28:17 2011

================================================================================

REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS processorS

--------------------------------------------------------------------------------

0 0 3948.2 3674.4 273.9 2116.9 403 0-7

================================================================================

processor SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP%

------------------------------------------------------------

0 0 394 100.0 0.0 0.0

1 0 77 100.0 0.0 0.0

2 0 7 100.0 0.0 0.0

3 0 0 0.0 0.0 0.0

4 0 0 0.0 0.0 0.0

7 0 0 0.0 0.0 0.0

5 0 0 0.0 0.0 0.0

6 0 0 0.0 0.0 0.0

Also AME metrics can now be seen at the main panel of topas in the section AME, as in Example 9-16.

Example 9-16 topas output

Topas Monitor for host: is3046 EVENTS/QUEUES FILE/TTY

Fri Jul 15 08:29:29 2011 Interval: 2 Cswitch 1677 Readch 3738.9K

Syscall 771.5K Writech 395.9K

Kernel 49.4 |############## | Reads 2427 Rawin 0

User 25.1 |######## | Writes 1125 Ttyout 349

Wait 0.0 |# | Forks 206 Igets 0

Idle 25.6 |######## | Execs 154 Namei 4380

Physc = 0.88 %Entc= 438.1 Runqueue 1.5 Dirblk 0

Waitqueue 0.0

Network KBPS I-Pack O-Pack KB-In KB-Out MEMORY

Total 57.9 185.3 98.7 36.2 21.7 PAGING Real,MB 6144

Faults 26288 % Comp 27

Disk Busy% KBPS TPS KB-Read KB-Writ Steals 0 % Noncomp 34

Total 0.0 82.2 20.0 0.0 82.2 PgspIn 0 % Client 34

PgspOut 0

FileSystem KBPS TPS KB-Read KB-Writ PageIn 0 PAGING SPACE

Total 3.3K 1.3K 3.3K 0.1 PageOut 0 Size,MB 5184

Sios 0 % Used 0

Name PID processor% PgSp Owner % Free 100

burn_hal 54264070 41.6 0.4 root AME

vmmd 458766 0.9 1.2 root TMEM,MB 4096 WPAR Activ 0

Xvnc 9306302 0.4 3.2 root CMEM,MB 16 WPAR Total 0

kcawd 6619320 0.2 12.0 root EF[T/A] 1.5/1.5 Press: "h"-help

xmtopasa 8126488 0.1 2.2 root CI:0.0 CO:0.0 "q"-quit

xterm 55640472 0.1 8.6 root

topas 14745716 0.1 1.8 root

random 5046450 0.1 0.4 root

xmgc 917532 0.0 0.4 root

xmtopas 5439546 0.0 4.7 root

gil 1245222 0.0 0.9 root

mysqld 6357154 0.0 42.4 mysql

icewm-de 8650828 0.0 1.1 root

nfsd 3014782 0.0 1.8 root

init 1 0.0 0.7 root

java 8388836 0.0 42.4 tomcat

kuxagent 6881304 0.0 37.9 root

rmcd 9502788 0.0 6.5 root

kulagent 4325578 0.0 7.7 root

sendmail 4063258 0.0 1.1 root

Many more examples of screenshots and explanations can be found on:

https://www.ibm.com/developerworks/wikis/display/WikiPtype/topas

Similarly, as with nmon, the topasrec command provides the capability to record certain metrics over time for an LPAR or even a whole system with the option -C. The topasout command can be used to generate reports from the data gathered from topasrec. With the topas CEC analyzer you have a nice tool to visualize the data gathered by topasrec and topasout using graphs in spreadsheets. More details can be found at:

https://www.ibm.com/developerworks/wikis/display/WikiPtype/topas+CEC+Analyser

9.1.6 The nmon tool

The nmon tool is a free tool available from IBM that presents output in a curses-based monitor that is similar to topas. nmon displays a wealth of information, as shown in Example 9-17 on page 107, about the configuration and utilization of the system, which includes many of the virtualization features.

You can download the nmon tool from the nmon Wiki website at:

http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmon

A complete manual, FAQ, and forum are also at the Wiki website. As of AIX 6.1, nmon is also packaged and delivered with AIX.

Example 9-17 Sample nmon panel

-nmon12e-----2=Top-Child-processor----Host=is3051---------Refresh=2 secs---09:02.11-

Resources -------------------------------------------------------------------

OS has 8 PowerPC_POWER6 (64 bit) processors with 8 processors active SMT=On

processor Speed 3504.0 MHz SerialNumber=104A1B0 MachineType=IBM,9117-MMA

Logical partition=Dynamic HMC-LPAR-Number&Name=51,is3051

AIX Version=5.3.8.3 TL08 Kernel=64 bit Multi-Processor

Hardware-Type(NIM)=CHRP=Common H/W Reference Platform Bus-Type=PCI

processor Architecture =PowerPC Implementation=POWER6_in_P6_mode

processor Level 1 Cache is Combined Instruction=65536 bytes & Data=65536 bytes

Level 2 Cache size=not available Node=is3051

Event= 0 --- --- SerialNo Old=--- Current=C4A1B0 When=---

processor-Utilisation-Small-View -----------Entitledprocessor= 0.40 Usedprocessor= 1.815-----

PURR Stats ["-"=otherLPAR] 0----------25-----------50----------75----------100

0 56.0 0.3 0.0 0.1|UUUUUUUUUUUUUUUUUUUUUUUUUUU---------------------->

1 33.3 0.0 0.0 0.1|UUUUUUUUUUUUUUUU--------------------------------->

2 68.2 23.6 0.0 0.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssss---->

3 0.0 0.0 0.0 0.1|>------------------------------------------------|

4 0.0 0.0 0.0 0.0|>------------------------------------------------|

5 0.0 0.0 0.0 0.0|>------------------------------------------------|

6 0.0 0.0 0.0 0.0|>------------------------------------------------|

7 0.0 0.0 0.0 0.0|>------------------------------------------------|

EntitleCapacity/Virtualprocessor +-----------|------------|-----------|------------+

EC+ 86.6 13.2 0.0 0.1|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssss|

VP 39.3 6.0 0.0 0.1|UUUUUUUUUUUUUUUUUUUss----------------------------|

EC= 453.7% VP= 45.4% +--No Cap---|--Folded=2--|-----------100% VP=4 processor+

Memory ----------------------------------------------------------------------

Physical PageSpace | pages/sec In Out | FileSystemCache

% Used 42.8% 0.2% | to Paging Space 0.0 0.0 | (numperm) 4.0%

% Free 57.2% 99.8% | to File System 0.0 0.0 | Process 23.2%

MB Used 877.4MB 3.5MB | Page Scans 0.0 | System 15.7%

MB Free 1170.6MB 2044.5MB | Page Cycles 0.0 | Free 57.2%

Total(MB) 2048.0MB 2048.0MB | Page Steals 0.0 | -----

| Page Faults 656.5 | Total 100.0%

------------------------------------------------------------ | numclient 4.0%

Min/Maxperm 84MB( 4%) 168MB( 8%) <--% of RAM | maxclient 8.2%

Min/Maxfree 960 1088 Total Virtual 4.0GB | User 11.8%

Min/Maxpgahead 2 8 Accessed Virtual 0.7GB 18.5%| Pinned 26.5%

Top-Processes-(88) ------Mode=3 [1=Basic 2=processor 3=Perf 4=Size 5=I/O 6=Cmds]--

PID %processor Size Res Res Res Char RAM Paging Command

Used KB Set Text Data I/O Use io other repage

385030 88.8 112 116 4 112 0 0% 0 0 0 spin

442520 60.0 112 116 4 112 0 0% 0 0 0 spin

295062 31.6 1248 1316 340 976 10 0% 0 655 0 xterm

348376 0.1 7212 7536 532 7004 0 1% 0 0 0 nmon12e_aix537

You can toggle on or off various sections that contain different metrics. Example 9-18 shows section possibilities that are offered when you depress the h key.

Example 9-18 nmon section options

h = Help information q = Quit nmon 0 = reset peak counts

+ = double refresh time - = half refresh r = Resourcesprocessor/HW/MHz/AIX

c = processor by processor C=upto 128 processors p = LPAR Stats (if LPAR)

l = processor avg longer term k = Kernel Internal # = Physicalprocessor if SPLPAR

m = Memory & Paging M = Multiple Page Sizes P = Paging Space

d = DiskI/O Graphs D = DiskIO +Service times o = Disks %Busy Map

a = Disk Adapter e = ESS vpath stats V = Volume Group stats

^ = FC Adapter (fcstat) O = VIOS SEA (entstat) v = Verbose=OK/Warn/Danger

n = Network stats N=NFS stats (NN for v4) j = JFS Usage stats

A = Async I/O Servers w = see AIX wait procs "="= Net/Disk KB<-->MB

b = black&white mode g = User-Defined-Disk-Groups (see cmdline -g)

t = Top-Process ---> 1=basic 2=processor-Use 3=processor(default) 4=Size 5=Disk-I/O

u = Top+cmd arguments U = Top+WLM Classes . = only busy disks & procs

W = WLM Section S = WLM SubClasses)

You can optionally save the nmon data in a file, which you can then give as input to the nmon analyzer, which uses MS Excel to produce dozens of graphs. The nmon analyzer is also available at the nmon Wiki website.

9.2 Performance monitoring on IBM i

Performance tools for the IBM i operating system can be divided into four major groups:

• Ad-hoc performance tools that show a snapshot of hardware resource utilization

• Data collection tools to collect data about hardware resource utilization continuously

•Performance analysis tools to format, view, and summarize the collected data

• Enhanced tools for root cause analysis of performance problems.

This chapter can only provide a quick overview of the most commonly used performance tools on IBM i. For more detailed information about the available tools, see IBM PowerVM Virtualization Managing and Monitoring, SG24-7590 and the “IBM i and System i Information Center” at:

http://publib.boulder.ibm.com/eserver/ibmi.html.

9.2.1 Ad-hoc Performance Tools on IBM i

The WRKSYSSTS command is shipped with the IBM i operating system and can be used to display processor and main storage utilization. When you execute the command with assistance level (ASTLVL) *INTERMED, you will see a panel as in Figure 9-1 on page 109.

Work with System Status AS0030

07/12/11 16:13:48

% CPU used . . . . . . . : 361.0 Auxiliary storage:

% DB capability . . . . : 142.7 System ASP . . . . . . : 4233 G

Elapsed time . . . . . . : 00:00:01 % system ASP used . . : 64.0045

Jobs in system . . . . . : 5278 Total . . . . . . . . : 4233 G

% perm addresses . . . . : .224 Current unprotect used : 54733 M

% temp addresses . . . . : 2.714 Maximum unprotect . . : 66256 M

Type changes (if allowed), press Enter.

System Pool Reserved Max -----DB----- ---Non-DB---

Pool Size (M) Size (M) Active Fault Pages Fault Pages

1 5200.00 470.22 +++++ .0 .0 .0 .0

2 10869.12 37.63 1400 44.9 323.7 425.1 1210

3 291.00 .50 71 .0 .0 .0 .0

4 1459.00 1.28 352 .0 .0 .0 .0

5 1459.00 2.00 356 .0 .0 .0 .0

More...

Command

===> .

F3=Exit F4=Prompt F5=Refresh F9=Retrieve F10=Restart F12=Cancel

F19=Extended system status F24=More keys

Figure 9-1 Output of WRKSYSSTS

When you use IBM i in a logical partition that is utilizing processor resources from a shared processor pool, you must consider several metrics: The number of physical processors in the shared processor pool, the number of virtual processors assigned to the partition, and the processing units assigned to the partition (also called entitlement). The entitlement indicates the guaranteed processing capacity of the partition.

In Figure 9-1, the shared processor pool has four physical processors, and the partition is configured with three virtual processors and an entitlement of 0.3 processing units. In the WRKSYSSTS panel, the value for “% processor used” indicates the processor utilization in relation to the entitlement of 0.3 processing units. A value of more than 100% is possible because additional processor resources from the shared processor pool can be used. Use function key F19 to get a more detailed view of processor utilization, as in Figure 9-2.

Extended System Status AS0030

07/12/11 16:13:48

Elapsed time . . . . . . . . . . . . . . . : 00:00:01

% CPU used . . . . . . . . . . . . . . . . : 361.0

% DB capability . . . . . . . . . . . . . . : 142.7

% uncapped CPU capacity used . . . . . . . : 36.1

% shared processor pool used . . . . . . . : 39.9

F3=Exit F5=Refresh F10=Restart F12=Cancel

Figure 9-2 Output using F19

The value for “% uncapped processor capacity used” calculates the current processor utilization in relation to the number of virtual processors (the maximum capacity that this partition can get). The value for “% shared processor pool used” considers how much of the shared processor pool is currently in use by all partitions sharing it, including this one. If this value approaches 100%, the system is becoming processor bound. You can either increase the number of processors in the shared pool or reduce the workload to avoid this situation.

The “Work with System Status” panel also has information about the utilization of main storage pools. The system pools 1 (*MACHINE pool) and 2 (*BASE pool) are always configured, while other pools are only shown if you have configured them and assigned them to an active subsystem (see 8.4.1, “Separation of main storage pools” on page 82). You can directly change pool sizes for all main storage pools except the *BASE pool (system pool 2). To change the pool size, overwrite the current value in column “Pool Size (M)” with the desired value and press Enter. The *BASE pool gets the storage that is left over from the available main storage after all the other pools have their desired amount of storage assigned. The page fault rates (DB faults per second and non-DB faults per second) are an important indicator for system performance.

The WRKSYSACT command is only available when the licensed program IBM Performance Tools for i (57xx-PT1 - Manager Feature) is installed. A sample panel looks like Figure 9-3.

Work with System Activity AS0030

07/12/11 17:18:38

Automatic refresh in seconds . . . . . . . . . . . . . . . . . . . 5

Job/Task CPU filter . . . . . . . . . . . . . . . . . . . . . . . . .10

Elapsed time . . . . . . : 00:00:02 Average CPU util . . . . : 255.3

Virtual Processors . . . . : 3 Maximum CPU util . . . . . : 363.2

Overall DB CPU util . . . : 74.7 Minimum CPU util . . . . . : 150.0

Average CPU rate . . . . . : 101.2 Current processing capacity: .30

Type options, press Enter.

1=Monitor job 5=Work with job

Total Total DB

Job or CPU Sync Async CPU

Opt Task User Number Thread Pty Util I/O I/O Util

WP04 SSM10 383827 0000020B 20 115.0 2 212 56.6

WP01 SSM10 383823 00000029 20 21.9 0 1 2.2

SERVER0 SSM10 031599 00000059 26 16.4 0 1 .0

WP02 SSM10 383824 00000029 20 16.2 1 24 10.4

WP03 SSM10 383825 00000012 20 14.4 1 2 5.5

RMTMSAFETA 0 2.8 0 0 .0

More...

F3=Exit F10=Update list F11=View 2 F12=Cancel F19=Automatic refresh

F24=More keys

Figure 9-3 Output of WRKSYSACT

The Work with System Activity tool is showing the number of virtual processors and the processing capacity (entitlement), which you could otherwise only see in the Hardware Management Console (HMC). In addition, the tool lists processes and Licensed Internal Code (LIC) tasks that use most of the processor resources.

9.2.2 Performance data collection tools on IBM i

Performance data collection on IBM i is performed by the Collection Services, which are shipped as part of the operating system. You can control the Collection Services through the command line interface, application programming interfaces (APIs), the System i Navigator interface and the IBM Systems Director Navigator Performance interface (available with IBM i 6.1 or later).

In the command line interface, use the STRPFRCOL command with parameter COLPRF to start a performance collection based on a collection profile. The collection profile defines what type of performance data to collect. Multiple collection profiles are available. When running SAP applications in a virtualized environment, the suggested collection profile is *STANDARDP. The values *MINIMUM and *STANDARD are not recommended because they do not include LPAR and communication data. You can change other attributes of the performance data collection, such as the default collection interval, the cycle time and interval, and the collection retention time, by using the CFGPFRCOL command. If you want to end a performance data collection, use the ENDPFRCOL command.

The collected data is initially stored in so-called management collection objects of type *MGTCOL. They are stored in a library that you can define with the CFGPFRCOL command, typically library QPFRDATA or library QMPGDATA. After some time, typically every day at midnight, the collection is cycled, that is, the Collection Services create a new management collection object and write further performance data into the new object. You can now convert the old management collection object into database files for further analysis with the CRTPFRDTA command. After that, you can use the CRTPFRSUM command to create additional indexes on the performance database files and to allow faster access to the summary data in the performance analysis tools. In order to automate the creation of these database files, use the CFGPFRCOL command and set both the parameters “Create database files” (CRTDBF) and “Create performance summary” (CRTPFRSUM) to *YES.

When the licensed program “IBM Performance Tools for i” is installed, you can use option 2 of menu PERFORM (“Collect performance data”) to start, end, or configure the performance data collection.

With the System i Navigator interface, you can start, end, configure, and manage the Collection Services through the path Configuration and Service → Collection Services as shown in Figure 9-4 on page 112.

Figure 9-4 System i Navigator for Collection Services

The Collection Services can be started with one of the shipped collection profiles, but it is also possible to individually select what data to collect in which intervals, as shown in Figure 9-5 on page 113.

Figure 9-5 Selection panel for Collection Services options

9.2.3 Performance Data Analysis Tools on IBM i

The IBM i operating system offers three ways to analyze the data that was collected by the Collection Services: Querying the output tables of the CRTPFRDTA command, using the Graph History function of the System i Navigator, or using the “Performance Data Investigator” of the IBM Systems Director Navigator for i Performance (available with IBM i 6.1 or later). You can perform a much more detailed analysis of the collected performance data with the licensed program IBM Performance Tools for i (57xx-PT1 - Manager Feature).

The formats and columns of the Collection Services output tables are documented in the IBM i and System i Information Center at:

http://publib.boulder.ibm.com/eserver/ibmi.html

Follow the path under Systems management → Performance → Reference information for Performance → Collection Services data files. For SAP applications on IBM i in a virtualized environment, tables QAPMLPARH and QAPMSHRMP are of special interest. Table QAPMLPARH contains logical partition configuration and utilization data and is available in IBM i 6.1 and higher. Table QAPMSHRMP reports shared memory pool data and is available in IBM i 7.1 and higher when a partition is defined to use a shared memory pool. This is supported on POWER6 or later with firmware level xx340_075 or later. Table QAPMLPAR contains cross-partition data, which is collected by the IBM Systems Director. While the IBM Systems Director Server is running on another server or partition, the platform agent support for IBM i is provided with licensed program IBM Universal Manageability Enablement for i (57xx-UME).

You can display performance data in the System i Navigator through the path Configuration and Service → Collection Services. Select a collection and then choose the option Graph History. A variety of metrics can be selected for display, for example the average processor utilization, as shown in Figure 9-6. Note that the processor utilization is given in percent of the entitlement, so values larger than 100% are possible when running in a virtualized environment with a shared processor pool.

Figure 9-6 Performance Graph History for processor utilization

You can execute the Performance Data Investigator by following these steps:

1. Access this URL from a web browser:

http://<server name>:2001

2. Log in with a user ID and password.

3. Select Performance from the left panel.

4. Select Investigate Data.

5. Select Health Indicators.

6. Select a performance collection and click Display.

You will get output similar to that shown in Figure 9-7 on page 115. You can change thresholds and select a variety of views with the Select Action button.

Figure 9-7 Performance Data Investigator in the Systems Director Navigator

The licensed program IBM Performance Tools for i offers a variety of reports and analysis tools for a more convenient analysis of the performance data collected by the Collection Services. You can access these tools through the menu PERFORM or the STRPFRT command.

When using the STRPFRT command, you can select the library that contains the performance data, if it is different from QPFRDATA. In the resulting menu, you can use option 3 (“Print performance report”) to print system reports for a general overview or component reports for a closer look at individual hardware resources. In a virtualized environment, these reports show the number of virtual processors and the processor units (entitlement). Processor utilization is reported in percent of the entitlement, so values above 100% are possible. The system report and the component report for disk activity include the disk service time, which can be an important indicator of a potential bottleneck when using external storage or a Virtual I/O Server.

Option 7 (“Display performance data”) in the Performance Tools menu enables you to navigate through a complete data collection or a certain interval within the data collection. After selecting a data collection and an interval, you will see the overview panel shown in Figure 9-8 on page 116.

Display Performance Data

Member . . . . . . . . Q195000007 F4 for list

Library . . . . . . QPFRDATA

Elapsed time . . . : 01:00:00 Version/Release . : 6/ 1.0

System . . . . . . : AS0030 Model . . . . . . : M50

Start date . . . . : 07/14/11 Serial number . . : 65-98D42

Start time . . . . : 00:00:08 Feature Code . . . : 4966-4966

Partition ID . . . : 002 Int Threshold . . : 100.00 %

QPFRADJ . . . . . : 0 Virtual Processors : 3

QDYNPTYSCD . . . . : 1 Processor Units . : .30

QDYNPTYADJ . . . . : 1

CPU utilization (interactive) . . . . . . . . . : .00

CPU utilization (other) . . . . . . . . . . . . : 63.66

Interactive Feature Utilization . . . . . . . . : .00

Time exceeding Int CPU Threshold (in seconds) . : 0

Job count . . . . . . . . . . . . . . . . . . . : 1112

Transaction count . . . . . . . . . . . . . . . : 0

More...

F3=Exit F4=Prompt F5=Refresh F6=Display all jobs F10=Command entry

F12=Cancel F24=More keys

Figure 9-8 Display Performance Data

From this panel, you can group the data by subsystem, job type, interval, main storage pool, disk unit, or communication line, or you can display a list of all processes that were active during the collection period. From that list you can see hardware resource consumption and wait times for each process individually.

9.2.4 Enhanced tools for root cause analysis

The previously mentioned tools are designed to collect data continuously, but they can only display summary information. This may be sufficient to identify, for example, lack of processor resources as the main cause of poor performance, but it does not help you to identify the software components that are using most of this resource. For a more in-depth analysis, you can use the following tools:

•Performance Explorer (PEX)

•IBM i Job Watcher

•IBM i Disk Watcher

Performance Explorer (PEX)

The Performance Explorer allows very specific and comprehensive data collections in the shape of statistics or traces, or based on profile information. The data collection can be system wide or process specific. Usually it produces a considerable amount of output, so it should only be active for a short amount of time. You define the type and amount of data collected with the ADDPEXDFN command. You start and stop the actual data collection with the commands STRPEX and ENDPEX. You can evaluate the data with the PRTPEXRPT command or with the tool “IBM iDoctor for IBM i”. Beginning with IBM i 6.1, all components except for “IBM iDoctor for IBM i” are included in the base operating system.

IBM i Job Watcher

The IBM i Job Watcher can be used to collect call stacks, SQL statements, wait statistics and objects being waited on for all processes on the system, based on periodic snapshots. The data collection can be system wide or process specific and should only be active for a short amount of time. You define the type of data collected with the ADDJWDFN command. You start and stop the actual data collection with the commands STRJW and ENDJW. You can evaluate the data with the tool “IBM iDoctor for IBM i”. Beginning with IBM i 6.1, the commands ADDJWDFN, STRJW, and ENDJW are shipped with the base operating system.

IBM i Disk Watcher

The IBM i Disk Watcher can be used to collect data for I/O operations to disk units along with data that makes it possible to identify which processes have caused them and which objects were related. The data collection is system wide, but you can limit it to a specific auxiliary storage pool (ASP). You define the type of data collected with the ADDDWDFN command. You start and stop the actual data collection with the commands STRDW and ENDDW. You can evaluate the data with the tool “IBM iDoctor for IBM i”. Beginning with IBM i 6.1, the commands ADDDWDFN, STRDW, and ENDDW are shipped with the base operating system.

More information about these tools can be found in the IBM i and System i Information Center at:

http://publib.boulder.ibm.com/eserver/ibmi.html

under Systems management → Performance → Applications for performance management → Performance data collectors. Information about the IBM iDoctor for IBM i can be found at:

http://www-912.ibm.com/i_dir/idoctor.nsf/iDoctor.html.

This page also contains links to Application and Program Performance Analysis Using PEX Statistics on IBM i5/OS, SG24-7457 and to IBM iDoctor for iSeries® Job Watcher: Advanced Performance Tool, SG24-6474.

9.3 Other monitoring tools

In this section, we cover other monitoring tools, such as LPAR2rrd and Ganglia.

9.3.1 LPAR2rrd

To monitor a virtualized environment, LPAR2rrd provides a good overview of used processor and memory resources across LPARs with minimal overhead. The advantage of this tool is that it does not require that you install agents into the LPARs on the system. Instead it evaluates, accumulates, and presents the utilization data from the HMC. It is independent from the operating systems running inside the monitored LPARs. The tool itself has some restrictions on where it can run and these are documented here. Because the HMC does not gather utilization data from inside the LPARs (intentionally), this tool is perfectly suited for environments that use mostly shared pool LPARs. After you set up the tool, the only activity to add new physical machines to the monitoring environment is to enable gathering of utilization data for the server on the HMC.

The features of this tool are:

•It creates charts based on utilization data collected on HMCs.

•It is intended mainly for micropartitioned systems.

•It does not require to install clients into LPARs.

•It is independent from the OS installed in the LPARs.

•It is easy to install and configure.

•It can also deal with LPARs moved with Live Partition Mobility (for example, Ganglia has some issues with this case).

In addition to these advantages, here are more facts:

•It will not provide processor utilization data for dedicated LPARs.

•For Dedicated Shared LPARs, it shows only how many cycles are donated to the shared pool. Because these are typically the idle cycles of an LPAR, you get a pretty good impression of the processor usage in a Dedicated Shared LPAR (in contrast to the dedicated LPAR).

The representation of the gathered data occurs through a small web application that provides a simple navigation through all machines and LPARs in your environment and presents the resource usage data as graphs. Figure 9-9 shows one example of a machine using mainly LPARs in the shared pool. It shows the accumulated processor usage of all LPARs in the shared pool.

Figure 9-9 Web front end display of LPAR2rrd

The tool provides similar graphs also for each single LPAR and for different time windows (day, week, month, and year). Current information about the latest available version, more images, an explanation of the output, and detailed installation and set-up descriptions are provided at:

http://www.ibm.com/developerworks/wikis/display/virtualization/LPAR2rrd+tool

LPAR2rrd requires a UNIX system with the following prerequisites:

•Web server (for example, Apache)

•Perl

•ssh

•RRDTool

LPAR2rrd was tested with AIX 5.3/5.2, SuSE Linux, RedHat and Ubuntu on Intel. It is not necessary to dedicate the UNIX system exclusively for the tool.

To retrieve the LPAR utilization data from the HMC, the HMC version that you are using must be greater or equal to V5R2.1, and the firmware on the managed system must be at least on SF240_201.

You can also use the web front end of the HMC to enable the collection of utilization data. Our original procedure describes how to do it on the command line in the HMC.

Figure 9-10 through Figure 9-14 on page 120 show the procedure of collection of utilization data on an HMC with the version V7R7.3:

1. In System Management, select the tasks of a specific server, as shown in Figure 9-10.

Figure 9-10 HMC procedure

2. Select Operations → Utilization Data, and choose the option Change Sampling Rate, as shown in Figure 9-11. When you choose this option, you can select the sampling frequency for your data.

Figure 9-11 HMC - select sampling rate

3. Alternately, instead of choosing the option Change Sampling Rate, you can choose the option View as shown in Figure 9-12 on page 120. Using this option, you can verify that the HMC is gathering the data as expected. After choosing the View option you can filter the events that you want to see.

Figure 9-12 HMC - filter events

A list of events is displayed, as shown in Figure 9-13, that matches your criteria. You can also show the detail information that is available for each event.

Figure 9-13 HMC - select events

As shown in Figure 9-14, we look at the system data, which is relatively static.

Figure 9-14 HMC - select system

You can also look at other panels such as the LPAR and pool information. Those panels provide very detailed information. Figure 9-15 shows an example of an overview of the utilization of all partitions.

Figure 9-15 HMC: Partition Utilization

In this book, we do not describe the installation procedure for LPAR2rrd itself because the instructions are provided in a simple and straightforward manner on the website at:

http://www.ibm.com/developerworks/wikis/display/virtualization/LPAR2rrd+tool

9.3.2 Ganglia

Ganglia is an open source distributed monitoring system that was originally targeted at High Performance Computing (HPC) clusters. Ganglia relies on a hierarchical design to scale well, especially in large environments. The freely available code runs on many different operating systems and hardware platforms. Ganglia is easy to set up and can be used through a web front end with any web browser. Unlike most open source monitoring tools, Ganglia is also aware of the virtualized Power Systems environment. More and more clients use Ganglia to monitor their IT landscape with both SAP and non-SAP systems.

Architecture overview

The Ganglia monitoring system collects and aggregates performance data from different computing systems. Ganglia defines three terms that describe the level of detail of collected data in a hierarchical structure:

•Node

•Cluster

•Grid

A node is the smallest entity where the actual performance metrics get collected. Usually a node is a small physical system that does a certain task or specific calculation. One or more nodes doing the same task participate in a cluster, and several different clusters make up a grid. Concerning time information, Ganglia aggregates per default performance data in the following resolutions: last hour, last day, last week, last month, and last year.

In different environments than HPC, you can easily translate Ganglia terms to corresponding terms used therein. In a virtualized environment, you can see a node as an equivalent to a single operating system instance, regardless of the physical occurrence of the node itself. Therefore, a node can be a virtual machine, or in Power Systems terms, a logical partition (LPAR). A cluster is then a group of nodes, or in our case, LPARs that are running on the same physical system. A grid is just a group of physical systems.

Table 9-1 shows the translation of Ganglia terms to other environments.

Table 9-1 Ganglia terms¹

Ganglia terms	HPC environment	Data Center	Virtualized environment
Node	Physical system	Physical system	Virtual machine (VM) or LPAR
Cluster	Systems with the same task	Grouping of platforms or system type (for example, Web services or all UNIX systems)	All VMs or LPARs on the same physical system (CEC)
Grid	Group of clusters	All clusters in data center	Multiple CECs

The Ganglia monitoring system consists of four main components:

•Ganglia monitoring daemon

•Ganglia meta daemon

•Data store

•Web front end

Figure 9-16 on page 123 shows the relationship between the four components.

Figure 9-16 Ganglia architecture overview²

The Ganglia monitoring daemon (gmond) runs on every node to be monitored. It is a single binary with a single configuration file, /etc/gmond.conf.

According to the configuration, the monitoring daemon collects all supported performance metrics on the node repeatedly in certain intervals of time. All monitoring daemons within a cluster send the collected data on to one designated monitoring daemon within the same cluster.

The Ganglia meta daemon (gmetad) collects all of the data from clusters and stores it into data files. The meta daemon has just one single configuration file, /etc/gmetad.conf.

This configuration file lists the available data sources. These data sources are usually monitoring daemons that represent one cluster each. In more complex environments, you can have cascaded meta daemons forming a hierarchy according to the three mentioned entity levels.

The data files are in the round robin database (RRD) file format. RRD is a common format for storing time series data and is used by many open source projects. The RRDtool Web site provides you with a very good description of using the RRD file format for generating customized graphs at:

http://oss.oetiker.ch/rrdtool/

The web front end runs on a PHP-enabled web server and needs access to the RRD data files of the meta daemon. In most cases, you run the meta daemon and the web server on the same machine. The web front end generates performance graphs on the grid, cluster, and node level in the specified time frame. Hence, the web front end is based on the PHP scripting language; therefore, you can easily customize the interface and the shown graphs.

Through a separate executable called gmetric, you can insert additional performance metrics that are unknown to the Ganglia monitoring daemon.

Ganglia and Power Systems virtualization

Although Ganglia already runs on a variety of platforms, the current AIX and Linux implementation of Ganglia lacks the support for IBM Power Systems processor-specific metrics. To fully support the Power Systems platform, the relevant portion of Ganglia source code was adapted by Nigel Griffith and Michael Perzl. They also provided a proposal to include the Power metrics into future Ganglia releases.

Table 9-2 shows the metrics that the AIX and Linux version of Ganglia supports³.

Table 9-2 Metrics of AIX and Linux supported by Ganglia

Ganglia metric terms	Value type	Description
Capped	boolean	Specifies if the LPAR is capped (0 = false, 1=true)
processor_entitlement	decimal number	Specifies the number of processor units (for example 1.5)
processor_in_LPAR	integer	Number of virtual processors for a micropartition, number physical processors for a dedicated LPAR, or standalone system
processor_in_machine	integer	Number of physical processors in the system
processor_in_pool	integer	Number of physical processors in the shared processor pool for micropartitions
processor_in_idle	decimal	Number of processor units that are idle in the shared processor pool
processor_used	decimal	Number of processor units utilized by the LPAR
disk_read	integer	Number of KBs read from disk
disk_write	integer	Number of KBs written to disk
kernal 64-bit	boolean	Specifies if the LPAR is capped (0 = false, 1=true)
Lpar	boolean	Specifies if the LPAR is capped (0 = false, 1=true
LPAR_name	string	Specifies the name of the LPAR given in the HMC
LPAR_num	integer	Specifies the LPAR number
Oslevel	string	Specifies the level of operating system
serial_num	string	Specifies the serial number of the physical system
Smt	boolean	Specifies if simultaneous multithreading is turned on (0 = false, 1=true)
SpLPAR	boolean	Specifies if the LPAR is capped (0 = false, 1=true)
Weight	integer	Specifies the weight of the LPAR

Note: For monitoring the processor_pool_idle metric in an LPAR, grant the pool utilization authority in the Hardware Management Console.

Downloading and installing Ganglia

The Ganglia source code and precompiled binaries for different platforms are available at the Ganglia project home page:

http://ganglia.info/

You can download precompiled binary packages for AIX and Linux under Power Systems with the POWER Ganglia extensions from:

http://www.perzl.org/ganglia

To install Ganglia, follow the instructions on the Ganglia project home page.

Sample usage

Figure 9-17 shows a usage example of Ganglia. Three LPARs (ls3772, ls3773, and ls3752) running on the same managed system are grouped as a Ganglia cluster (LoP 7.4).

Figure 9-17 Ganglia usage example⁴

You can access a live demo of Ganglia from:

http://monitor.millennium.berkeley.edu/

9.3.3 Tivoli Monitoring

Besides the introduced monitoring tools, IBM provides a mighty monitoring facility within its Tivoli® product family. You can get further details from the deployment guide IBM Tivoli Monitoring V6.2, SG24-7444 at:

http://www.redbooks.ibm.com/abstracts/SG247444.html

Figure 9-18 shows an example, using the CEC agent, within Tivoli Monitoring, which displays the entire processor and memory allocations per logical partition (LPAR) on a p595.

Figure 9-18 Tivoli Enterprise Portal

It is possible to view which LPARs are using the most memory and processor capacity. The workspace also shows how much processor capacity and memory is allocated to each LPAR on the p595 in the same view. Tivoli Monitoring manages your IT infrastructure, which includes operating systems, databases, and servers across distributed and host environments through a single customizable workspace portal.

9.4 Monitoring of virtualized SAP systems

In this section, we discuss:

•Motivation and challenges

•CCMS transaction st06

•SAP monitoring infrastructure

9.4.1 Motivation and challenges

With the availability of micropartitions and the shared processor pool, SAP systems can no longer rely on the traditional utilization measurements to depict the capacity consumption of the system. The SAP monitoring component (CCMS) depended on these traditional metrics, which were common to most derivatives of UNIX, to provide a common monitoring view of general system resources. In the virtualized landscape, these metrics are still available, but one can no longer rely on them in isolation to come to processor performance conclusions.

On AIX and Linux systems, the SAP operating system data collector, saposcol, presents the logical processors as processors. In a dedicated LPAR, for example, if simultaneous multithreading is active, the number of processors increases the physical processors by a factor of the number of threads. In an SPLPAR, the number of processors is a factor of the SMT threads and the online number of virtual processors. In CCMS, it was not possible to differentiate between a non-SMT system with four processors for example, an LPAR with two processors and SMT, or an SPLPAR with two virtual processors and SMT.

In SPLPARs, the processor utilization metrics change their point of reference as opposed to that of a dedicated server. The processor utilization for an uncapped micropartition is relative to its entitlement until it reaches 100% of entitlement and then it remains at 100%. If an SPLPAR is using many times its entitlement, this is not visible within CCMS. Therefore, response times can vary considerably on an SAP system, although the processor utilization remained the same: 100% utilization. This can happen if the SPLPAR got many times its entitlement sometimes, and only its basic entitlement at other times, due to oscillation in contention with other active SPLPAR workloads. This situation makes problem determination an extreme challenge.

The recording of history data for processor utilization, which also only targeted the traditional metrics, was of little value in determining actual use and trends. It was not even possible to determine from this data whether the server was dedicated, and the metrics might therefore still have some reliability, or running in virtualization, with an entirely different point-of-reference.

Virtualization brings many challenges to monitoring and these are further complicated by landscape trends within the SAP infrastructure itself. SAP CCMS collects history data based on aggregation of the hourly and daily system resource utilization. This data is used for trend analysis, regression testing, and performance monitoring. This data is recorded per application server and based on the SAP system, not the server. With the SAP “Adaptive Computing” initiative, you can move SAP components around the infrastructure landscape according to requirement, even landing on servers with different processor types: POWER4 to POWER6 for example.

Using dynamic LPAR functions, you can change the number of processors on the fly, so the number of configured processors can vary on the same dedicated server. For shared processor pools, the number of processors in the pool can change dynamically: dedicated LPARs might release processors that then join the shared pool, on-demand resources can be brought online, static LPARs might be started and thereby take processors out of the pool, and processors can be dynamically allocated to static LPARs and leave the pool.

The traditional SAP CCMS did not gather any history data on the number of processors, the processor type, or processor speed, related to the utilization data being gathered. Basically we can possibly say that it used 75% of capacity, but we cannot say what the capacity was neither in number nor in speed.

History data coming from CCMS is used by capacity planning tools, accounting methods, and for problem determination support. When this data is no longer relevant, or is even misleading, there are far reaching consequences.

The new integration of virtualization views into CCMS addresses these issues and re-establishes order in what threatened to become chaos. The following section describes the approach to take to integrate virtualization view into CCMS and related tools.

9.4.2 SAP CCMS operating system monitor

The new SAP operating system monitor provides additional detailed configuration and metric information for virtualization and other technologies that could not be viewed previously. In addition, the layout has been redesigned for improved accessibility, as seen in Figure 9-19 on page 129.

Figure 9-19 Example of st06, standard view⁵

Depending on the SAP Basis release and the support packages installed, the new SAP operating system monitor might be called ST06, OS07, or OS07n. For the remainder of this section, we refer to the new operating system monitor as ST06.

Tip: Refer to the following SAP Notes to determine the appropriate transaction call.
1084019: st06: New operating system monitor
1483047: “Old” version of operating system monitor not available

In Figure 9-20 on page 130, it can be seen that in addition to the new tabular layout, there are various subsections such as “processor”, processor Virtualization Virtual System”, “Memory”, and so on, each containing a set of configuration values and metrics. In the standard view, the sections and number of values and metrics are limited to the minimum, which should help to identify whether problems currently exist in the running system. The expert view, which provides the complete metric view, can then be used to resolve the issue. Figure 9-20 shows the expert view.

Figure 9-20 Example of ST06, expert view⁶

Most of the new values and metrics that can be seen in the new st06 are defined and delivered by the SAP operating system collector, saposcol. This makes it possible to provide new metrics to st06 without the need to apply new Basis Support Packages, but instead simply by updating saposcol. So, as new PowerVM and operating system technologies become available, it is recommended to keep up-to-date with the latest saposcol version, which usually can be found in the most recent kernel releases. Note that saposcol is downward compatible with respect to the SAP kernel release.

Tip: Refer to the following SAP Note on where to download the latest appropriate version of saposcol for AIX.
710975: FAQ: which saposcol should be used on AIX

The saposcol version information consists of two parts: a platform-independent version string and a platform-dependent version string.

For example,

$ saposcol -v | grep version

SAPOSCOL version COLL 21.03 720 – AIX v12.51 5L-64 bit 110419, 64 bit, single threaded, Non-Unicode

In this case, the platform-independent version is 21.03, and the platform-dependent version is v12.51.

As of version v12.51, the following new subsections, configuration values, and metrics are available:

The subsection processor Virtualization Host presents processor information relevant to and/or defined on the host.

•Model: Host server model type.

•Processor: Host server processor type.

•Maximum Processor Frequency: Host server nominal processor speed.

•Pool Id: The shared processor pool number for the LPAR

•Pool Utilization Authority: This field indicates whether the LPAR has the authority to retrieve information about the shared pool, for example, the idle processor capacity in the shared pool. Possible values are:

– Granted

– Not granted

•Pool processors: This field shows the total number of physical processors in the shared pool to which this LPAR belongs.

•Pool processors Idle: This field indicates the idle capacity of the shared pool in units of physical processors. This value is only available if pool utilization authority is granted.

•Pool Maximum Size: A whole number representing the maximum number of physical processors this pool can contain.

•Pool Entitled Size: The guaranteed size of the processor pool.

Subsection processor Virtualization Virtual System presents processor information relevant to and/or defined on the LPAR.

•Current Processor Frequency: The current operating processor frequency of the LPAR.

•Partition Name: The HMC-defined LPAR name.

•Partition Id: The HMC-defined LPAR number.

•Partition Type: This field describes the type of the logical partition.

Possible values are:

– Dedicated LPAR

– Shared Processor LPAR

•SMT Mode: This field indicates if Simultaneous Multi-Threading (SMT) is active.

Possible values are:

– On

– Off

•Capped: This field indicates whether a Shared Processor LPAR can exceed its entitlement.

Possible values are:

– On

– Off

•Threads: The number of SMT threads active:

– For Dedicated LPARs, the number of logical processors equals the number of dedicated physical processors that are assigned to the LPAR, multiplied by the number of SMT threads.

– For Shared Processor LPARs, the number of logical processors equals the number of virtual processors that are defined for the LPAR, multiplied by the number of SMT threads.

•Virtual processors: This field shows the number of administrator-defined virtual processors. Virtual processors are an abstraction of processors, which are mapped to physical processors in a scheduled manner by the hypervisor. The number of virtual processors defined places an implicit limit on the number of physical processors that can be used by the LPAR.

•Capacity Consumed: This field indicates the actual computing power that the LPAR consumes in units of physical processors.

•Guaranteed Capacity: This field shows the guaranteed physical processor capacity of a Shared Processor LPAR in units of fractional physical processors.

•Guaranteed Capacity Consumed: This field indicates the ratio of the actual consumed physical processor to the entitlement as a percentage. In the case of an uncapped Shared Processor LPAR, the value can exceed 100%.

•Available Capacity: This field indicates the possible available computing power for the LPAR based on the entitlement, which is guaranteed to the LPAR and the current idle capacity in the pool. This value is only available if pool utilization authority is granted.

•Available Capacity Busy: This field indicates the ratio of physical processor user and system ticks that the LPAR consumes to the available capacity for the LPAR as a percentage. This value is only available if pool utilization authority is granted.

•Available Capacity Consumed: This field indicates the ratio of physical processors that the LPAR consumes to the available capacity for the LPAR as a percentage. This value is only available if pool utilization authority is granted.

•Additional Capacity Available: The amount of physical processor capacity that can still be attained by the LPAR.

•Weight: This value is used to determine the allocation of spare pool resources to uncapped SPLPARs.

•Capacity Maximum: The maximum amount of physical processor capacity that can be acquired by the LPAR.

Subsection processor Virtualization Virtual Container presents processor information relevant to and/or defined on the WPAR.

•Container Type: The type of WPAR.

Possible values are:

– Application WPAR

– System WPAR

•Container Name: The WPAR name.

•Physical Processors Consumed: This field indicates the actual computing power that the WPAR consumes in units of physical processors.

Subsection Memory Virtualization Virtual System presents memory information relevant to and/or defined on the LPAR.

•Memory Mode: Indicates whether AME or AMS are active.

Possible values are:

– Dedicated – Neither AME nor AMS is active

– Dedicated Expanded – AME is active, AMS is inactive

– Shared– AME is inactive, AMS is active

– Shared Expanded– Both AME and AMS are active

•AME Target Factor: The ratio between desired expanded memory and true memory.

•AME Actual Factor: The ratio between actual expanded memory and true memory.

•AME Expanded Memory: The effective memory realized by AME.

•AME True Memory: The actual physical memory available to the LPAR.

•AME Deficit Memory: If the desired expanded memory is greater than the actual expanded memory, then the difference is shown.

Tip: Refer to the following SAP Note for further information regarding PowerVM and operating system metrics.
1131691: Processor utilization metrics of IBM System p®
1379340: Enhanced processor utilization metrics on System p

Depending on the partition type and the setting of pool utilization authority (PUA), some of the new processor metrics either are not relevant or cannot be calculated. Table 9-3 summarizes which data is displayed.

Table 9-3 st06 displayed monitoring data

	Dedicated LPAR	SPLPAR with PUA not granted	SPLPAR with PUA granted
Model			
Processor			
Maximum Processor Frequency			
Pool ID			
Pool Utilization Authority			
Pool processors			
Pool processors Idle			
Pool Maximum Size			
Pool Entitled Size			
Current Processor Frequency			
Partition Name			
Partition ID			
Partition Type			
SMT Mode			
Capped			
Threads			
Virtual processors			
Capacity Consumed			
Guaranteed Capacity Consumed			
Available Capacity			
Available Capacity Busy			
Available Capacity Consumed			
Additional Capacity Available Physical processors idle			
Weight			
Capacity maximum			

The majority of these metrics are concerned with processor utilization monitoring. Although the traditional global processor utilization metrics found in st06 are useful for Dedicated LPARs, they can provide misleading information in the context of Shared Processor LPARs. Although the number of logical processors remains constant, the underlying number of physical processors to which these utilization percentages refer is not constant in the case of a Shared Processor LPAR. For example, in Figure 9-21 on page 135, components for user, system, and I/O wait and idle are identical at 10 am and 12 am. However, the actual physical processor consumption by the LPAR is very different. We cannot simply compare the utilization percentages, as is traditionally done.

Figure 9-21 Physical processor Consumption versus Global Logical processor Utilization

Likewise, traditional rules of thumb might say that a system is nearing a processor bottleneck if user% + system% > 90% cannot be applied to an SPLPAR.

For an SPLPAR, we need to consider whether more physical processor resources are available for use, as shown in Example 9-19. Additionally, we need to consider whether the SPLPAR can acquire these resources based on configuration and restrictions.

Example 9-19 Available resources

available resources = MAX(entitlement, physical processors consumed + pool idle processors)

acquirable resources = MIN( entitlement, online virtual processors ) if LPAR is capped or weight=0

= MIN( shared pool processors, online virtual processors ) otherwise

The metrics available capacity and available capacity consumed encapsulate these considerations and display that the total physical processor resources are available to and can be acquired by a Shared Processor LPAR.

Example 9-20 Available capacity and available capacity consumed

available capacity = MIN(available resources, acquirable resources)

available capacity consumed = (available capacity /physical processors consumed) * 100

available capacity busy = (physical processors(user + system) consumed / available capacity) * 100

Available capacity consumed is based on physical processors consumed, physc, while available capacity busy is based on only the user and systems ticks physical processors consumed, physb. The difference between physc and physb is discussed in 8.6.6, “Side effects of PURR based metrics in Shared Pool environments” on page 94.

If the available capacity consumed is less than 100%, the LPAR is not bottlenecked in attaining additional physical processor resources. If available capacity consumed is 100% or near, but available capacity busy is significantly less, the LPAR is still not bottlenecked.

Special considerations on IBM i

If you are using the new SAP operating system monitor on IBM i, you must consider the following deviations from other operating systems:

•In the “processor” section, the processor clock rate (referred to as “Current Processor Frequency”) is not displayed for IBM i. You can look up the processor clock rate in Performance Capabilities Reference, SC41-0607, which can be found at:

http://www.ibm.com/systems/i/advantages/perfmgmt/resource.html

•The document lists the processor clock rates based on the processor type, model and feature code. You can find this information in the OS07 output in category “processor Virtualization Host”. Section “Model” is showing the processor type and model, section “Processor” is showing the feature code and serial number. In the processor section under category processor Single, you will not find a breakdown of the processor utilization over the configured processors. The entry for processor 0 will show averages for the processor utilization in all configured processors, while the other entries are showing -1 to indicate that the value is not available.

• At the time of writing, the Memory section in the new operating system monitor did not show data for the configured main storage pools on IBM i. That data is still available in the old transactions ST06, OS06, or OS07. If you are using an SAP release and support package level that routes the transaction ST06 per default to the new operating system monitor, you can still look at the old data by executing the ABAP program RSHOST10 through transaction SE38.

9.4.3 SAP Monitoring Infrastructure

The SAP Monitoring Infrastructure (CCMS) consists of several pieces and parts, which include the previously described transaction for the operating system monitor. In this section, we describe all of the pieces of this infrastructure that are relevant for virtualized environments: how to install and properly configure certain parts and how to use integrated features to create reporting (also across systems).

For virtualized environments, the following parts of the SAP Monitoring Infrastructure are of special interest and are described later in detail:

•A data collector for operating system metrics. This collector is the program saposcol, which is part of every SAP installation.

•An agent, which can be used to transport the data that saposcol gathers to another system in case the data collector and the monitor for the data are not running on the same system. This agent is the program sapccmsr and is called the CCMS agent.

•A data repository to store the gathered metrics in the SAP system. The CPH (Central Performance History) is the repository that is used.

•A transaction to view the gathered metrics. This is the Transaction st06, which we described in 9.4.2, “SAP CCMS operating system monitor” on page 128.

•An infrastructure to create reporting on the data beyond the available views in transaction st06. Together with the Cst06entral Performance History, the data collectors and the agents that are running on various systems, this infrastructure allows you to create views about the system level and not only for a single LPAR.

The easiest scenario of monitoring virtualized environments with the SAP Monitoring Infrastructure is a single LPAR, which certainly can be also a shared pool LPAR, as shown in Figure 9-22. There are no special configurations necessary to monitor this LPAR, other than a few adjustments that you must make in the CPH if historical data is stored. In this case, you need no CCMS agent because the data that is transferred between saposcol and the monitoring system can occur through a shared memory segment.

Figure 9-22 SAP monitoring a single LPAR

You can also monitor a set of LPARs with a single monitoring system using the SAP Monitoring Infrastructure. In this case, you have a data collector in every LPAR and a CCMS agent to transfer the gathered metrics to the central monitoring system. Figure 9-23 on page 138 is one example of such an infrastructure.

Figure 9-23 SAP monitoring overview

The transaction st06 shows, on the left side of the window, a list of all systems where monitoring data is available. Although you can access this data from multiple systems through one transaction window, you do not get an aggregated view of the available data. Nevertheless the SAP Monitoring Infrastructure provides mechanisms to achieve this goal by using reporting on the CPH. In “Creating reporting on the Central Performance History” on page 153, we describe a simple example of how to utilize this framework. With this functionality, you can create reports that aggregate data across multiple systems.

saposcol and the CCMS agent

The operating system data collector and the CCMS agent are independent executables. They are not bound to an SAP instance and can also be used to monitor an AIX, IBM i, or Linux partition without an SAP installation. A special case is the Virtual I/O Server because it usually does not allow you to install external software.

In SAP releases prior to 7.10, saposcol and the CCMS agent are usually installed automatically as part of an SAP installation. Beginning with Release 7.10, saposcol is shipped with a separate package called SAPHOSTAGENT. This package is installed together with any SAP installation in Release 7.10 or higher, but it can also be installed separately on a server or partition with older SAP releases or without any SAP instance.

Because of the restrictions in the Virtual I/O Server, we discuss the installation of saposcol and the CCMS agents on non-VIOS partitions and VIOS partitions separately.

Installing in a non-VIOS LPAR

Step 1: Installing saposcol and the CCMS agent

1. Download and install the SAPCAR tool from the SAP Software Distribution Center. For AIX and Linux, follow the instructions in SAP Note 212876. For IBM i, follow the instructions in SAP Note 863821.

2. Follow the instructions in the attachment of SAP Note 1031096 to download and install the SAPHOSTAGENT package. This creates the directory /usr/sap/hostctrl with subdirectories exe (for the executables) and work (for traces, log files and other data). It will also create the user sapadm (IBM i: *USRPRF SAPADM) that can be used to administer saposcol and the CCMS agent.

3. Download the CCMAGENT package from the SAP Software Distribution Center and rename it to ccmagent.sar. On AIX and Linux you can decompress the archive with the command:

sapcar -xvf ccmagent.sar

On IBM i, follow the instructions in SAP Note 1306787. Copy the sapccmsr executable to directory /usr/sap/hostctrl/exe.

Step 2: Starting saposcol

If you follow the installation instructions in the attachment of SAP Note 1031096, SAPHOSTAGENT and saposcol along with it will be started automatically at each IPL or reboot of the partition. If you need to start saposcol manually, follow these steps:

On AIX and Linux:

1. Switch to the sapadm user that you created during the previous installation steps:

# su - sapadm

2. Change to the path where saposcol resides:

# cd /usr/sap/hostctrl/exe

3. Launch saposcol:

# saposcol -l

On IBM i:

1. Sign on with a user profile that has *JOBCTL and *ALLOBJ authorities.

2. Launch saposcol by calling the executable:

CALL PGM(R3SAP400/SAPOSCOL) PARM('-l')

Step 3: Configuring and starting the CCMS agent

An SAP user with sufficient RFC permissions is required to run the CCMS agent. The easiest way to create such a user is to use the transaction RZ21 and create the CSMREG user on the SAP instance that you use for monitoring, as shown in Figure 9-24 on page 140.

Figure 9-24 Create CSMREG user from RZ21⁷

In the next panel enter the desired password, and press Enter. The CSMREG user is created, as shown in Figure 9-25 on page 141.

Figure 9-25 CSMREG user created⁸

Besides the CSMREG user, another SAP user with admin privileges on the central monitoring system is required to complete the CCMS configuration step. This user can be the user DDIC or any other user with administration privileges.

To configure the CCMS agent the following information is needed:

• Host name, SAP system ID, and client number for each central system to which the agent reports

• An SAP user with Admin privileges and password

• The CSMREG user and password

• Language used

• Host name of application server

• System number (00 - 98)

• Route string [optional]

• Trace level [0]

• Gateway information if not the default

To start the CCMS agent registration and configuration on AIX or Linux, enter the following commands:

# su - sapadm

# cd /usr/sap/hostctrl/exe

# sapccmsr -R

To start the CCMS agent registration and configuration on IBM i, sign on with a user profile that has *JOBCTL and *ALLOBJ special authorities and enter the following commands:

CD DIR('/usr/sap/hostctrl/exe')

CALL PGM(R3SAP400/SAPCCMSR) PARM('-R')

The configuration is straightforward when you use predefined options and information. In the rare case of a wrong configuration, the easiest thing to do is to remove the configuration file /usr/sap/tmp/sapccmsr/csmconf and run sapccmsr in configuration mode again.

To start the agent automatically after each restart of an AIX or Linux partition, insert the following command into /etc/inittab:

/usr/sap/hostctrl/exe/sapccmsr -DCCMS

To start the agent automatically after each restart of an IBM i partition, enter the following commands to the system startup program that is defined in system value QSTRUPPGM:

CD DIR('/usr/sap/hostctrl/exe')

CALL PGM(R3SAP400/SAPCCMSR) PARM('-DCCMS')

Installing in a VIOS LPAR

Being an appliance partition, the VIOS requires careful installation treatment. For this reason, a specially created “saposcol for vios” package has been put together to assist installation. This package has been explicitly tested and approved by IBM to be installed on the VIOS.

Tip: Refer to the following further details and support information

•SAP OSS note 1379855: Installation of saposcol and sapccmsr on IBM VIOS

•IBM white paper: “Include VIOS Partitions into SAP Performance Monitoring”:

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101541

Step 1: Installing saposcol and the CCMS agent

1. Download the appropriate install package, sapviosagent.sar, from SAP Service Marketplace. For example, the 7.20 package can be on SAP Service Marketplace at:

Support Packages and Patches → ? A-Z Index → N → SAP NETWEAVER → SAP EHP2 FOR SAP NETWEAVER 7.0 → Entry by Component → Application Server ABAP

→ SAP KERNEL 7.20 64-BIT → AIX 64Bit → #Database independent

2. Unpack the SAP install package to get the AIX bff-installp file

# SAPCAR -xvf sapviosagent.sar

SAPCAR: processing archive sapviosagent.sar (version 2.00)

x ./tmp/sapviosagent.com.sap.7.20.0.97.bff

SAPCAR: 1 file(s) extracted

3. Transfer the bff file to the VIO Server usinf ftp.

# ftp is314v1

Connected to is314v1.wdf.sap.corp.

220 is314v1 FTP server (Version 4.2 Tue Sep 14 20:17:37 CDT 2010) ready.

Name (is314v1:root): padmin

331 Password required for padmin.

Password:

230-Last unsuccessful login: Tue Jul 19 13:14:18 DFT 2011 on ssh from is3089.wdf.sap.corp

230-Last login: Tue Jul 19 16:41:59 DFT 2011 on ftp from is3110.wdf.sap.corp

230 User padmin logged in.

ftp> bin

200 Type set to I.

ftp> put sapviosagent.com.sap.7.20.0.97.bff

200 PORT command successful.

150 Opening data connection for sapviosagent.com.sap.7.20.0.97.bff.

226 Transfer complete.

12082688 bytes sent in 0.298 seconds (3.96e+04 Kbytes/s)

local: sapviosagent.com.sap.7.20.0.97.bff remote: sapviosagent.com.sap.7.20.0.97.bff

ftp> quit

221 Goodbye.

4. Log in to the VIOS as user padmin and obtain a root shell by executing the following command:

$ ssh padmin@is314v1

padmin@is314v1's password:

Last unsuccessful login: Tue Jul 19 13:14:18 DFT 2011 on ssh from is3089.wdf.sap.corp

Last login: Tue Jul 19 16:42:58 DFT 2011 on ftp from is3110.wdf.sap.corp

$ oem_setup_env

5. Install the bff file

# inutoc .

# installp -acXY -d . sapviosagent.com.sap

+-----------------------------------------------------------------------------+

Pre-installation Verification...

+-----------------------------------------------------------------------------+

Verifying selections...done

Verifying requisites...done

Results...

SUCCESSES

---------

Filesets listed in this section passed pre-installation verification

and will be installed.

Selected Filesets

-----------------

sapviosagent.com.sap 7.20.0.97 # Installation saphostcontrol

<< End of Success Section >>

+-----------------------------------------------------------------------------+

BUILDDATE Verification ...

+-----------------------------------------------------------------------------+

Verifying build dates...done

FILESET STATISTICS

------------------

1 Selected to be installed, of which:

1 Passed pre-installation verification

----

1 Total to be installed

+-----------------------------------------------------------------------------+

Installing Software...

+-----------------------------------------------------------------------------+

installp: APPLYING software for:

sapviosagent.com.sap 7.20.0.97

group sapsys does not exist

user sapadm does not exist

user sapadm not member of group sapsys

total 0

-rwxr-xr-x 1 1459 15 3059109 Jul 18 22:11 /usr/sap/hostctrl/exe/saposcol

13:42:23 19.07.2011 LOG: Using PerfDir (DIR_PERF) = /usr/sap/tmp

******************************************************************************

* This is Saposcol Version COLL 21.03 720 - AIX v12.51 5L-64 bit 110419

* Usage: saposcol -l: Start OS Collector

* saposcol -k: Stop OS Collector

* saposcol -d: OS Collector Dialog Mode

* saposcol -s: OS Collector Status

* Starting collector (create new process)

******************************************************************************

13:42:30 19.07.2011 LOG: Using PerfDir (DIR_PERF) = /usr/sap/tmp

INFO: New Saposcol collection interval is 60 sec.

Finished processing all filesets. (Total time: 8 secs).

+-----------------------------------------------------------------------------+

Summaries:

+--------------------------------------------------------------

The installation of the package automatically does the following:

•Adds the necessary user and group.

•Creates the appropriate directory paths and places the saposcol and sapccmsr binary in those paths.

•Starts the saposcol process.

Step 2: Configuring and starting the CCMS agent

This step is identical to Step 3 of “Installing in a non-VIOS LPAR” on page 139.

Configuring the RFC connection for the CCMS agent

To monitor operating system metrics in a distributed virtualized environment, you must transfer the data that saposcol gathers and the CCMS agent to the central monitoring system. The underlying technology is based on RFC connections. Therefore, configure those RFC connections in advance, as shown in Figure 9-26 on page 145.

To configure the RFC connection for the CCMS agent:

1. Start the transaction AL15 on the central monitoring system.

2. Insert the saposcol-Destination on the remote LPAR: SAPCCMSR.<Hostname>.99.

3. To modify a destination, press Modify.

Figure 9-26 Modify the saposcol destination⁹

Now the remote LPAR monitoring information is visible, for example in the transaction st06, as shown in Figure 9-27 on page 146. There we added the RFC connection for the system is3015.

Figure 9-27 Report of a remote LPAR in st06¹⁰

Activating and customizing the Central Performance History

The Central Performance History (CPH) stores historical data. Before you can use this functionality, you must activate and configure the CPH. In this example, we use, in most cases, the transaction RZ21, but also transaction RZ23n is a valid entry point into the CPH functions.

To activate the CPH call transaction RZ21:

1. Select Technical Infrastructure → Central Performance History → Background Jobs, as shown in Figure 9-28 on page 147.

Figure 9-28 Activate CPH from RZ21¹¹

2. If jobs are already scheduled, a list with those jobs is displayed; otherwise, confirm the activation of the CPH.

After this basic activation, you can define the granularity of data collection. SAP provides on help.sap.com, which has detailed instructions about what you can do beyond the example that we give in this book:

http://help.sap.com/saphelp_nw2004s/helpdata/en/c2/49773dd1210968e10000000a114084/content.htm

We are following the proposed steps from SAP to give a simple example:

1. Define how long and with which granularity performance data must be retained in the CPH.

2. Select the performance metrics to be stored in the CPH.

Defining the length and granularity of data that is stored in the CPH

This setting is defined in the Collection and Reorganization Schemata. SAP delivers predefined schemata that meets most requirements. The schemata also contains information about the calculation rules that are to be used to aggregate information. When you do this, you can weight the different hours of the day and also different days differently, to hide public holidays or hours during the night during the calculation of the aggregates.

In this example, as shown in Figure 9-29, we show how to define a day and a collection schema:

1. We start to call the transaction RZ21 and select Technical infrastructure → Central Performance History → Assign Collect/Reorganization Schemata.

Figure 9-29 SAP GUI menu path¹²

2. In the next transaction window, select Goto → Collection/Reorganization Schema, and switch to the Advanced Configuration window, as shown in Figure 9-30. An alternative path calls RZ23n, and select Maintain Schemas.

Figure 9-30 SAP GUI menu path¹³

3. We define our own SAPOSCOL_DAY schema to prioritize certain hours in the day, as shown in Figure 9-31 and Figure 9-32 on page 150.

Figure 9-31 Switch to the day schema configuration window¹⁴

Figure 9-32 Define which hours of the day the schema is active¹⁵

4. In the next step, we can use that day schema to define our collection schema, as shown in Figure 9-33 on page 151.

Figure 9-33 Advanced configuration window for _ZSAPOSCOL collection schema¹⁶

Selecting the performance metrics that are to be stored in the CPH

You can assign collection and reorganization schemata to any MTE classes of any systems. For further information, refer to Assigning Collection and Reorganization Schemata to Performance Values at:

http://help.sap.com/saphelp_nw2004s/helpdata/en/04/4d773dd1210968e10000000a114084/content.htm

MTE classes are the mechanism to access metrics in CCMS. To select the metrics:

1. Go either back to the transaction RZ21, and select Technical infrastructure → Central Performance History → Assign Collect/Reorganization, or call RZ23n, and choose Assign Procedure.

In the given example, as shown in Figure 9-34 on page 152, we collect for the non-SAP LPAR running our agent and collector (System ID column) the processor- and AIX-specific virtualization metrics (MTE Classes column).

2. Select the intervals in which the data is aggregated, in the Collection/Reorganization Schema.

Figure 9-34 Define the metrics to collect in the CPH from the remote LPAR¹⁷

Now the system starts to gather historical data. Depending on the configuration of the schemas in the first step, the CPH is filled faster or slower. Figure 9-35 on page 153 shows how the historical data is presented in st06.

Figure 9-35 CPH data in st06¹⁸

Creating reporting on the Central Performance History

After historical data is available in the CPH, you can use the integrated reporting framework from the SAP Monitoring Infrastructure. In the following two procedures, we describe how to use this functionality. On help.sap.com, SAP provides more detailed instructions:

http://help.sap.com/saphelp_nw2004s/helpdata/en/c2/49773dd1210968e10000000a114084/content.htm

Creating the report definitions

A report definition has two steps:

•Selecting the MTE classes

•Selecting the aggregation method, which implicitly defines the layout of the presentation view.

To select the MTE classes:

1. Switch to transaction RZ23n, and choose Define Reports.

2. Generate a report giving a cross partition view of the physical processor consummation. In our system, we have one shared pool with 15 processors and only two partitions (one LPAR with and another without an SAP instance).

3. In the Edit Report Definition window, define a new report called CROSS_PAR_PHYSprocessor_CONSUMED (name). We defined two groups. The group Phys, processors consumed summarizes the consumed processors of the two LPARs, whereas the group Shared Pool only shows the static value of how many processors are assigned to that pool, as shown in Figure 9-36. Here, we assume that both LPARs are running on the same pool.

Figure 9-36 Report editor in RZ23n¹⁹

4. Save the created report.

Executing the report

You can execute the defined report directly or schedule it as a regular job. Depending on its settings, you can display the report directly on the window or redirect the output to a file:

Return to transaction RZ23n, and choose Schedule and Execute Report. As shown in Figure 9-37, we target the output to what is only possible by our chosen method to execute the report directly by pressing Direct Execution.

Figure 9-37 Transaction RZ23n; execute and schedule report window²⁰

The output window for the report provides additional functionality to draw graphs or further data aggregation, such as summarizing columns, as shown in Figure 9-38 on page 156.

Figure 9-38 Example output of the generated report²¹

9.5 SAP EarlyWatch and SAP GoingLive check services

The SAP GoingLive and EarlyWatch services are proactive services during the implementation phase and production phase, respectively, that provide SAP system analysis and recommendations to ensure performance and availability. These services are available through SAP.

More information is available at the SAP Service Marketplace at:

http://service.sap.com/earlywatch

http://service.sap.com/goinglive

The analysis provided by these services was updated to take into account the different LPAR configuration possibilities on IBM Power Systems. You can see the virtualization analysis and details in the session details using the Solution Manager.

9.6 IBM Insight for SAP

IBM Insight for SAP is an IBM offering that provides customers with an in-depth report that details the SAP workload/performance statistics and host resource utilization. It consists of a downloadable data collector utility, the Insight Collector, and a report generation service by IBM. Both the utility and the subsequent report generation service are provided free of charge by IBM America's Advanced Technical Support (ATS) and Techline organizations. The report generation is limited to production systems and also can only be requested once per quarter per SAP SID.

The latest version, IBM Insight for SAP Version 5, was expanded to include metrics that are relevant to PowerVM virtualization, for example, for a Shared Processor LPAR, the report includes graphs over a measured period of time for the following data:

•Physical processor cores consumed

•Physical processor cores idle

•Percent of processor entitlement consumed

•Physical processor entitlement

•Physical processor available

Features of Version 5:

•Support for SAP’s Central Monitoring System (CEN) in SAP ERP 6.0 and later or SAP NetWeaver environments (base component ECC 6.0 or later).

•Support for concurrent multi-SAP systems data collection (that is, the ability to collect multiple systems within one or many landscapes concurrently in an SAP NetWeaver environment).

•Capture and report distributed statistical records (DSR) enabling the monitoring and reporting of SAP Java stack components and statistics (for example, SAP Process Integration, SAP Enterprise Portal, SAP Application Server JAVA, and so on)

•Capture and report a customization measure of the SAP system (reported as a percentage of custom reports and transactions registered in the production SAP system catalog).

•Capture and report IBM virtualized system statistics as seen and reported by the SAP monitoring system.

•Enhanced and reorganized report format.

•Continued reporting of all previously published ABAP environment performance statistics.

Figure 9-39 on page 158 displays a sample page from an Insight Report.

Figure 9-39 Example page from an Insight Report

9.6.1 Installing and configuring IBM Insight for SAP

The IBM Insight for SAP utility program is packaged as an all-in-one Microsoft Windows executable. You can install it on any PC that has TCP/IP connectivity to the monitored production SAP system.

It does not need any additional third-party software on the collector PC or the monitored servers nor does it require any additional transports to be applied to the SAP system.

The Insight download packages and detailed information about the installation and configuration (including a must-read Readme file) are at:

http://www.ibm.com/erp/sap/insight

9.6.2 Data collection

After installed and configured with the details of the SAP system to be monitored, the Insight Collector is manually triggered to start and stop data collection, which is accomplished using RFC communication and is designed to make minimal impact on the monitored system. It is recommended that you collect at least one-to-three days of statistics during a peak period or a period of the month with a reasonably high usage to improve the quality and value of the report.

9.6.3 Report Generation Service

At the end of a data collection session, the Insight Collector can package the session data into a single compressed file that is ready to be emailed or FTP’ed to IBM. Analysis takes a minimum of three business days.

If you have questions about this utility, contact: mailto:[email protected].

¹ Ganglia, reprinted by permission.

² Ganglia, reprinted by permission.

³ Ganglia, reprinted by permission

⁴ Ganglia, reprinted by permission.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9. Monitoring

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9. Monitoring