SAP system setup for virtualization
In this chapter, we discuss:
SAP Adaptive Computing
SAP instance profile changes
Virtual memory tuning in AIX for SAP systems
Main storage pools, work process priorities and workload capping on IBM i
Active Memory Expansion (AME)
processor utilization metrics
8.1 SAP Adaptive Computing
The central idea of SAP Adaptive Computing builds on the virtualization of the following layers: services (SAP application components), computing resources, and a main storage device, as shown in Figure 8-1. This complements the undisputed virtualization technologies of IBM PowerVM, which leads to increased flexibility and easier management of planned downtimes.
SAP covers the subject on the SAP Community Network (SDN) at:
Figure 8-1 SAP Adaptive Computing concept1
8.1.1 Overview
One challenge in today's IT environments is to avoid an increasing Total Cost of Ownership, which is why SAP developed a concept that has the central objective to “start any service on any server at any time”. The Adaptive Computing infrastructure is an integral component of SAP NetWeaver.
A service, in this respect, is an SAP product component, for example, the central system of SAP ERP, an application server in a BW landscape, or a DB server.
The concept consists of the following four building blocks:
Computing (physical and/or virtual): Standard servers or LPARs here are called compute nodes. Also recommended is a central location to perform operating system maintenance and image distribution. In the case of AIX, this is done with the Network Installation Manager (NIM).
Storage: The main storage pool is set up as a SAN or NAS concept. The compute nodes should not contain local disks. Only the operating system and paging space can reside on local disks, if desired. At least all of the application-relevant data must reside on a storage device that is external to the server. Recommended storage solutions that substantially simplify the storage administration are distributed file systems (such as, IBM General Parallel File System) or Network Attached Storage solutions.
Network: High-speed network connection between the servers and the storage (for example, Gigabit Ethernet)
Control: The Adaptive Computing Controller (ACC) is an infrastructure-management solution that communicates with different components of the infrastructure.
The ACC is a Java-based application that manages the resources (servers and services) that are known in a system landscape. From the ACC an application service (for example, a central instance of an SAP application component) can be started, stopped, or moved from one compute node to another. This is done with disruption with the classic ACC functionality called relocation.
It is also possible to trigger a Live Partition Migration from the SAP ACC. The interface to the IBM HMC or the Systems Director is available and here the move happens without disruption, without loss of service. Details about the interfaces are available in the following SAP Notes:
1411300 - Configuration of IBM HMC for Power Systems Adapter for ACC
1539332 - Configuration of IBM Systems Director VMControl for ACC
They are located on the SAP Support Portal at:
The supported (managed) systems in the system landscape can be 4.6C systems (using the 4.6D kernel) and higher. The management of older SAP releases is not supported.
SAP Adaptive Computing is an infrastructure implementation concept, which is realized by the partners of SAP with their technology. SAP provides the SAP ACC as an administration console to the solution.
8.1.2 Adaptive Computing compliance test
Adaptive Computing compliance test is a quality assurance procedure to ensure that clients get solutions that are properly tested and work. It is not a certification, hence support is provided by SAP as long as generally supported HW and SW components are used in an adaptive system setup. The building blocks of the compliance test are: the version of the ACC, the storage technology, and the technology of the computing nodes. A list of compliance tests is available on the SAP Community Network (SDN) at:
8.1.3 Technical Implementation
A cookbook about how to implement SAP Adaptive Computing with IBM Power Systems is available on IBM TechDocs at:
8.2 SAP instance profile changes
In this section, we cover dynamic work processes and extended memory tuning recommendations.
8.2.1 Dynamic work processes
In the past, adding or removing SAP work processes to or from an application server instance required a restart of the instance. In a running instance you were only able to change the type of already existing work processes. For example, you could define operation modes to switch a certain number of batch processes to dialog processes and vice versa. One of the latest features that SAP ships are dynamic work processes. To use dynamic work processes, you must initially create them by changing the instance profile and restarting the application server instance.
After they are created, the system owns dormant work processes that can be activated and deactivated while the system is running by:
The system itself to resolve deadlocks
The administrator to adapt the system to changing load or to address changing resource availability
This feature is especially valuable for virtualized environments where the amount of processors per LPAR can be dynamically changed. With dynamic work processes, dormant work processes can be predefined and they will not use processor resources while remaining in that state. Now the administrator of the SAP system can easily address changing hardware situations while the SAP system stays up, which was not possible without a restart before. This feature requires at least SAP Kernel 7.10.
To learn how to configure dynamic work processes, go to:
8.2.2 Extended memory tuning recommendations for AIX
PowerVM technology allows the available resources for an LPAR to vary dynamically over time without restarting the partition. As already described, this can occur for processor resources, for example, through properly configured shared processor pool LPARs and for memory with DLPAR operations. Therefore, a partition can adjust the processor capacity to the actual workload on the system.
In this section, we describe the recommended SAP memory settings to accommodate this ability on AIX. The SAP Application Server ABAP stores all user-specific data in a so-called user context. To allow the user to be scheduled to any work process in the system, this context resides in shared memory. On AIX, two implementations are available to use shared memory for SAP systems: SAP memory management allows you to use different implementations by setting instance profile parameters; on AIX, the recommended implementation is sometimes referred to as the SHM_SEGS implementation. Because this is not the default setting of the SAP kernel, ensure that the following values are set in the instance profile:
ES/TABLE = SHM_SEGS
ES/SHM_SEGS_VERSION = 2 (see also SAP Note 856848)
With this setting, each user context is stored in its own shared memory segment. This segment gets attached to a work process only when the user is dispatched to a work process. Hence the user data is isolated. In case the user logs off from the SAP system, its context is not needed any more, and the memory can be released to the operating system. In the current implementation of the SAP memory management algorithm, there are several optimizations, such as caching of released shared memory segments to optimize performance. Over time the number of shared memory segments in the system adjusts to the number of users that are active in the SAP system, and thus the memory usage adjusts to the current workload of the SAP system.
A summary of the advantages is as follows:
Caching of segments by non-OS mechanisms to reuse them instead of returning them directly to the operating system.
Improvement of handling large user contexts and the possibility to limit the number of segments per user.
Improvement of fast context switches.
Deferred page space allocation, which allows you to allocate just enough paging space to serve the amount of currently used extended memory segments instead of allocating paging space for the maximum that you can use.
It is highly recommended that you use the optimized SAP memory management implementation on AIX. Table 8-1 lists the related SAP Notes:
Table 8-1 SAP Notes covering memory management on AIX
SAP Note
Title
856848
AIX Extended Memory Disclaiming
1088458
AIX: Performance improvement for ES/SHM_SEGS_VERSION
191801
AIX 64-bit with very large amount of Extended Memory (refers to SAP notes 445533 for 4.6D and older releases and SAP Note 789477 for 6.10 and newer releases)
1551143
SORT (AIX): Memory release
973227
AIX Virtual Memory Management: Tuning Recommendations
1121904
SAP on AIX: Recommendations for paging space
8.2.3 Extended memory tuning recommendations for IBM i
Beginning with SAP kernel release 6.40, there are two possible ways to allocate memory in an SAP Application Server ABAP: Memory Mapped to File and Shared Memory. Early performance tests had indicated that the method Memory Mapped to File was significantly faster than Shared Memory, so Memory Mapped to File was chosen as the default setting for all SAP systems running with a kernel release 6.40 and higher.
In IBM i 6.1, significant enhancements were made to storage management, so that the method Shared Memory now became faster than the method Memory Mapped to File. Nevertheless, the method Memory Mapped to File remains the default for two reasons:
The performance of Shared Memory only affects Roll-In and Roll-Out of user contexts in work processes. Because other transaction components are responsible for a much higher share in the response time, the overall performance improvement of Shared Memory is very small.
Prerequisites are: IBM i 6.1, SAP kernel release 7.10 or higher with certain patch levels, and the UNIX style operating system user concept as described in SAP Note 1123501.
To switch to method Shared Memory, ensure that all prerequisites are met and set the following profile parameter (all uppercase):
ES/TABLE = SHM_SEGS
You can find the complete list of prerequisites and additional profile parameters in relation to this setting in SAP Note 808607. This SAP Note references SAP Note 789477 for additional tuning information on systems with very many concurrent users. Even though the title of SAP Note 789477 points to AIX as operating system, you can use the recommendations for IBM i as well.
8.3 Virtual Memory tuning in AIX for SAP systems
The memory management of AIX is highly optimized to use the installed physical memory of the server in a very efficient way. Looking for the free memory of a running AIX system typically shows small values for this metric, even without significant load from an application program.
Example 8-1 shows the output of the vmstat command. The column fre indicates about 4.8 MB of free memory. Small values like this often lead to irritations, assuming that the system is running out of free memory for the running application, such as an SAP system. In the next section, we describe in detail how the memory management of AIX works and why a small value for free memory does not directly indicate a critical system situation. We also give recommendations for the parameter setup of the memory management.
Example 8-1 Output of the vmstat command
#vmstat -w 1 5
System Configuration: lcpu=2 mem=22528MB
kthr memory page faults cpu
------ ----------------- ------------------------------------ ---------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 0 4490108 4930 0 0 0 0 0 0 16 189 250 0 0 99 0
1 0 4490108 4930 0 0 0 0 0 0 16 104 247 0 0 99 0
1 0 4490107 4931 0 0 0 0 0 0 16 160 271 0 0 99 0
1 0 4490107 4931 0 0 0 0 0 0 15 109 244 0 0 99 0
1 0 4490107 4931 0 0 0 0 0 0 9 189 277 0 0 99 0
In this section, see SAP Note 973227, which is located on the SAP Support Portal at:
AIX Virtual Memory Management (VMM) is designed to exploit all available physical memory in a system. VMM distinguishes between computational memory segments (working storage segments, such as process data and stack or program text segments) and file memory segments (usually pages from permanent data files). If the virtual memory demand can be contained in real memory pages, VMM fills up the available pages with computational memory pages or file pages (as a result of I/O operations). When the number of free pages drops below a certain threshold, a page replacement process starts and attempts to free up memory pages. The page-replacement algorithm uses a number of thresholds and repaging rates to decide which pages get replaced. With the default threshold settings on AIX 5, there is a slight bias in favor of computational pages. Especially for database servers, the default settings are not optimal.
Databases buffer access to persistent data in their own cache, and as the data is typically accessed through file system I/O operations, it is also buffered by the VMM in file pages. This redundant buffering of the same data can lead to unnecessary paging. The objective for SAP systems is to keep all working storage segments in real memory, while using the unused memory segments as file system cache. Page replacement should only steal memory pages from the file system cache.
With older AIX releases, the only way to achieve this objective was to set certain thresholds, such as minperm, maxperm, and maxclient, to rather low numbers (see SAP Note 921196).
These parameters do not set the amount of memory that is used for file system cache; instead, they control the boundaries for the page replacement algorithm. If the page stealer must free up memory and the amount of memory used by file pages is between minperm and maxperm (or maxclient for JFS2 file systems), the page stealer considers the repaging rates to determine which type of pages to steal. If the repaging rate for file pages is higher than the rate for computational memory, the page stealer steals working segments, which leads to the undesired paging for SAP systems.
Later releases introduced a new parameter called lru_file_repage that allows you to turn off the check for repaging rates. With lru_file_repages set to 0, the page replacement algorithm always steals file pages if the amount of memory that is used for file pages is larger than the minperm setting. In AIX 6.1 this parameter is set by default to 0 and in AIX 7.1 this parameter has been removed, but the system is behaving as if the parameter would have the value 0. With this new parameter, the recommendations for VMM page replacement tunables are:
minperm% = 3 (default 20)
maxperm% = 90 (default 80)
maxclient% = 90 (default 80)
lru_file_repage = 0 (default 1)
strict_maxclient = 1 (default 1)
strict_maxperm = 0 (default 0)
You can set the parameters using the smith fastpath “TunVmo” or directly using the following command:
/usr/sbin/vmo -p -o minperm%=3 -o maxperm%=90 -o maxclient%=90
-o lru_file_repage=0 -o strict_maxclient=1 -o strict_maxperm=0
-o minfree=960 -o maxfree=1088
The new parameter recommendations are not restricted to DB servers only and can be implemented for all servers in the landscape (including application servers in a 3-tier implementation). In AIX 6.1, the recommended settings are the default settings for each new AIX installation.
With this configuration, the system can use up to 90% of its memory for file caching but favors computational pages over file pages. Page replacement does not steal any computational pages unless the amount of computational memory exceeds 97%.
There are two more tunables that control the behavior of the page replacement algorithm: minfree and maxfree. The VMM attempts to keep the size of the free list greater than or equal to minfree. When page faults or system demands cause the free list size to fall below minfree, the page-replacement algorithm runs and frees pages until the number of pages on the free list reaches maxfree. The default values for minfree and maxfree were increased in AIX 5.3 ML01 and can be implemented for older releases too. The new default values are:
minfree = 960
maxfree = 1088
These settings are per memory pool. Larger systems can have more than one memory pool (you can check the number of memory pools with the vmstat -v command). In most cases, the defaults are fine and do not require specific tuning. The exceptions are typically small systems or LPARs with a single memory pool and heavy file system activity. Larger systems have multiple memory pools and the system wide minfree and maxfree values are large enough to prevent the depletion of the list of free memory pages. If you see the free list dropping close to 0 (check the fre column in a vmstat output), increase the minfree and maxfree values by 1024 and monitor again.
The recommendations require the following minimum AIX levels for this parameter set to work correctly. If you are on an older level, continue to use the recommendations from SAP Note 921196. The required maintenance levels are:
AIX 5.1 ML09
AIX 5.2 ML05
AIX 5.3 ML01
Check the latest requirements for the maintenance level of the AIX version used.
8.4 Main storage pools, work process priorities, and workload capping on IBM i
As we point out in Chapter 1, “From a non-virtualized to a virtualized infrastructure” on page 1 of this book, consolidation is an important practice to reduce complexity in the IT landscape. On IBM i, the easiest way to consolidate is to run multiple SAP systems in a single logical partition or on a single server. The operating system and Licensed Internal Code provide system management functions to assign hardware resources such as processor time and main storage pages to requesting processes automatically.
In general, you do not need to configure much—the operating system algorithms are designed to automate workload dispatching as much as possible. Processor dispatching is mainly controlled through the run priorities of the processes or threads, and through the activity level in a main storage pool. The Storage Management component in the Licensed Internal Code ensures that memory pages are allocated in main storage and on disk, or read from disk as they are needed. To make room for the new pages, pages that have not been used for a while are being purged from main storage. For many cases this concept is working very well, but there are certain SAP landscapes where you should consider a special setup.
8.4.1 Separation of main storage pools
The IBM i operating system is running with a minimum of two main storage pools: The *MACHINE pool for Licensed Internal Code tasks and system processing, and the *BASE pool for all application processes. In addition, you can configure additional main storage pools for special purposes, either private pools to hold certain objects permanently in main storage, or shared pools to separate the main storage used by a certain group of processes from the remaining processes. Private pools can be assigned to only one subsystem, while shared pools can be used by multiple subsystems at the same time. When using SAP software on IBM i, it can be helpful to set up shared main storage pools for some SAP instances or processes in the following cases:
When combining SAP Application Servers ABAP with SAP Application Servers Java in one IBM i partition, you should separate the ABAP and Java servers into multiple shared main storage pools. The SAP Application Server Java is starting so-called garbage collections on a regular basis. During the execution of these garbage collections a lot of pages are purged from the main storage pool. This may affect the performance of other application servers running in the same main storage pool.
When running multiple SAP systems in one IBM i partition and one of them is used rarely, it may be advisable to run that one in a separate main storage pool. If all SAP systems are sharing the same main storage pool, the more active SAP systems will purge many pages of the rarely used system from main storage. When the rarely used system is then used again, all these pages need to be paged in from disk, so the response times on the first transactions can be very long.
SAP Note 1023092 describes how to set up shared main storage pools on IBM i. Although the short text of the SAP Note is saying “Using Separate Memory Pools for Java Processes”, the section “Assign a memory pool to an entire SAP instance (Java standalone)” can be used for ABAP instances as well.
You can use SAP and IBM sizing guidelines in order to decide what sizes are needed for the separate main storage pools. You can also find some guidance in SAP Note 49201 for the SAP Application Server Java and in SAP Note 808607 for the SAP Application Server ABAP (SAP release 6.40 and higher).
8.4.2 Work process priorities and workload capping
Processes that are currently not in a wait state are queued in the task dispatching queue in order to get access to the available processor resources. They are processed in the order of their run priority (lower run priority value = higher priority) and arrival times in the task dispatching queue. Processes are removed from the processor when they are reaching a wait state (such as a lock wait or a wait on a disk operation), or when they reach the end of their time slice, which is defined in the process class. Traditionally, interactive processes run with higher priority and shorter time slice, while batch processes have lower priority and a longer time slice. In contrast, on a typical SAP installation all work processes of all instances are running at the same priority and time slice, so all processes compete against each other. This can cause problems in the following situations:
When using dialog work processes 24 hours per day through all time zones instead of having distinct “day” and “night” operation modes, you cannot separate background work from interactive work by the time of the day. In this case, it can be helpful to assign different run priorities to dialog work processes and background work processes.
When running multiple SAP systems in a single IBM i partition and one of the SAP systems is using a significant amount of processor capacity, this can have a negative impact on the remaining SAP systems. For example, when an SAP instance is started, each work process requires a significant amount of processor capacity to set up its runtime environment. When an instance with many work processes is started on a partition with few processors, it can keep all available processors occupied for several seconds or even minutes.
SAP Note 45335 describes how to configure different run priorities for different types of work processes. By default, all SAP processes have a run priority of 20, which is the same as value 'M' for the relevant profile parameter rdisp/prio/<type> where <type> can be upd, btc, spo, gwrd or java. For <type> = java you can select higher or lower priority than the rest of the instance, while for the other types you can only choose lower priority than the other jobs. In addition, there is a profile parameter, rdisp/prio/wp/start, to change the run priority just during the startup of the instance. To change the run priority of all work processes of an instance, you can also change the run priority in class R3<sid>400/R3_<nn>, where <sid> is the SAP system ID and <nn> is the instance number. This can be useful if an unimportant test system and an important productive system share the same partition. Note that a higher number for the run priority value means a lower run priority and vice versa.
Workload capping is supported in IBM i 6.1 with PTF SI41479 or IBM i 7.1 with PTF SI39795 and higher. It allows to restrict the total processor utilization of jobs or subsystems. Each SAP instance is running in its own subsystem, so you can use workload capping to limit the number of virtual processors that can be used by this instance. The disadvantage of workload capping is that it cannot be monitored in the SAP tools for operating system performance. The SAP operating system performance tools (executable saposcol and transactions ST06, OS06 or OS07) are monitoring the overall processor utilization in a logical partition, but they cannot monitor processor utilization per subsystem. If a logical partition has four physical and virtual processors available and an instance subsystem is limited to two virtual processors by workload capping, the SAP performance tools are only showing a processor utilization of 50 %, even though the SAP instance cannot use more processor resources.
To set up workload capping, first create a workload capping group. This workload capping group defines a maximum number of processors that can be used by the associated subsystems. For example, to create a workload capping group named SAPTEST with a maximum of two processors, use the command:
ADDWLCGRP WLCGRP(SAPTEST) PRCLMT(2)
The next step is to create a character type data area named QWTWLCGRP in library QSYS. To create the data area, enter the following command:
CRTDTAARA DTAARA(QSYS/QWTWLCGRP) TYPE(*CHAR) LEN(2000)
The data area contains a list of subsystem description names with their associated workload capping group names in pairs of 10 character names each. As an example, assume that you have a partition with four virtual processors. In the partition, a productive system is running in instance 00 (subsystem: R3_00), and a test system is running with instance 50 (subsystem: R3_50). You want the productive system to use no more than three processors, and the test system to use no more than two processors, so you have created a workload capping group SAPPROD with PRCLMT(3) and a workload capping group SAPTEST with PRCLMT(2). To set up workload capping for this set, modify the contents of the data area with the following command:
CHGDTAARA DTAARA(QSYS/QWTWLCGRP)
VALUE('R3_00 SAPPROD R3_50 SAPTEST ')
 
Note: The names must be filled with blanks up to 10 characters. The changes take effect only after the affected subsystems are ended and started again.
When you start the SAP instances with these settings, the job logs of the subsystem control jobs will contain the message CPI146C “Subsystem <subsystem name> is using workload capping group <workload capping group name>”. Use the DSPWLCGRP command to display the processor limit that was defined for all or a specific workload capping group. More information about workload capping groups can be found in the IBM i and System i® Information Center at:
Select i 7.1 and follow the path: IBM i 7.1 Information Center  Systems management  Work management  Managing work  Managing workload capping.
8.5 Active Memory Expansion for SAP systems
The IBM POWER7 technology-based systems with AIX provide the new feature Active Memory Expansion (AME), a technology for expanding a system's effective memory capacity. Active Memory Expansion employs memory compression technology to transparently compress in-memory data, allowing more data to be placed into memory and thus expanding the memory capacity of POWER7 technology-based systems. Utilizing Active Memory Expansion can improve system utilization and increase a system's throughput.
There are multiple papers explaining in detail the functionality and performance of AME, for example:
Active Memory Expansion: Overview and Usage Guide
Active Memory Expansion Performance
A proof of concept for SAP Retail and DB2, which also used AME
AME is supported for several SAP applications. Detailed information can be found in SAP Note 1464605 - Active Memory Expansion (AME). This note also documents prerequisites and database support statements.
Various SAP applications have been successfully tested with AME and the test results were documented in the previously mentioned papers. It turned out that ABAP application servers can benefit quite well from AME and in some cases activating AME on the database will also show improvements in memory utilizations. Due to the access patterns, especially during garbage collection, SAP Application Server Java-based applications cannot take full advantage of AME and are not good candidates for AME.
 
Tip: If you want to get experience with AME but do not want to start right away, enable AME in the profile of the LPAR and set the expansion factor to 1.0. Then you can start testing AME at any time by changing the expansion factor with DLPAR operations without the need to reboot the LPAR.
For further details, see the white paper Active Memory Expansion Performance at:
8.6 Processor utilization metrics
Here we discuss technical details about measuring the utilization of processors in general, the effect of SMT on processor utilization metrics and the impact on SAP applications in particular. The reason is that new innovations in hardware are changing the way processor utilization and other metrics need to be interpreted.
8.6.1 Introduction
Processor time is a metric that quantifies the amount of processor resources a process used. Equivalently, processor utilization is the relation between the used and available resources. For example, the documentation of the times() system call in the UNIX standard at
states that the field tms_utime returns the “processor time charged for the execution of user instructions of the calling process”. Intuitively one would assume this time to be equivalent to the time a process is active on the processor. Unfortunately, this is not exactly right and we describe in the following sections some background and effects on processor monitoring in general, and also for SAP specifically.
8.6.2 Test case
To illustrate some effects we used a small and simple test program written in C. Following is the source code if you would like to run some experiments on your own. We assume the source is named test_cpu.c:
#include <sys/times.h>
#include <unistd.h>
#include <stdio.h>
 
static clock_t start_time;
static clock_t end_time;
static struct tms start_cpu;
static struct tms end_cpu;
static unsigned long clock_ticks;
static double run_time = 0;
static double user_time = 0;
static double sys_time = 0;
static double cpu_time = 0;
 
int main( int argc, char ** argv )
{
clock_ticks = sysconf(_SC_CLK_TCK );
start_time = times( &start_cpu );
 
while ( run_time < 10 )
{
end_time = times( &end_cpu );
run_time = (double)(end_time - start_time) / clock_ticks;
}
user_time = (double)( end_cpu.tms_utime - start_cpu.tms_utime) / clock_ticks;
sys_time = (double)( end_cpu.tms_stime - start_cpu.tms_stime) / clock_ticks;
cpu_time = user_time + sys_time;
 
printf("Process : %d ", getpid());
printf(" Ticks / sec : %d ", clock_ticks );
printf(" Elapsed time : %2.3f sec ", run_time);
printf(" Charged CPU time : %2.3f sec ", cpu_time );
 
return 0;
}
You can compile the program on AIX with
cc -o test_cpu test_cpu.c
or on Linux with
gcc -o test_cpu test_cpu.c
You can create a similar program on IBM i based on the API QUSRJOBI with format JOBI1000.
This program runs continuously in a loop and calls the times() API. This API returns an elapsed time since an arbitrary time in the past. By comparing the value taken at start-up of the program and continuously in the loop, the program can easily check how long it ran. After 10 seconds it stops. The times() API returns the charged processor time, too. At the end the program prints out the elapsed time and the charged processor time. As such, this program is an ideal test case to report the charged processor time for a given interval of time.
8.6.3 Processor time measurement and SMT
As described in the introduction, the UNIX standard does not define the exact semantics of processor time. Today different operating systems implement different semantics. In one case the processor time is reported as the time a process was active on the processor. In the other case it reports the charged processor time related to the used resources within a processor core.
Running the test program on a single core without SMT
In the simple case, that is, only one thread is running in a core (which is equivalent to SMT-1 on POWER), the situation is easy. The previously introduced test program shows the effect nicely.
If the program runs in an LPAR with one core and SMT-1, we are getting the following result (Example 1 in Figure 8-2 on page 88):
Process : 360680
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 9.970 sec
If we run two instances of the test program in parallel in the same LPAR, the output from the two processes looks like the following (Example 2 in Figure 8-2):
Process : 327778
Ticks / sec : 100
Elapsed time : 10.010 sec
Charged CPU time : 4.980 sec
 
Process : 360624
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 4.990 sec
Figure 8-2 Running the test program on a single core without SMT
This result is easy to understand. There is only one core and one thread, so at any given time only one of the processes (here the process with process ID 327778 and 360624) can be active on the processor. Therefore, each process got charged for the time it was active on the processor. Independent of the operating system charging processor time based on resources (AIX) or on time active on the processor (Linux on x86), the results are the same.
For operating systems charging for used resources, the process had all resources available when it was on the processor, but it was only half of the time scheduled on the processor by the operating system. Consequently, it is getting charged only for about 5 seconds of processor time.
In case of operating systems charging for time active on the processor, the operating system scheduled one process only half of the time and therefore also in this case 5 seconds of processor time are charged.
For the case with only one thread in a core, the reporting of processor time is simple, intuitive, and consistent even between operating systems using different accounting methods for processor time.
Fundamentals of SMT
For several years now it is also possible to run multiple hardware threads on a single core. On POWER5 this was introduced as SMT and allowed two threads in one core. Starting with POWER7, up to four threads per core are possible. Other processors allow multiple threads too, for example the Hyper-Threading technology in several Intel processors. SMT is explained very well in multiple papers, for example:
Simply said, SMT provides mechanisms in a core to share the available resources between multiple threads, and thus achieve better utilization of existing resources within the core. Even though a single thread in a core looks to the operating system like a complete and independent processor, one has to remember that depending on the SMT mode multiple threads all share the same resources of the core and therefore may conflict with each other. As a result, SMT is increasing the throughput of a core but may decrease the response times of threads.
For operating systems such as AIX, charging the processor time based on used resources, SMT introduced some additional challenges. Since all threads use the same resources in the core, a mechanism had to be identified that would allow quantifying the usage of resources in a core on a thread basis. For this purpose POWER5 introduced the Processor Utilization Resource Register (PURR) in the processor. Each thread has its own register, which is incremented with a portion of the increment of the time base register, which is proportional to the amount of the used resources. The time base register is incremented monotonously with the time and can be used to represent the elapsed time. Therefore, the content of the PURR represents a portion of the elapsed time that is proportional to the used resources of a core.
More information about the PURR-based processor time accounting can be found at:
Running the test program on a single core with SMT
Looking at the test program again we can directly see the effects of the different implementations for charging processor time.
The following examples run two instances of the program on a single core with two hardware threads enabled.
On AIX we see the following result (Example 3 in Figure 8-3):
Process : 376978
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 4.780 sec
 
Process : 196642
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 5.170 sec
Figure 8-3 Running the test program on a single core with SMT
Both instances of the program are simultaneously active on the core, but each thread can use only parts of the resources during this time, since it has to share them with the other thread. As a result we see that each thread is only charged about 5 seconds processor time. In total roughly 10 seconds of processor time have been charged for 10 seconds elapsed time, which means 100% of time the core was used. In other words, the core was 100% utilized. We also see that the processor time of a thread does not relate in any form to the time the thread was active on the core.
Looking at the data produced by an operating system charging for time active on the processor we get very different results. The following output was produced on a Linux x86 system with one core and Hyper- Threading enabled:
Process : 4673
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 9.960 sec
 
Process : 4672
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 9.970 sec
Also in this case, both threads were active on the processor all the time. In contrast to AIX, each thread is charged for almost the whole elapsed time. So from the processor time it can be directly derived how long a thread was active on a core. On the other hand, no conclusion can be made on the amount of resources used. It cannot be determined that a single thread in the previous case had only half of the resources of the core available and therefore potentially had an impact on the work it was able to execute during this time.
 
Summary: The metric processor time is not exactly defined in the UNIX standard, and different operating systems provide different semantics. In one case it represents the active time on the processor whereas in the other case it represents an equivalent to the used resources. Both semantics provide very useful information. Processor time as active time on the processor can help in the analysis of application performance issues, for example to decide where a process spent its time. Processor time as a measurement for the used resources can help better to determine the overall utilization and can help to answer how much headroom in the sense of computing power is left on the system.
8.6.4 PURR-based metrics in POWER7
As described earlier, the increments in the time base register are proportionally distributed to the PURR registers in relation to the used resources of the threads. In POWER5 and POWER6 only threads doing active work got increments in their PURR registers. This lead to some artifacts in cases with only one active thread where the active thread was charged for all resources and a system seemed to be 100% utilized, even though the second thread would have been able to provide additional computing power.
With POWER7 this special case has been adjusted. If one or more threads are inactive, the active threads are not charged for all resources in the core. The inactive threads are also charged for those resources they potentially could use. The test program nicely illustrates the effects.
Result for one active instance of the program on one core with SMT-2 enabled (Example 4 in Figure 8-4 on page 92):
Process : 286730
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 8.190 sec
Result for one active instance of the program on one core with SMT-4 enabled (Example 5 in Figure 8-4):
Process : 286734
Ticks / sec : 100
Elapsed time : 10.000 sec
Charged CPU time : 6.250 sec
Figure 8-4 Running the test program on cores with SMT enabled
The charged processor time shows that there is still headroom available in the core to execute additional workload. The remaining headroom can only be used by additional processes scheduled on the free threads of the core, since the first process already uses one thread of the core all the time.
8.6.5 Processor time in SAP applications
In SAP applications the processor time is also used in several places, most importantly in the single statistic records, which are written for every dialog step and report a number of metrics gathered during the dialog step. For example, the records contain the time spent in the database, in the GUI and in other areas. The processor time is also returned as one of the metrics. The single statistic records can be shown with the transaction STAD. In the SAP system this very granular information is also aggregated and used for reporting in transaction ST03.
Using single statistic records for application performance analysis
The single statistic records or the derived results in ST03 are used for multiple purposes. One of them is the analysis of application performance issues. There is a rule of thumb that was established several years ago and not changed since then. This rule says: “If the Processing Time is two times higher (or more) than the processor time, then we likely have a processor bottleneck”. While this rule was valid in the early days, where systems did not have the capability of hardware threads (SMT), it is not applicable any more on systems reporting the processor time as resource usage. Figure 8-5 shows a single statistic record for an ABAP program which just ran on the processor and had no activities on the database, and so on. The LPAR was running in SMT-4 mode.
Figure 8-5 Detailed view of a single statistics record
The processor time reported in this screen shot is the processor time reported from the operating system.
We see the same behavior for this APAP program as we have shown earlier with our test program. Even though the program was active for the whole 10 seconds, it got charged only for 6.43 seconds, which indicates that there is free headroom in the core for additional work. The previously mentioned rule of thumb would now indicate that we are moving towards a processor bottleneck, which is definitively not the case.
 
Important: Some rules of thumb for the analysis of application performance issues in SAP systems based on single statistic records do not apply on POWER systems when SMT is enabled.
Using single statistic records for workload sizing
The other use of single statistic records is to determine the amount of computing power required for certain applications in order to size systems. The processor times for the transactions related to the application would be added up and set into relation of the measurement time interval. In this case the single statistic records on Power Systems provide with the reported processor time a good indicator for proper sizing of workload. On systems reporting processor time as time active on the processor this method returns inaccurate information for the required resources to run the measured workload.
 
Important: The single statistic records in SAP CCMS and their derived metrics in ST03 are providing a good set of data for application sizing purposes on POWER systems.
8.6.6 Side effects of PURR based metrics in Shared Pool environments
AIX partitions in a shared processor pool environment provide two metrics describing the usage of processors, physb and physc. The metric physc refers to the amount of computing power in units of processors provided by the hypervisor to a shared processor LPAR. This metric does not take into account how many of the threads were actually used by the LPAR, since the hypervisor can assign only full cores to an LPAR. In contrast to this the metric physb describes the amount of computing power used by the LPAR.
With the changes in the PURR handling in POWER7 technology-based systems, in some cases more idle time is reported to correctly represent the available free headroom in the system. As a result the difference between physb and physc may get larger, and also some existing recommendations have to be adjusted.
Since POWER7 technology-based systems provide four threads per core, it is not a rare case that physb is showing much lower values than physc and as such indicating free headroom within the resources provided by the hypervisor. This situation is also promoted by the fact that the operating system primarily schedules workload to the primary threads in the available cores, before the secondary and tertiary threads are used. Monitoring only physc to determine the required computing power for an LPAR may lead to wrong conclusions, since the potential free resources within an LPAR are not taken into account.
In the SAP transaction ST06 the metric physc is shown as Capacity Consumed. The value for physb is shown as percentage of the Available Capacity. It is named Available Capacity Busy. The original value for physb can be calculated by multiplying Available Capacity with Available Capacity Busy.
The described changes result in slightly changed recommendations. If the value of physc is nearing the number of virtual processors in the system, the number of virtual processors should not immediately be increased, unless the value for physb also indicates that all resources are used. If physb is significantly lower than physc, the number of virtual processors could stay unchanged or may even be reduced. In all cases the specific requirements of the workload have to be taken into account.
 

1 Copyright SAP AG 2010. All rights reserved.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.124.145