Test methodology
This chapter provides an insight into the approach that was taken by the CICS performance team when producing performance benchmark results. The concept of a CICS workload is defined, along with a description of how workloads are designed and coded.
Performance testing requires the combination of several techniques to provide accurate, repeatable measurements. These techniques are presented here, while demonstrating some of the tools that were used when collecting performance data.
This chapter includes the following topics:
2.1 Workloads
This publication uses the term workload extensively. The term refers to the combination of the following key components of the environment that are used when producing performance figures for a specific CICS configuration:
Application code
Application code can be written in any language that is supported by the CICS environment. The number, sequence, and ordering of EXEC CICS, EXEC SQL, or EXEC DLI commands dictate the flow of control between the application and the IBM CICS Transaction Server (TS) for IBM z/OS (CICS TS) environment under test and is known as the workload logic.
Data that is required by the application
The data that is required by the application can be stored in VSAM files or in an IBM DB2® database, provided by the simulated clients, or supplied by some other external system. The data that is used corresponds to the data that is exchanged between components of CICS as part of a customer’s application.
Topology of connected address spaces
The number of CICS regions, the methods that are used to connect these CICS regions, and the logical partition (LPAR) in which the CICS region is executed all form part of the workload.
Configuration of the CICS region
There are many configuration parameters for CICS and the value for each can be modified to achieve a specific effect.
Simulated clients
The number of simulated clients, their method of communication with the CICS regions under test, and the rate at which requests are sent to the CICS regions can be varied to affect the behavior of a workload.
2.2 Workload design
Performance test workloads that are developed by the CICS TS performance team are deliberately lightweight; that is, workloads have little business logic. The phrase business logic refers to language constructs that serve only to manipulate data according to business rules, rather than the workload logic that is used to control program flow between the application and the CICS TS environment.
The CICS TS performance team specifically target the discovery of performance problems in the CICS TS runtime code, and having lightweight applications maximizes the visibility of any potential problems at the time of development.
The use of a transaction from the data system workload (DSW) as described in 3.2, “Data Systems Workload” on page 20, helps you understand why minimizing business logic is important. Consider the following coding scenarios for the application:
A minimal business logic case with a total transaction CPU cost of 0.337 ms and consisting of the following values:
 – 0.322 ms of CPU for calls into CICS
 – 0.015 ms of CPU for business logic
A more heavyweight business logic case with a total transaction CPU cost of 1.500 ms and consisting of the following values:
 – 0.322 ms of CPU for calls into CICS
 – 1.178 ms of CPU for business logic
In both cases, the amount of CPU consumed by the CICS TS code to complete the CICS operations that are required for the workload is equal to 0.322 ms.
Now consider the example where a change in the CICS TS product during the development phase inadvertently introduces a CPU overhead of 5 µs for each transaction. With the workload in the first scenario (which contains a minimal amount of business logic), the total transaction cost increased from 0.337 ms to 0.342 ms of CPU, an increase of 1.5%. With the workload in the second scenario (which contains significant business logic), the total transaction cost increased from 1.500 ms to 1.505 ms of CPU, an increase of 0.3%.
Although techniques that are used to minimize variability in performance test results are described in 2.3, “Repeatable measurements” and 2.6, “Collecting performance data”, you should note that only a finite level of accuracy in terms of performance test results are achievable. By following leading practices in the CICS TS performance test environment, experience indicates that an accuracy of approximately ±1% can be achieved. The use of coding in the first scenario resulted in a relative performance change (1.5%), which is greater than the measurement accuracy. The small performance degradation was detected and the defect can be corrected.
Minimizing the amount of business logic in the test application maximizes the relative change in performance for the whole workload for any specific modification to the CICS TS runtime code. By using this worst-case test scenario approach, the performance test team can be confident that real-world applications do not observe any change in performance behavior.
 
Observation: For the DSW, an IBM zEnterprise® EC12 model HA1 executes at a rate of approximately 1,270 million instructions per second, per central processor (CP). An inadvertent change that added 5 µs to the total transaction cost represents approximately 6,350 instructions.
2.3 Repeatable measurements
Before describing how performance data is collected, it is important to understand that unless you totally dedicate hardware for a benchmark, the CPU that is used can vary each time that the benchmark is run. Achieving repeatable results can be difficult. This statement is true for benchmark comparisons and also for CPU usage comparisons after a CICS upgrade.
For more information about how CPU time can be affected by other address spaces in the LPAR and other LPARs on the central processor complex (CPC), see IBM CICS Performance Series: Effective Monitoring for CICS Performance Benchmarks, REDP-5170, which is available at this website:
The LPARs that support the CICS regions in all performance benchmarks that are described in this publication include dedicated CPs. Although the CPs are dedicated, the L3 and L4 caches remain shared with other CPs that are used by other LPARs. So, this situation is not perfect; it can lead to CPU variation because those caches can have their data invalidated by those CPs that are used by the other LPARs. Clearly, minimizing the magnitude of these external influences is a high priority when producing reliable performance benchmark results.
An automated measurement system is used to execute the benchmarks and collect the performance data. This automated system executes overnight during a period when no human users are permitted to access the LPAR. The use of an automation system reduces variation in results by ending unnecessary address spaces that can potentially disrupt the measurements. The use of overnight automation also minimizes disruption because that is the time frame during which other LPARs on the CPC are least busy.
2.4 Driving the workload
The IBM Workload Simulator for z/OS (Workload Simulator) tool is used to send work into the CICS regions from multiple simulated clients concurrently. For more information about Workload Simulator, see the following product web page:
The process of sending work into the CICS regions is commonly referred to as driving the workload. The system under test is on a separate LPAR in the same sysplex. All network traffic is routed by way of a coupling facility from one LPAR to the other.
2.5 Summary of performance monitoring tools
During the benchmark measurement periods, the following tools are used:
2.5.1 RMF Monitor I
IBM RMF™ Monitor I records system resource usage, including CPU, DASD, and storage. It is also used with the workload manager (WLM) configuration to record the CPU, transaction rates, and response times for CICS service classes and report classes.
SMF records 70 - 79 are written on an interval basis. They can be post-processed by using the ERBRMFPP RMF utility program.
2.5.2 RMF Monitor III
RMF Monitor III records the coupling facility activity for the logger and temporary storage structures.
SMF records 70 - 79 are written on an interval basis. Also, the records can be post-processed by using the ERBRMFPP RMF utility program. RMF Monitor III can be used on an interactive basis and the data can be written to VSAM data sets for later review.
2.5.3 CICS TS statistics
CICS statistics are used to monitor and report CICS resource usage, including CPU, storage, file accesses, and the number of requests that were transaction-routed.
With CICS interval statistics, most of the counters are reset at the start of the interval so that any resource consumption that is reported relates only to the observed measurement period. Interval statistics can be activated by using the CEMT SET STATISTICS command. However, when you set this interval, the first interval can be adjusted to a shorter time so that all the intervals are synchronized to the STATEOD parameter. For example, if you use CEMT to set the interval to 15 minutes at 10 past the hour, the first interval expires in 5 minutes so that all future intervals line up on 15-minute wall clock boundaries. The values in this first report also can be associated with a much longer period, depending on the time of the last reset.
Another alternative to the use of interval statistics is to use CEMT to reset the counters and then at the end of the measurement period, use CEMT to record all the statistics. Resetting the statistics requires a change of state from ON to OFF or from OFF to ON. To ensure that this change happens, the following commands provide an example of resetting the statistics in one CICS region:
F CICSA001,CEMT SET STAT OFF RESET
 
F CICSA001,CEMT SET STAT ON RESET
The measurement period is between the RESET and the RECORD, as shown in the following example:
F CICSA001,CEMT PERFORM STAT ALL RECORD
Regardless of whether the statistics are ON or OFF, when a PERFORM STAT ALL RECORD command is issued, a statistics record is written.
CICS statistics are written as SMF 110 subtype 2 records. They can be post-processed by using the CICS statistics utility program, DFHSTUP, or CICS Performance Analyzer (CICS PA).
2.5.4 CICS TS performance class monitoring
When CICS Performance Class Monitoring is turned on by using MNPER=ON in the CICS startup parameters or CEMT or CEMN transactions to turn it on dynamically, a Performance Class Monitoring record is generated for every executed transaction when the transaction ends.
The following command is an example of turning on CICS Performance Class Monitoring and Resource Class Monitoring in one CICS region:
F CICSA001,CEMT SET MON ON PER RESRCE
Monitoring can then be turned off by using the following command:
F CICSA001,CEMT SET MON ON NOPER NORESRCE
The performance class record of each transaction contains information about the resources that were used by that transaction, how much CPU was used on all the various task control blocks (TCBs), and information about how long it waited for different resources. Resource Class Monitoring records contain information about the individual files, temporary storage queues, and distributed program links (DPLs) that were used by transactions.
Monitoring records are written as SMF 110 subtype 1 records that can be analyzed by using CICS PA.
2.5.5 Hardware instrumentation counters and samples
The CPU Measurement Facility (CPU MF) is described in 1.1, “CPU Measurement Facility” on page 4. The CPU MF capability is built into the hardware, and a z/OS component called hardware instrumentation services (HIS) sets up buffers that the hardware then uses to store the sampling data. When a number of buffers are filled, the hardware generates an interrupt. This interrupt enables HIS to asynchronously collect the sampling information and save it to a file in the z/OS UNIX file system. It also provides the ability for the samples to be gathered without the software responsible for collecting the data, having to run at the highest Workload Manager priority level. HIS can be used to collect the following types of data:
Counters
Instruction samples
HIS counters are written as System Management Facilities (SMF) 113 records and to the z/OS UNIX file system. These counters contain information about key hardware events, such as the number of instructions that are executed, the number of cycles that were used, and the amount of instruction cache and data cache misses. Counters are used to provide a high-level understanding of how the address spaces interact with the hardware.
HIS instruction samples are written only to the z/OS UNIX file system. The samples are used to provide a view of CPU activity for individual instructions or groups of instructions. Tooling enables the inspection of this data to help the CICS performance team understand where hot spots exist in the CICS runtime code. Hot spots are short sequences of one or two machine instructions that consume a disproportionately large fraction of the total CPU cost. These hot spots are frequently caused by data access patterns that do not make optimal use of the hardware cache subsystem. Tooling that is written to consume HIS instruction samples also permits the comparison of two benchmark runs, where differences in performance can be analyzed at the instruction level.
For more information about configuring and the use of HIS, see Setting Up and Using the IBM System z CPU Measurement Facility with z/OS, REDP-4727, which is available at this website:
2.6 Collecting performance data
Performance data often is collected for five measurement intervals. The rate at which work is driven into CICS is varied by adjusting the Workload Simulator user think time interval (UTI). The UTI value represents the delay between a simulated client that is receiving a response, and then sending the next request into CICS. A large think time results in a low rate of transactions in CICS. Reducing the UTI increases the rate at which work is driven into the CICS environment.
The initial measurement period begins by adjusting the UTI to achieve the required transaction rate in the CICS regions. The workload can run for a period to ensure that all programs were loaded and the local shared resource (LSR) pools are populated. After the stabilization period is complete, the performance data collection is started.
No specific changes to any default CICS parameters are needed to support the data that is collected during performance benchmarks. Data is collected for a 5-minute period, which is relatively short but adequate in our environment when running in a steady-state.
RMF, CICS Performance Class Monitoring, CICS statistics, and HIS are all synchronized and started and ended together. An automation tool is used that enters commands on the IBM MVS™ console on a time-based interval.
To generate the RMF interval, start and stop RMF at the appropriate times, which creates an interval report for that period rather than trying to synchronize on a time basis.
When the workload is running at its stabilized state, the CICS statistics are reset by using the commands that are described in 2.5.3, “CICS TS statistics” on page 14. CICS Performance Class Monitoring is turned on by using the commands that are shown in 2.5.4, “CICS TS performance class monitoring” on page 15. RMF Monitor I was started by using the following MVS command:
S RMF.R
Monitor III is then started by using the following command:
F R,START III
HIS is also started to collect counter data only.
After 5 minutes elapses, RMF and HIS are stopped, and the command that is shown in 2.5.3, “CICS TS statistics” on page 14 is issued to request that CICS statistics are recorded.
After the performance data collection period ends, the UTI is reduced, which increases the transaction rate in CICS. Again, the workload is allowed to run for a period to ensure that the system reaches a steady-state. After this stabilization period is complete, the performance data collection is restarted.
After five cycles of UTI adjustment and data collection, a set of data is produced which represents the performance of the CICS regions at several transaction rates. The SMF data set that contains the collected RMF, CICS, and HIS performance data is copied for later post-processing and analysis to examine the performance characteristics of the workload.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.40.207