Chapter 5. CICS TS for z/OS V5.1

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CICS TS for z/OS V5.1

The IBM CICS Transaction Server (TS) for IBM z/OS (CICS TS) V5.1 release introduces various f technical and operational capabilities. Included in these updates are many improvements that provide performance benefits over previous CICS releases.

Included in the CICS V5.1 performance report are the following subject areas:

•Key performance benchmarks that are presented as a comparison with the CICS TS V4.2 release.

•An outline of improvements made regarding the threadsafe characteristics of the CICS run time.

•Details of the changes made to performance-critical CICS initialization parameters, and the effect of these updates.

•Description of all the updated monitoring fields, including examples where necessary.

•The extent and effect of the reduction in 24-bit and 31-bit virtual storage usage.

•High-level views of new functions that were introduced in the CICS V5.1 release, including performance benchmark results where appropriate.

•A description of transaction isolation and how changes that were introduced in this release might affect workloads with this feature enabled.

This chapter includes the following topics:

•5.1, “Introduction” on page 38

•5.2, “Release-to-release comparisons” on page 38

•5.3, “Improvements in threadsafety” on page 44

•5.4, “Changes to system initialization parameters” on page 48

•5.5, “Enhanced instrumentation” on page 50

•5.6, “Virtual storage constraint relief” on page 56

•5.7, “64-bit application support” on page 58

•5.8, “Java 7 and zEnterprise EC12” on page 59

•5.9, “CICSPlex System Manager dynamic routing” on page 61

•5.10, “Workload consolidation” on page 62

•5.11, “Effect of threadsafe transient data” on page 67

•5.12, “Transaction isolation” on page 70

5.1 Introduction

When compiling the results for this chapter, the workloads were run on an IBM zEnterprise 196 model M80 (machine type 2817). A maximum of 16 dedicated central processors (CPs) were available on the measured logical partition (LPAR), with a maximum of four dedicated CPs available to the LPAR used to simulate users. These LPARs are configured as part of a Parallel Sysplex. An internal coupling facility (CF) was co-located on the same central processor complex (CPC) as the measurement and driving LPARs, connected through internal coupling peer (ICP) links. An IBM System Storage® DS8800 unit was used to provide external storage.

This chapter presents the results of several performance benchmarks when run in a CICS TS for z/OS V5.1 environment. Unless otherwise stated in the results, the CICS V5.1 environment was the code that was available on the general availability date of 14 December 2012. Several of the performance benchmarks are presented in the context of a comparison with CICS TS V4.2. All LPARs used z/OS V1.13.

For more information about performance terms that are used in this chapter, see Chapter 1, “Performance terminology” on page 3. For more information about the test methodology that was used, see Chapter 2, “Test methodology” on page 11. For more information about the workloads that were used, see Chapter 3, “Workload descriptions” on page 19.

Where reference is made to an LSPR processor equivalent, the indicated machine type and model can be found in the large systems performance reference (LSPR) document. For more information about obtaining and using LSPR data, see 1.3, “Large Systems Performance Reference” on page 6.

5.2 Release-to-release comparisons

This section describes some of the results from a selection of regression workloads that are used to benchmark development releases of CICS TS. For more information about the use of regression workloads, see Chapter 3, “Workload descriptions” on page 19.

5.2.1 Data Systems Workload static routing

The static routing variant of the Data Systems Workload (DSW) is described in 3.2.1, “DSW static routing”. This section presents the performance figures that were obtained by running this workload. The LPAR used for measurement was configured with 16 CPs online, which resulted in an LSPR processor equivalent of 2817-716.

Table 5-1 lists the results of the DSW static routing workload that used the CICS TS V4.2 release.

Table 5-1 CICS TS V4.2 results for DSW static routing workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
2498.52	75.86%	0.304	6.78%
2928.69	88.35%	0.302	7.79%
3543.47	104.08%	0.294	9.09%
4428.34	129.16%	0.292	11.13%
5944.91	168.58%	0.284	14.34%

Table 5-2 lists the same figures for the CICS TS V5.1 release.

Table 5-2 CICS TS V5.1 results for DSW static routing workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
2496.35	77.55%	0.311	6.89%
2939.62	87.18%	0.297	7.65%
3532.10	102.29%	0.290	8.86%
4425.48	126.17%	0.285	10.80%
5948.50	166.52%	0.280	14.07%

The average CPU per transaction figure for CICS TS V4.2 is calculated to be 0.295 ms, and the CICS TS V5.1 figure is also calculated to be 0.292 ms. The performance of this workload is considered to be equivalent across the two releases.

These performance results are also shown in Figure 5-1.

Figure 5-1 Plot of CICS TS V4.2 and V5.1 performance figures for DSW static routing workload

The measured CPU cost for each transaction rate is similar for CICS TS V4.2 and V5.1, with the CICS TS V5.1 release showing a marginal improvement. CPU cost scales linearly in accordance with the transaction rate.

5.2.2 DSW dynamic routing

The dynamic routing variant of the DSW workload is described in 3.2.2, “DSW dynamic routing”. This section presents the performance figures that were obtained by running this workload. The workload was configured with four terminal-owning regions (TORs) dynamically routing transactions to 30 application-owning regions (AORs). The LPAR that was used for measurement was configured with eight CPs online, which resulted in an LSPR processor equivalent of 2817-708.

Table 5-3 lists the results of the DSW dynamic routing workload that used the CICS TS V4.2 release.

Table 5-3 CICS TS V4.2 results for DSW dynamic routing workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
2071.61	141.20%	0.682	21.05%
2842.02	189.11%	0.665	27.85%
4128.25	270.70%	0.656	39.41%
5047.36	326.08%	0.646	47.24%
6493.98	417.16%	0.642	60.21%

Table 5-4 lists the same figures for the CICS TS V5.1 release.

Table 5-4 CICS TS V5.1 results for DSW dynamic routing workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
2074.87	139.91%	0.674	20.87%
2846.00	188.55%	0.663	27.78%
4133.39	269.54%	0.652	39.32%
5053.15	326.22%	0.646	47.33%
6501.18	416.92%	0.641	60.25%

The average CPU per transaction figure for CICS TS V4.2 is calculated to be 0.658 ms, and the CICS TS V5.1 figure is also calculated to be 0.655 ms. The performance of this workload is considered to be equivalent across the two releases. Figure 5-2 shows the results from Table 5-3 on page 40 and Table 5-4.

Figure 5-2 Plot of CICS TS V4.2 and V5.1 performance figures for DSW dynamic routing workload

You can see the V4.2 and V5.1 lines are overlaid, which indicates near-identical CPU cost per transaction. The plot lines are also straight, which indicates linear scaling as transaction throughput increases.

5.2.3 Relational Transactional Workload threadsafe

This section presents the performance figures for the threadsafe variant of the Relational Transactional Workload (RTW), as described in 3.3, “Relational Transactional Workload” on page 23.

Table 5-5 on page 42 lists the results of the RTW threadsafe workload that used the CICS TS V4.2 release.

Table 5-5 CICS TS V4.2 results for the RTW threadsafe workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
249.69	53.59%	2.146	21.33%
361.55	77.65%	2.148	30.93%
474.66	101.46%	2.138	39.85%
592.37	125.40%	2.117	48.89%
730.20	153.82%	2.107	59.51%

Table 5-6 lists the same figures for the CICS TS V5.1 release.

Table 5-6 CICS TS V5.1 results for the RTW threadsafe workload

ETR	CICS CPU	CPU per transaction (ms)	LPAR busy
249.98	54.19%	2.168	21.63%
361.88	78.35%	2.165	31.26%
474.86	101.42%	2.136	39.74%
592.74	126.14%	2.128	49.20%
729.98	155.06%	2.124	59.98%

The average CPU per transaction figure for CICS TS V4.2 is calculated to be 2.131 ms, and the CICS TS V5.1 figure is also calculated to be 2.144 ms. The difference between these figures is less than 1%, so the performance of this workload is considered to be equivalent across the two releases. Figure 5-3 shows these performance results.

Figure 5-3 Plot of CICS TS V4.2 and V5.1 performance figures for RTW threadsafe workload

As previously observed, the straight line indicates linear scaling as throughput increases, and the overlaid lines demonstrate equivalent performance between the two CICS releases.

5.2.4 Java throughput

CICS TS V4.2 supports Java V6.0.1 only, whereas CICS TS V5.1 supports Java 7.0 only. This section compares the throughput of a Java workload that is running in CICS TS V4.2 that uses Java 6.0.1 with the same workload that is running in CICS TS V5.1 that uses Java 7.0.

Note: APAR PI30532 enables support for Java 7.1 in CICS TS V5.1. APAR PI52819 enables support for Java 8 in CICS TS V5.1.

The workload was an intensive Java application that performed some JCICS calls. The workload ran in a single CICS region that contained one JVMSERVER resource. A high transaction injection rate was maintained to drive a zEnterprise 196 model M80 with eight GCPs and one zIIP to maximum utilization. Data was collected in 1-minute intervals, and the THREADLIMIT attribute of the CICS JVM server was increased every 5 minutes. Each benchmark started with two JVM server threads.

The benchmarks used Java V6.0.1 SR3 and Java V7.0 SR3. The following configuration parameters were used in both cases:

•-Xms600M

•-Xmx600M

•-Xmns500M

•-Xmos100M

•-Xgcpolicy:gencon

The chart that is shown in Figure 5-4 plots the throughput in transactions per second as the THREADLIMIT attribute of the JVM server was increased.

Figure 5-4 Throughput comparison for a Java workload in CICS TS V4.2 and CICS TS V5.1

For both configurations, the CPU utilization reached 99.9% when the JVM server reached nine concurrent threads. It can be seen from Figure 5-4 on page 43 that similar performance is observed in both configurations, and both configurations scale well.

For more information about the benefits of Java 7.0 in CICS TS V5.1, see 5.8, “Java 7 and zEnterprise EC12” on page 59.

5.3 Improvements in threadsafety

Most new CICS API and SPI commands in CICS V5.1 are threadsafe. Also, some commands were made threadsafe in this release. Specific functional areas also were improved to reduce task control block (TCB) switches.

5.3.1 Threadsafe API and SPI commands

The following new CICS API commands are threadsafe:

•GETMAIN64

•FREEMAIN64

•GET64 CONTAINER

•PUT64 CONTAINER

The following CICS API transient data commands were made threadsafe:

•READQ TD

•WRITEQ TD

•DELETEQ TD

These transient data API commands are threadsafe when used with a queue in a local CICS region, and when the request is function shipped to a remote CICS region over an Internet Protocol interconnectivity (IPIC) connection only. For other types of connections to remote CICS regions, the command is not threadsafe.

For more information about CICS API commands, see the topic “CICS command summary” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4zLc

The following new CICS system programming interface (SPI) commands are threadsafe:

•INQUIRE EPADAPTINSET

•EPADAPTERSET commands:

– INQUIRE EPADAPTERSET

– SET EPADAPTERSET

The following CICS SPI commands were made threadsafe:

•SET TASK

•TRACEDEST commands:

– INQUIRE TRACEDEST

– SET TRACEDEST

•TRACEFLAG commands:

– INQUIRE TRACEFLAG

– SET TRACEFLAG

•TRACETYPE commands:

– INQUIRE TRACETYPE

– SET TRACETYPE

For more information about CICS SPI commands, see the topic “System commands” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4Yx3

5.3.2 Threadsafe program loading

When running on an open TCB and a CICS program load is requested, there is no longer a TCB change mode to the resource-owning (RO) TCB. The RO TCB is still used when an application is not running on an open TCB, or for CICS DFHRPL and LIBRARY data set management operations.

The ability to load programs on an open TCB can result in path length reductions for applications that frequently issue LOAD requests because of the removal of the TCB switch. The use of an open TCB can also result in reduced contention for the single RO TCB and potentially offers significantly increased CICS program LOAD capability. Reducing the effect of program load on the RO TCB also reduces any contention on the RO TCB that is caused by competing security calls that also share the RO TCB.

A simple threadsafe application was created that issued an EXEC CICS LOAD command while running on an open TCB. The chart that is shown in Figure 5-5 plots the response time against the program LOAD request rate.

Figure 5-5 Response time against program LOAD rate

With CICS TS V4.2, CICS must switch to the RO TCB to complete the physical load when the application issues an EXEC CICS LOAD command even though the application is running on an open TCB.

In this environment, a load operation is taking an average of 5 ms elapsed time to complete. Although the command does not use 5 ms of CPU time during this period, it does mean that the RO TCB is dispatched for a total of 5 ms and so the LOAD operations are effectively run serially. At a rate of around 200 requests per second, the RO TCB reached its limit in terms of available dispatch time.

When the same application is run in CICS TS V5.1, the LOADs are run concurrently on open TCBs. The chart in Figure 5-5 on page 45 shows that the throughput capability increased approximately tenfold. The limiting factor for this application became the I/O subsystem in CICS TS V5.1 instead of the RO TCB.

Updates to the CICS monitoring and statistics data also are associated with the threadsafe program load enhancements. For more information about updates to CICS monitoring, see 5.5.8, “DFHTASK performance group” on page 53. For more information about for updates to CICS statistics, see 5.5.11, “Loader domain global statistics” on page 56.

5.3.3 Use of T8 TCB for JDBC calls

A JDBC call from a Java application that uses the type 2 JDBC driver in CICS no longer requires a TCB change mode operation. As described in Chapter 4, “Open transaction environment” on page 27, TCB change mode operations can add CPU overhead to an application. In a CICS JVM server, Java applications run on T8 TCBs. In CICS TS V4.2, DB2 calls from a Java environment required a switch to an L8 TCB. In CICS TS V5.1, this switch is removed.

To demonstrate the improvement, a modified version of the CICS DB2 Dynamic SQL example was used. The application reads 43 rows from a DB2 table and writes the results to a CICS terminal. This combination of DB2 accesses and terminal writes ensures that the application has a mix of JDBC and JCICS calls.

The application was tested by using a single CICS region with one JVM server defined. Both configurations used DB2 V10. The JVM server was defined with a limit of 20 concurrent threads and the following JVM parameters:

•-Xgcpolicy:gencon

•-Xmx600M

•-Xms600M

•-Xmnx500M

•-Xmns500M

•-Xmox100M

•-Xmos100M

CICS TS V4.2 used Java V6.0.1 SR3, and CICS TS V5.1 used Java V7.0 SR3.

By using the methodology that is described in 2.6, “Collecting performance data” on page 16, CPU usage was recorded by using RMF for five different transaction rates. Figure 5-6 shows the measured CPU utilization as the transaction rate is increased.

Figure 5-6 Plot of CPU utilization against transaction rate for a Java workload in CICS V4.2 and V5.1

The CICS TS V5.1 configuration uses slightly less CPU overall, which benefits from the reduction in TCB change mode operations. Both configurations scale well, as indicated by the linear increase in CPU utilization as the transaction rate increased.

By using the same JDBC application, a second benchmark was run and CICS monitoring facility (CMF) data was collected. Several key performance metrics were extracted from the CMF data and are listed in Table 5-7.

Table 5-7 Average CPU time and TCB breakdown for Java DB2 workload

CICS release	Average CPU time (ms)				Average TCB change mode count
CICS release	Total user	QR TCB	T8 TCB	L8 TCB	Average TCB change mode count
V4.2	4.374	0.310	2.907	1.157	300
V5.1	4.230	0.322	3.844	0.064	202

Table 5-7 shows that the number of TCB change mode operations for each transaction decreased because the workload now completes JDBC calls into DB2 on a T8 TCB, rather than an L8 TCB. This reduction in TCB change modes has the following effects:

•The overall CPU per transaction is reduced from 4.374 ms to 4.230 ms.

•Where CPU time for DB2 calls was previously accumulated on the L8 TCB, this time is now accumulated on the T8 TCB. A small amount of CPU time is still accumulated on an L8 TCB for sync point processing at transaction completion.

5.4 Changes to system initialization parameters

Several performance-related CICS system initialization table (SIT) parameters were changed in the CICS TS V5.1 release. This section describes changes to the SIT parameters that have the most affect on CICS performance. All comparisons to previous limits or default values refer to CICS TS V4.2.

5.4.1 Active keypoint frequency (AKPFREQ)

The minimum value for the AKPFREQ parameter was decreased from 200 to 50. This value means that completed log task records can be deleted more frequently, which reduces the DASD data space usage. The value that is specified for the AKPFREQ parameter can be zero, or 50 - 65535.

If you specify AKPFREQ=0, no activity keypoints are written. Therefore, replication support is affected because without activity keypointing, tie-up records are not written to replication logs.

For more information, see the topic “AKPFREQ” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDc

5.4.2 Extended dynamic storage area limit (EDSALIM)

The default value for the EDSALIM parameter was increased from 48 MB to 800 MB. This new default value enables a CICS region that was started with the default value to process a reasonable workload. The value that is specified for the EDSALIM parameter can be 48 MB - 2047 MB in multiples of 1 MB.

For more information, see the topic “EDSALIM” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDx

5.4.3 Terminal scan delay (ICVTSD)

The default value for the ICVTSD parameter was decreased from 500 to 0. The value that is specified for the ICVTSD parameter can be 0 - 5000 milliseconds.

The terminal scan delay facility was used in earlier releases to limit how quickly CICS dealt with some types of terminal output requests that were made by applications to spread the overhead of dealing with the requests. Specifying a nonzero value was sometimes appropriate where the CICS system used non-SNA networks. However, with SNA and IPIC networks, setting ICVTSD to 0 is appropriate to provide a better response time and the best virtual storage usage.

For more information, see the topic “ICVTSD” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDD

5.4.4 Maximum open TCBs (MAXOPENTCBS)

MAXOPENTCBS was used to specify the maximum number of open TCBs in the pool of L8 and L9 mode TCBs. These TCBs are used for OPENAPI application programs and task-related user exits that are enabled with the OPENAPI option.

CICS now manages the number of TCBs in this pool automatically by using the following formula based on the current value of the maximum tasks (MXT) system parameter:

MAXOPENTCBS = (2 x MXT) + 32

For more information about open TCBs, see Chapter 4, “Open transaction environment” on page 27.

Note: The MAXOPENTCBS parameter was reintroduced in CICS TS V5.2.

5.4.5 Maximum XP TCBs (MAXXPTCBS)

MAXXPTCBS was used to specify the maximum number of open TCBs in the pool of X8 and X9 mode TCBs. These TCBs are used for C and C++ programs compiled with the XPLINK option.

CICS now manages the number of TCBs in this pool automatically by using the following formula based on the current value of the maximum tasks (MXT) system parameter:

MAXXPTCBS = MXT

For more information about open TCBs, see Chapter 4, “Open transaction environment” on page 27.

Note: The MAXXPTCBS parameter was reintroduced in CICS TS V5.2.

5.4.6 Maximum tasks (MXT)

The maximum number of user tasks that can exist in a CICS region concurrently was increased in the CICS TS V5.1 release. The maximum value that can be specified for the MXT parameter was increased from 999 to 2000. The minimum value was increased from 1 to 10, and the default value was increased from 5 to 500.

The changes mean that a CICS region operates more efficiently with the default setting and can process more workload, so the need to increase the number of CICS regions is reduced.

Note: The default value for MXT was changed to 250 in CICS V5.2.

These changes apply to the MXT system initialization parameter, the MAXTASKS option of the SET SYSTEM and CEMT SET SYSTEM commands, and the MAXTASKS value in CICSPlex SM.

You must ensure that enough storage is available to support the maximum number of tasks value. For more information about setting the maximum task specification, see the “Setting the maximum task specification (MXT)” topic in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDM

When you increase the maximum number of tasks for a CICS region, measure performance to ensure that the response time and other time components (such as dispatch time and suspend time) for your transactions remain acceptable. In some systems, an increase in concurrent tasks might increase resource contention to a level that causes more delays for transactions.

In the performance class data for a transaction, the new MAXTASKS field records the current setting for the maximum number of tasks for the CICS region. The CURTASKS field records the current number of active user transactions in the system at the time the user task was attached. This data helps you to assess the relationship between the task load during the life of a transaction, and the performance of the transaction. For more information about these new performance class data fields, see 5.5, “Enhanced instrumentation” on page 50.

5.4.7 Priority aging interval (PRTYAGE)

The default value for the PRTYAGE parameter was decreased from 32768 (32.768 seconds) to 1000 (1 second). This lower value means that the priority of long-running tasks that are on the ready queue increases more rapidly.

For more information, see the topic “PRTYAGE” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDa

5.4.8 Location of terminal user areas (TCTUALOC)

The default value of the TCTUALOC parameter was changed from BELOW to ANY. The specification of TCTUALOC=ANY means that terminal user areas can be stored in 24-bit or 31-bit storage, and CICS uses 31-bit storage to store them if possible.

If you require the terminal user area to be in 24-bit storage because you have application programs that are not capable of 31-bit addressing, specify the system initialization parameter TCTUALOC=BELOW for the CICS region.

For more information, see the topic “TCTUALOC” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDe

5.5 Enhanced instrumentation

Significant enhancements to monitoring and statistics were made in the CICS TS V5.1 release. This section details the extra and changed fields that are now available in the CMF and statistics SMF records.

5.5.1 DFHCHNL performance group

The following fields were updated to also include request counts from the new EXEC CICS GET64 CONTAINER and EXEC CICS PUT64 CONTAINER API commands:

•PGGETCCT

•PGPUTCCT

•PGMOVCCT

•PGGETCDL

•PGPUTCDL

•PGCRECCT

For more information about counters that are available in the DFHCHNL performance group, see the topic “Performance data in group DFHCHNL” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDp

5.5.2 DFHCICS performance group

One new field was added to the DFHCICS performance group.

Managed Platform - Policy rule thresholds exceeded (field MPPRTXCD) is the number of policy rule thresholds that this task exceeded.

For more information about counters that are available in the DFHCICS performance group, see the topic “Performance data in group DFHCICS” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDV

5.5.3 DFHDEST performance group

The following new fields were added to the DFHDEST performance group:

•Transient Data intrapartition lock wait time (field TDILWTT)

The elapsed time for which the user task waited for an intrapartition transient data lock.

•Transient Data extrapartition lock wait time (field TDELWTT)

The elapsed time for which the user task waited for an extrapartition transient data lock.

For more information about counters that are available in the DFHDEST performance group, see the topic “Performance data in group DFHDEST” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDA

5.5.4 DFHFILE performance group

The following new fields were added to the DFHFILE performance group:

•File control wait time for exclusive control of a VSAM control interval (field FCXCWTT)

The elapsed time in which the user task waited for exclusive control of a VSAM control interval.

•File control wait time for a VSAM string (field FCVSWTT)

The elapsed time in which the user task waited for a VSAM string.

For more information about counters that are available in the DFHFILE performance group, see the topic “Performance data in group DFHFILE” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDu

5.5.5 DFHRMI performance group

Group DFHRMI is present in the performance class record only if RMI=YES is specified on the DFHMCT TYPE=INITIAL macro.

In CICS TS V5.1, the default value for the RMI parameter changed from NO to YES.

5.5.6 DFHSOCK performance group

The following new fields were added to the DFHSOCK performance group:

•IPIC session allocation wait time (field ISALWTT)

The elapsed time for which a user task waited for an allocate request for an IPIC session.

•Cipher selected (field SOCIPHER)

Identifies the code for the cipher suite that was selected during the SSL handshake for use on the inbound connection.

For more information about counters that are available in the DFHSOCK performance group, see the topic “Performance data in group DFHSOCK” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YD3

5.5.7 DFHSTOR performance group

The following new fields were added to the DFHSTOR performance group:

•Number of GCDSA storage getmains (field SC64CGCT)

Number of user-storage GETMAIN requests that are issued by the user task for storage above the bar in the CICS dynamic storage area (GCDSA).

•GCDSA storage high water mark above 2 GB (field SC64CHWM)

Maximum amount (high-water mark) of user storage, rounded up to the next 4 KB, allocated to the user task above the bar in the CICS dynamic storage area (GCDSA).

•Number of GUDSA storage getmains (field SC64UGCT)

Number of user-storage GETMAIN requests that are issued by the user task for storage above the bar in the user dynamic storage area (GUDSA).

•GUDSA storage high water mark above 2 GB (field SC64UHWM)

Maximum amount (high-water mark) of user storage, rounded up to the next 4 KB, allocated to the user task above the bar in the user dynamic storage area (GUDSA).

•Number of shared storage getmains above 2 GB (field SC64SGCT)

Number of storage GETMAIN requests that are issued by the user task for shared storage above the bar in the GCDSA or GSDSA.

•Shared storage bytes obtained (field SC64GSHR)

Amount of shared storage obtained by the user task by using a GETMAIN request above the bar in the GCDSA or GSDSA. The total number of bytes that are obtained is rounded up to the next 4 KB and the resulting number of 4 KB pages is reported.

•Shared storage bytes released (field SC64FSHR)

Amount of shared storage that is released by the user task by using a FREEMAIN request above the bar in the GCDSA or GSDSA. The total number of bytes that are obtained is rounded up to the next 4 KB and the resulting number of 4 KB pages is displayed.

For more information about counters that are available in the DFHSTOR performance group, see the topic “Performance data in group DFHSTOR” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YDk

5.5.8 DFHTASK performance group

The following new fields were added to the DFHTASK performance group that are related to the CICS RO, SO, and T8 mode TCBs:

•User task RO TCB dispatch time (field RODISPT)

The elapsed time during which the user task was dispatched by the CICS dispatcher on the CICS RO mode TCB.

•User task RO TCB CPU time (field ROCPUT)

The processor time during which the user task was dispatched by the CICS dispatcher on the CICS RO mode TCB.

•User task RO TCB wait-for-dispatch time (field ROMODDLY)

The elapsed time for which the user task waited for redispatch on the CICS RO TCB. This time is the aggregate of the wait times between each event completion and user-task redispatch.

•User task SO TCB wait-for-dispatch (field SOMODDLY)

The elapsed time for which the user task waited for redispatch on the CICS SO TCB. This time is the aggregate of the wait times between each event completion and user-task redispatch.

•JVM server thread TCB delay time (field MAXTTDLY)

The elapsed time for which the user task waited to obtain a T8 TCB because the CICS system reached the limit of available threads.

The RO mode TCB is used for loading programs, unless the API command to load the program (EXEC CICS LOAD, EXEC CICS XCTL, or EXEC CICS LINK) is issued by an application that is running on an open TCB. In that situation, the open TCB is used to load the program instead of the RO TCB. The CICS RO mode TCB is also used for opening and closing CICS data sets, issuing IBM RACF® calls, and similar tasks.

The SO mode TCB is used to make calls to the socket interface of TCP/IP.

The T8 mode open TCBs are used by a JVM server to perform multi-threaded processing. Each T8 TCB runs under one thread. The thread limit is 2,000 for each CICS region, and each JVM server in a CICS region can have up to 256 threads.

The following new fields were added to the DFHTASK performance group that are related to the hardware environment on which the CICS region is running:

•CEC machine type (field CECMCHTP)

The central electronics complex (CEC) machine type, in EBCDIC, for the physical hardware environment where the CICS region is running. CEC is a commonly used synonym for central processing complex (CPC).

•CEC model number (field CECMDLID)

The CEC model number, in EBCDIC, for the physical hardware environment where the CICS region is running.

The following new fields were added to the DFHTASK performance group that are related to region load status:

•Maximum tasks value (field MAXTASKS)

The maximum task limit (MXT), expressed as a number of tasks, for the CICS region at the time the user task was attached.

•Current tasks in CICS region (field CURTASKS)

The current number of active user transactions in the system at the time the user task was attached.

The following new fields were added to the DFHTASK performance group that are related to specialty processor offload rates:

•Processor time on a standard processor (field CPUTONCP)

The total task processor time on a standard processor for which the user task was dispatched on each CICS TCB under which the task ran.

•Processor time eligible for offload to a specialty processor (field OFFLCPUT)

The total task processor time that was spent on a standard processor, but was eligible for offload to a specialty processor (zIIP or zAAP).

The following derived metrics can be obtained by combining the USRCPUT field with the CPUTONCP and OFFLCPUT fields:

•Total CPU time on specialty processor:

USRCPUT - CPUTONCP

•Total CPU time on standard processor that was not offload-eligible:

CPUTONCP - OFFLCPUT

•Total CPU time that was offload eligible:

OFFLCPUT + USRCPUT - CPUTONCP

Note: The times that are shown in the CPUTONCP and OFFLCPUT fields are available only when running on a system that supports the Extract CPU Time instruction service that is available on IBM System z9® or later hardware. For z/OS V1R13, the PTF for APAR OA38409 must also be applied.

The following new fields were added to the DFHTASK performance group that are related to CICS application context support:

•Application name (field ACAPPLNM)

The 64-character name of the application in the application context data.

•Platform name (field ACPLATNM)

The 64-character name of the platform in the application context data.

•Application major version (field ACMAJVER)

The major version of the application in the application context data, expressed as a 4-byte binary value.

•Application minor version (field ACMINVER)

The minor version of the application in the application context data, expressed as a 4-byte binary value.

•Application micro version (field ACMICVER)

The micro version of the application in the application context data, expressed as a 4-byte binary value.

•Operation name (field ACOPERNM)

The 64-character name of the operation in the application context data.

For more information about counters that are available in the DFHTASK performance group, see the topic “Performance data in group DFHTASK” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YRP

5.5.9 DFHTERM performance group

The MRO, LU6.1, and LU6.2 session allocation wait time (field TCALWTT) field was added to the DFHTERM performance group. This field is the elapsed time for which a user task waited for an allocate request for a multiregion operation (MRO), LU6.1, or LU6.2 session.

For more information about counters that are available in the DFHTERM performance group, see the topic “Performance data in group DFHTERM” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YRm

5.5.10 Monitoring domain global statistics

The following new fields were added to the collected monitoring domain global statistics:

•CEC Machine Type and Model Number (fields MNGMCHTP and MNGMDLID)

The CEC machine type and model number for the physical hardware environment where the CICS region is running.

•WLM Address Space Goal Management (field MNGWLMGM)

Whether z/OS Workload Manager manages the CICS address space by using region goals, transaction goals, or both.

A fragment of a sample DFHSTUP report containing the new fields is shown in Example 5-1.

Example 5-1 Fragment of a sample monitoring statistics report produced by CICS TS V5.1 DFHSTUP

CEC Machine Type and Model Number . . : 2817-779

Exception records . . . . . . . . . . : 0

Exception records suppressed. . . . . : 0

Performance records . . . . . . . . . : 0

Performance records suppressed. . . . : 0

...

MVS WLM Mode. . . . . . . . . . . . . : Goal

MVS WLM Server. . . . . . . . . . . . : Yes

MVS WLM Workload Name . . . . . . . . : CICSCPU

MVS WLM Service Class . . . . . . . . : CICSBTCH

MVS WLM Report Class. . . . . . . . . : CICS2A31

MVS WLM Resource Group. . . . . . . . :

WLM Manage Regions Using Goals of . . : Transaction

MVS WLM Goal Type . . . . . . . . . . : Velocity

MVS WLM Goal Value. . . . . . . . . . : 80

MVS WLM Goal Importance . . . . . . . : 1

MVS WLM CPU Critical. . . . . . . . . : No

MVS WLM Storage Critical. . . . . . . : No

For more information about monitoring domain statistics, see the topic “Monitoring domain: global statistics” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YRG

5.5.11 Loader domain global statistics

The following new fields were added to the collected loader domain global statistics:

•Library load requests on the RO TCB (field LDGLLRRO)

The number of times that the loader issued a program load request that used the RO TCB. This value is a subset of the number of library loads shown by LDGLLR. To calculate the number of program load requests that ran on open TCBs, subtract this value from the value shown by LDGLLR.

•Total loading time on the RO TCB (field LDGLLTRO)

The time taken for the number of library loads shown by LDGLLRRO. This value is a subset of the time shown by LDGLLT. To calculate the time taken for program load requests that ran on open TCBs, subtract this value from the value shown by LDGLLT.

A fragment of a sample DFHSTUP report that contains the new fields is shown in Example 5-2. The Average loading time on the RO TCB field is calculated by the DFHSTUP program and is not included directly in the SMF data.

Example 5-2 Sample CICS TS V5.1 DFHSTUP loader domain global statistics report fragment

LIBRARY load requests . . . . . . . . . . . . . . : 0

LIBRARY load requests on the RO TCB . . . . . . . : 0

Total loading time. . . . . . . . . . . . . . . . : 00:00:00.0000

Total loading time on the RO TCB. . . . . . . . . : 00:00:00.0000

Average loading time. . . . . . . . . . . . . . . : 00:00.000000

Average loading time on the RO TCB. . . . . . . . : 00:00.000000

For more information about loader domain global statistics, see the topic “Loader domain: Global statistics” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YRb

5.6 Virtual storage constraint relief

Virtual storage constraint relief (VSCR) is the reduction of virtual storage usage, which helps avoid short-on-storage conditions and can reduce the need for more CICS regions. CICS TS V5.1 provides VSCR in a number of areas, which reduces pressure on 24-bit and 31-bit virtual storage.

This section describes virtual storage improvements for the following areas:

•24-bit virtual (below the line)

•31-bit virtual (above the line but below the bar)

•64-bit virtual (above the bar)

5.6.1 24-bit virtual storage

The following CICS infrastructure items now use 31-bit storage in place of all, or some, of the 24-bit storage that was used in previous releases:

•Sync point and back out processing.

•Processing for transient data EXEC CICS application programming commands, wherever possible.

•CICS execution diagnostic facility (CEDF).

•CICS command-language tables for command interpreter (CECI) and other functions.

•Processing for journaling EXEC CICS application programming commands.

•Processing for function-shipped DL/I calls.

•Mirror transactions. For more information, see 5.6.4, “Mirror transactions” on page 57.

•The COMMAREA on an EXEC CICS XCTL call. For more information, see 5.6.5, “XCTL with a communication area” on page 58.

For more information about items that are improved in the CICS TS V5.1, see the topic “Reduced use of 24-bit storage by CICS” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YRL

5.6.2 31-bit storage

The following CICS infrastructure items now use 64-bit storage in place of all, or some, of the 31-bit storage that was used in previous releases:

•Console queue processing

•Storage allocation control blocks

•Loader control blocks

For more information about changes in the use of 31-bit storage, see the topic “Changes in CICS storage use from 31-bit to 64-bit storage” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YR9

5.6.3 64-bit storage

In CICS TS V5.1, the new managed platform and application context CICS facilities use 64-bit virtual storage.

User applications can now directly access 64-bit storage. For more information, see 5.7, “64-bit application support” on page 58.

5.6.4 Mirror transactions

The mirror transactions (CEHP, CEHS, CPMI, CSHR, CSMI, CSM1, CSM2, CSM3, CSM5, and CVMI) are now defined as TASKDATALOC(ANY).

5.6.5 XCTL with a communication area

When a communication area (COMMAREA) is used with an EXEC CICS XCTL command, CICS can now create the COMMAREA in 31-bit storage, rather than 24-bit storage, where appropriate.

When EXEC CICS XCTL is used, CICS ensures that any COMMAREA is addressable by the program that receives it by creating the COMMAREA in an area that conforms to the addressing mode of the receiving program. If the receiver is AMODE(24), the COMMAREA is created in 24-bit storage. If the receiver is AMODE(31), the COMMAREA is created in 31-bit storage.

In earlier releases of CICS, the COMMAREA was always copied into 24-bit storage.

5.6.6 User exit global work area

The EXEC CICS ENABLE PROGRAM system programming command has the new attribute GALOCATION that specifies the location of the storage that CICS provides as a global work area for this exit program. The GALOCATION attribute can have one of the following values:

•LOC24

The global work area is in 24-bit storage. This location is the default location.

•LOC31

The global work area is in 31-bit storage.

For more information about the EXEC CICS ENABLE PROGRAM system programming command, see the “ENABLE PROGRAM” topic in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4zLc

5.6.7 Related APARs

The IBM Language Environment® APAR PM57053 reduces the amount of virtual storage that is consumed by applications in 24-bit storage. For more information, see “PM57053: CEECPINI GETS TOO MUCH BELOW-THE-LINE STORAGE IN CICS” in IBM Support Portal at this website:

http://www.ibm.com/support/docview.wss?uid=isg1PM57053

5.7 64-bit application support

CICS TS V5.1 supports non-Language Environment assembly language programs that run in 64-bit addressing mode, which provides 64-bit application support to access large data objects.

New API commands, a new CICS-supplied procedure, and new CICS executable modules are supplied to provide 64-bit application support. CICS storage manager, program manager, loader domain, CICS-supplied macros, and the CECI and CEDF transactions are changed to provide 64-bit application support. New dynamic storage areas (DSAs) are available in 64-bit storage.

For more information about developing 64-bit assembly language programs, see the “Developing AMODE(64) assembly language programs” topic in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YFy

5.8 Java 7 and zEnterprise EC12

As described in 5.2.4, “Java throughput” on page 43, CICS TS V4.2 supports Java V6.0.1 only, and CICS TS V5.1 supports Java 7. The IBM Java 7 SDK for z/OS provides greater exploitation of the IBM zEnterprise EC12 hardware than previous releases of the SDK. The improved hardware exploitation by the Java virtual machine (JVM) can provide significant performance improvements for CICS workloads that use Java.

Note: APAR PI30532 enables support for Java 7.1 in CICS TS V5.1. APAR PI52819 enables support for Java 8 in CICS TS V5.1.

This section compares the internal throughput rate (ITR) of a Java workload when running in various combinations of hardware and software configurations. For more information about ITR, see 1.3.2, “Internal throughput rate” on page 7.

The workload was an intensive Java application that processes an inbound web service by using the Java pipeline implementation.

The IBM zEnterprise 196 (z196) was a model M80 that was running on an LPAR configured with four dedicated CPs, which resulted in an LSPR processor equivalent of 2817-704.

The IBM zEnterprise EC12 (zEC12) was a model HA1, running on an LPAR configured with four dedicated CPs, which resulted in an LSPR processor equivalent of 2827-704.

Both configurations used z/OS V1R13. The following hardware and software configuration combinations were studied:

•CICS TS V4.2 using Java V6.0.1 SR3 on a z196

•CICS TS V5.1 using Java V7.0 SR3 on an z196

•CICS TS V5.1 using Java V7.0 SR3 on a zEC12

•CICS TS V5.1 using Java V7.0 SR3 with aggressive hardware exploitation on a zEC12

Aggressive hardware exploitation was enabled in Java V7.0 by using the -Xaggressive and the -Xjit:noResumableTrapHandler runtime options.

Figure 5-7 shows plotting the relative ITR values that was recorded for these configurations.

Figure 5-7 Comparison of hardware and software configurations for a Java workload

As shown in Figure 5-7, the use of Java V7.0 in CICS TS V5.1 provided equivalent throughput when compared with Java V6.0.1 in CICS TS V4.2.

Upgrading the hardware to a zEC12 improved ITR by 30% and enabled the -Xaggressive option that further increased this value to 39%.

Another Java workload was also tested that had only a small amount of Java logic. This simple workload demonstrated a 24% improvement when moving from a z196 to a zEC12. The use of the -Xaggressive JVM option increased this improvement slightly to 25%. This improvement of 25% when moving from a z196 to a zEC12 is in line with the LSPR expectations.

For more information about interpreting LSPR tables, see 1.3, “Large Systems Performance Reference” on page 6. The LSPR tables used can be found in IBM Resource Link at this website:

https://www.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprITRzOSv1r13

From the LSPR tables, the relative ITR values were seen when running a workload with an average relative nest intensity:

•2817-704 = 7.72

•2827-704 = 9.66

Therefore, the expected ITR improvement equals 9.66 ÷ 7.72 = 1.25, or an increase of 25%.

5.9 CICSPlex System Manager dynamic routing

By using a variant of the Data Systems Workflow (DSW) workload as described in 3.2, “Data Systems Workload” on page 20, the cost per transaction was measured to understand the overhead that was involved with the use of dynamic transaction routing by using CICSPlex System Manager (CICSPlex SM). The following scenarios were used to provide the performance results:

•Connections from 2500 simulated LU2 clients were processed directly by each of the AORs with no TORs involved.

•A single TOR routed transaction to the AORs by using a simple round robin routing algorithm.

•Transactions were routed to the AORs with CICSPlex SM dynamic routing by using the CICSPlex SM sysplex optimized routing algorithm.

In all cases, four AORs were used and accessed the data files by using VSAM RLS. TORs were connected to AORs through MRO/XM connections. Eight CPs were online during the measurements, which provided an LSPR processor equivalent of 2817-708.

The results of the performance study are listed in Table 5-8, Table 5-9 on page 61, and Table 5-10 on page 62. Table 5-8 presents the scenario in which no TOR was used; therefore, the TOR ETR and TOR CPU columns are not applicable.

Table 5-8 Performance results with no transaction routing

TOR ETR	TOR CPU	AOR ETR	AOR CPU	CPU per transaction (ms)
n/a	n/a	2072.61	178.94%	0.863
n/a	n/a	2842.46	230.46%	0.810
n/a	n/a	4120.62	324.47%	0.787
n/a	n/a	5035.52	387.15%	0.768
n/a	n/a	5617.90	427.88%	0.761

For this scenario, the average CPU cost per transaction was 0.797 ms.

Table 5-9 presents the scenario where a simple round-robin routing algorithm is configured by specifying the SIT parameter DTRPGM. This table demonstrates the migration costs when moving from single local access to MRO transaction routing.

Table 5-9 Performance results with TOR using a round robin algorithm

TOR ETR	TOR CPU	AOR ETR	AOR CPU	CPU per transaction (ms)
1982.71	21.71%	2072.91	179.85%	0.976
2716.34	28.32%	2841.31	229.76%	0.912
3947.17	40.11%	4127.12	320.02%	0.876
4782.32	49.27%	5002.40	380.17%	0.862
5357.32	56.98%	5602.40	414.15%	0.843

The TOR and AOR transaction rates are shown separately because some of the transactions that are routed from the TOR to the AOR issue a local EXEC CICS START command for more transactions when running in the AORs.

For this second scenario, the average CPU cost per transaction was 0.893 ms.

Table 5-10 presents the scenario in which dynamic transaction routing uses CICSPlex SM sysplex that is optimized workload routing.

Table 5-10 Performance results with TOR using CICSPlex SM dynamic transaction routing

TOR ETR	TOR CPU	AOR ETR	AOR CPU	CPU per transaction (ms)
1982.55	26.31%	2071.68	178.87%	0.999
2716.04	34.55%	2840.08	229.68%	0.935
3946.17	48.99%	4125.28	321.11%	0.902
4813.04	59.46%	5033.92	380.43%	0.878
5394.49	67.92%	5640.63	418.77%	0.867

As in Table 5-9 on page 61, the TOR and AOR transaction rates are shown separately to differentiate where local EXEC CICS START commands were issued.

For this final scenario, the average CPU cost per transaction was 0.916 ms.

Comparing scenarios one and two represents the CPU cost that was incurred by introducing transaction routing into the environment. The average CPU per transaction increased by approximately 0.1 ms.

By using CICSPlex SM Sysplex Optimized Routing, the average cost per transaction increases only slightly (0.026 ms). For this workload, the transaction rate and loading on individual regions is stable and does not frequently cross predefined thresholds. When a workload is running and is not crossing thresholds, updates to the Coupling Facility are made infrequently.

For more information about CICSPlex SM Sysplex Optimized Routing, see the “Sysplex optimized workload management learning path” topic in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YFn

5.10 Workload consolidation

As described in 1.2.2, “Factors that can influence RNI” on page 5, tuning to reduce the number of simultaneously active address spaces to the proper number that are needed to support a workload can reduce relative nest intensity (RNI) and improve performance.

Combined with improvements in earlier releases, the following areas in CICS TS V5.1 enable you to run more work through fewer CICS regions than ever before:

•Increase in the concurrent task limit

Changes to the MXT system initialization parameter, which now permits a maximum of 2,000 tasks in a CICS region concurrently, are described in 5.4.6, “Maximum tasks (MXT)” on page 49.

•Virtual storage constraint relief

The reduction in CICS 24-bit and 31-bit storage usage, as described in 5.6, “Virtual storage constraint relief” on page 56, for CICS TS V5.1 enables more concurrent tasks in a single CICS region.

•Threadsafe support

Changes in CICS, as described in 5.3, “Improvements in threadsafety” on page 44, can reduce contention for the QR TCB. Workloads that were formerly constrained by the single QR TCB can now be able to run concurrently on an open TCB.

The process of workload consolidation involves reducing the number of CICS regions while maintaining the same level of availability, reliability, and throughput. The remainder of this section describes two performance studies that demonstrate how workload consolidation can reduce CPU usage, real storage usage, and operational costs because of requiring management of fewer CICS address spaces while maintaining the same transaction throughput.

5.10.1 Consolidating a COBOL VSAM workload

The first consolidation scenario used a variant of the DSW workload, as described in 3.2, “Data Systems Workload” on page 20. Four TORs used CICSPlex SM dynamic routing to distribute work to 30 AORs.

The benchmark followed the standard CICS performance test approach of collecting five measurement intervals at increasing transaction rates. For more information about the methodology used, see 2.6, “Collecting performance data” on page 16. Also, CPU measurement facility (CPU MF) data was collected for the final measurement interval to aid analysis of the workload.

A measurement was created to determine the transaction cost and resource usage before the workload consolidation exercise. The results of this measurement are listed in Table 5-11.

Table 5-11 CPU cost and storage used for DSW workload with 30 AORs

ETR	CICS CPU (% of single CP)	CPU per transaction (ms)	Frames of real storage used
4983.60	253.74%	0.640	736,961
6385.12	325.48%	0.635	737,319
10135.28	510.46%	0.619	738,387
13969.74	704.09%	0.616	739,682
15898.14	821.69%	0.629	740,917

The average CPU cost per transaction is calculated as 0.627 ms. During the highest transaction rate measurement interval, the real storage usage peaked at 740,917 frames, or 2.87 GB.

The workload was then reconfigured to use only 10 AORs. This reconfiguration was starting 10 CICS AOR address spaces, rather than 30 as in the original setup. No other configuration changes were made, with CICSPlex SM dynamically recognizing only 10 AORs were available.

By using the same data collection methodology, the benchmark was run again and the results are listed in Table 5-12.

Table 5-12 CPU cost and storage used for DSW workload with 10 AORs

ETR	CICS CPU (% of single CP)	CPU per transaction (ms)	Frames of real storage used
4969.95	232.11%	0.582	342,299
6390.11	293.22%	0.568	342,460
10137.49	456.27%	0.551	342,893
13969.68	620.51%	0.540	343,470
15867.72	725.80%	0.557	343,775

In this configuration, the average CPU cost per transaction is calculated as 0.560 ms. The peak real storage usage was recorded as 343,775 frames, or 1.31 GB.

Although both configurations sustained a throughput of over 15,800 transactions per second, the 10 AOR setup reduced the CPU cost per transaction by 11%. Real storage usage also decreased by over 50%.

It is clear that reducing the number of CICS regions reduces the overall real storage usage. By using the CPU MF data, it also can be seen how the 10 AOR configuration achieves the observed performance improvement.

Table 5-13 presents some of the most significant metrics that were obtained from the CPU MF data that was recorded as part of the measurement process. This data was obtained through a sampling process. Therefore, all counters are not absolute, but are values that present a statistical view of the workload behavior when run for the workload over a sufficiently long time.

Table 5-13 Comparison of 30 AOR and 10 AOR CPU MF data for DSW workload

Performance metric	30 AORs	10 AORs	Delta
Execution samples	2,487,298	2,201,099	-11%
Instruction first cycle (IFC)	379,000	371,470	-2%
Microseconds per transaction	628.34	556.43	-11%
Cycles per instruction (CPI)	6.53	5.90	-10%
MIPS per CP	797	882	+10%
Data cache misses	744,894	608,550	-18%
Instruction cache miss (includes TLB miss)	90,483	66,626	-26%
% cycles used by TLB misses	6.82%	5.94%	-13%
Relative nest intensity	0.48	0.34

The following data points are extracted from the CPU MF samples:

•Execution samples

Represents the number of CPU MF samples that were taken while code was run in a CICS address space used in the workload.

•Instruction first cycle (IFC)

Provides a relative indication as to the number of instructions run in the workload.

•Microseconds per transaction

Post-processing the CPU MF data requires the counter data and an input of the sustained transaction rate. The post-processing tools can calculate the total CPU that was used from the sample data. Therefore, the tools also can calculate a CPU cost per transaction value.

•Cycles per instruction (CPI)

This value represents the average number of CPU clock cycles an instruction took to run. Some instructions can be run in a single clock cycle (such as a register to register operation). Other instructions can take hundreds or thousands of cycles if the operation is moving data within real storage and must wait because data is not available in the cache.

•MIPS per CP

As described in 1.2.1, “Memory hierarchy and nest” on page 5, the instruction execution rate of a processor can vary. This metric uses the collected data to determine the millions of instructions per second (MIPS) that each CP managed to achieve during the sample period.

•Data cache misses

This counter represents the number of cycles for which a processor was waiting for data to become available from the cache.

•Instruction cache miss

This counter provides an indication of the number of cycles for which a processor was waiting for instructions to become available from the cache. It also includes time spent while the storage subsystem performs the translation from a virtual to a real storage address when the dynamic-address-translation (DAT) mechanism does not have an entry in the translation lookaside buffer (TLB). For more information about the DAT process and the hardware TLB, see Chapter 3, “Storage”, in z/Architecture Principles of Operation, SA22-7832.

•Percent of cycles used by TLB misses

Represents the fraction of cycles that were spent waiting for a TLB miss, where a TLB miss is defined in the Instruction cache miss metric.

•Relative nest intensity

RNI is defined in 1.2, “Relative nest intensity” on page 4. The RNI metric is calculated by the post-processing tools by using a formula that is specific to the hardware on which the workload is running. This formula uses a weighted count of cache misses from each of the cache levels in the memory hierarchy. For more information about the formulas that are used, see LSPR workload categories, which is available at this website:

https://www.ibm.com/servers/resourcelink/lib03060.nsf/pages/lsprwork

The microseconds per transaction value is reduced by approximately 11% in the second configuration. This delta matches the relative change as measured by RMF. The sampling data CPU cost per transaction does not exactly match the value as reported by RMF because of the manner in which CPU is accounted in instructions that operate across address spaces.

The number of instructions run (represented by IFC) for both environments changes only by approximately 2%, which is expected given that they are running the same number of CICS transactions. The reduction in CPU cost per transaction is because of the smaller number of data caches, instruction cache, and TLB misses. As predicted by 1.2.1, “Memory hierarchy and nest” on page 5, the MIPS rate increases because there are fewer cycles that are wasted while the processor is waiting for data to be retrieved from deep in the memory hierarchy.

As described in 1.2, “Relative nest intensity” on page 4, the RNI of the workload is also reduced because the processor does not need to go as deep into the memory hierarchy to find the required data.

5.10.2 Consolidating the GENAPP workload

The second consolidation scenario used a variant of the General Insurance Application (GENAPP). GENAPP is available for download as SupportPac CB12, which is available at this website:

http://www.ibm.com/support/docview.wss?uid=swg24031760

The workload was driven by using the web services extensions that are included in the SupportPac. For performance purposes, the supplied VSAM files and DB2 database definitions were extended to include a larger working set of data.

The baseline performance measurement that uses 30 AORs was run and the results are listed in Table 5-14.

Table 5-14 CPU cost and storage used for GENAPP workload with 30 AORs

ETR	CICS CPU (% of single CP)	CPU per transaction (ms)	Frames of real storage used
828.31	94.85%	1.145	862,739
992.14	114.24%	1.151	873,593
1237.67	139.43%	1.126	880,690
1633.98	185.24%	1.133	897,041
1883.25	233.38%	1.239	959,291

The average CPU cost per transaction is calculated as 1.159 ms. During the highest transaction rate measurement interval, the real storage usage peaked at 959,291 frames, or 3.66 GB.

As with the DSW scenario, the workload was then reconfigured to use only 10 AORs with no other changes to the workload. By using the same data collection methodology, the benchmark was run again and the results are listed in Table 5-15.

Table 5-15 CPU cost and storage used for GENAPP workload with 10 AORs

ETR	CICS CPU (% of single CP)	CPU per transaction (ms)	Frames of real storage used
827.72	86.42%	1.044	381,422
986.51	104.35%	1.057	389,384
1231.89	129.67%	1.052	394,495
1629.05	166.94%	1.024	399,247
1916.36	209.88%	1.095	464,827

In this configuration, the average CPU cost per transaction is calculated as 1.054 ms. The peak real storage usage was recorded as 464,827 frames, or 1.77 GB.

The sustained transaction rate for each configuration was similar (approximately 1,900 transactions per second), but the 10 AOR setup reduced the CPU cost per transaction by around 9%. Again, peak real storage usage decreased by over 50%.

By using the metrics as defined for Table 5-13 on page 64, the CPU MF data is listed for the GENAPP workload in Table 5-16.

Table 5-16 Comparison of 30 AOR and 10 AOR CPU MF data for GENAPP workload

Performance metric	30 AORs	10 AORs	Delta
Execution samples	3,517,830	3,188,565	-9%
Instruction first cycle (IFC)	589,236	590,667	+2%
Microseconds per transaction	1240	1095	-11%
Cycles per instruction (CPI)	5.97	5.39	-10%
MIPS per CP	898	1003	+12%
Data cache misses	1,145,876	932,896	-19%
Instruction cache miss (includes TLB miss)	149,468	115,015	-23%
% cycles used by TLB misses	9.95	9.23	-7%
Relative nest intensity	0.75	0.51

The microseconds per transaction value is again reduced by approximately 11% in the second configuration. This reduction in CPU cost per transaction is the result of a drop in data cache, instruction cache, and TLB misses. The MIPS rate for each processor is increased for the same reasons described in the DSW scenario. The RNI metric also decreases because the formula used accounts for the number of cache misses for the workload.

5.11 Effect of threadsafe transient data

CICS TS V5.1 makes the transient data commands threadsafe when used with a queue in a local CICS region, or function shipped to a remote CICS region over an IPIC connection.

This section describes the benefits of threadsafe transient data commands when combined with other improvements in CICS threadsafe support.

5.11.1 Maximizing time on an open TCB

An example threadsafe application is defined to run in user key, make calls to DB2, and is required to maximize the amount of time spent running on an open TCB. Before CICS TS V4.2, the following possible changes are made to the PROGRAM definition:

•Specify API(CICSAPI)

As described in 4.3.2, “Programs specifying JVM(NO) and API(CICSAPI)” on page 30, the program begins execution on the QR TCB, and then switches to an L8 TCB when making a DB2 call. Execution continues on the L8 TCB until a non-threadsafe CICS command is run, at which time it switches back to the QR TCB and stays there until the next DB2 call.

•Specify API(OPENAPI)

As described in 4.3.3, “Programs specifying JVM(NO) and API(OPENAPI)” on page 31, the program begins execution on an L9 open TCB. The drawback of specifying OPENAPI is that execution switches to an L8 TCB for each DB2 call, and switches back to an L9 TCB on completion.

CICS TS V4.2 introduced the value REQUIRED for the CONCURRENCY attribute of a CICS PROGRAM definition as a method of maximizing time on an open TCB. As shown in Table 4-1 on page 30, execution begins on an L8 TCB and remains on the L8 TCB when running a DB2 call. When a non-threadsafe CICS command is run, the application switches execution to the QR TCB, but then switches back to the L8 TCB on completion of the command.

5.11.2 Sample transient data and DB2 application

A test application was written that reads a row from a DB2 table and then writes a record to a transient data queue. This DB2 read and then TD write process was repeated 150 times. Three configurations were tested, with the following CICS releases and program attributes:

•CICS TS V4.1 - CONCURRENCY(THREADSAFE)

•CICS TS V4.2 - CONCURRENCY(REQUIRED)

•CICS TS V5.1 - CONCURRENCY(REQUIRED)

In all cases, the program was defined as API(CICSAPI).

In the CICS TS V4.1 configuration, the program starts on the QR TCB until it makes a DB2 call, and then switches to an L8 TCB. The program stays on the L8 until it makes a non-threadsafe call, such as a transient data read. The non-threadsafe call causes execution to switch back to the QR. Program execution now stays on the QR until the next DB2 call when it switches back to the L8. This process continues until the end of the program. Although all of this application code is coded to threadsafe standards and can run on open TCBs, a large portion of it is running on the QR.

In CICS TS V4.2, the program can be defined as CONCURRENCY(REQUIRED). Now the application starts on an L8 TCB and stays there until it makes a non-threadsafe call, such as the transient data write. When the transient data command completes, execution switches back to the L8 immediately and continues running on the open TCB. Now only a small portion of the application runs on the QR TCB and most of it runs on the L8 TCB reducing contention on the QR TCB.

When the application is moved to CICS TS V5.1, the transient data write, which caused TCB change mode operations in earlier releases, is now threadsafe. The entire application can now run on open TCBs with a reduction in CPU per transaction because of the removal of the switches.

5.11.3 CICS monitoring data results

By using the application as described in 5.11.2, “Sample transient data and DB2 application” on page 68, a workload was run and CMF data was collected.

The CMF data was post-processed by using the CICS Performance Analyzer tool. Extracts from key metrics are listed in Table 5-17.

Table 5-17 CICS monitoring data for transient data and DB2 application

CICS	Average response time (ms)	Average user CPU time (ms)	Average QR CPU time (ms)	Average L8 CPU time (ms)	Average change mode	Average TD cmd count	Average RMI DB2 time (ms)
V4.1	11.942	6.967	4.597	2.370	302	150	1.626
V4.2	11.393	6.875	0.212	6.663	306	150	1.420
V5.1	6.805	6.195	0.026	6.169	8	150	1.147

The CICS V4.1 configuration shows an average QR CPU usage of 4.597 ms per transaction and an L8 TCB average of 2.370 ms per transaction. As described in 5.11.2, “Sample transient data and DB2 application” on page 68, the application is coded to threadsafe standards, but most of the application runs on the QR TCB. Average CPU per transaction is reported as 6.967 ms.

When moving the application to CICS V4.2 and CONCURRENCY(REQUIRED), most of the code now runs on an open TCB. The average QR CPU usage is reduced to 0.212 ms per transaction, while the L8 TCB average is increased to 6.663 ms per transaction. There are still a high number of TCB change mode operations because the transient data write command is non-threadsafe in CICS V4.2. Average CPU per transaction remains effectively unchanged at 6.875 ms.

Moving the application to CICS V5.1 introduces threadsafe transient data, which removes the requirement for most of the TCB change mode operations. The number of change modes is reduced from 306 to 8. Only start-of-task and end-of-task processing are now run on the QR TCB, which reduces the average QR CPU usage down to 0.026 ms per transaction. The reduction in change mode operations reduces the CPU usage overall, with the total CPU per transaction reduced from 6.967 ms to 6.195 ms, which is a reduction of over 10% for this application.

Moving to CONCURRENCY(REQUIRED) and CICS V5.1 also reduces the average response time significantly, from 11.942 ms to 6.805 ms.

5.11.4 Throughput results

Figure 5-8 shows the CPU usage of the CICS region as the transaction rate increases for the workload, in all of the CICS TS V4.1, V4.2, and V5.1 configurations.

Figure 5-8 CPU comparison for TD and DB2 workload on CICS TS V4.1, V4.2, and V5.1

Figure 5-8 also shows that the CICS V4.1 configuration becomes constrained by the QR TCB at around 200 transactions per second. This result is in line with expectations based on the figures in Table 5-17 on page 69. In the CICS V4.1 results, each transaction required 4.597 ms of CPU time on the QR TCB. Therefore, the QR TCB can sustain only a maximum of 1000 / 4.597 = 217 transactions per second.

Specifying a value of REQUIRED for the CONCURRENCY attribute in CICS V4.2, the constraint on the QR TCB is eliminated. In this configuration, the CPU cost per transaction is approximately the same as for CICS V4.1 (as shown by the overlapping lines) but the maximum throughput increases to around 570 transactions per second. In this configuration, the constraint is now the total CPU available on the LPAR.

The elimination of the TCB switching in CICS V5.1 reduces total CPU per transaction and enables even higher throughput to be achieved before being limited by the available CPU on the LPAR.

5.12 Transaction isolation

CICS transaction isolation builds on CICS storage protection, which enables user transactions to be protected from one another. Transaction isolation uses the MVS subspace group facility to offer protection between transactions. This configuration ensures that an application program that is associated with one transaction cannot accidentally overwrite the data of another transaction.

Transaction isolation is enabled globally for a CICS region by specifying TRANISO=YES in the CICS SIT parameter at startup. Transaction isolation requires storage protection to be enabled, which is achieved by specifying STGPROT=YES as a CICS SIT parameter.

In addition to specifying the storage and execution key individually for each user transaction, you can specify that CICS is to isolate the user-key task-lifetime storage of a transaction to provide transaction-to-transaction protection. You complete this task by using the ISOLATE option of the TRANSACTION resource definition.

Transaction isolation does not apply to 64-bit storage.

For more information about the use of transaction isolation in a CICS environment, see the topic “CICS storage protection and transaction isolation” in the IBM Knowledge Center at this website:

https://ibm.biz/Bd4YEH

Transaction isolation is not a new feature in CICS TS V5.1, although changes to default values in this release can affect how storage is allocated when compared to previous CICS TS releases. The remainder of this section meets the following goals:

•Describes the concepts of transaction isolation

•Provides a study that demonstrates the overhead of enabling transaction isolation

•Highlights how changes to default values in CICS TS V5.1 can affect your environment when running with transaction isolation enabled

5.12.1 Unique subspaces

In general, transaction isolation ensures that user-key programs are allocated to separate (unique) subspaces, and have the following characteristics:

•ISOLATE(YES) and KEY(USER) on the transaction definition

•Read and write access to the user-key task-lifetime storage of their own tasks, which is allocated from one of the user dynamic storage areas (UDSA or EUDSA)

•Read and write access to shared storage, which is storage obtained by GETMAIN commands with the SHARED option (SDSA or ESDSA)

•Read access to the CICS-key task-lifetime storage of other tasks (CDSA or ECDSA)

•Read access to CICS code

•Read access to CICS control blocks that are accessible by the CICS API

User-key programs do not have any access to user-key task-lifetime storage of other tasks.

You might have some transactions where the application programs access one another's storage in a valid way. One such case is when a task waits on one or more event control blocks (ECBs) that are later posted by an MVS POST or hand posting by another task.

For example, a task can pass the address of a piece of its own storage to another task (by a temporary storage queue or some other method) and then WAIT for the other task to post an ECB to say that it includes updated the storage. If the original task is running in a unique subspace, the posting task fails when attempting the update and to post the ECB, unless the posting task is running in CICS-key.

CICS supports the following methods to ensure that transactions that must share storage can continue to work in the subspace group environment:

•Use of the common subspace

•Use of the base space

•Use of common storage by obtaining the storage with the SHARED option

5.12.2 Common subspace

You can specify that all the related transactions are to run in the common subspace. The common subspace allows tasks that must share storage to coexist, while isolating them from other transactions in the system. Transactions that are assigned to the common subspace have the following characteristics:

•They specify ISOLATE(NO) on the transaction definition.

•They have read and write access to each other's task-lifetime storage.

•They have no access of any kind to storage of transactions that run in unique subspaces.

•They have read only access to CICS storage.

5.12.3 Base space

Programs that are defined with EXECKEY(CICS) run in the base space.

You can ensure that the application programs of the transactions that are sharing storage are all defined with EXECKEY(CICS). This setting ensures that their programs run in base space, where they have read and write access to all storage. However, this method is not recommended because it does not give any storage protection.

5.12.4 Transaction isolation performance effect

The performance figures in this section were obtained when running the DSW static routing workload as described in 3.2.1, “DSW static routing” on page 20 by using CICS TS for z/OS V5.1. Table 5-18 lists the CPU cost per transaction and number of real frames of storage that is used by the workload when running with transaction isolation disabled.

Table 5-18 CPU cost per transaction and real storage frames used with transaction isolation disabled

ETR	Real storage frames	CPU per transaction (ms)
2072.30	163,292	0.618
2842.24	163,292	0.609
4130.87	163,335	0.594
5047.97	163,335	0.594
5681.45	163,487	0.586

The average CPU cost per transaction is calculated to be 0.600 ms.

Table 5-19 lists the CPU cost per transaction and number of real frames of storage that are used by the same workload when running with transaction isolation enabled.

Table 5-19 CPU cost per transaction and real storage frames used with transaction isolation enabled

ETR	Real storage frames	CPU per transaction (ms)
2073.25	188,103	0.677
2842.38	188,103	0.670
4129.20	188,138	0.659
5044.09	185,032	0.658
5676.44	185,111	0.655

The average CPU cost per transaction is calculated to be 0.664 ms.

The data in Table 5-18 on page 72 and Table 5-19 on page 73 shows that the use of transaction isolation does have a cost. In this workload, the increase in CPU was about 10% where all of the transactions ran in unique subspaces. There was also an increase in real storage usage.

5.12.5 Page and extent sizes

The transaction isolation facility increases the allocation of some 31-bit virtual storage for CICS regions that are running with transaction isolation active.

If you are running with transaction isolation active, CICS allocates storage for task-lifetime storage in multiples of 1 MB for user-key tasks that require 31-bit storage. The minimum unit of storage allocation in the EUDSA when transaction isolation is active is 1 MB.

Table 5-20 lists the page and extent sizes for the UDSA and EUDSA storage areas, where transaction isolation is disabled and transaction isolation is enabled.

Table 5-20 Page and extent sizes with transaction isolation disabled and enabled

Transaction isolation	Page size		Extent size
Transaction isolation	No	Yes	No	Yes
UDSA	4 KB	4 KB	256 KB	1 MB
EUDSA	64 KB	1 MB	1 MB	1 MB

As described in 5.6.4, “Mirror transactions” on page 57, CICS TS V5.1 changes the default value for the TASKDATALOC attribute on a TRANSACTION resource from BELOW to ANY. This change also affects the mirror transaction CSMI.

If running with transaction isolation enabled, the change to TASKDATALOC for the mirror transaction might cause higher peak usage in EUDSA because any GETMAIN request for user storage by this transaction is allocated in 31-bit storage, not 24-bit storage. As shown in Table 5-20, the minimum storage that is allocated is 1 MB, not 4 KB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5. CICS TS for z/OS V5.1

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 5. CICS TS for z/OS V5.1