The z/OS systems programmer
The IBM z/OS systems programmer often is the heart of the shop. Historically, the z/OS system programmers held responsibility for managing the z Systems hardware configuration and installing, customizing, and maintaining the z/OS operating system. It is their responsibility to ensure that the z Systems services are available and operating according to service level agreements. In some organizations, the z/OS system programmer might also have responsibility for capacity planning.
This system expertise and view of the greater picture makes a z/OS system programmer a foundational resource in the multi-person role of service provider. As a part of the service provider’s role, the modern z/OS system programmers must branch outside of their typical responsibilities to team with other supportive roles in their organization to understand the needs of the service consumers. They must be willing to look at the needs of the service consumer objectively and creatively to find a way to provide the services they require and support them based on their traditional expertise. This challenge can be exciting for the z/OS system programmer and provide resource savings and increased operational efficiencies.
This chapter provides a look into Walmart’s environment and how they used key pieces to create z/OS cloud service. It also describes the role that each part of the system plays in providing a key resource and how the components can be used for the cloud service.
This chapter includes the following topics:
5.1 Sysplex architecture
A key enabler of the cloud delivery model for z/OS is Parallel Sysplex. Although it is not required, Parallel Sysplex greatly facilitates the characteristics that are needed for cloud services.
5.1.1 Resource pooling, rapid elasticity, and measured service
Two of the essential characteristics of cloud, resource pooling, and rapid elasticity can be achieved primarily through configuring the z/OS Parallel Sysplex environment with little need for extra components to be developed or installed.
Resource pooling is an inherent characteristic of the platform since it was created. Because of the virtualized nature of the system, nearly any processing on z/OS is performed atop resource pools. Examples of these pooled resources include logical compute cycles, virtual storage (memory), and Data Facility Storage Management Subsystem (DFSMS) storage groups. Even beyond the resource pooling that is provided as part of the platform, other forms of resource pooling can be configured and used for service delivery; for example, high availability (HA) CICS regions can be considered a pool.
Elasticity can be relatively easily achieved in z/OS by over-provisioning the virtual runtime environments that host the services. Then, rely on the platform capabilities, such as Workload Manager (WLM), to ensure that those address spaces receive enough resources to satisfy demand, and then, reassign those resources elsewhere as needed. For more information about the more relevant WLM concepts, see 5.2, “Workload manager” on page 47.
The measured service characteristic is also largely enabled by configuring the environment, but likely needs some investment to fully realize. Configuring System Management Facilities (SMF) properly enables capturing most of the necessary data. However, this data is not immediately consumable. More tools or products are needed to make the information available to the consumers, providers, and other operational roles. For more information about this concept, see Chapter 6, “Operational considerations” on page 59 and the IBM Redbooks publication, Creating IBM z/OS Cloud Services, SG24-8324.
5.1.2 Unique characteristics
The existence of shared resources across disparate systems is a unique characteristic of the z/OS Parallel Sysplex architecture. By using concepts, such as distributed dynamic virtual IP addressing (DVIPA) to simplify endpoint management or data sharing to reduce data replication requirements provides particular value in a service-oriented delivery model. These attributes must be considered when determining the types of services that are a good fit for the platform.
Walmart used these and other characteristics of the platform to provide a differentiated experience to consumers with their caching service.
5.1.3 Important pieces
Numerous pieces are required to set up a z/OS Parallel Sysplex and volumes of documentation are available on the subject. The complete setup of a parallel sysplex is beyond the scope of this publication and is not described herein. This material is built on the assumption that a base parallel sysplex is defined, and the remainder of this chapter focuses on particular pieces of the z/OS and parallel sysplex configuration that are more specifically relevant to cloud service delivery. Many of these settings are directly related, and complementary to the items that are described in Chapter 4, “The CICS systems programmer” on page 27.
5.2 Workload manager
The WLM component of the z Systems platform plays an important role in supporting the pooled resources and rapid elasticity characteristics of the cloud delivery model. These characteristics rely on automated assignment and reassignment of resources within predefined rules. WLM provides this function through policy-based resource allocations across different workload categorizations and groups.
There are various WLM-related features that can be used to ensure a dynamic hosting environment. In Walmart’s case, the following particular features were used:
LPAR weighting
Service class
WLM ASID weights
5.2.1 LPAR weighting
Walmart uses logical partition (LPAR) weighting at the Processor Resource/System Manager (PR/SM) level. This feature is used to ensure that a particular amount of capacity is available to each system that is hosting services, while also allowing these systems to make use of unused capacity on the central processor complex (CPC), if needed.
5.2.2 Service class
The new cloud services (and web services in general) are regarded much like traditional online workloads, so WLM service classes were used as models for management of newer service-oriented workloads. In fact, Walmart adopted SOA-style web services many years earlier, and the same WLM service classes were used to manage the regions that host cloud services.
The address space identifiers (ASIDs) that are associated with the regions are included in service classes that carry a very high importance (1) and higher than average velocity (60) with NORMAL I/O priority, as shown in Figure 5-1.
Base goal:
CPU Critical - NO I/O Priority Group- NORMAL
 
# Duration   Imp Goal description
    - --------   -   ---------------------------------------------------------
1 1 Execution velocity of 60
Figure 5-1 WLM service class for cloud service workloads
The ASID service class is used by the z/OS Workload Manager to ensure that performance goals (which are specified in business terms) are met. The activity that is associated with meeting that performance goal is the responsibility of the operating system. The actions that the system takes are related to the allocation of CPU and storage.
5.2.3 WLM ASID weights
WLM also maintains an awareness of how well each ASID within a service class is meeting its goals. This information is used through WLM managed sysplex distribution to direct new work to the regions that are meeting or exceeding their associated goals, and are most suited to accommodate new requests. For more information about the use of this capability, see 5.3, “TCP/IP” on page 48.
5.3 TCP/IP
Chapter 4, “The CICS systems programmer” on page 27 describes many aspects of this cloud service delivery implementation from the CICS perspective that rely upon the configuration of TCP/IP that is provided by the z/OS systems programmer. In particular, configuring sysplex distributed DVIPA to be managed by WLM is needed for this design.
There are three general tasks that comprise this setup: defining the TCP/IP configuration to enable sysplex distribution, establishing shared ports, and assigning the DVIPA with ASID-level WLM management. Each of these tasks, as described in the following sections, is configured in an installation’s TCP/IP profile data set.
5.3.1 Enabling sysplex distribution
Two statements are required in the TCP/IP profile IPCONFIG (or IPCONFIG6) to initially enable this functionality: DYNAMICXCF and SYSPLEXROUTING, as shown in Figure 5-2.
IPCONFIG
DYNAMICXCF 123.45.67.89 255.255.255.224 1
SYSPLEXROUTING
Figure 5-2 TCP/IP profile configuration for sysplex distribution
The DYNAMICXCF statement establishes a single address by which all stacks in a sysplex can communicate with the other stacks. It includes the IP address, a subnet mask, and a cost metric identifier (unless OMPROUTE is used; then, it is overridden).
The SYSPLEXROUTING statement establishes the TCP/IP stack as part of a sysplex and enables the use of WLM consultation for distributing requests to the stacks in the sysplex.
5.3.2 Establish shared ports
To set up shared ports, assign the same port number to each region in a cluster or group of regions and identify the port as shared. The port assignment is configured in the TCP/IP profile, as shown in Figure 5-3 on page 49.
 
PORT 12345   TCP CICSRGI    NOAUTOLDELAYA SHAREP
PORT 12345   TCP CICSRG2    NOAUTOLDELAYA SHAREP
PORT 12345   TCP CICSRG3    NOAUTOLDELAYA SHAREP
PORT 12345   TCP CICSRG4    NOAUTOLDELAYA SHAREP
PORT 12345   TCP CICSRGS    NOAUTOLDELAYA SHAREP
PORT 12345   TCP CICSRG6    NOAUTOLDELAYA SHAREP
Figure 5-3 PORT statements in TCP/IP profile
Figure 5-3 shows the explicit assignment of a single port number to multiple regions. To reduce ongoing maintenance, the region name can be wild-carded so that any other regions that are created under the same name pattern acquire the port assignment without further updates to the TCP/IP profile data set, as shown in the following example:
PORT 12345 TCP CICSRG* NOAUTOL DELAYA SHAREP
The NOAUTOL, DELAYA, and SHAREP parameters of the PORT statement are used to specify the following parameters:
NOAUTOL(OG): Prevent the ASID from being restarted after it is stopped.
DELAYA(CKS): Explicit assignment of default setting that delays transmission of acknowledgements of packets that were received with the PUSH bit on in the TCP header.
SHAREP(ORT): Required to share port across multiple listeners.
5.3.3 Assign DVIPA
The DVIPA can then be set up for an address by using the shared port that the groups of regions listen on, as shown in Figure 5-4.
VIPADYNAMIC
VIPADIST DISTM SERVERNLM 123.45.67.89
PORT  12345 XXXXX XXXXX XXXXX XXXXX
DESTIP ALL
ENDVIPADYNAMIC
Figure 5-4 VIPADYNAMIC-VIPADISTRIBUTE statements in TCP/IP profile
This configuration statement includes the following components:
VIPADYNAMIC VIPADIST(RIBUTE) (DEFINE implicit): Enables the sysplex distributor function for DVIPA.
DISTM(ETHOD) SERVERWLM: Specifies that server-specific values should be collected for this group of DVIPA ports. This specification allows WLM to assign weights to individual ASIDs for request distribution decisions.
DESTIP ALL: Specifies that all TCP/IP stacks in the sysplex are targets for the DVIPA/Ports in this statement.
The combination of shared ports and WLM-managed sysplex distribution ensures that requests are routed to the most appropriate system and region in the sysplex to handle that request at any specific time. The design of this type of environment is shown in Figure 5-5 on page 50.
Figure 5-5 WLM managed sysplex distribution
5.3.4 Domain Name System
The responsibility for Domain Name System (DNS) management often is outside of the z/OS-related roles. However, the z/OS systems programmer and the CICS systems programmer often must interact with and use the services of the DNS administrators in their organization.
A DNS name makes IP addresses more logical and consumable. To establish an accessible entry point for the defined VIPA, the z/OS systems programmer should request a DNS name to be associated with it. This association is accomplished with a DNS A record.
Walmart employs a model in which a DNS CNAME alias is also defined and associated with the DNS A record that was assigned to the VIPA. Network load balancers translate a request for the alias and forward the request to the VIPA DNS and port number of the service-hosting region cluster. This design provides several benefits through abstraction and manageability. The inclusion of the DNS constructs to the TCP/IP configuration is shown in Figure 5-6 on page 51.
Figure 5-6 DNS assignments provide abstraction
For more information about the various ways that value is derived from this model, see Chapter 4, “The CICS systems programmer” on page 27. A particular aspect of this design that remains relevant to the z/OS systems programmer is its role in ensuring availability of service. For more information, see 5.4, “High availability”.
5.4 High availability
Along with enabling the resource pooling and rapid elasticity characteristics, the design that is described in section 5.3, “TCP/IP” on page 48 and shown in Figure 5-5 on page 50 and Figure 5-6 on page 51 contributes substantially to high availability (HA).
Distributing requests across multiple systems and multiple regions per system provides resiliency. A single region, multiple regions, or even an entire LPAR or system can be removed (in a planned or unplanned manner), and service requests can still be satisfied. The workload can continue to be distributed to the remaining systems and regions, as shown in Figure 5-7 on page 52.
 
Figure 5-7 Service availability if there are outages
In a situation where multiple regions or an entire system is removed from service, performance degradation can occur because of limited resources. These types of scenarios should be considered when planning the initial design of the hosting environment. Sufficient resources should be defined to handle expected workloads if a portion of the environment is unavailable.
Walmart elected to provide tiers of availability assurances that were related to their services. The basic availability configuration is sufficient for many consumers, but some consumers wanted greater availability assurances. As a result of this consumer feedback, the platform service engineers developed features that gave consumers availability across geographic fault zones.
The Walmart engineers developed mechanisms to replicate a service instance’s data commits, at the request level, to other data centers. The consumer can choose a tier with synchronous replication and active multi-site processing, or asynchronous replication and active/standby site processing, depending on their need and propensity to fund the level of service they want.
This capability is made possible by the design and configuration of the environments. By defining similar resources and TCP/IP environments at each site, the engineers can apply a little creativity and greatly increase the scope of the processing environment, and subsequently the availability of a service instance. With this design (as shown in Figure 5-8), service requests can continue to be satisfied even with the loss of an entire data center.
Figure 5-8 Multi-site hosting
5.5 VSAM RLS
The value that is associated with data sharing that is afforded by z/OS Parallel Sysplex was briefly described in 5.1.2, “Unique characteristics” on page 46. Although the relatively simple concept of serialized access to files or data sets from disparate systems is impressive, serialization at a more granular scope (such as at an individual record level within a data structure) is remarkable. VSAM record-level sharing (RLS) provides this unique capability.
The Walmart platform services engineers recognized the value of this feature and used the capability to provide a differentiated experience to their service consumers. However, configuration of the components that were related to RLS needed some focus to ensure a stable experience for consumers.
Great care must be given to ensure that this portion of the environment is configured appropriately for particular usage characteristics. This configuration can be complicated to get tuned for a particular use case, but some areas to focus on and some general guidance do exist.
5.5.1 Feature enablement
Consider the following key features:
RLS_MaxCFFeatureLevel(Z|A)
This parameter is defined in the IGDSMSxx PARMLIB member and controls the size of data that is placed in the coupling facility (CF) cache structures. A value of Z specifies that only control intervals (CI) less than 4 KB are placed in the cache structures. This setting generally works best for data that is mostly read-only. A value of A allows for CIs up to 32 KB to be placed in cache and often is related to data that is heavily updated.
RLS CF CACHE in data class
Values determine which components are cached during different types of activity. Walmart sets this value to ALL for all data classes that are associated with RLS data sets. This setting indicates that all data for the sphere is eligible for caching. Other options include NONE, UPDATESONLY (only updated control intervals are cached), and DIRONLY (only the directory is cached).
5.5.2 Sizing
Consider the following components for sizing:
Buffer pools
The RLS_MAX_POOL_SIZE parameter in IGDSMSxx limits the size of the RLS local buffer pool per system. The value of this setting should not exceed real storage availability and is recommended to be set less than or equal to 850 MB.
Lock structures
It is recommended that the following calculation is used for the RLS lock structure size:
(10 MB * Number of Systems * Lock Entry Size)
No less than 13 MB should be used. The Lock Entry_Size is determined by the MAXSYSTEM value that is defined in the coupling facility resource management (CFRM) policy and can be identified as listed in Table 5-1.
Table 5-1 MAXSYSTEM values
MAXSYSTEM
Lock_Entry_Size
<=7
2
>=8 & <24
4
>=24 & <=32
8
Cache structures
The total RLS cache size for the environment is commonly recommended to be calculated by using the following equation:
((RLS_Max_Pool_Size) * number of systems)
If multiple cache structures are defined, this aggregate size should be split among the cache structures.
5.5.3 System managed assignments
DFSMS can be configured to control cache structure assignments in several ways. Walmart elected to allow system managed assignments by associating each cache set with all available cache structures in the DFSMS configuration, as shown in Figure 5-9.
Cache Set CF Cache Structure Names
VSAMRLS1 VSAMRLS1 VSAMRLS2 VSAMRLS3 VSAMRLS4
VSAMRLS2  VSAMRLS1 VSAMRLS2 VSAMRLS3 VSAMRLS4
VSAMRLS3  VSAMRLS1 VSAMRLS2 VSAMRLS3 VSAMRLS4
VSAMRLS4  VSAMRLS1 VSAMRLS2 VSAMRLS3 VSAMRLS4
 
Figure 5-9 DFSMS cache set to cache structure associations
5.6 z/OS configuration
There are a couple of CICS parameter specifications that are described in Chapter 4, “The CICS systems programmer” on page 27 that require corresponding z/OS configuration settings to avoid issues. In particular, the MAXOPENTCBS and MAXSSLTCBS settings in CICS must be complemented by z/OS parameters in BPXPRMxx.
The MAXPROCSYS and MAXPROCUSER settings in BPXPRMxx establish higher-level limits on the numbers of processes or TCBs that are allowed in the environment. Some projections and planning are necessary to determine values for these settings that accommodate the expected CICS workloads. These parameters are described and example definitions are provided in the following sections.
MAXPROCSYS controls the maximum number of processes per system or LPAR. An example of this setting is shown in Figure 5-10.
/***************************************************************** */
/***************************************************************** */
/* Specify the maximum number of processes that z/OS UNIX */
/* will allow to be active concurrently. */
/* */
/*                                                                 */
/* Notes: */
/* */
/* 1. Minimum allowable value is 5. */
/* 2. Maximum allowable value is 32767. */
/* 3. If this parameter is not provided, the system default */
/* value for this parameter is 900.                       */
/* */
/*                                                                 */
/* */
/*******************************************************************/
MAXPROCSYS 3000                    /* System will allow at most 3000
                                      processes to be active
                                      concurrently              @P9C*/
Figure 5-10 MAXPROCSYS sets maximum number of processes per system
MAXPROCUSER controls the maximum number of processes per user ID. Walmart uses a unique user ID per region, so this setting equates to the maximum number of processes per region in this case. An example of this setting is shown in Figure 5-11.
/*******************************************************************/
/* */
/* Specify the maximum number of processes that a single user */
/* (that is, with the same UID) is allowed to have concurrently */
/* active regardless of origin. */
/* */
/* Notes: */
/* */
/* 1. This parameter is the same as the Child_Max variable */
/* defined in POSIX 1003.1. */
/* 2. Minimum allowable value is 3. */
/* 3. Maximum allowable value is 32767. */
/* 4. If this parameter is not provided, the system default */
/* value for this parameter is 25. */
/* */
/* */
/* */
/*******************************************************************/
MAXPROCUSER(2500) /* Allow each user (same UID) to
have at most 25 concurrent
processes active */
Figure 5-11 MAXPROCUSER sets maximum number of processes per user ID
5.7 Security
Security for the z/OS cloud services in the Walmart environment is provided by Resource Access Control Facility (RACF). A dedicated team manages RACF in Walmart; therefore, this area is not technically the purview of the z/OS systems programmer. However, these entities work closely together and experience some overlap in concern. In many shops, these roles are often within the same group or team.
The cloud services, particularly the caching service, might not require security controls in all cases (cache can contain mundane, inconsequential data sometimes). However, in most cases, some basic authentication is needed.
With the fact that many users of the caching service were not z/OS developers, they did not have RACF credentials to use for authentication. Walmart system and security engineers established “service IDs” for this purpose. These IDs can be used to authenticate a consumer’s access to the service, but did not allow for logging on to Time Sharing Option (TSO). This issue worked out well considering that the cloud services are accessible only through ReSTful APIs, so access to the actual systems was not necessary.
The on-demand self-service cloud characteristic still needed to be provided for these consumers; therefore, immediate access to these IDs was a requirement. Through collaboration between the platform and system engineers and the security engineers, a process was initially established in which the security engineers pre-defined groups of these service IDs for use during service provisioning. The platform engineers pulled from this pool of IDs as needed for service deployments. Then, as the pool diminished, more IDs were created and added to the pool.
Over time, as the delivery model and credibility were solidified, an agreement was established between the platform and security engineers to allow for dynamic provisioning of the IDs along with the service instances. These IDs are now provisioned and managed through the same self-service channels and with the same suite of provisioning automation as the other cloud services that are provided by the z/OS platform service engineers.
5.8 Summary
A well-designed operating environment is necessary for effective service hosting. The environment should be configured for efficiency, resiliency, scalability, and flexibility to provide a solid foundation for the runtime environment to deliver quality services.
In this chapter, numerous aspects of how these qualities can be achieved on the z/OS platform were described, which established the z/OS systems programmer as a key resource in the service provider role.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.147.123