Chapter 3. Architecture

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Architecture

Running Temenos applications on IBM LinuxONE provides a robust Enterprise platform for mission critical banking services. In designing the correct solution, there are a number of architectural options. Choosing the correct path varies based on your own or your clients' architectural foundations, which are often influenced by budgetary constraints. Architectural workshops should be run to reach agreement about the correct ingredients and should encompass both the functional (application, database, system) and non-functional (availability, security, integrity, reliability) characteristics of your requirements.

In this chapter, a sample Reference Architecture is proposed and is based on a two server HA/DR configuration and additional components and decision points as appropriate.

The types of architecture are:

•Traditional on-premises, non-containerized solution

•IBM Cloud Hyper Protect/SSC

•Cloud native OpenShift/ICP on-premises

Note: This chapter focuses on the Traditional on-premises, non-containerized solution and later updates to this book will include the other types of architecture including cloud native and on-premises cloud as they become available.

The following sections are covered in this chapter:

•3.1, “Traditional on-premises (non-containerized) architecture” on page 54

•3.2, “Machine configuration on IBM LinuxONE” on page 55

•3.3, “IBM LinuxONE LPAR Architecture” on page 59

•3.4, “Virtualization with z/VM” on page 60

•3.5, “Pervasive Encryption for data-at-rest” on page 72

•3.6, “Networking on IBM LinuxONE” on page 74

•3.7, “DS8K Enterprise disk subsystem” on page 78

•3.8, “Temenos Transact” on page 79

•3.9, “Red Hat Linux” on page 80

•3.10, “IBM WebSphere” on page 80

•3.11, “Queuing with IBM MQ” on page 81

•3.12, “Oracle DB on IBM LinuxONE” on page 81

3.1 Traditional on-premises (non-containerized) architecture

Temenos Transact can be deployed in a variety of infrastructure environments. This chapter focuses on the Traditional on-premises, non-containerized solutions.

Perhaps unlike any other system architecture on which the Temenos applications can be installed, IBM LinuxONE provides alternatives for hypervisor and other aspects of deployment. The following paragraphs describe some of the architectural choices available on IBM LinuxONE and their considerations.

3.1.1 Key benefits of architecting a new solution instead of lift-and-shift

When migrating a deployed Temenos stack from another system architecture to IBM LinuxONE, it might be tempting to preserve the system layout currently implemented on the other system. This is certainly possible with IBM LinuxONE, by defining the same number of virtual instances and installing them with the same application structure as was previously deployed.

However, architecting a new solution specifically for IBM LinuxONE allows you to take advantage of the following important capabilities:

•Scalability, both horizontal and vertical

•Hypervisor clustering

•Reliability, Availability, and Serviceability

IBM LinuxONE scalability

The IBM LinuxONE server can scale vertically to large processing capacities. This scalability can be used to consolidate a number of physical machines of other hardware architectures to a smaller number of IBM LinuxONE servers. This simplifies the hardware topology of the installation by allowing more virtual instances to be deployed per IBM LinuxONE server.

On other architectures, the total number of instances deployed might be greater than the number required on IBM LinuxONE. A single virtual instance on IBM LinuxONE can scale vertically to support a greater transaction volume than is possible in a single instance on other platforms. Alternatively, you can decide to employ horizontal scaling at the virtual level and use the greater capacity per IBM LinuxONE footprint to deploy more virtual instances. This can provide more flexibility in workload management by lessening the impact of removing a single virtual instance from the pool of working instances.

Hypervisor clustering

The z/VM hypervisor on IBM LinuxONE provides a clustering technology known as Single System Image (SSI). SSI allows a group of z/VM systems to be managed as a single virtual compute environment. Most system definitions are shared between the members of the cluster, providing these benefits:

•Consistency in the system definition process: no need to replicate changes between systems as the systems all read the same configuration

•Single source for user directory: all definitions of the virtual instances are maintained in a single source location, again eliminating the need to replicate changes between systems

•Flexibility for deployment of virtual instances: allowing functions such as start and stop, live-relocate, and virtual instances between member z/VM systems

Other system architectures are more complex to manage in a clustered fashion, or approach hypervisor clustering in different ways that can adversely affect the workloads deployed or not provide the expected benefits.

Reliability, Availability, and Serviceability (RAS)

Other hardware platforms often require more physical systems than are actually needed to ensure that a failure does not affect operation. This means that, in normal operation, other hardware platforms are underutilized or oversized to withstand spikes in demand or system failures. It is also necessary to install additional equipment so that removing a system for maintenance (installation of new components, firmware, or OS patching) does not interrupt service.

An IBM LinuxONE server is designed to provide the highest levels of availability in the industry. First, the internal design of the system provides a high degree of redundancy to greatly reduce the likelihood that a component failure will affect the machine availability. Secondly, the IBM LinuxONE server provides functions that allow it to remain fully operational during service actions such as firmware updates. This means that in the majority of cases an IBM LinuxONE server does not have to be removed from service for hardware upgrades or firmware updates.

3.2 Machine configuration on IBM LinuxONE

On IBM LinuxONE the process of configuring the physical adapters and logical partitions, and resources (such as processors and memory allocation) is known as the I/O Definition process. There are two ways that this process can occur:

•The traditional method involving system configuration files known as the I/O Definition File (IODF) and the I/O configuration data set (IOCDS). This method also uses the Image Profile definitions on the Hardware Management Console (HMC).

•Dynamic Partition Manager (DPM) is a new configuration system and interface on the HMC that provides a graphical interface. The graphical method allows for configuring partitions, assigning network and storage resources to partitions, and assigning processor and memory resources.

Note: Though DPM is simpler to use for newcomers to the IBM LinuxONE platforms, there are some limitations in supported configurations. Using the traditional IODF method ensures that partitions can utilize all hardware and software capabilities of the IBM LinuxONE platform. The recommended architecture, which uses z/VM SSI, requires IODF mode. This is because DPM is not able to configure the FICON CTC devices needed for SSI.

3.2.1 System configuration using IODF

An IODF is generated using an environment on z/VM known as Hardware Configuration Definition (HCD). A graphical Microsoft Windows based tool known as Hardware Configuration Manager (HCM) is used to generate the IODF. Then, HCD commands on z/VM are used to load the hardware portion of the IODF into the IODCS in the Service Element. This IOCDS is then used when a reset of the IBM LinuxONE system is performed.

Some degree of knowledge of I/O configuration on IBM LinuxONE is needed to perform this process. Understanding how to use the tools to create an I/O configuration and channel subsystem concepts is required to achieve a functional configuration.

Hardware Configuration and Definition (HCD)

HCD is the set of utilities used to create and manage IO Definition Files (IODFs).

On the z/OS operating system, HCD includes a rich Interactive System Productivity Facility (ISPF) interface for hardware administrators to manage IODFs. The ISPF interface for HCD is not provided on z/VM. So instead, a graphical Microsoft Windows-based tool called Hardware Configuration Manager (HCM) is used to interact with the HCD code in z/VM and perform IODF management tasks.

HCM is a client/server program that needs to have access to a z/VM host (over TCP/IP, to a server process called the HD Dispatcher). HCM also has a stand-alone mode that works separately from the Dispatcher. However, in the stand-alone mode, no changes can be made to IODFs.

The IODF process

The first step in updating a server’s I/O configuration is to take the production IODF (the IODF that represents the machine’s current operating configuration) and produce a work IODF from it. A production IODF cannot be edited, so it needs to be copied to make a new work IODF, which can be edited. Using HCM, the work IODF is customized with the changes that need to be made to the hardware configuration: adding or removing LPARs; adding, changing, or removing IBM LinuxONE server hardware; adding, changing, or removing disk subsystems; and so on.

After the changes are complete, the work IODF is converted to a new production IODF. This new production IODF can then be dynamically applied to the IBM LinuxONE server.

Stand-Alone I/O Configuration Program

When a new machine is installed, the first IODF has to be written to the machine using a limited-function version of some of the tools in HCD. This utility is called the Stand-Alone I/O Configuration Program (Stand-Alone IOCP) and is installed on every IBM LinuxONE system.

Stand-Alone IOCP is described in the IBM manual “Stand-Alone I/O Configuration Program User’s Guide”, SB10-7173-01.

Server’s First IODF

Creation of the first IODF for an IBM LinuxONE server can be more complicated. As there is no operating system running on the server, how do we run HCD/HCM to create one?

If there is already an existing IBM LinuxONE server on which the IODF for the new machine can be created, the IODF for the new machine can be created there and then exported from the existing machine. Using Stand-Alone IOCP on the new machine, the IODF is written to the IOCDS of the new machine and can then be activated.

However, what if this machine is the first IBM LinuxONE server at your installation? In this case, Stand-Alone IOCP must be used to create a valid IODF. To make the process easier, rather than attempting to define the entire machine using this method a minimal IOCP deck defining a single LPAR and basic DASD can be used. This simple IOCP can be activated to make available a single system into which a z/VM system can be installed. This z/VM system is then used to download the HCM code to a workstation and start the HCD Dispatcher. HCM is then installed and used to create an IODF with more complete definitions of the system.

Note: An example of a minimal IOCP deck to perform this operation is provided in Appendix B, “Creating and working with the first IODF for the server” on page 107. The example lists important aspects and parts of the operation, enablement of the IOCP, and a success-verification example

Single IODF per installation

The data format of the IODF allows multiple IBM LinuxONE machines to be managed in a single IODF. This feature has distinct advantages over having separate IODFs, such as those noted in the following list:

•The visualization capabilities of HCM can be used to view the entire IBM LinuxONE installation at the same time

•Devices such as disk (DASD) subsystems, which usually attach to more than one server, can be managed more effectively

•A wizard in HCM can be used to configure CTC connections. The wizard can do this, though, only if both sides of the CTC link are present in the same IODF

When an IODF is written to the IOCDS of an IBM LinuxONE machine, HCD knows to write only the portions of the IODF that apply to the current machine.

I/O Configuration system roles

When multiple physical servers are in use, each physical server must at some point be able to access the IODF. Without using shared DASD, there is a possibility that each server might have a separate copy. With different copies of the IODF on different systems, it is important to always know which IODF is the real one.

We have created some definitions to describe the roles that various systems have in the I/O Definition process:

I/O Definition system This system is the one from which you do all of the HCM work of defining your I/O Configurations. This is the system you use to run the HCD Dispatcher when needed, and all of the work IODF files are kept there. As noted previously, there should be one I/O Definition system across your IBM LinuxONE environment.

I/O Managing system This system runs the HCD programs to dynamically activate a new IODF and to save the IOCDS. Each CPC requires at least one z/VM system to be the I/O Managing system. The I/O Definition system is also the I/O Managing system for the CPC on which it runs.

I/O Client system These are all the other z/VM systems in your IBM LinuxONE environment. These systems do not need a copy of the IODF, and they are not directly involved in the I/O definition process. When a dynamic I/O operation takes place (driven by the I/O Managing system), the channel subsystem signals the operating system about the status changes to devices in the configuration.

For backup and availability reasons, it is a good idea to back up or copy the IODF files (by default the files are kept on the A disk of the CBDIODSP user on the I/O Definition system). This allows another system to be used to create an IODF in an emergency.

Configuration system roles and SSI

z/VM SSI does not change the need for the roles described previously. However, it does simplify and reduce the number of systems that need to have their own copy of the IODF.

In an SSI cluster, the PMAINT CF0 disk is common between the members of the cluster. This means that, if the I/O Managing systems for two CPCs are members of the same SSI cluster, those I/O Managing systems can share the same copy of the IODF. This reduces the number of IODF copies that exist across the IBM LinuxONE environment.

BAU IODF process

We recommend the following process for performing an update to the I/O definition in an IBM LinuxONE environment:

1. Plan the changes to be made, and gather required information (such as PCHID/CHID numbers, switch port IDs, and so on).

2. Log on to the CBDIODSP user on the I/O Definition system.

3. Start the HCD Dispatcher, using the CBDSDISP command.

4. On your workstation, start and log on to HCM.

5. Use the existing production IODF to create a new work IODF.

6. Open the work IODF in HCM.

7. Make whatever changes are required to the I/O configuration.

8. When changes are complete, build a new production IODF from the work IODF.

9. Transmit the new production IODF to any remote I/O Managing systems in the IBM LinuxONE environment.

A variety of methods can be used to transmit the file:

a. If Unsolicited File Transfer (UFT) has been set up on your z/VM systems, use the SENDFILE command with the UFTSYNC and NETDATA options to send the file to the spool of the I/O Managing system(s)

SENDFILE IODFxx PRODIODF A to CBDIODSP at iomanager. (UFTSYNC NETDATA

Where iomanager is the hostname or IP address of an I/O Managing system. The trailing ‘.’ forces the command to treat the name as a TCP/IP hostname, which can be looked up using DNS or the ETC HOSTS file.

b. Copy the file through a shared DASD volume;

c. Use FTP, IND$FILE, or other file transfer method. Ensure that the record format of the IODF is preserved (it must be transferred as a binary file, with fixed record length of 4096 bytes).

10. If the I/O configuration is to be changed dynamically:

a. Use HCD to test the activation of the new IODF on each IBM LinuxONE server that has I/O changes.

b. If the test is successful, use HCD to activate the new IODF on each IBM LinuxONE server that has I/O changes.

11. Using HCD on each I/O Managing system, save the new IODF to the I/O configuration data set (IOCDS) on each IBM LinuxONE server Support Element.

12. Update either the Active IOCDS marker or the system Reset profile to indicate the new IOCDS slot for the next Power-on Reset (POR).

3.3 IBM LinuxONE LPAR Architecture

A system architecture implemented on IBM LinuxONE makes use of the Logical Partitioning (LPAR) capability of the server to create system images that operate separately from one another. These images can be used for different components of the architectures (for example, application and database tiers) or for different operational enclaves (for example Production, Test and Development, Stress testing, and so on). They can also be used to provide high availability.

The following paragraphs describe the layout of LPARs in the recommended architecture.

3.3.1 LPAR Layout on IBM LinuxONE CPCs

Based on the architecture diagram (shown in Figure 4-2 on page 87) the preferred design is to start with two IBM LinuxONE CPCs (which provide hardware redundancy). This allows hardware maintenance to occur on one CPC without impact to the production workload running on the other CPC.

The division and setup for logical partitions (LPARs) include the following aspects:

•Two LPARs for Core Banking Database

There are a number of database solutions available for the TEMENOS Banking platform. When implementing any core banking database, high availability is key. Best practices suggest each IBM LinuxONE CPC have a z/VM LPAR with a minimum of two Linux guests hosting the core banking database. Isolating core banking databases in their own LPAR reduces the core licensing costs by dedicating the fewest number of IFLs to the core banking database.

Oracle is the prevalent database used in Temenos deployments.

•Two LPARs for Non-Core Database Farm (this can include any databases needed for banking operations)

In these LPARs, we can run the databases that support banking operations. These include credit card, mobile banking, and others. Each database will be running in a virtual Linux Guest running on the z/VM hypervisor.

•Four LPARs for Application servers

The four LPARs run z/VM Hypervisor managed by a single system image (SSI). SSI allows sharing of virtual Linux guests across all four LPARs. Linux guests can be moved between any of the four LPAR clusters. This movement can be done by either of these methods:

– Bringing a server down and then bringing that same server up on another LPAR

– By issuing the Live Guest Relocation (LGR) command to move the guest to another LPAR without an outage of the Linux guest

SSI will also let you install maintenance once and push to the other LPARs in the SSI cluster.

Note: LGR is not supported for use with Oracle.

In this cluster, the Temenos Transact application server will run in each LPAR. This allows the banking workload to be balanced across all four LPARs. Each Temenos Transact application server can handle any of the banking requests.

3.4 Virtualization with z/VM

The z/VM hypervisor provides deep integration with the IBM LinuxONE platform hardware and provides rich capabilities for system monitoring and accounting.

z/VM was selected for this architecture to take advantage of several unique features of IBM LinuxONE. IBM LinuxONE uses these features to reduce downtime and system administration costs and are noted in the following list:

•GDPS for reduced and automatic failover in the event of an outage

•SSI clustering to manage the resources and maintenance of systems

•Live guest migration between LPARs or CPCs

z/VM provides a clustering capability known as Single System Image (SSI). This capability provides many alternatives for managing the virtual machines of a compute environment, including Live Guest Relocation (LGR). LGR provides a way for a running virtual machine to be relocated from one z/VM system to another, without downtime, to allow for planned maintenance.

How SSI helps virtual machine management

One of the important reasons SSI and LGR were developed was to improve availability of Linux systems and mitigate the impact of a planned outage of a z/VM system.

It is recommended that z/VM systems have Recommended Service Updates (RSUs) applied approximately every six months. When an RSU is applied to z/VM, it is usually necessary to restart the z/VM system. In addition, z/VM development uses a model known as Continuous Delivery to provide new z/VM features and functions in the service stream. If one of these new function System Programming Enhancements (SPEs) updates the z/VM Control Program, a restart of z/VM is required for those changes to take effect. Whenever z/VM is restarted, all of the virtual machines supported by that z/VM system must be shut down, causing an outage to service.

With SSI and LGR, instead of taking down the Linux virtual machines they can be relocated to another member of the SSI cluster. The z/VM system to be maintained can be restarted without any impact to service.

LGR is a highly reliable method of moving running guests between z/VM systems. Before you perform a relocation, test the operation to see whether any conditions might prevent the relocation from completing. Example 3-1 shows examples of testing two targets for relocations.

Example 3-1 VMRELOCATE TEST examples

vmrelo test zgenrt1 to asg1vm1

User ZGENRT1 is eligible for relocation to ASG1VM1

Ready;

vmrelo test zdb2w01 to asg1vm1

HCPRLH1940E ZDB2W01 is not relocatable for the following reason(s):

HCPRLI1997I ZDB2W01: Virtual machine device 2000 is associated with real device 2001 which has no EQID assigned to it

HCPRLI1997I ZDB2W01: Virtual machine device 2100 is associated with real device 2101 which has no EQID assigned to it

HCPRLL1813I ZDB2W01: Maximum pageable storage use (8256M) exceeds available auxiliary paging space on destination (7211520K) by 1242624K

Ready(01940);

In the first example, all devices required by the guest to operate are available at the proposed destination system. In the second example, there are devices present on the virtual machine that VMRELOCATE cannot guarantee are equivalent on the destination system. Also, checks determined that there is not enough memory available on the destination system to support the guest to be moved.

When a guest is being relocated, its memory pages are transferred from the source to the destination over FICON CTC devices. FICON provides a large transfer bandwidth, and the CTC connections are not used for anything else in the system other than SSI. This means guest memory can be transferred quickly and safely.

User and security management

z/VM has built-in security and user management functions. The user directory contains the definitions for users (virtual machines) in the z/VM system and the resources defined to them: disks, CPUs, memory, and so on. The z/VM Control Program (CP) manages security functions such as isolation of user resources (enforcing the definitions in the user directory) and authorization of operator commands.

z/VM also provides interfaces to allow third-party programs to enhance these built-in functions. IBM provides two of these on the z/VM installation media as additional products that can be licensed for use:

•Directory Maintenance Facility for z/VM

•RACF Security Server for z/VM

Directory Maintenance Facility (DirMaint) simplifies management of the user directory of a z/VM system. RACF enhances the built-in security provided by CP to include mandatory access control, security labels and strong auditing capabilities.

Broadcom also provides products in this area, such as their CA VM:Manager suite, which includes both user and security management products.

When installed, configured, and activated, the directory manager takes responsibility for management of the system directory. A directory manager also helps (but does not eliminate) the issue of the clear text password.

Note: The directory manager might not remove the USER DIRECT file from MAINT 2CC for you. Usually the original USER DIRECT file is kept as a record of the original supplied directory source file, but this can lead to confusion.

We recommend that you perform the following actions if you use a directory manager:

•Rename the USER DIRECT file (to perhaps USERORIG DIRECT) to reinforce that the original file is not used for directory management

•Regularly export the managed directory source from your directory manager and store it on MAINT 2CC (perhaps as USERDIRM DIRECT if you use DirMaint). This file can be used as an emergency method of managing the directory in case the directory manager is unavailable. The DirMaint USER command can export the directory.

If you are not using an External Security Manager (such as IBM RACF), you can export this file with the user passwords in place. This helps its use as a directory backup, but it potentially exposes user passwords.

When a directory manager is used, it can manage user passwords. In the case of IBM DirMaint, z/VM users can have enough access to DirMaint to change their own passwords. Also, when a directory entry is queried using the DirMaint REVIEW command, a randomly-generated string is substituted for the password. However, it is still possible for privileged DirMaint users to set or query the password of any user. For this reason, the only completely effective way to protect against clear text passwords in the directory is to use an External Security Manager (such as IBM RACF).

3.4.1 z/VM installation

When installing z/VM on ECKD volumes, the preferred way is installing z/VM on 3390-9 volumes. z/VM 7.1 requires five 3390-9 volumes for the base installation (non-SSI or single member SSI installation) and an additional three 3390-9 volumes for each further SSI member.

The recommendation is to install z/VM as a two- or four-member SSI cluster with one or two z/VM members on each IBM LinuxONE server. You will be prompted to select an SSI or non-SSI installation during the installation.

If z/VM 7.1 is installed into an SSI, at least one extended count key data (ECKD) volume is necessary for the Persistent Data Record (PDR). If you plan to implement RACF, the database must be configured as being shared and at least two ECKD DASD volumes are necessary. Concurrent virtual and real reserve/release must always be used for the RACF database DASD when RACF is installed in an SSI.

3.4.2 z/VM SSI and relocation domains

The following paragraphs describe aspects of running the z/VM SSI feature in support of Linux guests.

FICON CTC

An SSI cluster requires CTC connections, always size them in a pair of two. If possible, use different paths for the cables. During normal operation, there is not much traffic on the CTC connection. LGR is dependent of the capacity of these channels, especially for large Linux guests. The more channels you have between the members, the faster a relocation of a guest completes. This is a valid reason to plan four to eight CTC connections between the IBM LinuxONE servers. Keep in mind, if you run only two machines, this cabling is not an obstacle. But if you plan to run three or four servers, the physical weight can become heavy as the connection must be point-to-point and you need any-to-any connectivity.

Note: FICON CTCs can be defined on switched CHPIDs, which can relieve the physical cable requirement. For example, by connecting CTC paths using a switched FICON fabric the same CHPIDs can be used to connect to multiple CPCs.

Also, FICON CTC control units can be defined on FICON CHPIDs that are also used for DASD. Sharing CHPIDs between DASD and CTCs can be workable for a development or test SSI cluster, this can further reduce the physical connectivity requirements.

Relocation domains

A relocation domain defines a set of members of an SSI cluster among which virtual machines can relocate freely. A domain can be used to define the subset of members of an SSI cluster to which a specific guest can be relocated. Relocation domains can be defined for business or technical reasons. For example, a domain can be defined having all of the architectural facilities necessary for a particular application, or a domain can be defined to allow access only to systems with a particular software tool. Whatever the reason for the definition of a domain, CP allows relocation among the members of the domain without any change to architectural characteristics or CP functionality as seen by the guest.

Architecture parity in a relocation domain

In a mixed environment (mixed IBM LinuxONE generations or z/VM levels) be cognizant of the architecture level. z/VM SSI supports cluster members running on any supported hardware. Also, during a z/VM upgrade using the Upgrade In Place feature, different z/VM versions or releases can be operating in the same cluster. For example, you can have z/VM 6.4 systems running on an IBM LinuxONE Emperor processor in the same SSI cluster as z/VM 7.1 systems running on an IBM LinuxONE III processor.

When a guest system logs on, z/VM assigns the maximum common subset of the available hardware and z/VM features for all members belonging to this relocation domain. This means that by default, in the configuration described previously, guests started on the IBM LinuxONE III server have access to only the architectural features of the IBM LinuxONE Emperor. There also can be z/VM functions that might not be presented to the guests under z/VM 7.1 because the cluster contains members running z/VM 6.4.

To avoid this, a relocation domain spanning only the z/VM systems running on the IBM LinuxONE III server are defined. Guests requiring the architectural capabilities of the IBM LinuxONE III or of z/VM 7.1 are assigned to that domain, and are permitted to execute only on the IBM LinuxONE III servers.

SSI topology in the recommended architecture

Our recommended architecture uses z/VM SSI to offer simpler manageability of the z/VM layer. The database LPARs are part of one SSI cluster and application-serving LPARs are part of another SSI cluster. Additional SSI clusters can also be employed for other workload types such as test, development, quality assurance and others.

3.4.3 z/VM memory management

z/VM effectively handles memory overcommitment. The factor for overcommitment depends on how your z/VM paging performs. This factor is calculated by the virtual memory (sum of all defined guests plus the shared segments) to the real memory available to the LPAR. For the production system, a value of 1.5:1 should be considered as a threshold to not pass. As a recommendation for production systems, plan the initial memory assignment for no overcommitment (overcommitment factor lower than 1:1) and with space to grow. For test and development systems, the value can reach up to 3:1.

3.4.4 z/VM paging

Paging is used to move memory pages out to disk in case of memory constraints. Sometimes z/VM also uses paging for reordering pages in memory. Normally, the system is sized for no paging. However, paging can still occur if memory is overcommitted, short-term memory is constrained, old 31-bit code is running and the required memory pages out of 31-bit addressability and so on.

Memory overcommitment

Virtual machines do not always use every single page of memory allocated to them. Some programs read in data during initialization but only rarely reference that memory during run time. A virtual machine with 64 GB of memory that runs a variety of programs can actually be actively using significantly less than the memory allocated to it.

z/VM employs many memory management techniques on behalf of virtual machines. One technique is to allocate a real memory page only to a virtual machine when the virtual machine references that page. For example, after our 64 GB virtual machine has booted it might have referenced only a few hundred MB of its assigned memory, so z/VM actually allocates only those few hundred MB to the virtual machine. As programs start and workload builds the guest uses more memory. In response, z/VM allocates it, but only at the time that the guest actually requires it. This technique allows z/VM to manage only the memory pages used by virtual machines, reducing its memory management overhead.

Another technique z/VM uses is a sophisticated least-recently-used (LRU) algorithm for paging. When the utilization of z/VM’s real memory becomes high, the system starts to look for pages that can be paged-out to auxiliary storage (page volumes). To avoid thrashing, z/VM finds the least-recently-used guest pages and selects those for paging. Using a feature known as co-operative memory management (CMM), Linux can actually nominate pages it has itself paged out to its own swap devices. z/VM can then prioritize the paging of those pages that Linux can itself re-create, in a way that avoids the problem of double-paging.

These capabilities are why memory can be overcommitted on IBM LinuxONE to a higher degree with lower performance impact than on other platforms.

Paging subsystem tuning

Plan z/VM paging carefully to obtain the maximum performance using the following criteria:

•Monitor paging I/O rates. Excessive paging I/O means that virtual machines are thrashing, which can be due to insufficient real storage in z/VM

•Use fast disks for paging or enable tiering in your disk subsystem

•Leverage HyperPAV for paging devices and use fewer, larger devices

Command: SET PAGING ALIAS ON

Configuration file: FEATURES ENABLE PAGING_ALIAS

•If you do not use HyperPAV for paging, use these considerations:

More smaller disk volumes are better than one large volume. As an example, use three volumes of type 3390-9 rather than one volume of type 3390-27. z/VM can then utilize three paging I/Os in parallel on the smaller volumes as opposed to only one I/O with the larger volume. The sum of all defined paging volumes is called paging space

•Continuously monitor your paging about usage (command QUERY ALLOC PAGE or panel FCX109 in Performance Toolkit). z/VM crashes with ABEND PGT004 if it runs out of paging space

•Monitor the Virtual-to-Real ratio, which reflects the amount of memory overcommitment in a z/VM system. A ratio of less than 1:1 means that the z/VM system has more memory than it needs. Above 1:1 means that some overcommitment is occurring

•Reserve or predefine slots for additional paging volumes in the z/VM system configuration file

Note these considerations when working with AGELIST, EARYLWRITE, and KEEPSLOT. It is important to save I/O for paging or paging space, especially for systems with a large amount of memory. EARLYWRITE specifies how the frame replenishment algorithm backs up page content to auxiliary storage (paging space). When Yes is specified, pages are backed up in advance of frame reclaim to maintain a pool of readily reclaimable frames. When No is specified, pages are backed up only when the system is in need of frames. KEEPSLOT indicates whether the auxiliary storage address (ASA) to which a page is written during frame replenishment should remain allocated when the page is later read and made resident. Specifying Yes preserves a copy of the page on the paging device and eliminates the need to rewrite the contents if the page is unchanged before the next steal operation. Keeping the slot might reduce the amount of paging I/O, but can result in more fragmentation on the device. See the CP Planning and Administration Manual from the z/VM documentation for details about EARLYWRITE. For environments where the overcommit level is low and large amounts of real memory are being used, you will want to consider disabling EARLYWRITE and KEEPSLOT.

See also the page space calculation guidelines that are located in the CP Planning and Administration Manual located at the following z/VM 7.1 library:

https://www-01.ibm.com/servers/resourcelink/svc0302a.nsf/pages/zVMV7R1Library?OpenDocument

3.4.5 z/VM dump space and spool

z/VM uses spool to hold several kinds of temporary (print output, transferred files, trace data, and so on) or shared data (such as Named Saved Systems and Discontiguous Saved Segments). Spool is a separate area in the system and needs disk space. For performance reasons, do not mix spool data with other data on the disk.

One important item is dump space. At IPL time, z/VM reserves a space in spool for a system dump. The size depends on the amount of memory in the LPAR. It is important to ensure that there is sufficient dump space in the spool.

The SFPURGER tool can be used to maintain the spool. If you use an automation capability (such as the Programmable Operator facility or IBM Operations Manager for z/VM) you can schedule regular runs of SFPURGER to keep spool usage well managed.

3.4.6 z/VM minidisk caching

z/VM offers a read-only caching of disk data. Per default, caching is permitted to utilize the whole memory. In some rare cases for workloads with a high read I/O rate, minidisk caching can utilize all the available memory and the guests are dropped from dispatching. To avoid this scenario, restrict minidisk caching to a maximum value. A viable starting point is about 10% to 25% of the available memory in the LPAR. The following command, in the autostart file, sets this restriction:

CP SET MDCACHE SYSTEM ON

CP SET MDCACHE STORAGE 0M <max value>M

With command, CP QUERY MDCACHE, you can control the setting and the usage.

Deactivate minidisk caching for Linux swap disks. To do so, code MINIOPT NOMDC operands on the MDISK directory statement of the appropriate disk.

3.4.7 z/VM share

Share is the denotation for the amount of cpu processing a virtual machine receives. There are two different variations of share settings (absolute and relative share).

Relative share is factored similarly as the LPAR weight factor. The sum of the relative share of all active virtual machines in conjunction to the share setting of an individual virtual machine. Relative share ranges from 1-9999.

Absolute share is expressed in percent and defines a real portion of the available cpu capacity of the LPAR dedicated to a specific virtual machine. This portion of the cpu capacity is reserved for that virtual machine, as long as it can be consumed. The remaining piece, which cannot be consumed, is returned to the system for further distribution. It ranges from 0.1-100%. If the sum of absolute shares is greater than 99%, it will be normalized to 99%. Absolute share users are given resource first.

The default share is RELATIVE 100 to each virtual machine. The value can be changed dynamically by the command CP SET SHARE or permanently at the user entry in the z/VM directory.

SHARE RELATIVE and multi-CPU guests

It is important to remember that the SHARE value is distributed across all of the virtual CPUs of a guest. This means that no matter how many virtual CPUs a guest has, if the SHARE value is not changed the guest gets the same amount of CPU.

To make sure that adding virtual CPUs actually results in extra CPU capacity to your virtual machines, make sure the SHARE value is increased when virtual CPUs are added.

3.4.8 z/VM External Security Manager (ESM)

The security and isolation mechanisms built into the z/VM Control Program (CP) to protect virtual machines from each other are extremely strong. These mechanisms are facilitated by various hardware mechanisms provided by the IBM LinuxONE server architecture. There are some areas where improvements can be made, such as those in the following list:

•Auditing of resource access successes and failures

•Queryable passwords for users and minidisks

•Complexity of managing command authority and delegation

z/VM allows the built-in security structure to be enhanced through the use of an External Security Manager (ESM). When an ESM is enabled on z/VM, various security decisions can be handled by the ESM rather than by CP. This allows for greater granularity of configuration, better auditing capability and the elimination of queryable passwords for resources.

The IBM Resource Access Control Facility for z/VM (RACF) is one ESM available for z/VM. It is a priced optional feature preinstalled on a z/VM system. Broadcom also offers ESM products for z/VM, such as CA ACF2 and CA VM:Secure.

Note: IBM strongly recommends the use of an ESM on all z/VM systems.

Common Criteria and the Secure Configuration

IBM undergoes evaluation of the IBM LinuxONE server hardware and z/VM against the Common Criteria. z/VM is evaluated against the Operating System Protection Profile (OSPP). This evaluation allows clients to be more confident that the IBM LinuxONE server with z/VM as the hypervisor is a highly secure platform for running critical workloads. z/VM has achieved an Evaluation Assurance Level (EAL) of 4+ (the plus indicates additional targets from the Labelled Security Protection Profile (LSPP) were included in the evaluation).

The evaluation process is performed against a specific configuration of z/VM which includes RACF. The configuration that IBM applies to the systems evaluated for Common Criteria certification is described in the z/VM manual “z/VM: Secure Configuration Guide,” document number SC24-6323-00. This document is located at the following link:

http://www.vm.ibm.com/library/710pdfs/71632300.pdf

By following the steps in this manual you can configure your z/VM system in a way that meets the standard evaluated for Common Criteria certification.

3.4.9 Memory of a Linux virtual machine

In a physical x86 system, the memory installed in a machine is often sized larger than required by the application. Linux uses this extra memory for buffer cache (disk blocks held in memory to avoid future I/O operations). This is considered a positive outcome as the memory cannot be used by any other system; using it to avoid I/O is better than letting it remain unused.

Virtualized x86 systems often retain the same memory usage patterns. Because memory is considered to be inexpensive, virtual machines are often configured with more memory than actually needed. This leads to accumulation of Linux buffer cache in virtual machines; on a typical x86 virtualized environment a large amount of memory is used up in such caching.

In z/VM the virtual machine is sized as small as possible, generally providing enough memory for the application to function well without allowing the same buffer cache accumulation as occurs on other platforms. Assigning a Linux virtual machine too much memory can allow too much cache to accumulate, which requires Linux and z/VM to maintain this memory. z/VM sees the working set of the user as being much larger than it actually needs to be to support the application, which can put unnecessary stress to z/VM paging.

Real memory is a shared resource. Caching disk pages in a Linux guest reduces memory available to other Linux guests. The IBM LinuxONE I/O subsystem provides extremely high I/O performance across a large number of virtual machines, so individual virtual machines do not need to keep disk buffer cache in an attempt to avoid I/O.

Linux: to swap or not to swap?

In general it is better to make sure that Linux does not swap, even if it means that z/VM has to page. This is because the algorithms and memory management techniques used by z/VM provide better performance than Linux swap.

This creates a tension in the best configuration approach to take. Linux needs enough memory for programs to work efficiently without incurring swapping, yet not so much memory that needless buffer cache accumulates.

One technology that can help is the z/VM Virtual Disk (VDISK). VDISK is a disk-in-memory technology that can be used by Linux as a swap device. The Linux guest is given one or two VDISK-based swap devices, and a memory size sufficient to cover the expected memory consumption of the workload. The guest is then monitored for any swap I/O. If swapping occurs, the performance penalty is small because it is a virtual disk I/O to memory instead of a real disk I/O. Like virtual machine memory, z/VM does not allocate memory to a VDISK until the virtual machine writes to it. So memory usage of a VDISK swap device is only slightly more than if the guest had the memory allocated directly. If the guest swaps, the nature of the activity can be measured to see whether the guest memory allocation needs to be increased (or if it was just a short-term usage bubble).

Using VDISK swap in Linux has an additional benefit. The disk space that normally is allocated to Linux as swp space can be allocated to z/VM instead to give greater capacity and performance to the z/VM paging subsystem.

Hotplug memory

Another memory management technique is the ability to dynamically add and remove memory from a Linux guest under z/VM, known as hotplug memory. Hotplug memory added to a Linux guest allows it to handle a workload spike or other situation that could result in a memory shortage.

We recommend that you use this feature carefully and sparingly. Importantly, do not configure large amounts of hotplug memory on small virtual machines. This is because the Linux kernel needs 64 bytes of memory to manage every 4 kB page of hotplug memory, so a large amount of memory gets used up simply to manage the ability to plug more memory. For example, configuring a guest with 1 TB of hotplug memory consumes 16 GB of the guest’s memory. If the guest only had 32 GB of memory, half of its memory is used just to manage the hotplug memory.

When configuring hotplug memory, be aware of this management requirement. You might need to increase the base memory allocation of your Linux guests to make sure that applications can still operate effectively.

Monitoring memory on Linux

There are a number of places to monitor memory usage on Linux:

•Monitor memory usage using the free or vmstat commands, along with /proc/meminfo. This can provide summary through to detailed information about memory usage. /proc/slabinfo can provide further detail about kernel memory.

•Generally Linux does not suffer memory fragmentation issues, but longer server uptimes might lead to memory becoming fragmented over time. /proc/buddyinfo contains information about normal and kernel memory pools. Large numbers of pages in the small pools (order-3 and below) indicate memory fragmentation and possible performance issues for some programs (particularly kernel operations such as allocation of device driver buffers).

3.4.10 Simultaneous Multi-threading (SMT-2)

By using SMT, z/VM can optimize core resources for increased capacity and throughput. Its maximization of SMT enables z/VM to dispatch a guest (virtual) CPU or z/VM Control Program task on an individual thread (CPU) of an Integrated Facility for Linux (IFL) processor core.

SMT can be activated only from the operating system and requires a restart of z/VM. We recommend activating multithreading in z/VM by defining MULTITHREADING ENABLE in the system configuration file. The remaining defaults of this parameter set the maximum number of possible threads (currently two) for all processor types. This parameter also enables the command CP SET MULTITHREAD to switch multithreading back and forth dynamically without a restart.

3.4.11 z/VM CPU allocation

This section describes CPU allocation in z/VM.

Linux Guests

It is important that Linux is given enough opportunity to access the amount of CPU capacity needed for the work being done. However, allocating too much virtual CPU capacity can, in some cases, reduce performance.

IFL/CPU and memory resources

This section introduces how to manage allocating IFLs to your LPARs, and to your Linux guests within LPARs.

Symmetric Multi-Threading

Modern CPUs have sophisticated designs such as pipelining, instruction prefetch, out-of-order execution and more. These technologies are designed to keep the execution units of the CPU as busy as possible. Yet another way to keep the CPU busy is to provide more than one queue for instructions to enter the CPU. Symmetric Multi-Threading (SMT) provides this capability on IBM LinuxONE.

z/VM does not virtualize SMT for guests. Guest virtual processors in z/VM are single-thread processors. z/VM uses the threads provided by SMT-enabled CPUs to run more virtual CPUs against them.

On the IBM LinuxONE IFL, up to two instruction queues can be used (referred to as SMT-2). These multiple instruction queues are referred to as threads.

Two steps are required to enable SMT for a z/VM system. First, the LPAR needs to be defined to permit SMT mode. Second, z/VM must be configured to enable it. This is done using the MULTITHREAD keyword in the SYSTEM CONFIG file.

When z/VM is not enabled for SMT, logical processors are still referred to as processors. When SMT is enabled, z/VM creates a distinction between cores and threads, and treats threads in the same way as processors in non-SMT.

Logical, Physical, or Virtual CPUs/IFLs

It is important to make sure that CPU resources are assigned efficiently. As z/VM on IBM LinuxONE implements two levels of virtualization. It is vital to configure Linux and z/VM to work properly with the CPU resources of the system.

The following section introduces details about CPU configuration in IBM LinuxONE.

LPAR weight

IBM LinuxONE is capable of effectively controlling the CPU allocated to LPARs. In their respective Activation Profile, all LPARs are assigned a value called a weight. The weight is used by the LPAR management firmware to decide the relative importance of different LPARs.

Note: When HiperDispatch is enabled, the weight is also used to determine the polarization of the logical IFLs assigned to an LPAR. More about HiperDispatch and its importance for Linux workloads is in the following section.

LPAR weight is usually used to favor CPU capacity toward your important workloads. For example, on a production IBM LinuxONE system, it is common to assign higher weight to production LPARs and lower weight to workloads that might be considered discretionary for that system (such as testing or development).

z/VM HiperDispatch

z/VM HiperDispatch feature uses the System Resource Manager (SRM) to control the dispatching of virtual CPUs on physical CPUs (scheduling virtual CPUs). The prime objective of z/VM HiperDispatch is to help virtual servers achieve enhanced performance from the IBM LinuxONE memory subsystem.

z/VM HiperDispatch works toward this objective by managing the partition and dispatching virtual CPUs in a way that takes into account the physical machine's organization (especially its memory caches). Therefore, depending upon the type of workload, this z/VM dispatching method can help to achieve enhanced performance on IBM LinuxONE hardware.

The processors of an IBM LinuxONE are physically placed in hardware in a hierarchical, layered fashion:

•CPU cores are fabricated together on chips, perhaps 10 or 12 cores to a chip, depending upon the model

•Chips are assembled onto nodes, perhaps three to six chips per node, again, depending upon model

•The nodes are then fitted into the machine's frame

To help improve data access times, IBM LinuxONE uses high-speed memory caches at important points in the CPU placement hierarchy:

•Each core has its own L1 and L2

•Each chip has its own L3

•Each node has its own L4

•Beyond L4 lies memory

One-way z/VM HiperDispatch tries to achieve its objective is by requesting that the PR/SM hypervisor provisions the LPAR in vertical mode. A vertical mode partition has the property that informs the PR/SM hypervisor to repeatedly attempt to run the partition's logical CPUs on the same physical cores (and to run other partitions' logical CPUs elsewhere). For this reason, the partition's workload benefits from having its memory references build up context in the caches. Therefore, the overall system behavior is more efficient.

z/VM works to assist HiperDispatch to achieve its objectives by repeatedly running the guests' virtual CPUs on the same logical CPUs. This strategy ensures guests experience the benefit of having their memory references build up context in the caches. This also enables the individual workloads to run more efficiently.

3.4.12 z/VM configuration files

This section describes the following major configuration files:

•z/VM system configuration file SYSTEM CONFIG

•z/VM directory file USER DIRECT

•Autostart file

z/VM system configuration file SYSTEM CONFIG

This file is placed on the configuration disk CF0 of user PMAINT and carries all global z/VM system parameters. This file will be read only once at IPL time.

z/VM directory file USER DIRECT

After the z/VM installation, USER DIRECT is present on disk 2CC in user MAINT. This file has resource definitions about all virtual machines. After changing this file, it has to be compiled to update the system directory areas. This will be done by the command DIRECTXA. Take security precautions with this file because it contains clear text passwords!

If you run a directory management software such as IBM Directory Maintenance Facility for z/VM (DirMaint), this file is no longer used. See “User and security management” on page 61 for more information about using a directory manager.

Autostart file

You need to invoke commands every time the z/VM system starts. User AUTOLOG1 is automatically started after IPL. Inside AUTOLOG1 the file PROFILE EXEC on minidisk 191 is automatically executed. Every command in that file will be automatically invoked after the system starts.

One of the key things this process is used for is starting your Linux guests automatically after z/VM is started.

Note: IBM Wave for z/VM also uses the AUTOLOG1 user for configuration of entities (such as z/VM VSwitches) managed by IBM Wave.

3.4.13 Product configuration files

System management products installed on z/VM will have their own configuration files. A few examples include:

•Files on various DIRMAINT disks, such as 155, 11F, and 1DF, for IBM DirMaint:

– CONFIGxx DATADVH

– EXTENT CONTROL

– AUTHFOR CONTROL

– Any customized PROTODIR files

•Files on VMSYS:PERFSVM (or minidisks if not installed to file pool) for IBM Performance Toolkit for z/VM:

– $PROFILE FCONX

– FCONRMT SYSTEMS

– FCONRMT AUTHORIZ

– UCOMDIR NAMES

•Various files on OPMGRM1 198 for IBM Operations Manager for z/VM

•Backup and disk pool definition files for IBM Backup and Restore Manager for z/VM

Consult the product documentation for each of the products being customized for the role and correct contents for these files.

3.4.14 IBM Infrastructure Suite for z/VM and Linux

Section 2.4.4, “IBM Infrastructure Suite for z/VM and Linux” on page 36 previously described the key elements for monitoring and managing the IBM LinuxONE environments. Additional tools are proposed for advanced sites where dashboards and automation triggers present operations personnel with additional information about the status of the services and proposed actions.

First, for the base Infrastructure Suite, you need to install and configure DirMaint and Performance Toolkit. Then for the advanced tools, a suggested setup is to create five LPARs to host the following parts:

•IBM Wave UI Server (Wave)

•Tivoli Storage Manager Server (TSM)

•Tivoli Data Warehouse (TDW) with Warehouse Proxy and Summarization and Pruning Agents

•IBM Tivoli Monitoring (ITM) Servers: Tivoli Enterprise Portal Server (TEPS) and Tivoli Enterprise Management Server (TEMS)

•JazzSM server for Dashboard Application Services Hub (DASH) and Tivoli Common Reporting (TCR)

These five LPARs need to be set up only once for your enterprise. You can use any existing servers that meet the capacity requirements.

Before installing IBM Wave, check for the latest fixpack for IBM Wave and install it. The initial setup for IBM Wave is simple. All required setup in Linux and z/VM is done automatically by the installation scripts. IBM Wave has a granular role-based user model. Plan the roles in IBM Wave carefully according to your business needs.

3.5 Pervasive Encryption for data-at-rest

Protecting data-at-rest is an important aspect of security on IBM LinuxONE. Linux and z/VM support different aspects of Pervasive Encryption with two separate but important security capabilities. This section covers those capabilities in greater detail.

3.5.1 Data-at-rest protection on Linux: encrypted block devices

One of the key security capabilities of IBM LinuxONE is a highly secure way to encrypt disk devices. Protected key encryption uses both of the IBM LinuxONE hardware security features:

•The Crypto Express card, as a secure, tamper-evident master key storage repository

•The Central Processor Assist for Cryptographic Functions (CPACF), accelerated cryptographic instructions available to every CPU in the system

Protected key encryption uses an encryption key that is derived from a master key and kept within the Crypto Express card to generate a wrapped key that is stored in the Hardware System Area (HSA) of the IBM LinuxONE system. The key is used by the CPACF instructions to perform high-speed encryption and decryption of data, but it is not visible to the operating system in any way.

How IBM LinuxONE data-at-rest encryption works

When the paes cipher is used with IBM LinuxONE data at-rest encryption, the following protected volume options are available:

•The LUKS2 format includes a header on the volume and a one-time formatting is required

•The LUKS2 header is made up of multiple key slots. Each key slot contains key and cipher information

•The volume's secure key is wrapped by a key-encrypting key (which is derived from a passphrase or a keyfile7) and stored in a keyslot. The user must supply the correct passphrase to unlock the keyslot. A keyfile allows for the automatic unlocking of the keyslot

Note: LUKS2 format is the preferred option for IBM LinuxONE data at-rest encryption.

The plain format does not include a header on the volume and no formatting of the volume is required. However, the key must be stored in a file in the file system. The key and cipher information must be supplied with every volume open.

Creating a secure key

The process that is used to create a secure key for an LUKS2 format volume is shown in Figure 3-1.

Figure 3-1 Create a secure key.

This process includes the following steps:

1. A secure key is created by using a zkey command. The zkey utility generates the secure key with the help of the pkey utility and an assigned Crypto Express adapter (with master key). The secure key is also stored in the key repository

2. The use of the zkey cryptsetup command generates output strings that are copied and pasted to the cryptsetup command to create the encrypted volume with the appropriate secure key

3. The cryptsetup utility formats the physical volume and writes the encrypted secure key and cipher information to the LUKS2 header of the volume

Opening a LUKS2 formatted volume

The process that is used to open an LUKS2 formatted volume is shown in Figure 3-2 on page 73.

Figure 3-2 Opening a LUKS2 formatted volume.

This process includes the following steps:

1. The cryptsetup utility fetches the secure key from the LUKS2 header

2. The cryptsetup utility passes the secure key to dm-crypt

3. The dm-crypt passes the secure key to paes for conversion into a protected key by using pkey

4. The pkey module starts the process for converting the secure key to a protected key

5. The secure key is unwrapped by the CCA coprocessor in the Crypto Express adapter by using the master key

6. The unwrapped secure key (effective key) is rewrapped by using a transport key that is specific to the assigned domain ID

7. By using firmware, CPACF creates a protected key

3.5.2 Data-at-rest protection on z/VM: encrypted paging

For the most part, Linux running as a virtual machine is responsible for its own resources. The hypervisor does things to protect virtual machines from each other (such as protecting the memory allocated to the virtual machine from being accessed by another virtual machine). However, Linux manages the resources allocated to it. Currently, paging is the only operation that the hypervisor does that might result in exposure of a guest’s resource (in this case, part of its memory).

Paging occurs when the z/VM system does not have enough physical memory available to satisfy a guest’s request for memory. To obtain memory to meet the request, z/VM finds some currently allocated but not recently used memory and stores the contents onto persistent storage (a disk device). z/VM then reuses the memory to satisfy the guest’s request.

When a paging operation occurs, the content of the memory pages is written to disk (to paging volumes). It is during this process that the possible exposure occurs. If the memory being paged-out happened to contain a password, the private key of a digital certificate, or other secret data, z/VM has stored that sensitive data onto a paging volume outside the control of Linux. Whatever protections were available to that memory while it was resident are no longer in effect.

To protect against this situation occurring, z/VM Encrypted Paging uses the advanced encryption capability of the IBM LinuxONE system to encrypt memory being paged out and decrypt it after the page-in operation. Encrypted Paging uses a temporary key (also known as an ephemeral key) which is generated each time a z/VM system is IPLed. If Encrypted Paging is enabled, pages are encrypted using the ephemeral key before they are written to the paging device.

3.6 Networking on IBM LinuxONE

IBM LinuxONE servers support a variety of hardware, software, and firmware networking technologies to support workloads.

Adding dedicated OSA ports to Linux guests can be ideal for use as interconnect interfaces for databases or clustered file systems. Using a dedicated OSA can reduce the path length to the interface, but you will need to decide your own method for providing failover.

Also, if you dedicate an OSA interface it can be used for only one IP network by default. You can use the Linux 8021q module to provide VLANs, managed within Linux.

3.6.1 Ethernet technologies

Standard network connectivity is supported on IBM LinuxONE using two types of network technology:

•OSA Express features

•HiperSockets

OSA Express features

For optimal data transfer, use OSA with a speed of at least 10 Gb, and for redundancy plan them in a pair of two.

OSA Express cards can be used in conjunction with software networking facilities in z/VM and Linux (such as z/VM Virtual Switch and Open vSwitch in Linux). When used in conjunction, together they support connectivity for virtual machines in virtualized environments under z/VM.

HiperSockets

HiperSockets (HS) interconnects LPARs that are active on the same machine by doing a memory-to-memory transfer at processor speed. HS can be used for both TCP and UDP connections.

VSwitch

A z/VM Virtual Switch (VSwitch) will provide the network communication path for the Linux guests in the environment. Refer to sections 2.6.3, “z/VM networking” on page 49 and 3.6.3, “Connecting virtual machines to the network” for more information about VSwitch.

We recommend that a Port Group is used, for maximum load sharing and redundancy capability. The Link Aggregation Control Protocol (LACP) can enhance the usage of all the ports installed in the Port Group.

z/VM Virtual Switch also provides a capability called Multi-VSwitch Link Aggregation, also known as Global Switch. This allows the ports in a Port Group to be shared between several LPARs.

3.6.2 Shared Memory Communications (SMC)

SMC can be used only for TCP connections. It also requires connectivity through an OSA for the initial handshake. This OSA connectivity acts also as a fallback if SMC has problems initiating. SMC offers the same speed as HiperSockets but is more flexible because it also interconnects additional IBM LinuxONE servers.

SMC-R (RoCE) and SMC-D (ISM)

Using SMC, is recommended in any environment where there is extensive TCP traffic between systems in the IBM LinuxONE environment. This is the case whether using SMC over RoCE hardware or an ISM internal channel.

We anticipate significant throughput and latency improvement when enabling the communication between the database and the Transact application servers for SMC.

The RoCE Express card can be used by Linux for both SMC communication and for standard Ethernet traffic. At this time however, the RoCE Express card does not have the same level of availability as the OSA Express card (for example, firmware updates are disruptive). For this reason and at the time of this publication, we recommend the following condition. If RoCE Express is being used for Linux, it should be used in addition to a standard OSA Express-based communications path (either direct-OSA or VSwitch). For more information about SMC, see 2.1.8, “Shared Memory Communication (SMC)” on page 24.

3.6.3 Connecting virtual machines to the network

Virtual machines (VMs) attach to the network using the physical and virtual networking technologies described previously. The following sections describe some of the ways that attaching VMs can be completed.

Dedicating devices to Linux under z/VM

z/VM can pass channel subsystem devices through to the guest virtual machine, which enables the guest to manage networking devices directly with its own kernel drivers.

Note: This is the only way that HiperSockets can be used by a Linux guest under z/VM. For OSA Express, the z/VM Virtual Switch is an alternative. See “z/VM Virtual Switch” on page 76.

To allow a guest to access an OSA Express card or HiperSockets network directly, the z/VM ATTACH command is used to connect devices accessible by z/VM directly to the Linux virtual machine. If the Linux guest needs access to the network device at startup, use the DEDICATE directory control statement. This statement attaches the required devices to the Linux guest when it is logged on.

The adapter can still be shared with other LPARs on the IBM LinuxONE server. It can also be sharable with other Linux guests in the same LPAR. However, adapter sharing has a dependency. There must be enough subchannel devices defined in the channel subsystem to allow more than one Linux guest in the LPAR to use the adapter at the same time.

Note: The way adapter sharing is done is different between the IODF mode and the DPM mode of the IBM LinuxONE server.

When you attach a Linux guest to an OSA Express adapter in this way, you need to consider how you will handle possible adapter or switch failures. Usually you attach at least two OSA Express adapters to the guest and use Linux channel bonding to provide interface redundancy. You can use either the Linux bonding driver or the newer Team softdev Linux driver for channel bonding. You have to repeat this configuration on every Linux guest. Managing this configuration across a large number of guests is challenging and is one reason this is not the preferred connection method for Linux guests.

z/VM Virtual Switch

A z/VM Virtual Switch can be used to attach Linux guests under z/VM to an Ethernet network. The guests are configured with one (or more) virtual OSA Express cards, which are then connected to a VSwitch. The VSwitch is in turn connected to one or more real OSA Express adapters. A z/VM Virtual Switch simplifies the configuration of a virtualized environment by handling much of the networking complexity on behalf of Linux guests.

A VSwitch can support IEEE 802.1Q Virtual LANs (VLANs). It can either manage VLAN tagging on behalf of a virtual machine or can let the virtual machine do its own VLAN support.

VSwitches also provide fault tolerance on behalf of virtual machines. This is provided either using a warm standby mode, or link aggregation mode using a Port Group. In the warm standby mode, up to three OSA Express ports are attached to a VSwitch with one carrying network traffic and the other two ready to take over in case of a failure. In the Port Group mode, up to eight OSA Express ports can be joined for link aggregation. This mode can use the IEEE 802.1AX (formerly 802.3ad) Link Aggregation Control Protocol (LACP). The two modes can actually be combined: a Port Group can be used as the main uplink for the VSwitch, with a further OSA port in the standard mode used as a further backup link.

A z/VM VSwitch can also provide isolation capability, using the Virtual Edge Port Aggregator (VEPA) mode. In this mode, the VSwitch no longer performs any switching between guests that are attached to it. Instead, all packets generated by a guest are transmitted out to the adjacent network switch by the OSA uplink. The adjacent switch must support Reflective Relay (also known as hairpinning) for guests attached to the VSwitch to communicate.

3.6.4 Connecting virtual machines to each other

There are times where virtual machines need specific communication paths to each other. The most common instance of this is clustered services requiring an interconnect or heartbeat connection (such as Oracle RAC, or IBM Spectrum Scale). It is possible to use the standard network interface used for providing service from the Linux guest. However, most cluster services stipulate that the interconnect should be a separate network dedicated to the purpose.

Any of the network technologies described in Section 3.6.3, “Connecting virtual machines to the network”, can be used for a cluster interconnect. Our architecture recommends the use of OSA Express adapters for cluster interconnect for the following reasons:

•Provides cluster connectivity between CPCs without changes

•Provides support for all protocols supported over Ethernet

Cross-CPC connectivity

HiperSockets is a natural first choice for use as a cluster interconnect: it is fast, and highly secure. It can be configured with a large MTU size, making it ideal as a database or file storage interconnect.

However, because HiperSockets exists only within a single CPC, it cannot be used when the systems being clustered span CPCs. If a HiperSockets-based cluster interconnect is implemented for nodes on a single CPC, the cluster is changed to a different interconnect technology if the nodes were to be split across CPCs.

When an OSA Express-based interconnect can be configured with a large MTU size (referred to as Ethernet jumbo frames), OSA Express is a good choice. This is because of the flexibility of OSA Express in being able to deploy cluster nodes across CPCs.

Protocol flexibility

The SMC networking technologies, SMC-D and SMC-R, can also be considered as cluster interconnect technologies. They offer high throughput with low CPU utilization. Unlike HiperSockets, the technology can be used between CPCs (SMC-R).

SMC can increase only the performance of TCP connections; therefore, it might not be usable for all cluster applications (Oracle RAC, for example, uses both TCP and UDP on the interconnect network). SMC operates as an adjunct to the standard network interface and not as a separate physical network. Because of this, it doesn’t meet the usual cluster interconnect requirement of being a logically and physically separate communication path.

3.7 DS8K Enterprise disk subsystem

Determining what type of storage your organization needs can depend on many factors at your site. The IBM LinuxONE can use FCP/SCSI, FICON ECKD or a mix of storage types. The storage decision should be made early on and with the future in consideration. Changing decisions later in the process can create longer migrations to another storage type. The architecture described in this book is based around FICON and ECKD storage, which is required for SSI and the high availability features that it brings.

There are two options available for which disk storage type to choose:

•A 512-byte fixed block open system storage based on the FCP protocol, which is the same storage as for the x86 platform. On this storage you need to define LUNs with the appropriate sizes, which can be found in the product documentations.

•ECKD storage, which requires an enterprise class storage subsystem (IBM DS8000) based on FICON protocol. ECKD volumes need to be defined in the storage subsystem. If the product documentation defined the disk size in GB or TB then you need to transform the sizes in number of cylinders or 3390 models.

The following section helps you to do the calculations for ECKD volume size.

3.7.1 ECKD volume size

ECKD is also known as IBM 3390 volume. The size of an ECKD volume is categorized into models and is counted in cylinders. One cylinder is 849,960 bytes. The base model is a 3390 model 1 (3390 M1 or 3390-1) and it has a size of about 946 MB. The 3390-1, 3309-2 and 3390-3 are no longer used (or only in rare cases). Table 3-1 shows the commonly used sizes for an ECKD volume.

Table 3-1 Commonly used 3390 sizes

Disk Type	Cylinder	Volume size
3390-9	10017	8.1 GB
3390-27	32760	27.8 GB
3390-54	65520	55.6 GB
ECKD EAV	up to 262668	up to 223 GB

The 3390-9 can be used for the operating system (especially for z/VM) and the other types for data. In an enterprise class storage, you find these volume models as predefined selections in the configuration dialog. However, you are not restricted to these specific sizes as you can define any number of cylinders within the limit of max 262668. The ECKD EAV is the extended addressability volume. This means there is no further specific type defined beyond 3390-54.

3.7.2 Disk mirroring

Independent of which storage is chosen, we recommend installing at least two identical storage subsystems and to set up the built mirroring technology and mirror all defined volumes. For this, IBM DS8000 offers Metro Mirror (the former name was Peer-to-Peer Remote Copy) function. Metro Mirror is a synchronous disk replication method and guaranties, at any time, an identical copy of your data. This allows you, at an outage of a disk subsystem, to restart immediately from the other disk subsystem or to immediately switch over to the other disk subsystem. This immediate response is dependent on the high availability functions that are implemented additionally (such as GDPS).

3.7.3 Which storage to use

When deciding between FBA storage and ECKD storage, there are several considerations to review in understanding what is best for your business needs.

FBA storage is more common because it is also usable as storage for the x86 platform. FBA storage has the following attributes:

•Any kind of SCSI disk storage can be used

•It fits into your already implemented monitoring environment

•It does not need any special hardware (SAN switches) or dedicated cabling

•It is less expensive as an enterprise class storage

•If you run IBM LinuxONE in DPM mode, FBA storage is the preferred storage

•Some functions are not available when compared to an enterprise class storage

•Multipathing must be done at the operating system level

•It has limits in scalability

•It does not support GDPS

ECKD or enterprise class storage is unique to the IBM LinuxONE architecture. ECKD storage has the following attributes:

•It supports all the functions available for disk storage systems

•It offers the most performance and scalability

•No additional driver is necessary for multi-pathing; it is implemented in the FICON protocol

•It is supported by GDPS

•It is more expensive when compared to FBS storage

•It requires enterprise class SAN switches

•FICON needs dedicated cabling

If you are considering running GDPS, you are required to use FICON and enterprise class storage. Otherwise, FBA storage is also a good option.

3.8 Temenos Transact

Temenos Infinity and Transact is a multi-module application suite supporting core banking, payments, Islamic fund and various other Retail and Commercial Banking services. Over recent years, this application framework has evolved to support more agile and flexible technologies such as Java and RESTful API services. The latest R19 and R20 releases referred to throughout this book are based on TAFJ (Temenos Application Framework for Java) and can be deployed within a range of Java application servers (such as IBM WebSphere, JBOSS, or Oracle WebLogic). This reduces the proprietary runtime components typically associated with the TAFC application versions. It also allows the application server to control the Enterprise Server processing, messaging, operations, and management features independently of the application instance(s).

This new application suite approach allows clients to integrate new modules and modify or update existing services without impacting the runtime services. It also reduces the development and testing effort required and appeals to the larger community of Java developers. This has also become the de-facto standard for Cloud-based adoption using containers (such as Docker and Podman) and orchestration technologies (for example, Kubernetes) based on Java frameworks.

In summary, the use of the latest Temenos TAFJ-based suite brings many functional advantages. This is based on the ability to exploit the latest Java, Cloud and associated runtime technologies, while allowing the non-functional architectural requirements such as Availability, Scalability, (Transaction) Reliability and Security to be fully exploited on the IBM LinuxONE platform.

The software used when running Temenos Transact is specific and exact. In fact, there is only one specific version of the Linux operating system that is compatible for use. If an organization defers from the recommended list of software, Temenos can deny support.

The following components and minimum release levels are certified to run Temenos Transact:

•Red Hat Enterprise Linux 7

•Java 1.8

•IBM WebSphere MQ 9

•Application Server (noted in the next list)

•Oracle DB 12c

The Temenos Transact software is JAVA based and requires an application server to run. There are several options of application servers:

•IBM WebSphere 9

•Red Hat JBoss EAP

•Oracle WebLogic Server 12c (JDBC driver)

The Temenos Stack Runbooks provide more information about using Temenos stacks with different application servers. Temenos customers and partners can access the Runbooks through either of the following links:

•The Temenos Customer Support Portal: https://tcsp.temenos.com/

•The Temenos Partner Portal: https://tpsp.temenos.com/

3.9 Red Hat Linux

Temenos supports only Red Hat Enterprise Linux (RHEL). IBM LinuxONE LPARs and guests, under z/VM, should be provisioned with RHEL Linux release for s390x. Depending on the LPAR and workload, IBM LinuxONE resources should be tuned specifically for each LPAR for best overall performance.

3.10 IBM WebSphere

In the traditional architecture described in this section, IBM WebSphere Application Server is deployed across four LPARS under z/VM on two CECs. IBM WebSphere Application Server is configured as a stand-alone server and all Temenos Transact components are installed on one instance of the IBM WebSphere Application Server per node. Deploying IBM WebSphere Network Deployment enables a central administration point of a cell that consists of multiple nodes and node groups in a distributed server configuration. The Database connection uses the JDBC driver. This driver is the only driver supported by Temenos Transact and Oracle on IBM LinuxONE.

3.11 Queuing with IBM MQ

No specific tuning or installation considerations are needed for IBM MQ on top of the Transact installation process. Queues should be defined as input shared and (where applicable) defined as persistent. Although failures of IBM MQ are rare, single points of failure should be avoided in any architecture and multiple MQ servers should be deployed.

MQ servers should be configured in an active/passive way, on two Linux systems (possibly guests under z/VM). IBM MQ requires shared storage (such as Spectrum Scale), so they can share MQ vital information (such as logs) and allow the active/passive behavior.

For the installation and configuration process of IBM MQ see “Installing IBM MQ server on Linux,” located at the following link:

https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_8.0.0/com.ibm.mq.ins.doc/q008640_.htm

3.12 Oracle DB on IBM LinuxONE

Oracle Database (DB) is the preferred database for Temenos Transact on IBM LinuxONE with the tradition deployment architecture. Oracle Real Application Clusters (RAC) is used to ensure High Availability (HA) of the database in the event of an outage on one of the database nodes.

3.12.1 Native Linux or z/VM guest deployment

Oracle DB is supported on IBM LinuxONE either as a native installation on an LPAR or as a guest (virtual machine) under z/VM.

3.12.2 Oracle Grid Infrastructure

Oracle Grid Infrastructure is a prerequisite for Oracle Real application Cluster (RAC) and is a suite of software containing Oracle Clusterware and Oracle Automatic Storage Management (ASM).

3.12.3 Oracle Clusterware

Oracle Clusterware is part of the Oracle Grid Infrastructure suite and is required for Oracle Real Application Clusters (RAC). Oracle Clusterware is what allows the independent Linux Guests on the Production HA pair, shown in Figure 4-6 on page 91, to operate as a single database instance to the application and balance the database workloads.

Each DB Node is a stand-alone Linux server. However, Oracle Clusterware allows all Oracle RAC nodes to communicate with each other. Installation of Oracle DB or updates to it could happen across all DB nodes automatically.

Oracle Clusterware has additional shared storage requirements: a voting disk to record node membership and the Oracle Cluster Registry (OCR) for cluster configuration information.

3.12.4 Oracle Automatic Storage Management (ASM)

ASM is a volume manager and file system that groups storage devices into storage groups. ASM simplifies the management of storage devices by balancing the workload across disks in the disk group and exposes the file system interface for the Oracle database files. ASM is used alternatively to conventional volume managers, file systems and raw devices.

Some advantages of ASM are noted in the following list:

•Live add and remove of disk devices

•Ability to use external disk mirroring technology such as IBM Metro Mirror

•Automatically balancing database files across disk devices to eliminate hotspots

The use of ASM is optional and Oracle now supports Spectrum Storage (GPFS) on IBM LinuxONE as an alternative with Oracle RAC.

3.12.5 Oracle Real Application Clusters (RAC)

Oracle RAC is an optional feature from Oracle to provide a highly available scalable database for Temenos Transact on IBM LinuxONE. Oracle RAC is a clustered database that overcomes the limitations of traditional share nothing or share disk approach with a shared cache architecture that does not impact performance.

High availability of Oracle RAC is achieved by removing single points of failure of single node or single-server architectures with multi-node deployments while maintaining the operational efficiency of a single node database. Node failures do not affect the availability of the database because Oracle Clusterware migrates and balances DB traffic to the remaining nodes whether the outage was planned or unplanned. IBM LinuxONE can achieve high availability Oracle RAC clusters on a single IBM LinuxONE. This is done with the use of multiple LPARs to a server as individual DB nodes or with multiple IBM LinuxONE systems in a data center. See the architectural diagram shown in Figure 4-6 on page 91.

With IBM LinuxONE, scalability can be achieved in multiple ways. One way is to add compute capacity to existing Oracle DB LPARs by adding IFLs. A second way is to add additional Oracle DB LPARs to the environment. The ability for scaling the Oracle DB by adding IFLs one at a time is another unique feature of IBM LinuxONE. This ability can have a distinct CPU core savings instead of deploying an entire Linux Server that can have dozens of cores to your Oracle architecture.

Oracle RAC, in an active/active configuration, offers the lowest Recovery Time Objective (RTO). However, this mode is the most resource intensive. Better DB performance has been observed using Oracle RAC One Node. In the event of a failure, Oracle RAC One Node will relocate database services to the standby node automatically. Oracle RAC One Node is a great fit with the scale-up capability of IBM LinuxONE.

See the Oracle documentation for system prerequisites and detailed information for installation and operation at the following link:

https://docs.oracle.com/cd/E11882_01/install.112/e41962/toc.htm

3.12.6 GoldenGate for database replication

The described architecture outlines that storage replication is used for the production LPARs within a metro distance. Because of this configuration, Oracle's GoldenGate real-time database replication software is not required.

3.12.7 Use encrypted volumes for the database

Oracle offers Transparent Data Encryption (TDE) for Oracle DB. TDE can be configured to selectively and transparently encrypt and decrypt sensitive data. However, IBM LinuxONE has a built-in feature that is available to transparently encrypt and decrypt ALL data on the volume. This means your entire DB can be transparently encrypted and decrypted with little impact on DB performance or IFL consumption. The data report shown in Figure 1-6 on page 13 shows nearly 2% impact on transaction rates with Temenos Transact on fully encrypted volumes as compared to non-encrypted volumes. That is a profound advantage over other platforms when you can encrypt everything.

3.12.8 Oracle tuning on IBM LinuxONE

It is recommended to use the following guidance to get the most benefit of Oracle DB on IBM LinuxONE:

•Enabling large pages

It is recommended for performance and availability reasons to implement Linux large pages for Oracle databases that are running on IBM LinuxONE systems. Linux large pages are beneficial for systems where the database's Oracle SGA is greater than 8 GB.

•Defining large frames

Enabling large frames allows the operating system to work with memory frames of 1 MB (on IBM LinuxONE) rather than the default 4 K. This allows smaller page tables and more efficient Dynamic Address Translation. Enabling fixed large frames can save CPU cycles when looking for data in memory. In our testing, transparent huge pages were disabled to ensure that the 1 MB pool was assigned when specified. In our lab environment testing, two components of the Transact architecture benefited from the large frames: Java and Oracle.

•Disabling transparent HugePages with kernel parameter

It is recommended for performance and stability reasons to disable transparent HugePages. Transparent HugePages are different than Linux large pages, which are still highly recommended to use. Use the following command to disable transparent HugePages:

transparent_hugepage=never

•Increasing the Memory pool size

Define the memory pool size for huge pages of 1 MB by adding kernel parameters, to do so use the following command:

default_hugepagesz=1M hugepagesz=1M hugepages=<number of pages>

•Increase fcp queue depth

To maximize the I/O capabilities within the Linux hosting Oracle database, set the zfcp.queue_depth kernel parameter to 256 to increase the default fcp queue size.

You can check whether your system has transparent HugePages enabled by using the following command:

cat /sys/kernel/mm/transparent_hugepage/enabled

[always] madvise never

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 3. Architecture