Shared Memory Communications
 
Naming: The IBM z14 server generation is available as the following machine types and models:
Machine Type 3906 (M/T 3906), Models M01, M02, M03, M04, and M05  further identified as IBM z14 Model M0x, or z14 M0x.
Machine Type 3907 (M/T 3907), Model ZR1  further identified as IBM z14 Model ZR1, or z14 ZR1.
In the remainder of this document, IBM z14 (z14) refers to both machine types, unless otherwise specified.
This appendix briefly describes the optional Shared Memory Communications (SMC) function that is implemented on IBM Z servers as Shared Memory Communications over Remote Direct Memory Access (SMC-R) and Shared Memory Communications - Direct Memory Access (SMC-D) of IBM z14, z13, and z13s servers.
The following types adapters are available for SMC-R for physical connectivity:
25Gbe RoCE Express2 (FC 0430)
10GbE RoCE Express2 (FC 0412)
10GbE RoCE Express (FC 0411): This adapter can be carried forward to z14 ZR1
Throughout this appendix, we use the terms 10GbE RoCE Express and RoCE Express for both adapters, except when the adapter specifications differ.
This appendix includes the following topics:
 
D.1 Overview
As the volume of data that is generated and transmitted by technologies that are driven by cloud, mobile, analytics, and social computing applications grows, pressure increases on business IT organizations to provide fast access to that data across the web, application, and database tiers that comprise most enterprise workloads. SMCs helps to access data faster and with less latency. They also reduce CPU resource consumption over traditional TCP/IP for communications.
D.2 Shared Memory Communication over RDMA
SMC-R (RoCE) for IBM z14 ZR1 includes the following features:
Remote Direct Memory Access (RDMA) technology provides low latency, high bandwidth, high throughput, and low processor utilization attachment between hosts.
SMC-R is a protocol that allows TCP applications to benefit transparently from RDMA for transferring data. Consider the following points:
 – SMC-R uses RoCE Express adapter as the physical transport layer.
 – Initial deployment is limited to z/OS to z/OS communications with a goal to expand usage to more operating systems, and possibly appliances and accelerators.
Single Root I/O Virtualization (SR-IOV) technology provides the capability to share the RoCE Express adapter between logical partitions (LPARs), with the following specifications for z14 ZR1:
 – 25GbE & 10GbE RoCE Express2 features support 31 Virtual Functions (VFs) per physical port for a total of 62 VFs per PCHID (z14 M0x support a different number of VFs for the FC 0412 and FC 0430).
 – 10GbE RoCE Express supports 31 VFs per PCHID.
Maximum number of RoCE Express and RoCE Express2 adapters supported per z14 ZR1 is four (combined).
D.2.1 RDMA technology overview
RDMA over Converged Ethernet (RoCE) is part of the InfiniBand Architecture Specification that provides InfiniBand transport over Ethernet fabrics. It encapsulates InfiniBand transport headers into Ethernet frames by using an IEEE-assigned Ethertype. One of the key InfiniBand transport mechanisms is RDMA, which is designed to allow the transfer of data to or from memory on a remote system with low latency, high throughput, and low CPU usage.
Traditional Ethernet transports, such as TCP/IP, typically use software-based mechanisms for error detection and recovery. They also are based on the underlying Ethernet fabric that uses a “best-effort” policy. With the traditional policy, the switches typically discard packets that are in congestion and rely on the upper-level transport for packet retransmission.
However, RoCE uses hardware-based error detection and recovery mechanisms that are defined by the InfiniBand specification. A RoCE transport performs best when the underlying Ethernet fabric provides a lossless capability, where packets are not routinely dropped.
This process can be accomplished by using Ethernet flow control where Global Pause frames are enabled for both transmission and reception on each of the Ethernet switches in the path between the RoCE Express/Express2 adapters. This capability is enabled in the RoCE Express/Express2 adapter by default.
The following key requirements for RDMA are shown in Figure D-1:
A reliable “lossless” Ethernet network fabric (LAN for layer 2 data center network distance)
An RDMA network interface card (RNIC)
Figure D-1 RDMA technology overview
RDMA technology is now available on Ethernet. RoCE uses an Ethernet fabric (switches with Global Pause enabled) and requires advanced Ethernet hardware (RNICs on the host).
D.2.2 Shared Memory Communications over RDMA
SMC-R is a protocol that allows TCP socket applications to transparently use RDMA. It also is a “hybrid” solution (see Figure D-2 on page 428) that includes the following features:
Uses a TCP connection to establish the SMC-R connection.
A TCP option (SMCR) controls switching from TCP to “out-of-band” SMC-R.
SMC-R information is exchanged within the TCP data stream.
Socket application data is exchanged through RDMA (write operations).
Retains TCP connection to control the SMC-R connection.
Preserves many critical operational and network management features of TCP/IP.
Figure D-2 Dynamic transition from TCP to SMC-R
The hybrid model of SMC-R uses the following key attributes:
Follows the standard TCP/IP connection setup.
Switches to RDMA (SMC-R) dynamically.
TCP connection remains active (idle) and is used to control the SMC-R connection.
Preserves the following critical operational and network management TCP/IP features:
 – Minimal (or zero) IP topology changes
 – Compatibility with TCP connection-level load balancers
 – Preservation of the IP security model, such as IP filters, policies, virtual LANs (VLANs), and Secure Sockets Layer (SSL)
 – Minimal network administration and management changes
Host application software is not required to change; therefore, all host application workloads can benefit immediately.
D.2.3 Single Root I/O virtualization
Single Root I/O virtualization (SR-IOV) is a technology that provides the capability to share the adapter between LPARs. SR-IOV is also designed to provide isolation of virtual functions within the PCIe RoCE Express/Express2 adapter. For example, one LPAR cannot cause errors that are visible to other virtual functions or other LPARs. Each operating system that is running in an LPAR includes its own application queue in its own memory space.
The concept of the Shared RoCE Mode is shown in Figure D-3.
Figure D-3 Shared RoCE mode concepts
The Physical Function Driver communicates with the physical function in the PCIe adapter and is responsible for the following functions:
Manage resource allocation
Perform hardware error handling
Perform code updates
Run diagnostics
The device-specific IBM Z Licensed Internal Code (LIC) connects Physical Function Driver to Support Elements (SEs) and limited system level firmware required services.
D.2.4 Hardware
The 10GbE RoCE Express adapter (FC 0411), 25GbE RocE Express2 (FC 0430), and 10GbE RoCE Express2 (FC 0412) are RDMA-capable NICs. The integrated firmware processor (IFP) runs four resource groups (RGs) that contain firmware for the RoCE Express adapter. For more information, see C.1.3, “Resource groups” on page 421.
D.2.5 RoCE Express/Express2 adapter
The RoCE Express adapters are designed to help reduce the consumption of CPU resources for applications that use the TCP/IP stack, such as WebSphere accessing a Db2 database. The use of RoCE Express also helps to reduce network latency with memory-to-memory transfers that use SMC-R in z/OS V2.1 or later. It is not apparent to applications and can be used for LPAR-to-LPAR communications on a single z/OS system or server-to-server communications in a multiple CPC environment.
The 10GbE RoCE Express2 adapter that is shown in Figure D-4 is installed in the PCIe+ I/O drawer.
Figure D-4 10GbE RoCE Express2
Each PCIe adapter has two ports. A maximum of four adapters can be installed in a z14 ZR1 server. The adapter use a short reach (SR) laser as the optical transceiver and support the use of a multimode fiber optic cable that ends with an LC Duplex connector. Point-to-point connection (with another RoCE Express/Express2 adapter of the SAME speed) and switched connection with an enterprise-class switch are supported.
 
RoCE Physical Connectivity: The 25GbE RoCE Express2 feature does not support negotiation (to a lower speed). Therefore, it must be connected to a 25 Gbps port of an Ethernet Switch or to another 25GbE RoCE Express2 feature.
The 10GbE RoCE Express and 10GbE RoCE Express2 features can be connected to each other in a point-to-point connection or to a 10 Gbps port of an Ethernet switch.
SMC-R can use direct RoCE Express to RoCE Express connectivity (without any switch). However, this type of direct physical connectivity forms a single physical point-to-point connection, which disallows any other connectivity with other LPARs, such as other SMC-R peers. Although this option is viable for test scenarios, it is not practical (nor recommended) for production deployment.
If the IBM RoCE Express/Express2 adapters are connected to Ethernet switches, the switches must support the following requirements:
10 Gbps or 25 Gbps ports (depending on the RoCE feature specifications
Global Pause function frame (as described in the IEEE 802.3x standard) must be enabled
Priority flow control (PFC) disabled
No firewalls, outing, or intraensemble data network (IEDN)
The maximum supported unrepeated distance, point-to-point at initial introduction was 300 meters for 10 Gbps features. The 25GbE RoCE Express2 feature supports a maximum unrepeated distance of 100 meters (328 feet). These distances can be extended across multiple cascaded switches or qualified DWDMs to even 100 km (62 miles). For more information, see the SMC-R over distance presentation.
12.7.2 RoCE Express/Express2 configuration example
 
Mixing of RoCE Generations: Mixing generations of RoCE adapters on the same stack is supported with the following considerations:
25GbE RoCE Express2 should not be mixed with 10GbE RoCE Express2 or 10GbE RoCE Express in the same SMC-R Link Group.
10GbE RoCE Express2 can be mixed with 10GbE RoCE Express (that is, provisioned to the same TCP/IP stack or same SMC-R Link Group).
A sample configuration that allows redundant SMC-R connectivity among LPAR A and C, and LPAR 1, 2 and 3 is shown in Figure D-5. Each adapter can be shared or dedicated to an LPAR. As shown in Figure D-5, two adapters per LPAR are advised for redundancy.
Figure D-5 10GbE RoCE Express sample configuration
The configuration that is shown in Figure D-5 allows redundant SMC-R connectivity among LPAR A, LPAR C, LPAR 1, LPAR 2, and LPAR 3. LPAR to LPAR OSD connections are required to establish the SMC-R communications. The 1 GbE OSD connections can be used. OSD connections can flow through the same switches or different switches.
 
Note: The OSA-Express Adapter and the RoCE Express adapter must be associated to each other by having equal PNET IDs (defined in the hardware configuration definition [HCD]).
An OSA-Express adapter, which is defined as channel-path identifier (CHPID) type OSD, is required to establish SMC-R. The interaction of OSD and the RNIC is shown in Figure D-6.
Figure D-6 RNIC and OSD interaction
The OSA adapter might be a single or pair of 10 GbE, 1 GbE, or 1000Base-T OSAs. The OSA must be connected to another OSA on the system with which the RoCE adapter is communicating. As shown in Figure D-5 on page 431, 1 GbE OSD connections can still be used instead of 10 GbE and OSD connections can flow through the same 10 GbE switches.
Consider the following points regarding Figure D-6:
The z/OS system administrator must configure and manage the OSD interface only.
The Communications Server transparently splits and converges network traffic to and from the converged interface.
Only OSD connectivity must be configured.
With SMC-R, the RNIC interface is dynamically and transparently added and configured.
D.2.6 Hardware configuration definitions
The following HCDs are important:
Function ID
The RoCE adapter is identified by a hexadecimal Function Identifier (FID). It features a dedicated limit in the range 00 - FF, and the shared limit is 000 - 0FFF in the HCD or Hardware Management Console (HMC) to create the I/O configuration program (IOCP) input.
An FID can be configured to only one LPAR, but it is reconfigurable. The RoCE adapter, as installed in a specific PCIe+ I/O drawer and slot, is to be used for the defined function. The physical installation (drawer and slot) determines the physical channel identifier (PCHID). Only one FID can be defined for dedicated mode.
Virtual Function ID
Virtual Function ID is defined when PCIe hardware is shared between LPARs. Virtual Function ID has a decimal Virtual Function Identifier (VF=) in the range of 1 - nn, where nn is the maximum number of partitions that the PCIe adapter supports. For example, at z14 ZR1 the RoCE Express2 adapter supports up to 62 partitions, and a zEDC Express adapter supports up to 15.
Physical network (PNet) ID
As one parameter for the FUNCTION statement, the PNet ID is a client-defined value for logically grouping OSD interfaces and RNIC adapters based on physical connectivity. The PNet ID values are defined for OSA and RNIC interfaces in the HCD.
A PNet ID is defined for each physical port. z/OS Communications Server receives the information during the activation of the interfaces and associates the OSD interfaces with the RNIC interfaces that include matching PNet ID values.
 
Attention: Activation fails if you do not configure a PNet ID for the RNIC adapter. Activation succeeds if you do not configure a PNet ID for the OSA adapter; however, the interface is not eligible to use SMC-R.
D.2.7 Software use of SMC-R
SMC-R can be implemented on the RoCE and can communicate memory-to-memory, which avoids the CPU resources of TCP/IP by reducing network latency and improving wall clock time. It focuses on “time to value” and widespread performance benefits for all TCP socket-based middleware.
The following advantages are gained, as shown in Figure D-7 on page 434:
No middleware or application changes (transparent)
Ease of deployment (no IP topology changes)
LPAR-to-LPAR communications on a single central processing complex (CPC)
Server-to-server communications in a multi-CPC environment
Retained key qualities of service that TCP/IP offers for enterprise class server deployments (high availability, load balancing, and an IP security-based framework)
Figure D-7 Reduced latency and improved wall clock time with SMC-R
D.2.8 SMC-R support overview
SMC-R needs hardware and software support, as described in this section.
Hardware requirements
SMC-R requires the following hardware:
PCIe-based RoCE Express2:
 – z14 systems
 – Dual port 25GbE or 10GbE adapter
PCIe-based RoCE Express:
 – z14, z13, z13s, zEC12, and zBC12
 – Dual port 10 GbE adapter:
 • z14 ZR1 maximum 4 RoCE Express2/Express adapters per CPC
 • z14 M0x maximum 8 RoCE Express2/Express adapters per CPC
HCD and input/output configuration data set (IOCDS): PCIe FID, VF (sharing), and RoCE configuration with PNet ID.
Optional: Standard switch (CEE-enabled switch is not required).
Required queued direct input/output (QDIO) Mode OSA connectivity between z/OS LPARs, as shown in Figure D-5 on page 431.
Adapter MUST be dedicated to an LPAR on a zEC12 or zBC12. It must be shared (or at least in shared mode) to one or more LPARs on a z14, z13, or z13s systems.
SMC-R cannot be used in IEDN.
Software requirements
SMC-R requires the following software:
z/OS V2R1 (with PTFs) or higher are the only supported operating systems for the SMC-R protocol. You cannot roll back to previous z/OS releases.
z/OS guests under z/VM 6.4 or later are supported to use RoCE adapters.
IBM is working with its Linux distribution partners to include support in future Linux on Z distribution releases.
Other RoCE considerations
RoCE includes the following considerations:
RoCE system limits:
 – 62 unique VLANs per PCHID physical port for z14 ZR1
 – Each VF ensures a minimum of 2 VLANs max of 16
z/OS CS consumption of RoCE virtual resources:
 – One VF used per TCP stack (per PFID / port)
 – One virtual Media Access Control (VMAC) per VF (z/OS uses PF generated VMAC)
 – One VLAN ID (up to 16) per OSA VLAN (“inherited” as TCP connections occur)
z/OS Communications Server Migration considerations:
 – RoCE HCD (IOCDS) configuration changes are required
 – z/OS RoCE users might be required to make a TCP/IP configuration change; that is, TCP/IP profiles (PFIDs) might be compatible with shared RoCE
Changes are required for RoCE users for the following cases:
 – z/OS users who use multiple TCP/IP stacks and both stacks currently use the same RoCE adapter (single z/OS image sharing a physical adapter among multiple stacks).
 – z/OS users who need to use physical RoCE ports from the same z/OS instance (not “best practices”, but is allowed).
 – z/OS users who do not continue the use of (coordinate) the same PFID values (continue the use of the PFID value that is used in the dedicated environment for a specific z/OS instance) when multiple PFIDs and VFs are added to the same adapter (for more shared users).
D.2.9 SMC-R use cases for z/OS to z/OS
SMC-R with RoCE provides high-speed communications and “HiperSockets-like” performance across physical processors. It can help all TCP-based communications across z/OS LPARs that are in different CPCs.
The following typical communications patterns are used:
Optimized Sysplex Distributor intra-sysplex load balancing
WebSphere Application Server type 4 connections to remote Db2, IMS, and CICS instances
IBM Cognos® to Db2 connectivity
CICS to CICS connectivity through Internet Protocol interconnectivity (IPIC)
Optimized Sysplex Distributor intra-sysplex load balancing
Dynamic virtual IP address (VIPA) and Sysplex Distributor support are often deployed for high availability (HA), scalability, and so on, in the sysplex environment.
When the clients and servers are all in the same sysplex, SMC-R offers a significant performance advantage. Traffic between client and server can flow directly between the two servers without traversing the Sysplex Distributor node for every inbound packet, which is the current model with TCP/IP. In the new model, only connection establishment flows must go through the Sysplex Distributor node.
Sysplex Distributor before RoCE
A traditional Sysplex Distributor is shown in Figure D-8.
Figure D-8 Sysplex Distributor before RoCE
The traditional Sysplex Distributor features the following characteristics:
All traffic from the client to the target application goes through the Sysplex Distributor TCP/IP stack.
All traffic from the target application goes directly back to the client by using the TCP/IP routing table on the target TCP/IP stack.
Sysplex Distributor after RoCE
A RoCE Sysplex Distributor is shown in Figure D-9. Consider the following points:
The initial connection request goes through the Sysplex Distributor stack.
The session then flows directly between the client and the target over the RoCE adapters.
Figure D-9 Sysplex Distributor after RoCE
 
Note: As with all RoCE Communications, the session end also flows over OSAs.
D.2.10 Enabling SMC-R support in z/OS Communications Server
The following checklist provides a task summary for enabling SMC-R support in z/OS Communications Server. This list assumes that you start with an IP configuration for LAN access that uses OSD:
o HCD definitions (install and configure RNICs in the HCD):
o Add the PNet ID for the current OSD.
o Define PFIDs for RoCE (with the same PNet ID).
o Specify the GLOBALCONFIG SMCR parameter (TCP/IP Profile):
o Must specify at least one PCIe Function ID (PFID):
o A PFID represents a specific RNIC adapter.
o A maximum of 16 PFID values can be coded.
o Up to eight TCP/IP stacks can share a RoCE PCHID (RoCE adapter) in a specific LPAR (each stack must define a unique FID value).
o Start the IPAQENET or IPAQENET6 INTERFACE with CHPIDTYPE OSD:
o SMC-R is enabled, by default, for these interface types.
o SMC-R is not supported on any other interface types.
 
Note: The IPv4 INTERFACE statement (IPAQENET) must also specify an IP subnet mask
o Repeat in each host (at least two hosts).
Start the TCP/IP traffic and monitor it with Netstat and IBM VTAM displays.
 
Note: For RoCE Express2, the PCI Function IDs (PFIDs) are now associated with a specific (single) physical port (that is, port 0 or port 1). The port number is now configured with the FID number in HCD (or IOCDS) and the port number must be configured (no default is available). z/OS CommServer does not learn the RoCE generation until activation. During activation, CommServer learns the port number for RoCE Express2.
Consider the following points:
The port number for RoCE Express is configured in z/OS TCP/IP profile and does not change.
When defining a FID in the TCP/IP profile for RoCE Express2, the port number is no longer applicable (it is ignored for RoCE Express2).
A warning message is issued if the TCP/IP profile does not match the HCD configured value (that is, the value is ignored and it is incorrect).
D.3 Shared Memory Communications - Direct Memory Access
This section describes the new SMC-D functions that are implemented in IBM z14, and z13 and z13s (Driver Level 27) systems.
D.3.1 Concepts
The collocation of multiple tiers of a workload onto a single IBM Z physical system allows for the use of HiperSockets, which is an internal LAN technology that provides low-latency communication between virtual machines within a physical IBM Z CPC.
HiperSockets is implemented fully within IBM Z firmware; therefore, it requires no physical cabling or external network connection to purchase, maintain, or replace. The lack of external components also provides for a secure and low latency network connection because data transfer occurs, much like a cross-address-space memory move.
SMC-D maintains the socket-API transparency aspect of SMC-R so that applications that use TCP/IP communications can benefit immediately without requiring any application software or IP topology changes. SMC-D completes the overall Shared Memory Communications solution, which provides synergy with SMC-R. Both protocols use shared memory architectural concepts, which eliminate TCP/IP processing in the data path, yet preserves TCP/IP Qualities of Service for connection management purposes.
From an operations standpoint, SMC-D is similar to SMC-R. The objective is to provide consistent operations and management tasks for SMC-D and SMC-R. SMC-D uses a new virtual PCI adapter that is called Internal Shared Memory (ISM). The ISM Interfaces are associated with IP interfaces; for example, HiperSockets or OSA (ISM interfaces do not exist without an IP interface).
ISM interfaces are not defined in software. Instead, ISM interfaces are dynamically defined and created, and automatically started and stopped. You do not need to operate (Start or Stop) ISM interfaces. Unlike RoCE, ISM FIDs (PFIDs) are not defined in software. Instead, they are auto-discovered based on their PNet ID.
SMC-R uses RDMA (RoCE), which is based on Queue Pair (QP) technology. Consider the following points:
RC-QPs represent SMC Links (logical point-to-point connection).
RC-QPs over unique RNICs are logically bound together to form Link Groups (used for HA and load balancing).
Link Groups (LGs) and Links are provided in many Netstat displays (for operational and various network management tasks).
SMC-D over ISM does not use QPs. Consider the following points:
Links and LGs based on QPs (or other hardware constructs) are not applicable to ISM. Therefore, the SMC-D information in the Netstat command displays is related to ISM link information rather than LGs.
SMC-D protocol (such as SMC-R) feature a design concept of a “logical point-to-point connection” and preserves the concept of an SMC-D Link (for various reasons that include network administrative purposes).
 
Note: The SMC-D information in the Netstat command displays is related to ISM link information (not LGs).
D.3.2 Internal Shared Memory technology overview
ISM is a function that is supported by the z14, z13, and z13s systems. It is the firmware that provides the connectivity for shared memory access between multiple operating systems within the same CPC. It provides the same functionality as SMC-R, but without physical adapters (such as the RoCE adapter) by using instead virtual ISM devices as SMC-R. It is a HiperSocket-like function that provides guest-to-guest communications within the same machine. A possible solution that uses only SMC-D is shown in Figure D-10.
Figure D-10 Connecting two LPARs on the same CPC by using SMC-D
SMC-D and SMC-R technologies can be used at the same time on the same CPCs. A fully configured three-tier solution that uses SMC-D and SMC-R is shown in Figure D-11.
Figure D-11 Clustered systems: Multitier application solution. RDMA, and DMA
D.3.3 SMC-D over Internal Shared Memory
ISM is a virtual channel that is similar to IQD for HiperSockets. A virtual adapter is created in each OS. The memory is logically shared by using the SMC protocol. The network is firmware-provided and a new device is required to manage that virtual function. SMC is based on a TCP/IP connection and preserves the entire network infrastructure.
SMC-D is a protocol that allows TCP socket applications to transparently use ISM. It is a “hybrid” solution, as shown in Figure D-12 on page 440.
Figure D-12 Dynamic transition from TCP to SMC-D by using two OSA-Express adapters
Consider the following points:
It uses a TCP connection to establish the SMC-D connection.
The TCP connection can be through the OSA adapter or IQD HiperSockets.
A TCP option (SMCD) controls switching from TCP to “out-of-band” SMC-D.
The SMC-D information is exchanged within the TCP data stream.
Socket application data is exchanged through ISM (write operations).
The TCP connection remains to control the SMC-D connection.
This model preserves many critical operational and network management features of TCP/IP.
The hybrid model of SMC-D uses the following key attributes:
It follows the standard TCP/IP connection setup.
The hybrid model switches to ISM (SMC-D) dynamically.
The TCP connection remains active (idle) and is used to control the SMC-D connection.
The hybrid model preserves the following critical operational and network management TCP/IP features:
 – Minimal (or zero) IP topology changes
 – Compatibility with TCP connection-level load balancers
 – Preservation of the IP security model, such as IP filters, policies, VLANs, and SSL
 – Minimal network administration and management changes
Host application software is not required to change; therefore, all host application workloads can benefit immediately.
D.3.4 Internal Shared Memory introduction
The IBM z14, IBM z13 (Driver 27), and z13s systems support the ISM virtual PCI function. ISM is a virtual PCI network adapter that enables direct access to shared virtual memory that provides a highly optimized network interconnect for IBM Z intra-CPC communications.
ISM introduces a new static virtual channel identifier (VCHID) Type. The VCHID is referenced in IOCDS/HCD. The ISM VCHID concepts are similar to the IQD (HiperSockets) type of virtual adapters. ISM is based on IBM Z PCIe architecture (that is, virtual PCI function or adapter). It introduces a new PCI Function Group and type (ISM virtual PCI). A new virtual adapter is scheduled for release.
The system administrator, configuration, and operations tasks follow the same process (HCD/IOCDS) as PCI functions, such as RoCE Express and zEDC Express. ISM supports dynamic I/O.
ISM Provides adapter virtualization (Virtual Functions) with high scalability. Consider the following points:
It supports up to 32 ISM VCHIDs per CPC (z14, z13, or z13s servers, each VCHID represents a unique internal shared memory network each with a unique Physical Network ID).
Each VCHID supports up to 255 VFs per VCHID (the maximum is 8 k VFs per CPC), which provide significant scalability.
 
Note: No concept of a PCI Physical Function is available to provide virtualization. No concept of MACs, MTU, or Frame size is available.
Each ISM VCHID represents a unique and isolated internal network, each having a unique Physical Network ID (PNet IDs are configured in HCD/IOCDS).
ISM VCHIDs support VLANs; therefore, subdividing a VCHID by using virtual LANs is supported.
ISM provides a Global Identifier (GID) that is internally generated to correspond with each ISM FID.
ISM is supported by z/VM in pass-through mode (PTF required).
D.3.5 Virtual PCI Function (vPCI Adapter)
Virtual Function ID is defined when PCIe hardware is shared between LPARs. Virtual Function ID includes a decimal Virtual Function Identifier (VF=) in the range 1 - nn, where nn is the maximum number of partitions that the PCIe adapter supports. For example, the SMC-D ISM supports up to 32 partitions, and a zEDC Express adapter supports up to 15.
The following basic infrastructure is available:
zPCI architecture
RoCE, zEDC, ISM
zPCI layer in z/OS and Linux for z systems
vPCI for SD queues
The basic concept vPCI adapter is shown in Figure D-13.
Figure D-13 Concept of vPCI adapter implementation
 
Note: The following basic z/VM support is available:
Generic zPCI pass-through support starting from z/VM 6.3
The use of the zPCI architecture remains unchanged
An SMC-D configuration in which Ethernet provides the connectivity is shown in Figure D-14.
Figure D-14 SMC-D configuration that uses Ethernet to provide connectivity
An SMC-D configuration in which HiperSockets provide the connectivity is shown in Figure D-15.
Figure D-15 SMC-D configuration that uses HiperSockets to provide connectivity
D.3.6 Planning considerations
In the z/OS SMC-D implementation, z/OS uses a single VF per ISM PNet. This configuration is true for a single VLAN or for multiple VLANs per PNet. The number of VLANs that are defined for a specific PNet does not affect the number of VFs required.
z/OS Communications Server requires one ISM FID per ISM PNet ID per TCP/IP stack. This requirement is not affected by the version of the IP (that is, it is true even if both IPv4 and IPv6 are used).
z/OS might use more ISM FIDs for the following reasons:
IBM supports up to eight TCP/IP stacks per z/OS LPAR. SMC-D can use up to eight FIDs or VFs (one per TCP/IP stack).
IBM supports up to 32 ISM PNet IDs per CPC. Each TCP/IP stack can have access to PNet ID that uses up to 32 FIDs (one VF per PNet ID).
D.3.7 Hardware configuration definitions
Complete the following steps to use HCDs:
1. Configure ISM vPCI Functions (HCD/HCM).
2. Define PNet IDs (OSA, HiperSockets [IQD], and ISM) in HCD/HCM.
3. Activate the definition by using HCD.
4. Enable SMC-D in at least two z/OS instances, which are a single parameter in TCP/IP Global configuration. Both z/OS instances must run on the same CPC.
5. Review and adjust as needed the available real memory and fixed memory usage limits (z/OS and CS). SMC requires fixed memory. You might need to review the limits and provision more real memory for z/OS.
6. Review the IP topology, VLAN usage considerations, and IP security. For more information, see the IBM z/OS Shared Memory Communications: Security Considerations white paper.
7. Run Shared Memory Communications Applicability Tool (SMC-AT) to evaluate applicability and potential value.
8. Review changes to messages, monitoring information, and diagnostic tools. Similar to SMC-R, many updates are made to the following items:
 – Messages (VTAM and TCP stack)
 – Netstat (status, monitoring, and display information)
 – CS diagnostic tools (VIT, Packet trace, CTRACE, and IPCS formatted dumps)
 
Note: No application changes (transparent to Socket applications) are made. Also, no optional operation changes are required (for example starting or stopping devices).
ISM Functions must be associated with another channel (CHID) of one of the following types:
IQD (a single IQD HiperSockets) channel
OSD channels
 
Note: A single ISM PCHID cannot be associated with both IQD and OSD.
D.3.8 Sample IOCP FUNCTION statements
The IOCP FUNCTION statements (see Example D-1) describe the configuration that defines ISM adapters that are shared between LPARs on the same CPC, as shown in Figure D-16.
Example D-1 OCP FUNCTION statements
FUNCTION FID=1017,VCHID=7E1,VF=1,PART=((LP1),(LP1,LP2)),PNETID=(PNET1),TYPE=ISM
FUNCTION FID=1018,VCHID=7E1,VF=2,PART=((LP2),(LP1,LP2)),PNETID=(PNET1),TYPE=ISM
Figure D-16 ISM adapters that are shared between LPARs
 
Note: On the IOCDS statement, the VCHID is defined as 7E1. As shown in Figure D-16, the ISM network “PNET 1” is referenced by the IOCDS VCHID statement. ISM (as with IQD) does not use physical adapters or adapter slots (PCHID). Instead, only logical (firmware) instances that are defined as VCHIDs in IOCDS are used.
A sample IOCP FUNCTION configuration (see Example D-2) that defines ISM adapters that are shared between LPSRs and multiple VLANs on the same CPC as shown in Figure D-17 on page 446.
Example D-2 Sample IOCP Function
FUNCTION FID=1017,VCHID=7E1,VF=1,PART=((LPAR3),(LPAR3,LPAR4)),PNETID=(PNETA),TYPE=ISM
FUNCTION FID=1018,VCHID=7E1,VF=2,PART=((LPAR4),(LPAR3,LPAR4)),PNETID=(PNETA),TYPE=ISM
FUNCTION FID=1019,VCHID=7E1,VF=3,PART=((LPAR5),(LPAR4,LPAR5)),PNETID=(PNETA),TYPE=ISM
FUNCTION FID=1020,VCHID=7E1,VF=4,PART=((LPAR6),(LPAR5,LPAR6)),PNETID=(PNETA),TYPE=ISM
FUNCTION FID=1021,VCHID=7E1,VF=5,PART=((LPARn),(LPAR6,LPARn)),PNETID=(PNETA),TYPE=ISM
FUNCTION FID=1022,VCHID=7E2,VF=1,PART=((LPAR1),(LPAR1,LPAR2)),PNETID=(PNETB),TYPE=ISM
FUNCTION FID=1023,VCHID=7E2,VF=2,PART=((LPAR2),(LPAR1,LPAR2)),PNETID=(PNETB),TYPE=ISM
FUNCTION FID=1024,VCHID=7E2,VF=3,PART=((LPARn),(LPAR1,LPARn)),PNETID=(PNETB),TYPE=ISM
 
Figure D-17 Multiple LPARs connected through multiple VLANs
Workloads can be logically isolated on separate ISM VCHIDs. Alternatively, workloads can be isolated by using VLANs. The ISM VLAN definitions are inherited from the associated IP network (OSA or HiperSockets).
Configuration considerations
The IOCDS (HCD) definitions for ISM PCI VFs are not directly related to the software (SMC-D) use of ISM (that is, the z/OS TCP/IP and SMC-D implementation and usage are not directly related to the I/O definition).
The user defines a list if ISM FIDs (VFs) in IOCDS (HCD), and z/OS dynamically selects an eligible FID that is based on the required PNet ID. FIDs or VFs are not defined in Communications Server for z/OS TCP/IP. Instead, z/OS selects an available FID for a specific PNET. Access to more VLANs does not require configuring extra VFs.
 
Note: Consider over-provisioning the I/O definitions; for example, consider defining eight FIDs instead of five.
For native PCI devices, FIDs must be defined. Each FID in turn also defines a corresponding VF. In terms of operating system administration tasks, the administrator typically references FIDs. VFs (and VF numbers) often are transparent.
D.3.9 Software use of ISM
ISM enables SMC-D, which provides SMC capability within the CPC (SMC without requiring RoCE hardware or network equipment). Host virtual memory is managed by each OS (similar to SMC-R, logically shared memory) following IBM Z PCI I/O translation architecture.
Only minor changes are required for z/VM guests. An OS can be enabled for SMC-R and SMC-D. SMC-D is used when both peers are within the same CPC (and ISM PNet and VLAN). After the ISM HCD configuration is complete, SMC-D can be enabled in z/OS with a single TCP/IP parameter (GLOBALCONFIG SMCD). ISM FIDs must be associated with an IP network. The association is accomplished by matching PNet IDs (for example, HiperSockets and ISM).
 
Note: ISM FIDs must be associated with HiperSockets or with an OSA adapter by using a PNet ID. It cannot be associated to both.
D.3.10 SMC-D over ISM prerequisites
SMC-D over ISM features the following prerequisites:
IBM z14, z13s, or z13 (Driver 27): HMC/SE for ISM vPCI Functions.
At least two z/OS V2.2 systems (or later) in two LPARs on the same CPC with required service installed:
 – SMC-D can communicate with another z/OS V2.2 instance only and peer hosts must be on the same CPC and ISM PNet.
 – SMC-D requires an IP Network with access through OSA or HiperSockets that includes a defined PNet ID that matches the ISM PNet ID.
If running as a z/OS guest under z/VM, z/VM 6.3 with APAR VM65716, including APARs is required for guest access to RoCE (Guest Exploitation only).
Linux support is planned for a future deliverable.
The required APARs per z/OS subsystem are listed in Table D-1.
Table D-1 Prerequisite APARs for SMC-D enablement
Subsystem
FMID
APAR
IOS
HBB77A0
OA47913
Communications Server SNA VTAM
HVT6220
OA48411
Communications Server IP
HIP6220
PI45028
HCD
HCS77A0
HCS7790
HCS7780
HCS7770
HCS7760
HCS7750
OA46010
IOCP
HIO1104
OA47938
HCM
HCM1F10
HCM1E10
HCM1D10
HCM1C10
HCM1B10
HCM1A10
IO23612
 
Restrictions: SMC (existing architecture) cannot be used in the following circumstances:
Peer hosts are not within the same IP subnet and VLAN
TCP traffic requires IPSec or the server uses FRCA
D.3.11 Enabling SMC-D support in z/OS Communications Server
The new parameter SMCD (see Figure D-18) is available on the GLOBALCONFIG statement in the TCP/IP profile of the z/OS Communications Server (similar to the SMCR parameter). The SMCD parameter is the only parameter that is required to enable SMC-D.
Figure D-18 SMCD parameter in GLOBALCONFIG
The key difference from the SMCR parameter is that ISM PFIDs are not defined in TCP/IP. Rather, ISM FIDs are discovered automatically based on matching PNETID that is associated with the OSD or HiperSockets. An extract from z/OS Communications Server: IP Configuration Reference is shown in Figure D-18.
D.3.12 SMC-D support overview
SMC-D requires IBM z14, or IBM z13 and IBM z13s servers at driver level 27 or later for ISM support.
 
IOCP required level: The required level of IOCP for z14 is V5 R4 L1 or later with PTFs. Defining ISM devices other the z14, z13, or z13s systems is not possible. For more information, see the following publications:
IBM Z Stand-Alone Input/Output Configuration Program User's Guide, SB10-7166
IBM Z Input/Output Configuration Program User's Guide for ICP IOCP, SB10-7163
SMC-D requires the following software:
z/OS V2R2 with PTFs (see Table D-1 on page 447) or later is the only supported operating system for the SMC-D protocol. Consider the following points:
 – HCD APAR (OA46010) is required.
 – You cannot roll back to previous z/OS releases.
z/OS guests under z/VM 6.3 and later are supported to use SMC-D.
At the time of this writing, IBM is working with its Linux distribution partners to include support in future Linux on Z distribution releases.
Other ISM considerations
ISM systems include the following limits:
A total of 32 ISM VCHIDs (in IOCDS/HCD) per CPC. Each IOCDS/HCD VCHID represents a unique internal shared memory network each with a unique Physical Network ID.
A total of 255 VFs per VCHID (8k VFs per CPC). For example, the maximum number of virtual servers that can communicate over the same ISM VCHID is 255.
Each ISM VCHID in IOCDS/HCD represents a unique (isolated) internal network, each having a unique Physical Network ID (PNet IDs are configured in HCD/IOCDS).
ISM VCHIDs support VLANs (can be subdivided into VLANs).
ISM provides a GID (internally generated) to correspond with each ISM FID.
All MACs (VMACs), MTU, physical ports, and Frame sizes are not applicable.
ISM is supported by z/VM (for pass-through guest access to support the new PCI function).
More information
For more information about a configuration example for SMC-D, see IBM z/OS V2R2 Communications Server TCP/IP Implementation - Volume 1, SG24-8360.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.234.225