Chapter 2. Infrastructure planning for availability and GDPS

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Infrastructure planning for availability and GDPS

In this chapter, we discuss several technologies that are available to help you achieve your goals related to IT resilience, recovery time, and recovery point objectives. To understand how the IBM GDPS offerings described in this book can help you, it is important to have at least conceptual knowledge of the functions, capabilities, and limitations of these underlying technologies.

This chapter includes the following topics:

•2.1, “Parallel Sysplex overview” on page 14

•2.2, “Data consistency” on page 17

•2.3, “Synchronous versus asynchronous data transfer” on page 19

•2.4, “Data replication technologies” on page 22

•2.5, “Tape resident data” on page 38

•2.6, “FlashCopy” on page 38

•2.7, “Automation” on page 41

•2.8, “Flexible server capacity” on page 42

•2.9, “Cross-site connectivity considerations” on page 44

•2.10, “Testing considerations” on page 50

•2.11, “Summary” on page 51

2.1 Parallel Sysplex overview

As discussed in Chapter 1, “Introduction to business resilience and the role of GDPS” on page 1, IT resilience covers more than just recovery from a disaster. It also encompasses ensuring high availability on a day-to-day basis, protecting your applications from normal planned, and unplanned outages. You cannot expect to be able to provide continuous or near-continuous application availability across a disaster if you are unable to provide that in normal operations.

Parallel Sysplex is the primary mechanism that is used by IBM to provide the highest levels of application availability on the IBM Z¹ platform. The logical first step in a business resiliency project is to do all you can to deliver the highest levels of service from your existing configuration. Implementing Parallel Sysplex with data sharing and dynamic workload routing provides higher levels of availability now. It also provides a foundation to achieve greater resiliency if you implement GDPS.

In the following sections we briefly discuss Parallel Sysplex, the benefits you can derive by using the technology, and the points to consider if you decide to implement GDPS Metro or GDPS Continuous Availability. Because GDPS XRC and GDPS GM do not have a continuous availability (CA) aspect, there are no Parallel Sysplex considerations specifically relating to GDPS XRC and GDPS GM. There are also no Parallel Sysplex considerations for the IBM GDPS Virtual Appliance because the GDPS Virtual Appliance protects only IBM z/VM and Linux on IBM Z platforms.

2.1.1 Maximizing application availability

There is only one way to protect applications from the loss of a single component (such as an IBM CICS region or a z/OS system), and that is to run multiple, failure-isolated copies. This infers an ability to share data at the record level, with integrity, and to dynamically route incoming work requests across the available servers. Parallel Sysplex uses hardware and software components to link individual systems together in a cluster. Because all systems in the sysplex are able to share the same resources and data, they appear as a single image to applications and users, while providing the ability to eliminate single points of failure.

Having more than one instance of an application within the sysplex can shield your users from both planned and unplanned outages. With Parallel Sysplex, parts of the cluster can be brought down for maintenance, upgrades, or any other type of outage, while the applications continue to be available on other members of the sysplex.

GDPS Continuous Availability further extends this concept with the ability to switch the workload between two sysplexes separated by virtually unlimited distance for both planned and unplanned outage situations.

Although it is not necessary to have a Parallel Sysplex before implementing most GDPS solutions, it is important to understand the role that Parallel Sysplex plays in supporting the continuous availability aspect of IT resilience. Technical information about implementing and using Parallel Sysplex is available in other IBM documentation, so it is not covered in this book.

2.1.2 Multisite sysplex considerations

The considerations for a multisite sysplex depend on whether you plan to run production systems in both sites at the same time or if all the production systems will be in a single site at any one time. Configurations where production systems can run in both sites at the same time are referred to as multisite workload configurations. Configurations where the production systems run together in one site or the other (but not split across multiple sites) are referred to as single-site workload configurations or sometimes as Active/Standby configurations. Other variations on this, where production systems are predominantly running at one site but where partially active systems or systems enabled only for queries are running at the other site, are still considered multisite workloads.

Terminology: This section is focused on a multisite sysplex, which is a single sysplex spread across multiple (typically two) sites, and how the workload is configured to run in those sites to provide near-continuous availability and metro distance DR.

Do not confuse it with the GDPS Continuous Availability solution that uses some of the same terminology, but is related to multiple sysplexes (limited to two, currently) and how the workload is configured between the two sysplexes, not within any single sysplex.

In a GDPS Continuous Availability environment, it is anticipated that each of the participating sysplexes are in an Active/Active configuration. This configuration provides local and continuous availability with GDPS Metro and GDPS Continuous Availability, which provides a solution for unlimited distance CA/DR. For more information about the GDPS Continuous Availability solution, see Chapter 7, “GDPS Continuous Availability solution” on page 203.

Several phrases are often used to describe variations of multisite workload. Brief definitions are included here for the more commonly implemented variations.

Active/Active This refers to a multisite workload configuration where z/OS systems are actively running in the same sysplex with active subsystems in more than one site at the same time. Typically this term also implies that applications take advantage of data sharing and dynamic workload routing in such a way that applications can freely move from one site to another. Finally, critical Parallel Sysplex resources are duplexed or replicated in such a way that if one site fails, the remaining site can recover workload within minutes after contending locks and communications timeouts clear. When combined with HyperSwap, an Active/Active configuration has the potential to provide near-continuous availability for applications even in the case of a site outage.

Active/Warm This refers to a multisite workload configuration that is similar to the Active/Active configuration, with production systems running at more than one site. The difference is that workload generally runs in one site at a time, with the systems in the other site simply IPLed without subsystems or other resources active.

This configuration is intended to save IPL time when moving workload between sites. It can be most effective for supporting the planned movement of workload because in many unplanned scenarios, the “warm” systems might also not survive.

Active/Query This refers to a multisite workload configuration that is quite close to the Active/Active configuration, but where workload at the second site is partitioned or restricted (possibly to queries only) in such a way as to limit impacts because of serialization, thereby protecting shared resources when delays because of distance between the sites is a concern. Again, depending on the configuration of the coupling facility structures (that is, whether they are duplexed across sites or basically in one site at a time), this configuration might provide value only for planned scenarios because in many unplanned scenarios the “query” or “hot standby” subsystems might not survive.

You can devise potentially many more configuration variations, but from a Parallel Sysplex and GDPS² perspective, all of these fall into either the single-site or the multisite workload category.

Single-site or multisite workload configuration

When first introduced, Parallel Sysplexes were typically contained within a single site. Extending the distance between the operating system images and the coupling facility has an impact on the response time of requests using that coupling facility (CF). Also, even if the systems sharing the data are spread across more than one site, all of the primary disk subsystems are normally contained in the same site, so a failure affecting the primary disks affects the systems in both sites. As a result, a multisite workload configuration does not, in itself, provide significantly greater availability than a single-site workload configuration during unplanned outages. To achieve the optimal benefit from a multisite workload configuration for planned outages, HyperSwap should be used; this enables you to move applications and their data from one site to the other nondisruptively.

More specifically, be careful when planning a multisite workload configuration if the underlying Parallel Sysplex cannot be configured to spread the important coupling facility structures across the sites and still achieve the required performance. As discussed later in this chapter and illustrated in Table 2-1 on page 46, the Coupling Link technology can support links upwards of 100 km with qualified dense wavelength division multiplexing (DWDM). However, this does not mean that your workload will tolerate even 1 km of distance between the z/OS images and the CF. Individual coupling operations will be delayed by 10 microseconds per kilometer. Although this time can be calculated, there is no safe way to predict the increased queuing effects caused by the increased response times and the degree of sharing that is unique to each environment. In other words, you will need to run your workload with connections at distance to evaluate the tolerance and impacts of distance.

The benefits of a multisite workload come with more complexity. This must be taken into account when weighing the benefits of such configurations.

CF structure duplexing

Two mechanisms exist for duplexing CF structures.

•User-Managed Structure Duplexing is supported for use only with DB2 group buffer pool (GBP) structures. Duplexing the GBP structures can significantly reduce the time to recover the structures following a CF or CF connectivity failure. The performance impact of duplexing the GBP structures is small. Therefore, it is best to duplex the GBP structures used by a production DB2 data sharing group.

•System-Managed Coupling Facility Structure Duplexing (referred to as SM duplexing) provides a general purpose, hardware-assisted and easy-to-use mechanism for duplexing CF structures. This feature is primarily intended to allow installations to do data sharing without having to have a failure-isolated CF. However, the design of SM duplexing means that having the CFs a significant distance (kilometers) apart can have a dramatic impact on CF response times for the duplexed structures, and thus your applications, and needs careful planning and testing.

In addition to the response time question, there is another consideration relating to the use of cross-site SM Duplexing. Because communication between the CFs is independent of the communication between mirrored disk subsystems, a failure that results in remote copy being suspended would not necessarily result in duplexing being suspended at the same instant. In case of a potential disaster, you want the data in the “remote” CF to be frozen in time at the same instant the “remote” disks are frozen, so you can restart your applications from the moment of failure.

If you are using duplexed structures, it might seem that you are guaranteed to be able to use the duplexed instance of your structures if you must recover and restart your workload with the frozen secondary copy of your disks. However, this is not always the case. There can be rolling disaster scenarios where before, after, or during the freeze event, an interruption occurs (perhaps failure of CF duplexing links) that forces CFRM to drop out of duplexing. There is no guarantee that the structure instance in the surviving site is the one that will be kept. It is possible that CFRM keeps the instance in the site that is about to totally fail. In this case, there will not be an instance of the structure in the site that survives the failure.

Furthermore, during a rolling disaster event, if you freeze secondary disks at a certain point but continue to update the primary disks and the CF structures, then the CF structures, whether duplexed or not, will not be usable if it is necessary to recover on the frozen secondary disks. This depends on some of your installation’s policies.

To summarize, if there is a surviving, accessible instance of application-related structures, this might or might be consistent with the frozen secondary disks and therefore might or might not be usable. Furthermore, depending on the circumstances of the failure, even with structures duplexed across two sites, you are not 100% guaranteed to have a surviving, accessible instance of the application structures. Therefore, you must have procedures in place to restart your workloads without the structure contents.

For more information, see the white paper titled System-Managed CF Structure Duplexing, GM13-0103.

2.2 Data consistency

In an unplanned outage or disaster situation the ability to perform a database restart, rather than a database recovery, is essential to meet the recovery time objective (RTO) of many businesses, which typically are less than an hour. Database restart allows starting a database application (as you would follow a database manager abend or system abend) without having to restore it from backups. Database recovery is normally a process measured in many hours (especially if you have hundreds or thousands of databases to recover), and it involves restoring the last set of image copies and applying log changes to bring the databases up to the point of failure.

But, there is more to consider than simply the data for one data manager. What if you have an application that updates data in IMS, DB2, and VSAM? If you need to do a recover for these, will your recovery tools allow you to recover them to the same point in time and to the level of granularity that ensures that either all or none of the updates made by one transaction are recovered? Being able to do a restart rather than a recover avoids these issues.

Data consistency across all copies of replicated data, spread across any number of storage subsystems, and in some cases across multiple sites, is essential to providing data integrity and the ability to perform a normal database restart if there is a disaster.

2.2.1 Dependent write logic

Database applications commonly ensure the consistency of their data by using dependent write logic regardless of whether data replication techniques are being used. Dependent write logic states that if I/O B must logically follow I/O A, so B will not be started until A completes successfully. This logic would normally be included in all software to manage data consistency. There are numerous instances within the software subsystem, such as databases, catalog/VTOC, and VSAM file updates, where dependent writes are issued.

As an example, in Figure 2-1, LOG-P is the disk subsystem containing the database management system (DBMS) logs, and DB-P is the disk subsystem containing the DBMS data segments. When the DBMS updates a database, it also performs the following process:

1. Write an entry to the log about the intent of the update.

2. Update the database.

3. Write another entry to the log indicating that the database was updated.

If you will be doing a remote copy of these volumes, be sure that all of the updates are mirrored to the secondary disks.

Figure 2-1 Need for data consistency

It is unlikely that all the components in a data center will fail at the same instant, even in the rare case of a full data center outage. The networks might fail first, or possibly one disk subsystem, or any other component in unpredictable combinations. No matter what happens, the remote image of the data must be managed so that cross-volume and subsystem data consistency is preserved during intermittent and staged failures that might occur over many seconds, even minutes. Such a staged failure is generally referred to as a rolling disaster.

Data consistency during a rolling disaster is difficult to achieve for synchronous forms of remote copy because synchronous remote copy is entirely implemented within disk subsystem pairs.

For example, in Figure 2-1 on page 18 the synchronously mirrored data sets are spread across multiple disk subsystems for optimal performance. The volume containing the DBMS log on the LOG-P disk subsystem in Site1 is mirrored to the secondary volume in the LOG-S disk subsystem in Site2, and the volume containing the data segments in the DB-P disk subsystem in Site1 is mirrored to the secondary volume in the DB-S disk subsystem in Site2.

Assume that a disaster is in progress in Site1, causing the link between DB-P and DB-S to be lost before the link between LOG-P and LOG-S is lost. With the link between DB-P and DB-S lost, a write sequence of (1), (2), and (3) might be completed on the primary devices (depending on how the remote copy pair was defined) and the LOG writes (1) and (3) would be mirrored to the LOG-S device, but the DB write (2) would not have been mirrored to DB-S. A subsequent DBMS restart using the secondary copy of data in Site2 would clean up in-flight transactions and resolve in-doubt transactions, but the missing DB write (2) would not be detected. In this example of the missing DB write the DBMS integrity was compromised.³

We discuss data consistency for synchronous remote copy in more detail in “Metro Mirror data consistency” on page 24.

For the two IBM asynchronous remote copy offerings, the consistency of the volumes in the recovery site is ensured because of the way these offerings work. This is described further in 2.4.3, “Global Mirror” on page 32 and “XRC data consistency” on page 28.

For GDPS Continuous Availability, which relies on asynchronous software replication as opposed to the use of Metro Mirror, XRC, or Global Mirror, consistency is managed within the replication software products. For more information, see 2.4.5, “IBM software replication products” on page 36.

2.3 Synchronous versus asynchronous data transfer

Synchronous data transfer and asynchronous data transfer are two methods used to replicate data. Before selecting a data replication technology, you must understand the differences between the methods used and the business impact.

Terminology: In this book, we continue to use the term Extended Remote Copy (XRC) when referring to the asynchronous disk copy technology that is managed by the z/OS System Data Mover (SDM). The rebranded name of the IBM disk storage implementation is z/OS Global Mirror, which is used specifically when referring to the IBM implementation on the IBM Enterprise Storage Server and the IBM DS8000 family of products.

When using synchronous data transfer, as shown in Figure 2-2 by using IBM Metro Mirror, the application writes are first written to the primary disk subsystem (1) and then forwarded on to the secondary disk subsystem (2). When the data has been committed to both the primary and secondary disks (3), an acknowledgment that the write is complete (4) is sent to the application. Because the application must wait until it receives the acknowledgment before executing its next task, there will be a slight performance impact. Furthermore, as the distance between the primary and secondary disk subsystems increases, the write I/O response time increases because of signal latency⁴.

The goals of synchronous replication are zero or near-zero loss of data, and quick recovery times from failures that occur at the primary site. Synchronous replication can be costly because it requires high-bandwidth connectivity.

One other characteristic of synchronous replication is that it is an enabler for nondisruptive switching between the two copies of the data that is known to be identical.

Figure 2-2 Synchronous versus asynchronous storage replication

With asynchronous replication (see Figure 2-2) with either XRC or Global Mirror, the application writes to the primary disk subsystem (1) and receives an acknowledgment that the I/O is complete as soon as the write is committed on the primary disk (2). The write to the secondary disk subsystem is completed in the background. Because applications do not have to wait for the completion of the I/O to the secondary device, asynchronous solutions can be used at virtually unlimited distances with negligible impact to application performance. In addition, asynchronous solutions do not require as much bandwidth as the synchronous solutions.

With software-based asynchronous replication, as used in a GDPS Continuous Availability environment, the process is similar to that described for XRC. Data is captured from the database subsystem logs at the source copy when a transaction commits data to the database. That captured data is then sent asynchronously to a second location where it is applied to the target copy of the database in near real time.

When selecting a data replication solution, perform a business impact analysis to determine which solution meets the businesses requirements while ensuring your service delivery objectives continue to be met; see Figure 2-3. The maximum amount of transaction loss that is acceptable to the business (RPO) is one measurement used to determine which remote copy technology should be deployed. If the business is able to tolerate loss of committed transactions, then an asynchronous solution will likely provide the most cost-effective solution. When no loss of committed transactions is the objective, then synchronous remote copy must be deployed. In this case, the distance between the primary and secondary remote copy disk subsystems, and the application’s ability to tolerate the increased response times, must be factored into the decision process.

Figure 2-3 Business impact analysis

Many enterprises have both business and regulatory requirements to provide near-continuous data availability, without loss of transactional data, while protecting critical business data if there is a wide-scale disruption. This can be achieved by implementing three-copy (sometimes referred to as 3-site) mirroring solutions that use both synchronous and asynchronous replication technologies. Synchronous solutions are used to protect against the day-to-day disruptions with no loss of transactional data. Asynchronous replication is used to provide out-of-region data protection, with some loss of committed data, for wide-spread disruptions. The key is to ensure cross-disk subsystem data integrity and data consistency is maintained through any type of disruption.

For more information about three-copy replication solutions see Chapter 9, “Combining local and metro continuous availability with out-of-region disaster recovery” on page 267.

2.4 Data replication technologies

The two primary ways to make your data available following a disaster are as follows:

•By using a form of tape-based backup

•By using data replication to a recovery site (also known as remote copy)

This can be hardware-based or software-based replication.

For companies with an RTO of a small number of hours or less, a tape-based solution is unlikely to be acceptable, because it is simply not possible to restore all your volumes and apply all database logs in the time available. Therefore, we are assuming that if you are reading this book you already have, or are planning to implement, some form of data replication technology.

Remotely copying your data eliminates the time that would be required to restore the data from tape and addresses the problem of having to recover data that is generated between the last backup of an application system and the time when the application system fails. Depending on the technology used, remote copy implementations provide a real-time (or near real-time) continuing copy of data between a source and a target.

IBM offers three basic technologies to provide this type of mirroring for disk storage:

•Metro Mirror: Updates to the primary volumes are synchronously mirrored to the remote volumes and all interactions related to this activity are done between the disk subsystems. Multi-target Metro Mirror (MTMM) is based on Metro Mirror and allows multiple secondary copies from the same primary.

•XRC: The task of retrieving the updates from the primary disk subsystem and applying those changes to the secondary volumes is done by a z/OS component named the System Data Mover (SDM).

•Global Mirror: This offering mirrors the data asynchronously. However, unlike XRC, all interactions are done between the disk subsystems rather than by an SDM.

These technologies are described more fully in the following sections. For an even more detailed explanation of the remote copy technologies described in the following sections, see IBM System Storage DS8000: Copy Services for IBM System z, SG24-6787.

IBM also offers several software-based replication products. Unlike the technologies listed for mirroring disk storage (which are application independent), most software replication products are specific to the database source and target in use. The following products are currently supported in a GDPS Continuous Availability environment:

•IBM InfoSphere Data Replication for IMS for z/OS

•IBM InfoSphere Data Replication for VSAM for z/OS

•IBM InfoSphere Data Replication for DB2 for z/OS

These products are introduced in the following sections. For more information, see IBM Documentation.

2.4.1 Metro Mirror

Metro Mirror ensures that after the volume pair has been established and remains synchronized that the secondary volume will always contain exactly the same data as the primary. The IBM implementation of Metro Mirror provides synchronous data mirroring at distances up to 300 km (and potentially even greater distances, after technical review and approval).

Important: Always use caution when considering long distances. When we say that something is “supported up to xx km,” it means that the technology will work at that distance if you have qualified cross-site connectivity technology that supports that protocol. See 2.9, “Cross-site connectivity considerations” on page 44 for more details.

You must also consider the impact the increased response time will have on your applications. Some applications can tolerate the response time increase associated with cross-site distances of 100 km, but the same distance in another installation might make it impossible for the applications to deliver acceptable levels of performance.

So, carefully evaluate the projected response time impact, and apply that increase to your environment to see if the result is acceptable. Your vendor storage specialist can help you determine the disk response time impact of the proposed configuration.

Recovery point objective with Metro Mirror

If you have a recovery point objective of zero, meaning zero data loss, Metro Mirror is the only IBM remote copy option that can achieve that objective.

That is not to say that you will always have zero data loss if using Metro Mirror. Zero data loss means that there will never be updates made to the primary disks that are not mirrored to the secondaries. The only way to ensure that zero data loss is to immediately stop all update activity to the primary disks if the remote copy relationship ceases to exist (if you lose connectivity between the primary and secondary devices, for example).

Thus, choosing to have zero data loss really means that you must have automation in place that will stop all update activity in the appropriate circumstances. It also means that you accept the possibility that the systems can be stopped for a reason other than a real disaster; for example, if the failure was caused by a broken remote copy link rather than a fire in the computer room. Completely avoiding single points of failure in your remote copy configuration, however, can reduce the likelihood of such events to an acceptably low level.

Supported platforms with Metro Mirror

Metro Mirror replication is supported for any IBM or non-IBM disk subsystem that supports the Metro Mirror architecture, specifically the Freeze/Run capability. Metro Mirror can mirror fixed-block (FB) devices that are used by IBM Z and platforms other than IBM Z and CKD devices that are used by mainframe operating systems, such as IBM z/OS, IBM z/VM, and IBM z/VSE.

Not all operating systems necessarily support an interface to control the remote copy function. However, the Metro Mirror function for FB devices can be controlled from a connected z/OS system if the disk storage subsystem supports the zFBA feature (as described in “FB disk management prerequisites” on page 78 for GDPS Metro, and in 4.3.1, “FB disk management prerequisites” on page 132 for GDPS Metro HyperSwap Manager).

With current implementations of Metro Mirror, the primary and secondary disk subsystems must be from the same vendor, although vendors (including IBM) often support Metro Mirror between different disk subsystem models of their own product lines. This can help with migrations and technology upgrades.

Distance with Metro Mirror

The maximum distance supported for IBM Metro Mirror is 300 km (without an RPQ). Typical GDPS Metro and GDPS Metro HyperSwap Manager configurations are limited to distances less than this because of Coupling Link configurations. For more information about the supported distances for these Parallel Sysplex connections, see 2.9.3, “Coupling links” on page 46. You will also need to contact other storage vendors to understand the maximum distances supported by their Metro Mirror compatible mirroring implementations.

Performance with Metro Mirror

As the distance between your primary and secondary disk subsystems increases, the time it takes for your data to travel between the subsystems also increases. This might have a performance impact on your applications because they cannot proceed until the write to the secondary device completes.

Be aware that as response times increase, link use also increases. Depending on the type and number of Metro Mirror links you configured, more links and the use of Parallel Access Volumes (PAVs) might help to provide improved response times at longer distances.

Disk Magic, a tool available to your IBM storage specialist, can be used to predict the impact of various distances, link types, and link numbers for IBM disk implementation. We consider access to the information provided by such a tool essential to a GDPS project using Metro Mirror.

Metro Mirror connectivity

Connectivity between the primary and secondary disk subsystems can be provided by direct connections between the primary and secondary disk subsystems, by IBM FICON switches, by DWDMs, and by channel extenders.

The type of intersite connection (dark fiber or telecommunications link) available determines the type of connectivity you use: telecommunication links can be used by channel extenders, and the other types of connectivity require dark fiber.

For more information about connectivity options and considerations for IBM Z, see the most recent version of IBM System z Connectivity Handbook, SG24-5444.

Metro Mirror data consistency

When using Metro Mirror, the following sequence of actions occurs when an update I/O is issued to a primary volume:

1. Write to the primary volume (disk subsystem cache and non-volatile store (NVS)).

Your production system writes data to a primary volume and a cache hit occurs.

2. Write to the secondary (disk subsystems cache and NVS).

The primary disk subsystem’s microcode then sends the update to the secondary disk subsystem’s cache and NVS.

3. Signal write is complete on the secondary.

The secondary disk subsystem signals write complete to the primary disk subsystem when the updated data is in its cache and NVS.

4. Post I/O is complete.

When the primary disk subsystem receives the write complete from the secondary disk subsystem, it returns Device End (DE) status to your application program. Now, the application program can continue its processing and move on to any dependent writes that might have been waiting for this one to complete.

However, Metro Mirror on its own provides this consistency only for a single write. Guaranteeing consistency across multiple logical subsystems and even across multiple disk subsystems requires automation on top of the Metro Mirror function. This is where GDPS comes in with freeze automation, which is described in the following sections:

•3.1.1, “Protecting data integrity and data availability with GDPS Metro” on page 55 for GDPS Metro

•4.1.1, “Protecting data integrity and data availability with GDPS HM” on page 115 for GDPS Metro HyperSwap Manager

Metro Mirror transparent disk swap

Because under normal conditions the primary and secondary disks are known to be identical, with Metro Mirror it is possible to swap to using the secondary copy of the disks in a manner that is transparent to applications that are using those disks. This task is not simple. It requires tight control and coordination across many devices that are shared by multiple systems in a timely manner. GDPS Metro and GDPS Metro HyperSwap Manager automation, with support provided in z/OS, z/VM, and specific distributions of Linux on IBM Z, provide such a transparent swap capability and it is known as HyperSwap.

HyperSwap is a key availability-enabling technology. For more information about GDPS HyperSwap, see the following sections:

•“GDPS HyperSwap function” on page 60 for GDPS Metro

•“GDPS HyperSwap function” on page 119 for GDPS Metro HyperSwap Manager

•“GDPS HyperSwap function” on page 252 for the GDPS Virtual Appliance.

Addressing z/OS device limits in a GDPS Metro environment

As clients implement IT resiliency solutions that rely on multiple copies of data, more are finding that the z/OS limit of 64K (65,536) devices is limiting their ability to grow or even to take advantage of technologies like HyperSwap. Clients can consolidate data sets to fewer larger volumes, but even with that, there are times when this might not make operational sense for all types of data.

As a result, z/OS introduced the concept of an “alternate subchannel set,” which can include the definition for certain types of disk devices. An alternate subchannel set provides another set of 64K devices for the following device types:

•Parallel Access Volume (PAV) alias devices

•Metro Mirror secondary devices (defined as 3390D)

•FlashCopy target devices

Including PAV alias devices in an alternate subchannel set is transparent to GDPS and is common practice for current GDPS Metro HyperSwap Manager and GDPS Metro environments.

Support is included in GDPS Metro HyperSwap Manager and GDPS Metro to allow definition of Metro Mirror secondary devices in an alternate subchannel set. With this feature, GDPS can support Metro Mirror configurations with nearly 64K device pairs. GDPS Metro HyperSwap Manager allows the secondary devices for z/OS systems in the GDPS sysplex, as well as for managed z/VM systems (and guests) to be defined in an alternate subchannel set. GDPS Metro only supports alternate subchannel sets for z/OS systems in the sysplex.

There are limitations to keep in mind when considering the use of this feature. Specifically, enhanced support is provided in IBM zEnterprise 196 or 114 servers that allow the Metro Mirror secondary copy of the IPL, IODF, and stand-alone dump devices for z/OS systems in the GDPS sysplex to also be defined in the alternate subsystem set (MSS1).

With this support, a client can define all z/OS Metro Mirrored devices belonging to the GDPS sysplex uniformly with their secondary in the alternate subchannel set. This removes the necessity to define IPL, IODF, and stand-alone dump devices differently in MSS0.

The use of alternate subchannel sets for the FlashCopy target devices that are managed by GDPS is not necessary because no requirement exists to define UCBs for these devices (they can be in any subchannel set or not defined at all). This issue contributes to the ability of GDPS to support Metro Mirror configurations with nearly 64 K device pairs because no device numbers or UCBs are used by the FlashCopy target devices.

Multi-Target Metro Mirror

Note: There is no requirement to define UCBs for the FlashCopy target devices that are managed by GDPS.

Multi-target PPRC, also known as MT-PPRC, is based on the PPRC (Metro Mirror) technology. The MT-PPRC architecture allows multiple secondary, synchronous, or asynchronous secondary devices for a single primary device.

Multi-Target Metro Mirror (MTMM) is a specific topology that is based on the MT-PPRC technology, which allows maintaining two synchronous Metro Mirror secondary targets (two Metro Mirror legs) from a single primary device. Each leg is tracked and managed independently. Consider the following points:

•Data is transferred to both targets in parallel.

•Pairs operate independent of each other.

•Pairs may be established, suspended or removed separately.

•A replication problem on one leg does not affect the other leg.

•HyperSwap is possible on either leg.

MTMM provides all the benefits of Metro Mirror plus has the extra protection of a second synchronous leg.

Summary

Metro Mirror synchronous replication gives you the ability to remote copy your data in real time, with the potential for no data loss at the recovery site. Metro Mirror is your only choice if your RPO is zero. Metro Mirror is the underlying remote copy capability that the GDPS Metro, GDPS Metro HyperSwap Manager, and GDPS Virtual Appliance offerings are built on.

2.4.2 XRC (z/OS Global Mirror)

The Extended Remote Copy (XRC) solution consists of a combination of software and hardware functions. XRC maintains a copy of the data asynchronously at a remote location. It involves a System Data Mover (SDM) that is a component of the z/OS operating system working with supporting microcode in the primary disk subsystems. One or more SDMs running in the remote location are channel-attached to the primary disk subsystems. They periodically pull the updates from the primary disks, sort them in time stamp order, and apply the updates to the secondary disks. This provides point-in-time consistency for the secondary disks. The IBM implementation of XRC is branded as z/OS Global Mirror. This name is used interchangeably with XRC in many places, including in this book.

Recovery point objective

Because XRC collects the updates from the primary disk subsystem some time after the I/O has completed, there will always be an amount of data that has not been collected when a disaster hits. As a result, XRC can be used only when your recovery point objective is greater than zero (0). The amount of time that the secondary volumes lag behind the primary depends mainly on the following items:

•The performance of the SDM

The SDM is responsible for collecting, sorting, and applying all updates. If insufficient capacity (MIPS, storage, and I/O resources) is available to the SDM, longer delays collecting the updates from the primary disk subsystems will occur, causing the secondaries to drift further behind during peak times.

•The amount of bandwidth

If there is insufficient bandwidth to transmit the updates in a timely manner, contention on the remote copy links can cause the secondary volumes to drift further behind at peak times.

•The use of device blocking

Enabling blocking for devices results in I/O write activity to be paused for devices with high update rates. This allows the SDM to offload the write I/Os from cache, resulting in a smaller RPO.

•The use of write pacing

Enabling write pacing for devices with high write rates results in delays being inserted into the application’s I/O response to prevent the secondary disk from falling behind. This option slows the I/O activity, resulting in a smaller RPO; it is less disruptive than device blocking. Write pacing, if wanted, can be used in conjunction with the z/OS Workload Manager (WLM).

Because XRC is able to pace the production writes, it is possible to provide an average RPO of 1 to 5 seconds and maintain a guaranteed maximum RPO, if sufficient bandwidth and resources are available. However, it is possible that the mirror will suspend, or that production workloads will be impacted, if the capability of the replication environment is exceeded because of either of the following reasons:

•Unexpected peaks in the workload

•An underconfigured environment

To minimize the lag between the primary and secondary devices, you must have sufficient connectivity and a well-configured SDM environment. For more information about planning for the performance aspects of your XRC configuration, see the chapter about capacity planning in DFSMS Extended Remote Copy Installation Planning Guide, GC35-0481.

Supported platforms

There are two aspects to “support” for XRC. The first aspect is the ability to append a time stamp to all write I/Os so the update can subsequently be remotely copied by an SDM. This capability is provided in the following operating systems:

•Any supported release of z/OS

•Linux on IBM Z when using CKD format disks

•z/VM with STP and appropriate updates (contact IBM support for the more information)

Note: XRC does not support FB devices.

It is also possible to use XRC to remote copy volumes that are being used by IBM Z operating systems that do not time stamp their I/Os. However, in this case, it is not possible to provide consistency across multiple LSSs. The devices must all be in the same LSS to provide consistency. For more information, see the section about understanding the importance of timestamped writes in the most recent revision of z/OS DFSMS Advanced Copy Services manual.

The other aspect is which systems can run the System Data Mover function. In this case, the only system that supports this is any supported release of z/OS.

Distance and performance

Because XRC is an asynchronous remote copy capability, the amount of time it takes to mirror the update to the remote disks does not affect the response times to the primary volumes. As a result, virtually unlimited distances between the primary and secondary disk subsystems are supported, with minimal impact to the response time of the primary devices.

Connectivity

If the recovery site is within the distance supported by a direct FICON connection, switches/directors, or DWDM, then you can use one of these methods to connect the SDM system to the primary disk subsystem. Otherwise, you must use channel extenders and telecommunication lines.

XRC data consistency

XRC uses time stamps and consistency groups to ensure that your data is consistent across the copy operation. When an XRC pair is established, the primary disk subsystem notifies all systems with a logical path group for that device, and the host system DFSMSdfp software starts to time stamp all write I/Os to the primary volumes. This is necessary to provide data consistency.

XRC is implemented in a cooperative way between the disk subsystems in the primary site and the SDMs, which typically are in the recovery site. The data flow includes the following process (see Figure 2-4):

1. The primary system writes to the primary volumes.

2. Primary disk subsystem posts I/O complete.

Your application I/O is signaled completed when the data is written to the primary disk subsystem's cache and NVS. Channel End (CE) and Device End (DE) are returned to the writing application. These signal that the updates have completed successfully. A time stamped copy of the update is kept in the primary disk subsystems cache. Dependent writes can proceed now.

Figure 2-4 Data flow when using z/OS Global Mirror

3. Offload data from primary disk subsystem to SDM.

Every so often (several times a second), the SDM requests each of the primary disk subsystems to send any updates that have been received. The updates are grouped into record sets, which are asynchronously offloaded from the cache to the SDM system.

Within the SDM, the record sets, perhaps from multiple primary disk subsystems, are processed into consistency groups (CGs) by the SDM. The CG contains records that have their order of update preserved across multiple disk subsystems participating in the same XRC session. This preservation of order is vital for dependent write I/Os such as databases and logs. The creation of CGs guarantees that XRC applies the updates to the secondary volumes with update sequence integrity for any type of data.

4. Write to secondary.

When a CG is formed, it is written from the SDM’s buffers to the SDM’s journal data sets. Immediately after the CG has been hardened on the journal data sets, the records are written to their corresponding secondary volumes. Those records are also written from the SDM’s buffers.

5. The XRC control data set is updated to reflect that the records in the CG have been written to the secondary volumes.

Coupled Extended Remote Copy

XRC is an effective solution for mirroring many thousands of volumes. However, a single SDM instance can manage replication only for a finite number of devices. You can use the Coupled XRC (CXRC) support to extend the number of devices for added scalability.

CXRC provides the ability to couple multiple SDMs running in the same or different LPARs together into a master session. CXRC coordinates the consistency of data across coupled sessions in a master session, allowing recovery of data for all the volumes in the coupled sessions to a consistent time.

If the sessions are not coupled, recoverable consistency is provided only within each individual SDM, not across SDMs. All logically related data (for example, all the data used by a single sysplex) should be copied by one SDM, or a single group of coupled SDMs.

Multiple Extended Remote Copy

In addition to the additional capacity enabled by Coupled XRC, there is also an option called Multiple XRC (MXRC). MXRC allows you to have up to 20 SDMs in a single LPAR, of which 13 can be coupled together into a cluster. These can then be coupled to SDMs or clusters running in other LPARs through CXRC. Up to 14 SDM clusters can then be coupled together, allowing for an architectural limit of coupled consistency across 182 SDMs.

Multiple Reader

XRC Multiple Reader (also known as Extended Reader) allows automatic load balancing over multiple readers in an XRC environment. A reader is a task that is responsible for reading updates from a primary LSS. Depending on the update rate for the disks in an LSS, a reader task might not be able to keep up with pulling these updates and XRC could fall behind. The function can provide increased parallelism through multiple SDM readers and improved throughput for XRC remote mirroring configurations.

It can allow XRC to do these tasks:

•Better sustain peak workloads for a given bandwidth

•Increase data currency over long distances

•Replicate more capacity while maintaining the same recovery point objective

•Help avoid potential slowdowns or suspends caused by I/Os that are not being processed fast enough

Before the introduction of Multiple Readers, you needed to plan carefully to balance the primary volume update rate versus the rate at which the SDM could “drain” the data. If the drain rate was unable to keep up with the update rate, there was a potential to affect application I/O performance.

GDPS XRC can use this multireader function, and thus provide these benefits.

Extended Distance FICON

Extended Distance FICON is an improvement focused on providing XRC clients a choice of selecting less complex channel extenders built on frame forwarding technology rather than channel extenders that need to emulate XRC read commands to optimize the channel transfer through the channel extender to get the best performance.

Extended distance FICON enables mirroring over longer distances without substantial reduction of effective data rate. It can significantly reduce the cost of remote mirroring over FICON for XRC.

Extended Distance FICON is supported only on the IBM Z10 and later servers, and the IBM System Storage DS8000 disk subsystems.

SDM offload to zIIP

The System Data Mover (SDM) is allowed to run on one of the specialty engines that are referred to as a IBM Z Integrated Information Processor (zIIP), which are offered on IBM Z9 and later processors. By offloading some of the SDM workload to a zIIP, better price performance and improved use of resources at the mirrored site can be achieved.

One benefit is that DFSMS SDM processing is redirected to a zIIP processor, which can lower server use at the mirrored site. Another benefit is that with an investment of a zIIP specialty processor at the mirrored site, you might now be able to cost-justify the investment in and implementation of a disaster recovery solution that can lower server use at the mirrored site, while at the same time reduce software and hardware fees.

Scalability in a GDPS XRC environment

As clients implement IT resiliency solutions that rely on multiple copies of data, more are finding that the z/OS limit of 64K (65,536) devices is limiting their ability to grow. Clients can consolidate data sets to fewer larger volumes, but even with that, there are times when this might not make operational sense for all types of data.

In an XRC replication environment, the SDM system or systems are responsible for performing replication. An SDM system will need to address a small number of XRC infrastructure volumes plus the primary and secondary XRC devices that it is responsible for and possibly the FlashCopy target devices. This means that each SDM system can manage XRC replication for up to roughly 21K primary devices, assuming target FlashCopy devices are also defined to the SDM system. However, as described in “Multiple Extended Remote Copy” on page 30 and “Coupled Extended Remote Copy” on page 30, it is possible to run multiple clustered and coupled SDMs across multiple z/OS images. As you can see, you have more than ample scalability.

Also, it is possible in a GDPS XRC environment to use “no UCB” FlashCopy, in which case you do not need to define the FlashCopy target devices to the SDM systems. This configuration further increases the number of devices each SDM system can handle.

However, UCBs for the FlashCopy target devices must be defined in a separate LPAR somewhere in the environment to bring these devices online and perform an XRC recover operation.

Another option is to define the FlashCopy target devices in MSS2 in the SDM systems. GDPS provides the capability to swap the FlashCopy target devices that are defined in MSS2 into the active subchannel set on the controlling system and then, perform an XRC recover operation there. This process eliminates the need for a special LPAR with addressibility to the FlashCopy target devices to perform the XRC recover while still maximizing the number of devices that each SDM system can manage.

Hardware prerequisites

XRC requires, on IBM disk subsystems, that primary IBM disk subsystems have the IBM z/OS Global Mirror feature code installed. It is not necessary for the primary and secondary disks to be the same device type, although they must both have the same geometry and the secondary device must be at least as large as the primary device.

XRC is also supported on disk subsystems from other vendors that have licensed and implemented the interfaces from IBM, and it is possible to run with a heterogeneous environment with multiple vendors’ disks. Target XRC volumes can also be from any vendor, even if the target subsystem does not support XRC, thus enabling investment protection.

Note: Keep in mind that at some point, you might have to remote copy from the recovery site back to the production site. GDPS XRC defines procedures and provides specific facilities for switching your production workload between the two regions.

To reverse the XRC direction, the IBM z/OS Global Mirror feature code must also be installed in the secondary disk subsystems that will become primary when you reverse the replication direction. To reverse the replication direction, the primary and secondary devices must be the same size.

In summary, it makes sense to maintain a symmetrical configuration across both primary and secondary devices.

An extra requirement is that all the systems writing to the primary volumes must be connected to the same STP network. It is not necessary for them all to be in the same sysplex, simply that they all share the same time source.

Summary

XRC offers a proven disk mirroring foundation for an enterprise disaster recovery solution that provides large scalability and good performance.

XRC is a preferred solution if your site has these requirements:

•Extended distances between primary and recovery site

•Consistent data, at all times, in the recovery site

•Ability to maintain the highest levels of performance on the primary system

•Can accept a small time gap between writes on the primary system and the subsequent mirroring of those updates on the recovery system

•Scale with performance to replicate a large number of devices with consistency

•Run with a heterogeneous environment with multiple vendors’ disks

2.4.3 Global Mirror

Global Mirror is an asynchronous remote copy technology that enables a 2-site disaster recovery and backup solution for the IBM Z and distributed systems environments. Using asynchronous technology, Global Mirror operates over Fibre Channel Protocol (FCP) communication links and maintains a consistent and restartable copy of data at a remote site that can be located at virtually unlimited distances from the local site.

Global Mirror works by using three sets of disks, as shown in Figure 2-5 on page 33. Global Copy (PPRC Extended Distance, or PPRC-XD), which is an asynchronous form of PPRC (Metro Mirror), is used to continually transmit data from the primary (A) to secondary (B) volumes, using the out-of-sync bitmap to determine what needs to be transmitted. Global Copy does not guarantee that the arriving writes at the local site are applied to the remote site in the same sequence. Therefore, Global Copy by itself does not provide data consistency.

If there are multiple physical primary disk subsystems, one of them is designated as the Master and is responsible for coordinating the creation of consistency groups. The other disk subsystems are subordinates to this Master.

Each primary device maintains two bitmaps. One bitmap tracks incoming changes. The other bitmap tracks which data tracks must be sent to the secondary before a consistency group can be formed in the secondary.

Periodically, depending on how frequently you want to create consistency groups, the Master disk subsystem will signal the subordinates to pause application writes and swap the change recording bitmaps. This identifies the bitmap for the next consistency group. While the I/Os are paused in all LSSs in the Global Mirror session, any dependent writes will not be issued because the CE/DE has not been returned. This maintains consistency across disk subsystems. The design point to form consistency groups is 2 - 3 ms.

After the change recording bitmaps are swapped, write I/Os are resumed and the updates that remain on the Global Mirror primary for the current consistency group will be drained to the secondaries. After all of the primary devices have been drained, a FlashCopy command is sent to the Global Mirror secondaries (B), which are also the FlashCopy source volumes, to perform a FlashCopy to the associated FlashCopy target volumes (C). The tertiary or C copy is a consistent copy of the data.

Remember, the B volumes are secondaries to Global Copy and are not guaranteed to be consistent. The C copy provides a “golden copy” which can be used to make the B volumes consistent in case recovery is required. Immediately after the FlashCopy process is logically complete, the primary disk subsystems are notified to continue with the Global Copy process. For more information about FlashCopy, see 2.6, “FlashCopy” on page 38.

After Global Copy is resumed, the secondary or B volumes are inconsistent. However, if recovery is needed, the FlashCopy target volumes provide the consistent data for recovery.

All this processing is done under the control of microcode in the disk subsystems. You can have up to 16 mirrored pairs in a pool, one of which is the Master primary and secondary pair (see Figure 2-5).

Figure 2-5 Global Mirror: How it works

Recovery point objective

Because Global Mirror is an asynchronous remote copy solution, there will always be an amount of data that must be re-created following a disaster. As a result, Global Mirror can be used only when your recovery point objective (RPO) requirement is greater than zero (0). The amount of time that the FlashCopy target volumes lag behind the primary depends mainly on the following items:

•How often consistency groups are built

This is controlled by the installation and can be specified in terms of seconds.

•The amount of bandwidth

If there is insufficient bandwidth to transmit the updates in a timely manner, contention on the remote copy links can cause the secondary volumes to drift further behind at peak times. The more frequently you create consistency groups, the more bandwidth you will require.

Although it is not unusual to have an average RPO of 5 - 10 seconds with Global Mirror, it is possible that the RPO will increase significantly if production write rates exceed the available resources. However, unlike z/OS Global Mirror, the mirroring session will not be suspended and the production workload will not be impacted if the capacity of the replication environment is exceeded because of unexpected peaks in the workload or an underconfigured environment.

To maintain a consistent lag between the primary and secondary disk subsystems, you must have sufficient connectivity. For more information about planning for the performance aspects of your Global Mirror configuration, see IBM DS8870 Copy Services for IBM z Systems, SG24-6787.

Supported platforms

The IBM Enterprise Storage Server and DS8000 families of disk subsystems support Global Mirror. For other enterprise disk vendors, contact your vendor to determine whether they support Global Mirror and if so, on which models.

Distance and connectivity

Because Global Mirror is an asynchronous remote copy capability, the amount of time it takes to mirror the update to the remote disks does not affect the response times to the primary volumes. As a result, virtually unlimited distances between the primary and secondary disk subsystems are supported.

Global Mirror requires FCP links on the disk subsystems. If the recovery site is within the distance supported by FCP direct connect, switches, or DWDM, you can use one of those methods to connect the primary and secondary disk subsystems. Otherwise, you must use network extension technology that supports FCP links.

Addressing z/OS device limits in a GDPS GM environment

To this end, z/OS introduced the concept of an alternate subchannel set, which can include the definition for certain types of disk devices. An alternate subchannel set provides another set of 64K devices for the following device types:

•Parallel Access Volume (PAV) alias devices

•Metro Mirror secondary devices (defined as 3390D)

•FlashCopy target devices

Including PAV alias devices in an alternate subchannel set is transparent to GDPS and is common practice for many client configurations.

The application site controlling system performs actions against the GM primary devices and can address up to nearly 64 K devices. The recovery site controlling system performs actions against the GM secondary and the GM FlashCopy devices. GDPS supports defining the GM FlashCopy devices in an alternative subchannel set (MSS1) or not defining them at all (which is known as no-ucb FlashCopy). This ability allows up to nearly 64 K devices to be replicated in a GDPS GM environment.

Summary

Global Mirror provides an asynchronous remote copy offering that supports virtually unlimited distance, without the requirement of an SDM system to move the data from primary to secondary volumes. Global Mirror also supports a wider variety of platforms because it supports FB devices and removes the requirement for timestamped updates that is imposed by XRC.

Conversely, Global Mirror is currently not as scalable as XRC because it supports only a maximum of 17 storage subsystems. In addition, Global Mirror does not have the multiple vendor flexibility provided by XRC.

2.4.4 Combining disk remote copy technologies for CA and DR

In this section we briefly describe Metro/Global Mirror and Metro/z/OS Global Mirror. For more detailed information, see Chapter 9, “Combining local and metro continuous availability with out-of-region disaster recovery” on page 267. Combining the technologies of Metro Mirror and HyperSwap with either Global Mirror or XRC (also referred to as z/OS Global Mirror in this section) allows clients to meet requirements for continuous availability (CA) with zero data loss locally within metropolitan distances for most failures, along with providing a disaster recovery (DR) solution in the case of a region-wide disaster. This combination might also allow clients to meet increasing regulatory requirements.

Metro Global Mirror

Metro Global Mirror (MGM) is a cascading data replication solution that combines the capabilities of Metro Mirror and Global Mirror.

Synchronous replication between a primary and secondary disk subsystem located either within a single data center, or between two data centers located within metropolitan distances, is implemented using Metro Mirror. Global Mirror is used to asynchronously replicate data from the secondary disks to a third disk subsystem in a recovery site typically out of the local metropolitan region. As described in 2.4.3, “Global Mirror” on page 32, a fourth set of disks, also in the recovery site, are the FlashCopy targets used to provide the consistent data for disaster recovery.

MGM provides a comprehensive three-copy data replication strategy to protect against day-to-day disruptions, while protecting critical business data and functions if there is a wide-scale disruption.

Metro z/OS Global Mirror

GDPS Metro/z/OS Global Mirror (MzGM) is a multi-target data replication solution that combines the capabilities of Metro Mirror and XRC (z/OS Global Mirror).

Because XRC supports only CKD devices, only IBM Z data can be mirrored to the recovery site. However, because Metro Mirror and XRC are supported by multiple storage vendors, this solution provides flexibility that MGM cannot.

For enterprises looking to protect IBM Z data, MzGM delivers a three-copy replication strategy to provide continuous availability for day-to-day disruptions, while protecting critical business data and functions if there is a wide-scale disruption.

2.4.5 IBM software replication products

This section does not aim to provide a comprehensive list of all IBM software-based replication products. Instead, it provides an introduction to the following supported products within the GDPS Continuous Availability solution:

•InfoSphere Data Replication for IMS for z/OS

•InfoSphere Data Replication for VSAM for z/OS

•InfoSphere Data Replication for DB2 for z/OS

These products provide the capability to asynchronously copy changes to data held in IMS or DB2 databases or VSAM files from a source to target copy. Fine-grained controls allow you to precisely define what data is critical to your workload and needs to be copied in real time between the source and target. Unlike disk replication solutions that are application, or data-agnostic and work at the z/OS volume level, software replication does not provide a mechanism for copying all possible data types in your environment. As such, it is suited to provide only a CA/DR solution for specific workloads that can tolerate only the IMS, DB2 or VSAM database-resident information to be copied between locations. This is also discussed in Chapter 7, “GDPS Continuous Availability solution” on page 203.

InfoSphere Data Replication for IMS for z/OS

IMS Replication provides the mechanisms for producing copies of your IMS databases and maintaining the currency of the data in near real time, typically between two systems separated by geographic distances. There is essentially no limit to the distance between source and target systems because the copy technique is asynchronous and uses TCP/IP as the protocol to transport the data over your wide area network (WAN).

IMS replication employs Classic data servers in the source and target systems to provide the replication services.

Classic source server

The Classic source server reads the IMS log data and packages changes to the specified databases into messages that are then sent through TCP/IP to the target location.

Classic target server

The Classic target server, running in the target location, receives messages from the source server and applies the changes to a replica of the source IMS database in near real time. IMS replication provides mechanisms to ensure that updates to a given record in the source database are applied in the same sequence in the target replica. Furthermore, IMS replication maintains a bookmark to know where it has reached in processing the IMS log data so that if any planned or unplanned outage occurs, it can later catch up knowing where it was at the time of the outage.

For more information, IBM Documentation.

InfoSphere Data Replication for VSAM for z/OS

VSAM replication is similar in structure to IMS replication. For CICS/VSAM workloads, the transaction data for selected VSAM data sets is captured using the CICS log streams as the source. For non-CICS workloads, CICS VSAM Recovery (CICS VR) logs are used as the source for capturing VSAM update information. The updates are transmitted to the target using TCP/IP, where they are applied to the target data sets upon receipt.

InfoSphere Data Replication for DB2 for z/OS

InfoSphere Replication Server for z/OS, as used in the GDPS Continuous Availability solutions, is also known as Q replication. It provides a high capacity and low latency replication solution that uses IBM WebSphere® MQ message queues to transmit data updates between source and target tables of a DB2 database.

Q replication is split into two distinct pieces:

•Q capture program or engine

•Q apply program or engine

Q capture

The Q capture program reads the DB2 logs or changes to the source table or tables that you want to replicate. These changes are then put into WebSphere MQ messages and sent across the WebSphere MQ infrastructure to the system where the target table resides. There, they are read and applied to the target table by the Q apply program.

The Q capture program is flexible in terms of what can be included or excluded from the data sent to the target and even the rate at which data is sent can be modified if required.

By the nature of the method of Q replication, the replication of data is an asynchronous process. Even so, an RPO of a few seconds is possible even in high update environments.

Q apply

The Q apply program takes WebSphere MQ messages from a receive queue, or queues and then applies the changes held within the message to the target tables. The Q apply program is designed in such a way to use parallelism to keep up with updates to multiple targets while maintaining any referential integrity constraints between related target tables.

Both the Q capture and Q apply programs have mechanisms to track what has been read from the logs and sent to the target site, and what has been read from the receive queues and applied to the target tables, including any dependencies between updates.

This in turn provides data consistency and allows for restart of both the capture and apply programs, if this is required or in case of failures.

For more information about Q replication, see IBM Documentation.

2.5 Tape resident data

Operational data, that is, data that is used directly by applications supporting users, is normally found on disk. However, there is another category of data (called support data) that supports the operational data; this often resides in tape subsystems. Support data typically covers migrated data, point-in-time backups, archive data, and so on. For sustained operation in the failover site, the support data is indispensable. Furthermore, some enterprises have mission-critical data that resides only on tape. You need a solution to ensure that tape data is readily accessible at your recovery site.

Just as you mirror your disk-resident data to protect it, similarly you can mirror your tape-resident data. GDPS provides support for management of the IBM TS7700⁵. See section 3.1.2, “Protecting tape data” on page 67 for details about GDPS TS7700 support. The IBM TS7700 provides comprehensive support for replication of tape data. For more information about the TS7700 technology that complements GDPS for tape data, see IBM TS7700 R 5.1 Guide, SG24-8464 (https://www.redbooks.ibm.com/abstracts/sg248464.html).

2.6 FlashCopy

FlashCopy provides a point-in-time (PiT) copy of a volume, with almost instant availability for the user of both the source and target volumes. There is also a data set-level FlashCopy supported for z/OS volumes. Only a minimal interruption is required for the FlashCopy relationship to be established. The copy is then created by the disk subsystem, with minimal impact on other disk subsystem activities. The volumes created when you use FlashCopy to copy your secondary volumes are called tertiary volumes.

FlashCopy and disaster recovery

FlashCopy has specific benefits in relation to disaster recovery. For example, consider what happens if you temporarily lose connectivity between primary and secondary Metro Mirror volumes. At the point of failure, the secondary volumes will be consistent. However, during the period when you are resynchronizing the primary and secondary volumes, the secondary volumes are inconsistent (because the updates are not applied in the same time sequence that they were written to the primaries). So, what happens if you have a disaster during this period? If it is a real disaster, your primary disk subsystem will be a smoldering lump of metal on the computer room floor. And your secondary volumes are inconsistent, so those volumes are of no use to you either.

So, how do you protect yourself from such a scenario? One way (our suggested way) is to take a FlashCopy of the secondary volumes just before you start the resynchronization process. This at least ensures that you have a consistent set of volumes in the recovery site. The data might be several hours behind the primary volumes, but even data a few hours old that is consistent is better than current, but unusable, data.

An additional benefit of FlashCopy is that it provides the ability to perform disaster recovery tests while still retaining disaster recovery readiness. The FlashCopy volumes you created when doing the resynchronization (or subsequently) can be used to enable frequent testing (thereby ensuring that your recovery procedures continue to be effective) without having to use the secondary volumes for that testing.

FlashCopy can operate in several modes. GDPS uses one of the following modes of FlashCopy, depending on the GDPS offering:

COPY When the volumes are logically copied, the FlashCopy session continues as a background operation, physically copying all the data from the source volume to the target. When the volumes have been physically copied, the FlashCopy session ends. In this mode, the FlashCopy target physical volume will be a mirror image of the source volume at the time of the FlashCopy.

NOCOPY When the volumes are logically copied, a FlashCopy session continues as a background operation, physically copying only those tracks subsequently updated by write operations to the source volume. In this mode, the FlashCopy target physical volume contains only data that was changed on the source volume after the FlashCopy.

NOCOPY2COPY Change existing FlashCopy relationship from NOCOPY to COPY. This can be done dynamically. When one or more NOCOPY relationships exist for a source volume, NOCOPY2COPY will initiate a background copy for all target relationships with intersecting source extents from the point in time the NOCOPY was issued. Upon completion of the background copy, the converted relationship or relationships will be terminated.

INCREMENTAL This allows repetitive FlashCopies to be taken, but only the tracks that have changed since the last FlashCopy will be copied to the target volume. This provides the ability to refresh a FlashCopy relationship and bring the target up to the source’s newly established point-in-time. Incremental FlashCopy helps reduce the background copy completion time when only a subset of data on either the source or target has changed, thus giving you the option to perform a FlashCopy on a more frequent basis.

CONSISTENT This option is applicable to GDPS Metro and GDPS Metro HyperSwap Manager environments. It creates a consistent set of tertiary disks without suspending Metro Mirror. It uses the FlashCopy Freeze capability which, similar to Metro Mirror Freeze, puts all source disks in Extended Long Busy to ensure that the FlashCopy source disks are consistent before the point-in-time copy is made. After the source disks are consistent, the FlashCopy is taken (quite fast) and the Freeze is thawed.

Without this support, you would need to suspend Metro Mirror (planned freeze) and then resynchronize Metro Mirror to produce a consistent point-in-time copy of the secondary disks. HyperSwap would remain disabled from the time you suspended Metro Mirror until the mirror is full-duplex again; however, this can take a long time depending on how much data was updated while Metro Mirror remained suspended. In comparison, with Consistent FlashCopy, HyperSwap is only disabled during the FlashCopy Freeze, which should be simply a few seconds.

GDPS gives you the capability to restrict the FlashCopy Freeze duration and to abort the FlashCopy operation if the FlashCopy Freeze time exceeds your threshold.

To create a consistent point-in-time copy of the primary disks without Consistent FlashCopy, you would need to somehow make sure that there is no I/O on the primary disks (effectively, you would need to stop the production systems). With Consistent FlashCopy, production systems continue to run and I/O is prevented during the few seconds until the FlashCopy Freeze completes. After the FlashCopy Freeze completes, the primary disks are in a consistent state, the FlashCopy operation itself is quite fast, and then the freeze is thawed and production systems resume I/O. Consistent FlashCopy can be used in conjunction with COPY, NOCOPY, or INCREMENTAL FlashCopy.

Zero Suspend This option is applicable to GDPS XRC environments. It creates a recoverable set of tertiary disks for recovery testing with no suspension of the XRC operation. This allows DR testing to be performed without ever losing the DR capability. Before this support, to produce a consistent tertiary copy you needed to suspend XRC for all volumes, FlashCopy secondary volumes, and then resynchronize XRC sessions.

If you plan to use FlashCopy, remember that the source and target volumes must be within the same physical disk subsystem. This is a capacity planning consideration when configuring and planning for the growth of your disk subsystems.

Also remember that if you performed a site switch to run in the recovery site, at some point you will want to return to the production site. To provide equivalent protection and testing capability no matter which site you are running in, consider providing FlashCopy capacity in both sites.

Furthermore, GDPS does not perform FlashCopy for simply selected volumes. The GDPS use of FlashCopy is for the purposes of protection during resynchronization and for testing. Both of these tasks require that a point-in-time copy for the entire configuration is made. GDPS FlashCopy support assumes that you will provide FlashCopy target devices for the entire configuration and that every time GDPS performs a FlashCopy, it will be for all secondary devices (GDPS Metro also supports FlashCopy for primary devices).

An exception to this rule is that GDPS can perform FlashCopy for a subset of the production volumes when FlashCopy is used for the purposes Logical Corruption Protection (LCP). For more information about how GDPS uses FlashCopy technology to provide flexible testing and protection against various types of logical data corruption, including cyber attacks and internal threats, see section 10.2, “Introduction to LCP and Testcopy Manager” on page 297.

User-initiated FlashCopy

User-initiated FlashCopy supports FlashCopy of all defined FlashCopy volumes using panel commands, GDPS scripts, or GDPS Z NetView for z/OS commands, depending on which GDPS product is used.

Space-efficient FlashCopy (FlashCopy SE)

FlashCopy SE is functionally not much different from the standard FlashCopy. The concept of space-efficient with FlashCopy SE relates to the attributes or properties of a DS8000 volume. As such, a space-efficient volume can be used like any other DS8000 volume.

When a normal volume is created, it occupies the defined capacity on the physical drives. A space-efficient volume does not occupy physical capacity when it is initially created. Space gets allocated when data is actually written to the volume. This allows the FlashCopy target volume capacity to be thinly provisioned (that is, smaller than the full capacity of the source volume). In essence this means that when planning for FlashCopy, you may provision less disk capacity when using FlashCopy SE than when using standard FlashCopy, which can help lower the amount of physical storage needed by many installations

All GDPS products support FlashCopy SE. Details of how FlashCopy SE is used by each offering is described in the chapter related to that offering.

2.7 Automation

If you have challenging recovery time and recovery point objectives, implementing disk remote copy, software-based replication, tape remote copy, FlashCopy, and so on are necessary prerequisites for you to be able to recover from a disaster and meet your objectives. However, be sure you realize that they are only enabling technologies. To achieve the stringent objectives placed on many IT departments today, it is necessary to tie those technologies together with automation and sound systems management practices. In this section we discuss your need for automation to recover from an outage.

2.7.1 Recovery time objective

If you have reached this far in the document, we presume that your recovery time objective (RTO) is a “challenge” to you. If you have performed tape-based disaster recovery tests, you know that ensuring that all your data is backed up is only the start of your concerns. In fact, even getting all those tapes restored does not result in a mirror image of your production environment. You also need to get all your databases up to date, get all systems up and running, and then finally start all your applications.

Trying to drive all this manually will, without question, prolong the whole process. Operators must react to events as they happen, while consulting recovery documentation. However, automation responds at machine speeds, meaning your recovery procedures will be executed without delay, resulting in a shorter recovery time.

2.7.2 Operational consistency

Imagine an average computer room scene immediately following a system failure. All the phones are ringing. Every manager within reach moves in to determine when everything will be recovered. The operators are frantically scrambling for procedures that are more than likely outdated. And the systems programmers are all vying with the operators for control of the consoles; in short, chaos.

Imagine, instead, a scenario where the only manual intervention is to confirm how to proceed. From that point on, the system will recover itself using well-tested procedures. How many people watch it does not matter because it will not make mistakes. And you can yell at it all you like, but it will still behave in exactly the manner it was in which it was programmed to behave. You do not need to worry about outdated procedures being used. The operators can concentrate on handing calls and queries from the assembled managers. And the systems programmers can concentrate on pinpointing the cause of the outage, rather than trying to get everything up and running again.

And all of this is just for a system outage. Can you imagine the difference that well-designed, coded, and tested automation can make in recovering from a real disaster? Apart from speed, perhaps the biggest benefit that automation brings is consistency. If your automation is thoroughly tested, you can be assured that it will behave in the same way, time after time. When recovering from as rare an event as a real disaster, this consistency can be a lifesaver.

2.7.3 Skills impact

Recovering a computing center involves many complex activities. Training staff takes time. People come and go. You cannot be assured that the staff that took part in the last disaster recovery test will be on hand to drive recovery from this real disaster. In fact, depending on the nature of the disaster, your skilled staff might not even be available to drive the recovery.

The use of automation removes these concerns as potential pitfalls to your successful recovery.

2.7.4 Summary

The technologies you will use to recover your systems all have various control interfaces. Automation is required to tie them all together so they can be controlled from a single point and your recovery processes can be executed quickly and consistently.

Automation is one of the central tenets of the GDPS offerings. By using the automation provided by GDPS, you save all the effort to design and develop this code yourself, and also benefit from the IBM experience with hundreds of clients across your industry and other industries.

2.8 Flexible server capacity

In this section we discuss options for increasing your server capacity concurrently, for either planned upgrades or unplanned upgrades, to quickly provide the additional capacity you will require on a temporary basis. These capabilities can be used for server or site failures, or they can be used to help meet the temporary peak workload requirements of clients.

The only capabilities described in this section are the ones used by GDPS. Other capabilities exist to upgrade server capacity, either on a temporary or permanent basis, but they are not covered in this section.

2.8.1 Capacity Backup upgrade

Capacity Backup (CBU) upgrade for IBM Z processors provides reserved emergency backup server capacity that can be activated in lieu of capacity that is lost as a result of an unplanned event elsewhere. CBU helps you to recover by adding reserved capacity on a designated IBM Z system. A CBU system normally operates with a base server configuration and with a preconfigured number of additional processors reserved for activation in case of an emergency.

CBU can be used to install (and pay for) less capacity in the recovery site than you have in your production site, while retaining the ability to quickly provision the additional capacity that would be required in a real disaster.

CBU can be activated manually, using the HMC. It can also be activated automatically by GDPS, either as part of a disaster recovery test, or in reaction to a real disaster. Activating the additional processors is nondisruptive. That is, you do not need to power-on reset (POR) the server or even IPL the LPARs that can benefit from the additional capacity (assuming that an appropriate number of reserved CPs were defined in the LPAR Image profiles).

CBU is available for all processor types on IBM Z.

The CBU contract allows for an agreed-upon number of tests over the period of the contract. GDPS supports activating CBU for test purposes.

For more information about CBU, see System z Capacity on Demand User’s Guide, SC28-6846.

2.8.2 On/Off Capacity on Demand

On/Off Capacity on Demand (On/Off CoD) is a function that enables concurrent and temporary capacity growth of the server. The difference between CBU and On/Off CoD is that On/Off CoD is for planned capacity increases, and CBU is intended to replace capacity lost as a result of an unplanned event elsewhere. On/Off CoD can be used for client peak workload requirements, for any length of time, and it has a daily hardware and software charge.

On/Off CoD helps clients, with business conditions that do not justify a permanent upgrade in capacity, to contain workload spikes that might exceed permanent capacity so that Service Level Agreements cannot be met. On/Off CoD can concurrently add processors (CPs, IFLs, ICFs, zAAPs, and zIIPs) up to the limit of the installed books of an existing server. It is restricted to double the currently installed capacity.

2.8.3 Capacity for Planned Events

Capacity for Planned Events (CPE) can be used to replace capacity because of relocation of workloads, such as during system migrations, data center or server relocation, re-cabling, or general work on the physical infrastructure of the data processing environment.

CPE provides the ability to concurrently and temporarily (for 72 hours) activate more CPs, ICFs, IFLs, zAAPs, zIIPs, and SAPs to increase the CP capacity level, or a combination of these.

2.8.4 System Recovery Boost

System Recovery Boost (SRB) delivers substantially faster system shutdown and restart, short duration recovery process boosts for sysplex events (such as HyperSwap events), and enables faster catch up of the accumulated backlog of work after specific events, such as system restart.

SRB is available starting with the IBM z15™ IBM Z processor.

2.8.5 GDPS CBU, On/Off CoD, CPE, and SRB

The GDPS temporary capacity management capabilities are related to the capabilities provided by the particular server system being provisioned. Processors before the
IBM Z10 required that the full capacity for a Capacity Backup (CBU) upgrade or On/Off Capacity on Demand (OOCoD) be activated, even though the full capacity might not be required for the particular situation at hand.

GDPS, with IBM Z10 and later generation systems, provides support for activating temporary capacity, such as CBU and OOCoD, based on a preinstalled capacity-on-demand record. In addition to the capability to activate the full record, GDPS also provides the ability to define profiles that determine what will be activated. The profiles are used in conjunction with a GDPS script statement and provide the flexibility to activate the full record or a partial record.

When temporary capacity upgrades are performed by using GDPS facilities, GDPS tracks activated CBU and OOCoD resources at a Central Electronics Complex (CEC) level.

GDPS provides keywords in GDPS scripts to support activation and deactivation of the CBU, On/Off CoD, CPE, and SRB functions.

GDPS allows definition of capacity profiles to add capacity to already running systems. Applicable types of reserved engines (CPs, zIIPs, zAAPs, IFLs, and ICFs) can be configured online to GDPS z/OS systems, to xDR-managed z/VM systems, and to coupling facilities that are managed by GDPS.

When a GDPS z/OS system is IPLed, GDPS automatically configures online any applicable reserved engines (CPs, zIIPs, and zAAPs) based on the LPAR profile. The online configuring of reserved engines is done only if temporary capacity was added to the CEC where the system is IPLed using GDPS facilities.

2.9 Cross-site connectivity considerations

When setting up a recovery site, there might be a sizeable capital investment to get started, but you might find that one of the largest components of your ongoing costs is related to providing connectivity between the sites. Also, the type of connectivity available to you can affect the recovery capability you can provide. Conversely, the type of recovery capability you want to provide will affect the types of connectivity you can use.

In this section, we list the connections that must be provided, from a simple disk remote copy configuration through to an Active/Active workload configuration. We briefly review the types of cross-site connections that you must provide for the different GDPS solutions and the technology that must be used to provide that connectivity. All of these descriptions relate solely to cross-site connectivity. We assume that you already have whatever intrasite connectivity is required.

2.9.1 Server-to-disk links

If you want to be able to use disks installed remotely from a system in the production site, you must provide channel connections to those disk control units.

Metro Mirror and MTMM-based solutions

For Metro Mirror and MTMM with GDPS, all of the secondary disks (both sets for MTMM) must be defined to and channel-accessible to the production systems for GDPS to be able to manage those devices.

If you foresee a situation where systems in the production site will be running off the secondary disks (for example, if you will use HyperSwap), you need to provide connectivity equivalent to that provided to the corresponding primary volumes in the production site. The HyperSwap function provides the ability to nondisruptively swap from the primary volume of a mirrored pair to what had been the secondary volume.

If you do not have any cross-site disk accessing, minimal channel bandwidth (two FICON channel paths from each system to each disk subsystem) is sufficient.

Depending on your director and switch configuration, you might be able to share the director-to-director links between channel and Metro Mirror connections. For more information, see IBM System z Connectivity Handbook, SG24-5444.

HyperSwap across sites with less than full channel bandwidth

You might consider enabling unplanned HyperSwap to the secondary disks in the remote site even if you do not have sufficient cross-site channel bandwidth to sustain your production workload for normal operations. Assuming that a disk failure is likely to cause an outage and you will need to switch to using a disk in the other site, the unplanned HyperSwap might at least give you the opportunity to perform an orderly shutdown of your systems first. Shutting down your systems cleanly avoids the complications and longer restart time that is associated with crash-restart of application subsystems.

For GDPS Metro environments, the same consideration applies to enabling HyperSwap to the remote secondary copy: Channel bandwidth to the local secondary copy should not be an issue.

XRC-based and Global Mirror-based solutions

For any of the asynchronous remote copy implementations (XRC or Global Mirror), the production systems would normally not have channel access to the secondary volumes.

Software replication-based solutions

As with other asynchronous replication technologies, given that effectively unlimited distances are supported, there is no requirement for the source systems to have host channel connectivity to the data in the target site.

2.9.2 Data replication links

You need connectivity for your data replication activity for the following circumstances:

•Between storage subsystems (for Metro Mirror or Global Mirror)

•From the SDM system to the primary disks (for XRC)

•Across the wide area network for software-based replication

Metro Mirror-based and Global Mirror-based solutions

The IBM Metro Mirror (including MTMM) and Global Mirror implementations use Fibre Channel Protocol (FCP) links between the primary and secondary disk subsystems. The FCP connection can be direct, through a switch, or through other supported distance solutions (for example, Dense Wave Division Multiplexer, DWDM, or channel extenders).

XRC-based solutions

If you are using XRC, the System Data Movers (SDMs) are typically in the recovery site. The SDMs must have connectivity to both the primary volumes and the secondary volumes. The cross-site connectivity to the primary volumes is a FICON connection, and depending on the distance between sites, either a supported DWDM can be used (distances less than 300 km) or a channel extender can be used for longer distances. As discussed in “Extended Distance FICON” on page 30, an enhancement to the industry standard FICON architecture (FC-SB-3) helps avoid degradation of performance at extended distances, and this might also benefit XRC applications within 300 km where channel extension technology had previously been required to obtain adequate performance.

Software-based solutions

Both IMS replication and DB2 replication use your wide area network (WAN) connectivity between the data source and the data target. Typically, for both, either natively or through IBM MQ for Z/OS, TCP/IP is the transport protocol used, although other protocols, such as LU6.2, are supported. It is beyond the scope of this book to go into detail about WAN design, but ensure that any such connectivity between the source and target have redundant routes through the network to ensure resilience from failures. There are effectively no distance limitations on the separation between source and target. However, the greater the distance between them will affect the latency and, therefore, the RPO that can be achieved,

2.9.3 Coupling links

Coupling links are required in a Parallel Sysplex configuration to provide connectivity from the z/OS images to the coupling facility. Coupling links are also used to transmit timekeeping messages when Server Time Protocol (STP) is enabled. If you have a multisite Parallel Sysplex, you will need to provide coupling link connectivity between sites.

For distances greater than 10 km, either ISC3 or Parallel Sysplex InfiniBand (PSIFB) Long Reach links must be used to provide this connectivity. The maximum supported distance depends on several things, including the particular DWDMs that are being used and the quality of the links.

Table 2-1 lists the distances that are supported by the various link types.

Table 2-1 Supported CF link distances

Link type	Link data rate	Maximum unrepeated distance	Maximum repeated distance
ISC-3	2 Gbps¹ 1 Gbps²	10 km 20 km³	200 km
PSIFB Long Reach 1X	5.0 Gbps 2.5 Gbps⁴	10 km	175 km
PSIFB 12X, for use within a data center	6 GBytes/sec 3 GBytes/sec⁵	150 meters	Not applicable

¹ Gbps (gigabits per second).

² RPQ 8P2197 provides an ISC-3 Daughter Card that clocks at 1 Gbps.

³ Requires RPQ 8P2197 and 8P2263 (IBM Z Extended Distance).

⁴ The PSIFB Long Reach feature will negotiate to 1x IB-SDR link data rate of 2.5 Gbps if connected to qualified DWDM infrastructure that cannot support the 5 Gbps (1x IB-DDR) rate.

⁵ The PSIFB links negotiate to 12x IB-SDR link data rate of 3 GBytes/sec when connected to IBM Z9 servers.

2.9.4 Server Time Protocol

Server Time Protocol (STP) is a server-wide facility that is implemented in the Licensed Internal Code (LIC) of the IBM Z servers. It provides the capability for multiple servers to maintain time synchronization with each other. STP is the successor to the 9037 Sysplex Timer.

STP is designed for servers that have been configured to be in a Parallel Sysplex or a basic sysplex (without a coupling facility), and servers that are not in a sysplex, but need to be time-synchronized. STP is a message-based protocol in which timekeeping information is passed over data links between servers. The timekeeping information is transmitted over externally defined coupling links. Coupling links are used to transport STP messages.

If you are configuring a sysplex across two or more sites, you need to synchronize servers in multiple sites. For more information about Server Time Protocol, see Server Time Protocol Planning Guide, SG24-7280, and Server Time Protocol Implementation Guide, SG24-7281.

2.9.5 XCF signaling

One of the requirements for being a member of a sysplex is the ability to maintain XCF communications with the other members of the sysplex. XCF uses two mechanisms to communicate between systems: XCF signaling structures in a CF and channel-to-channel adapters. Therefore, if you are going to have systems in both sites that are members of the same sysplex, you must provide CF connectivity, CTC connectivity, or preferably both, between the sites.

If you provide both CF structures and CTCs for XCF use, XCF will dynamically determine which of the available paths provides the best performance and use that path. For this reason, and for backup in case of a failure, we suggest providing both XCF signaling structures and CTCs for XCF cross-site communication.

2.9.6 HMC and consoles

To be able to control the processors in the remote center, you need to have access to the LAN containing the SEs and HMCs for the processors in that location. Such connectivity is typically achieved using bridges or routers.

If you are running systems at the remote site, you will also want to be able to have consoles for those systems. Two options are 2074 control units and OSA-ICC cards. Alternatively, you can use SNA consoles, but be aware that they cannot be used until IBM VTAM® is started, so they cannot be used for initial system loading.

2.9.7 Connectivity options

Note: WAN connectivity options are not covered in this book. Table 2-2, with the exception of HMC connectivity, is predominantly related to disk replication solutions.

Now that we have explained what you need to connect across the two sites, we briefly review the most common options for providing that connectivity. There are several ways to provide all this connectivity, from direct channel connection through to DWDMs. Table 2-2 on page 48 lists the different options. The distance supported varies by device type and connectivity method.

Table 2-2 Cross-site connectivity options

Connection type	Direct (unrepeated)	Switch and director or cascaded directors	DWDM	Channel extender
Server to disk	Yes	Yes	Yes	Yes
Disk Remote copy	Yes	Yes	Yes	Yes
Coupling links	Yes	No	Yes	No
STP (coupling links)	Yes	No	Yes	No
XCF signaling	Yes	Yes (CTC) No (coupling links)	Yes	Yes (CTC only) No (coupling links)
HMC/consoles	Yes	Yes	Yes	Yes

For more information about options and distances that are possible, see IBM System z Connectivity Handbook, SG24-5444.

FICON switches/directors

For more information about IBM Z qualified FICON and Fibre Channel Protocol (FCP) products and products that support mixing FICON and FCP within the same physical FC switch or FICON director, see the I/O Connectivity web page.

The maximum unrepeated distance for FICON is typically 10 km. However, FICON switches can be used to extend the distance from the server to the control unit further with the use of a cascaded configuration. The maximum supported distance for the interswitch links (ISL) in this configuration is technology- and vendor-specific.

No matter what the case might be, if the property between the two sites is not owned by your organization, you will need a vendor to provide dark fiber between the two sites because FICON switches/directors cannot be directly connected to telecommunication lines.

For more information, see IBM System z Connectivity Handbook, SG24-5444.

Wavelength Division Multiplexing

A Wavelength Division Multiplexor (WDM) is a high-speed, high-capacity, scalable fiber optic data transport system that uses Dense Wavelength Division Multiplexing (DWDM) or Course Wavelength Division Multiplexing (CWDM) technology to multiplex several independent bit streams over a single fiber link, thereby making optimal use of the available bandwidth.

WDM solutions that support the protocols described in this book generally support metropolitan distances in the range of tens to a few hundred kilometers. The infrastructure requirements and the supported distances vary by vendor, model, and even by features on a given model.

More specifically, several qualified WDM solutions support the following key protocols used in a GDPS solution:

•Fiber Connection (FICON)

•InterSystem Channel (ISC-3)

•Parallel Sysplex InfiniBand (PSIFB) Long Reach links

•Server Time Protocol (STP) over ISC-3 Peer Mode or PSIFB Long Reach

•Potentially, protocols that are not IBM Z protocols

Given the criticality of these links for transport of data and timing information, it is important to use only qualified WDM vendor solutions when extending Parallel Sysplexes to more than one site (as is often done as part of a GDPS configuration).

The latest list of qualified WDM vendor products, along with links to corresponding IBM Redpaper publications for each product, is available at the IBM Resource Link web page (sign-in required).

Also see “Hardware products for servers” on the Library page.

Channel extenders

Channel extenders are special devices that are connected in the path between a server and a control unit, or between two control units. Channel extenders provide the ability to extend connections over much greater distances than that provided by DWDM. Distances supported with channel extenders are virtually unlimited⁶.

Unlike DWDMs, channel extenders support connection to telecom lines, removing the need for dark fiber. This can make channel extenders more flexible because access to high-speed telecoms is often easier to obtain than access to dark fiber.

However, channel extenders typically do not support the same range of protocols as DWDMs. In a IBM Z context, channel extenders support IP connections (for example, connections to OSA adapters), FCP and FICON channels, but not coupling links or time synchronization-related links.

For much more detailed information about the options and distances that are possible, see IBM System z Connectivity Handbook, SG24-5444.

More information about channel extenders that have been qualified to work with IBM storage is available to download from the DS8000 Series Copy Services Fibre Channel Extension Support Matrix web page.

2.9.8 Single points of failure

When planning to connect systems across sites, it is vital to do as much as you possibly can to avoid all single points of failure. Eliminating all single points of failure makes it significantly easier to distinguish between a connectivity failure and a failure of the remote site. The recovery actions you take are quite different, depending on whether the failure you just detected is a connectivity failure or a real site failure.

If you have only a single path, you do not know if it was the path or the remote site that went down. If you have no single points of failure and everything disappears, there is an extremely good chance that it was the site that went down. Any other mechanism to distinguish between a connectivity failure and a site failure (most likely human intervention) cannot react with the speed required to drive effective recovery actions.

2.10 Testing considerations

Testing your DR solution is a required and essential step in maintaining DR readiness. Many enterprises have business or regulatory requirements to conduct periodic tests to ensure the business is able to recover from a wide-scale disruption and recovery processes meet RTO and RPO requirements. The only way to determine the effectiveness of the solution and your enterprise's ability to recover from a disaster is through comprehensive testing.

One of the most important test considerations in developing a DR test plan is to make sure that the testing you conduct truly represents the way you would recover your data and enterprise. This way, when you actually need to recover following a disaster, you can recover the way you have been testing, thus improving the probability that you will be able to meet the RTO and RPO objectives established by your business.

Testing disk mirroring-based solutions

When conducting DR drills to test your recovery procedures, without additional disk capacity to support FlashCopy, the mirroring environment will be suspended so the secondary disks can be used to test your recovery and restart processes. When testing is completed, the mirror must be brought back to a duplex state again. During this window, until the mirror is back to a duplex state, the enterprises ability to recover from a disastrous event is compromised.

If this is not acceptable or your enterprise has a requirement to perform periodic disaster recovery tests while maintaining a disaster readiness posture, you will need to provide additional disk capacity to support FlashCopy. The additional FlashCopy device can be used for testing your recovery and restart procedures while the replication environment is running. This ensures that a current and consistent copy of the data is available, and that disaster readiness is maintained throughout the testing process.

The additional FlashCopy disk can also be used to create a copy of the secondary devices to ensure a consistent copy of the data is available if a disaster-type event occurs during primary and secondary volume resynchronization.

From a business perspective, installing the additional disk capacity to support FlashCopy will mean incurring additional expense. Not having it, however, can result in compromising the enterprise’s ability to recover from a disastrous event, or in extended recovery times and exposure to additional data loss.

Testing software replication solutions

Similar in some instances to the situation described for testing disk-based mirroring solutions, if you test on your target copy of your database or databases, you will have to pause the replication process. Potentially, you might have to also re-create the target copy from scratch by using the source copy as input when the test is complete.

It would be normal to test the recovery procedures and operational characteristics of a software replication solution in a pre-production environment that as close as possible reflects the production environment.

However, because of the nature of software replication solutions, there is limited recovery required in the target site. Updates will either have been sent (and applied) from the source site, or they will not; the apply process is based on completed units of work, so there should be no issue with incomplete updates arriving from the source site. The testing is more likely to be related to the process for handling the potential data loss and any possible handling of collisions caused by the later capture/apply of stranded transactions with other completed units of work that might have occurred following an outage or disaster.

Testing methodology

How you approach your DR testing is also an important consideration. Most enterprises aim to do the majority of disruptive testing in a test or “sandbox” environment. This ideally will closely resemble the production environment so that the testing scenarios done in the sandbox are representative of what is applicable also in your production environment.

Other enterprises might decide to simulate a disaster in the production environment to really prove the processes and technology deliver what is required. Remember, however, that a disaster can surface to the technology in different ways (for example, different components failing in different sequences), so the scenarios you devise and test should consider these possible variations.

A typical approach to DR testing in production is to perform some form of a planned site switch. In such a test, the production service is closed down in a controlled manner where it normally runs, and then restarted in the DR site. This type of test will demonstrate that the infrastructure in the DR site is capable of running the services within the scope of the test, but given the brief duration of such tests (often over a weekend only) not all possible workload scenarios can be tested.

For this reason, consider the ability to move the production services to the DR site for an extended period (weeks or months), to give an even higher degree of confidence. This ability to “toggle” production and DR locations can provide other operational benefits, such as performing a preemptive switch because of an impending event, along with increased confidence being able to run following a DR invocation.

With this approach it is important to continue to test the actual DR process in your test environment, because a real disaster is unlikely to happen in a way where a controlled shutdown is possible. Those processes must then be carefully mapped across to the production environment to ensure success in a DR invocation.

In some industries, regulation might dictate or at least suggest guidelines about what constitutes a valid DR test, and this also needs to be considered.

2.11 Summary

In this chapter we covered the major building blocks of an IT resilience solution. We discussed providing continuous availability for normal operations, the options for keeping a consistent offsite copy of your disk and tape-based data, the need for automation to manage the recovery process, and the areas you need to consider when connecting across sites.

In the next few chapters, we discuss the functions provided by the various offerings in the GDPS family.

¹ In this book, we use the term IBM Z to refer to the z Systems, IBM z Systems, System z, and zSeries ranges of processors. If something applies only to System z or zSeries processors, we point that out at the time.

² Not including the GDPS Continuous Availability solution which relates to a multiple sysplex configuration that can be either single-site or multisite workloads.

³ The way the disk subsystem reacts to a synchronous IBM Metro Mirror remote copy failure depends on the options you specify when setting up the remote copy session. The behavior that is described here is the default if no overrides are specified.

⁴ Signal latency is related to the speed of light over fiber and is 10 microseconds per km, round trip.

⁵ The TS7700 management support is available only in GDPS Metro at this time.

⁶ For more information about the impact of distance on response times when using channel extenders, contact your IBM representative to obtain the white paper titled, The effect of IU pacing on XRC FICON performance at distance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2. Infrastructure planning for availability and GDPS

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 2. Infrastructure planning for availability and GDPS