Global Mirror overview
This chapter provides an overview of Global Mirror. It describes the necessity for data consistency over unlimited distances when synchronous data replication such as Metro Mirror is not practical or possible. It explains how Global Mirror works in a manner similar to a distributed application, that is in a client/server relationship. It includes the following topics:
19.1 Global Mirror basic concepts
When you replicate data over long distances, that is, beyond the 300 km Metro Mirror non-RPQ limit, asynchronous data replication is the preferred approach. The reason this approach is the preferred one is because, with asynchronous techniques, the application I/O processing on the primary storage system remains independent of the process of data transmission to the secondary storage system.
With asynchronous data replication techniques, you must provide additional means to ensure data consistency at the secondary location. Such a scenario requires a solution that ensures data consistency within a single primary-secondary pair of storage systems and across multiple primary and secondary storage systems.
Global Mirror, is based on an efficient combination of Global Copy and FlashCopy functions. It is the storage system microcode that provides, from the user perspective, a transparent and autonomic mechanism to intelligently use Global Copy with certain FlashCopy operations to attain consistent data at the secondary site.
In normal operations for asynchronous data replication, data consistency for dependent writes is preserved depending on the technique that is used to replicate the data. Dependent writes and data consistency are explained in detail in 16.8, “Consistency group function” on page 127. Global Mirror uses a different technique than z/OS Global Mirror, which is explained in Chapter 24, “How z/OS Global Mirror maintains consistency” on page 299.
The trade-off of providing consistency in an asynchronous replication is that not all the most recent data can be saved in a consistency group. The reason is that data consistency can only be provided in distinct periods of time. When an incident occurs, only the data from the previous point of consistency creation can be restored. The measurement of the amount of data which is lost in such a case is named recovery point objective (RPO) and it is given in units of time, usually seconds or minutes. The RPO is not a fixed number, it is in fact highly dependant on the available bandwidth and the quality of the physical data link and of the current workload at the local site.
19.1.1 Terminology in Global Mirror environments
The following terms and elements are commonly used when working in a Global Mirror context:
Bitmaps
 – Out of sync (OOS)
This bitmap, in the Global Mirror environment, is created by the DS8000 at Global Copy pair establish time and keeps all tracks of data that are modified on the Primary storage system and not yet copied to the secondary storage system.
 – Change recording (CR)
This bitmap, in the Global Mirror environment, is used during consistency group processing to track any changes that come in to the primary storage system while the OOS bitmap is being drained to the secondary storage system in preparation for a consistency group.
Consistency
The consistency of data is ensured if the order of dependent writes to disks or disk groups is maintained. With Copy Services solutions, the data consistency at the secondary site is important for the ability to restart a database. You can use consistent data to perform a database restart rather than a database recovery that might take hours or days.
Data consistency across all secondary volumes that are spread across multiple storage disk systems is essential for logical data integrity.
Consistency is ensured to be equivalent to a power off of a production storage system, where everything that was written right up to the point of losing power is consistent. Thus, without a quiesce of all production activity, there is always some data in flight. So you might need to rerun some tasks to completion at the secondary after recovery.
Consistency group
A group of volumes in one or more secondary storage systems whose data contains a consistent point in time for disaster recovery.
Data currency
This term describes the difference of time since the last data was written at the primary site, versus the time the same data was written to the secondary site. It determines the amount of data you must recover at the remote site after a disaster. This concept is also called the recovery point objective (RPO). Only synchronous copy solutions, such as Metro Mirror, have an RPO equal to zero. All asynchronous copy solutions have a data currency greater than zero.
With Global Mirror, a data currency of a few seconds can be achieved, and data consistency is always maintained by the consistency group process within Global Mirror.
Here are some examples of different asynchronous replication methods:
 – Global Copy is an asynchronous method that does not ensure consistent data at the secondary site.
 – z/OS Global Mirror is an asynchronous replication method that ensures consistent data at the secondary site.
 • It is a hardware and software solution, that is a system data mover (SDM) in DFSMSdfp.
 • It supports only z Systems count key data (CKD) volumes.
 – Global Mirror is also an asynchronous replication method that provides consistent data at the secondary site. For more information, see 19.1.4, “Global Mirror Master Subordinate Relationship” on page 204.
 • It is a hardware solution.
 • It supports all platforms (z Systems and Open Systems).
Dependent writes
If the start of one write operation depends upon the completion of a previous write, the writes are dependent. An example of an application with dependent writes is databases with their associated logging files.
Maintaining the order of dependent writes is the basis for having data consistency at the secondary (remote) site.
Dependent writes are described in detail with examples in 16.8.1, “Data consistency and dependent writes” on page 127.
Journal
Asynchronous mirroring requires a mechanism to contain consistent recoverable data. The Global Mirror function uses FlashCopy to create the Global Mirror journals. These relationships are created during the Global Mirror setup process and are refreshed with each consistency group that is created.
Master
The master is a function inside a primary storage system that communicates with other primary storage systems (subordinates), and coordinates the creation of consistency groups while you manage the Global Mirror session. The master is defined when the start command for a session is issued to any LSS in a primary storage system. This command determines which DS8000 becomes the master storage system and which LSS becomes the master LSS.
The master requires communication paths over Fibre Channel links to any one of the LSSs in each subordinate storage system.
Session
A Global Mirror session is a collection of Global Copy pairs that are managed together when you create consistent copies of data volumes. This set of volumes can be in one or more LSSs and one or more storage disk systems at the primary site. Open Systems volumes and z/OS volumes can both be members of the same session.
When you start or resume a session, consistency group processing occurs, with the master storage system coordinating the session by communicating with the subordinate storage systems.
For a Global Copy pair to be part of a Global Mirror session, the session must be defined to the LSS where the volume resides. All LSSs with a specific session defined are combined and grouped within that Global Mirror session at the time the session is started.
Subordinate
The subordinate is a function inside a primary storage system that communicates with the master and is coordinated by the master. At least one of the LSSs of each subordinate primary storage systems requires Fibre Channel communication paths to the master. These paths enable the communication between the master and the subordinate, and are required to create consistency groups of volumes that spread across more than one storage system.
If all the volumes of a Global Mirror session are in one primary storage disk system, no subordinate is required because the master can communicate to all LSSs inside the primary storage system.
19.1.2 Application I/O and Global Mirror
In an asynchronous data replication environment, an application write I/O has the following steps (see Figure 14-1 on page 108):
1. Write application data to the primary storage system cache.
2. Acknowledge a successful I/O to the application so that the next I/O can be immediately scheduled. The application can then immediately schedule the next I/O.
3. Replicate the data from the primary storage system cache to the auxiliary storage system cache and NVS.
4. Acknowledge to the primary storage system that data successfully arrived at the auxiliary storage system.
Note how in an asynchronous type technique that the data transmission and the I/O completion acknowledge are independent processes, which result in virtually no application I/O impact, or at most a minimal degree of I/O. This technique is convenient when you must replicate over long distances.
19.1.3 Asynchronous replication technique
An asynchronous data replication technique provides the following features:
Data replication to the secondary site is independent from application write I/O processing at the primary site, which results in no impact, or at least only minimal impact, to the application write I/O response time.
The order of dependent writes is maintained at the primary site, so data consistency can be maintained at the secondary site.
Data currency at the secondary site lags behind the primary site. How much it lags depends upon network bandwidth and storage system configuration. In periods of peak write workloads, this difference increases.
The bandwidth requirement between the primary and secondary sites does not have to be configured for peak write workload; link bandwidth utilization is improved over synchronous solutions.
Journal copies are required at the secondary site to preserve data consistency.
Data loss in disaster recovery situations is limited to the data in transit plus the data that might still be in the queue at the primary site that is waiting to be replicated to the secondary site.
To accomplish the necessary activities with a minimal impact on the application write I/O, Global Mirror architecture uses Global Copy, FlashCopy and a bitmap approach. Global Mirror uses two different types of bitmaps, the OOS bitmaps used by the Global Copy function and the CR bitmap allocated during the process of consistency formation. Figure 19-1 on page 204 identifies the following essential components of the DS8000 Global Mirror architecture:
Global Copy is used to transmit the data asynchronously from the primary volumes to the secondary volumes.
FlashCopy relationship from the Global Copy secondary volumes to the journal volumes.
A CR bitmap which is maintained by the Global Mirror running on the primary storage system while consistency group is created at the primary site.
When the Global Mirror Process at the primary site creates a consistency group, I/O is held at the primary volumes for the time it takes to coordinate all subordinates that a consistency group is to be formed and to create the CR bitmap in the primary storage memory (maximum coordination interval default = 50 ms). All new data sent from the hosts will be marked for each corresponding track in the CR bitmap.
In Global Mirror, the latest consistency group is represented by the journal volumes (FlashCopy targets) at the secondary site.
Figure 19-1 General architecture of Global Mirror
19.1.4 Global Mirror Master Subordinate Relationship
Global Mirror works like a distributed application. A distributed application is usually built on a server to client relationship. The server functions as a supervisor and instructs the client. The client is able to do some work in an autonomic fashion but relies on the coordination efforts from the server, as shown in Figure 19-2.
Figure 19-2 Distributed application
The server distributes the work to its clients. The server also coordinates all individual feedback from the clients and decides further actions. Looking at this diagram, the communication paths between the server and all its clients are key. Without communication paths between these four components, the functions eventually come to a complete stop. Matters get more complicated when the communication fails unexpectedly in the middle of information exchange between the server and its clients or to some of its clients.
Usually, a two-phase commit process provides a consistent state for certain functions and determines whether they complete successfully at the client site. After a function completes successfully and is acknowledged to the server, the server progresses to the next function task. Concurrently, the server tries to parallelize operations (for example, I/O requests and coordinate communication) to minimize the impact on throughput because of serialization and checkpoints.
When certain activities depend on each other, the server must coordinate these activities to ensure a correct sequence. The server and client can be also referred as master and subordinate.
Figure 19-3 shows the basic Global Mirror structure in relation to a distributed application. A master coordinates all efforts within a Global Mirror environment. After the master is started and manages a Global Mirror environment, the master issues all related commands over Peer-to-Peer Remote Copy (PPRC) links to its attached subordinates at the primary site. These subordinates can include a subordinate within the master itself. This communication between the master and an internal subordinate is transparent and does not require any extra attention from the user. The subordinates use inband communication to communicate with their related auxiliary storage systems at the remote site. The master also receives all acknowledgements from the subordinates and coordinates and serializes all the activities in the session.
Figure 19-3 Global Mirror as a distributed application
Now, if we look at this structure in a more detail view, you see that we have primary volumes in the master and subordinate storage system, secondary volumes in the secondary storage systems, and the secondary volumes are also FlashCopy source volumes in the secondary storage systems. Pull them all together into a coordinated entity and you have a Global Mirror session. A Session is a Global Mirror master session that manages a Global Mirror processing, as shown in Figure 19-4 on page 206. A Global Mirror session is identified by a Global Mirror session ID, which is the number 20 in the figure. This session ID is defined in all the involved LSSs at the primary site, which contain Global Copy primary volumes that belong to session 20. The Global Mirror master manages the subordinate through PPRC logical paths between both DS8000 storage systems. Consistency is provided across all primary storage systems.
Figure 19-4 Global Mirror session
When the master and subordinate are in a single storage system, the subordinate is internally managed by the master. With two or more storage systems at the primary site, which participate in a Global Mirror session, the subordinate is external and requires separate attention when you create and manage a Global Mirror session or environment (Figure 19-5).
Figure 19-5 Global Mirror master and subordinate configuration
19.2 Global Mirror consistency group processing
In a Global Mirror environment, as mentioned previously, the consistent set of data is represented by the journal volumes at the secondary site. This is accomplished using specific steps and coordination within the storage systems and between storage systems as described in this section.
19.2.1 Properties of the Global Mirror journal
To create a set of volumes at a secondary site that contains consistent data, asynchronous data replication alone is not enough. It must be complemented with either a journal or a tertiary copy of the secondary volume. With Global Mirror, this third copy is created by using FlashCopy.
Figure 19-6 shows a FlashCopy relationship with a Global Copy secondary volume as the FlashCopy source volume. Volume H2 is now both, at the same time, a Global Copy secondary volume and a FlashCopy source volume. In the same storage server is the corresponding FlashCopy target volume.
Figure 19-6 Volume relationships in a Global Mirror configuration
This FlashCopy relationship has certain attributes that are typical and required when you create a Global Mirror session:
Inhibit target write
Protect the FlashCopy target volume from being modified by anything other than Global Mirror related actions.
Start change recording
Apply changes only from the source volume to the target volume that occur to the source volume in between FlashCopy establish operations, except for the first time when FlashCopy is initially established.
Persist
Keep the FlashCopy relationship until explicitly or implicitly terminated. This parameter is automatic because of the change recording property.
Nocopy
Do not start background copy from source to target, but keep the set of FlashCopy bitmaps required for tracking the source and target volumes. These bitmaps are established when a FlashCopy relationship is created. Before a track in the source volume B is modified, between consistency group creations, the track is copied to the target volume C to preserve the previous point-in-time copy. This copy includes updates to the corresponding bitmaps to reflect the new location of the track that belongs to the point-in-time copy. The first Global Copy write to its secondary volume track with the window of two adjacent consistency groups causes FlashCopy to perform copy on write operations.
Some interfaces to trigger this particular FlashCopy relationship combine these attributes into a single parameter such as MODE(ASYNC) with the TSO command for z/OS.
With standard FlashCopy, you needed twice the capacity at the secondary site, as compared to the primary site. With extent space-efficient (ESE) or thin provisioned volumes as the FlashCopy targets, you need twice the logical capacity as on the primary site. However, the physical storage available in the extent pool need only be the same physical capacity as on the primary site with additional physical capacity for the ESE target volumes to capture the current consistency group’s changes. The physical capacity of the journal volumes is less than using fully provisioned C volumes. The physical capacity that you need is determined by how long you can tolerate an inactive Global Mirror session. If Global Mirror does not initiate FlashCopy copies by forming consistency groups, which releases space consumed by the target (journal) at the start of establish processing, the extent pool could fill up with data as tracks are updated with updates from the primary site.
For detailed information about ESE volumes (thin provisioning), see IBM DS8880 Thin Provisioning, REDP-5343.
19.2.2 Consistency group formation
The microcode automatically triggers a sequence of autonomic events to create a set of consistent data volumes at the secondary site. This set of consistent data volumes is a consistency group. The following sections describe the sequence of events that create a consistency group.
The creation of a consistency group requires three steps that are internally processed and controlled by the microcode. These steps are fully transparent and do not require any other external code invocation or user action.
The numbers in Figure 19-7 illustrate the sequence of the events that are involved in the creation of a consistency group. This illustration provides only a high-level view to aid in understanding how this process works.
Figure 19-7 Formation of a consistent set of volumes at the secondary site
Note that before step 1 and after step 3, Global Copy constantly scans through the OOS bitmaps and replicates data from H1 volumes to H2 volumes.
When the creation of a consistency group is triggered by the Global Mirror master, the following steps occur:
1. Coordination includes serializing all of the Global Copy primary volumes in the Global Mirror session. This serialization imposes a brief hold on all incoming write I/Os to all involved Global Copy primary volumes. After all primary volumes are serialized across all involved primary DS8000s, the pause on the incoming write I/O is released and all further write I/Os are now noted in the CR bitmap. They are not replicated until step 3 on page 209 is done, but application write I/Os can immediately continue. This serialization phase takes only a few milliseconds and the default coordination time is set to 50 ms.
2. Drain includes the process to replicate all remaining data that is indicated in the OOS bitmap and still not replicated. After all out-of-sync bitmaps are empty, step 3 is triggered by the microcode from the primary site.
3. Now the H2 volumes contain all data at a point-in-time copy, and are consistent because of the serialization process in step 1 and the completed replication or drain process in step 2. Step 3 is now a FlashCopy that is triggered by the primary system’s microcode as an inband FlashCopy command to volume H2, as FlashCopy source, and the journal volume J, as a FlashCopy target volume. This FlashCopy is a two-phase process. First, the FlashCopy command is issued to all involved FlashCopy pairs in the Global Mirror session. Then, the master collects the feedback and all incoming FlashCopy completion messages. When all FlashCopy operations are successfully completed, the master concludes that a new consistency group is created successfully and performs commit operations to each of the FlashCopy relationships.
FlashCopy applies here only to changed data since the last FlashCopy operation because the start change recording property was set at the time when the FlashCopy relationship was established. The FlashCopy relationship does not end because the relationship is persistent. Because of the nocopy attribute, only copy on write operations cause physical tracks to be copied from the source to the target.
When step 3 is complete, a consistent set of volumes is created at the secondary site. This set of volumes, the H2 and J volumes, represents the consistency group.
For this brief moment only, the H2 volumes and the J volumes are equal in their content. Immediately after the FlashCopy process is logically complete, the primary storage systems are notified to continue with the Global Copy process from H1 to H2. To replicate the changes to the H1 volumes that occurred during the step 1 to step 3 window, the change recording bitmap is merged into the empty out-of-sync bitmap, and from now on, all arriving write I/Os end up again in the out-of-sync bitmap. Then, the conventional Global Copy process, as outlined in 14.1, “Global Copy overview” on page 108, continues until the next consistency group creation process is started.
 
Note: If any error occurs preventing the successful completion of the FlashCopy commands, the relationships that were established are reverted to the previous consistency group. So, there is always a consistent set of data at the remote site.
19.2.3 Consistency group parameters
There are three externalized parameters that can be specified by a user (see Figure 19-8 on page 210). Their default values can be overridden.
Figure 19-8 Consistency group tuning parameters
The following parameters are available:
Maximum coordination time
In the first step, the serialization step, Global Mirror serializes all related Global Mirror primary volumes across all participating primary storage systems. This parameter dictates, for all of the Global Copy primary volumes that belong to this session and consistency group, the maximum time that is allowed when you form the change recording bitmaps for each volume. This time is measured in milliseconds (ms). The default is 50 ms.
Maximum drain time
This time is the maximum time that is allowed for draining the out-of-sync bitmap after the process to form a consistency group is started and step 1 of Figure 19-8 completes successfully. The maximum drain time is specified in seconds. The default is 30 seconds. You might want to increase this time window when you replicate over a longer distance and with limited bandwidth.
If the maximum drain time is exceeded, Global Mirror fails to form the consistency group and evaluates the current throughput of the environment. If the evaluation indicates that another drain failure is likely, Global Mirror stays in Global Copy mode while regularly re-evaluating the situation to determine when to form the next consistency group. If this situation persists for a significant period, then Global Mirror eventually forces the formation of a new consistency group. In this way, Global Mirror ensures that during periods when the bandwidth is insufficient, production performance is protected, and data is transmitted to the secondary site in the most efficient manner possible. When the peak activity passed, consistency group formation resumes in a timely fashion.
Consistency group interval time
After a consistency group (CG) is created, the consistency group interval time (CGI) determines how long to wait before starting the formation of the next consistency group. This formation is specified in seconds, and the default is zero seconds. Zero seconds means that consistency group formation happens constantly. When a consistency group is created successfully, the process to create a consistency group starts again immediately.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201