Disaster recovery principles
The discussion of TS7700C disaster recovery (DR) principles that is presented in this chapter is meant to supplement the information that is available in Chapter 5 Disaster Recovery of the IBM Redbooks publication TS7700 Release 4.2 Guide. It also adds considerations to disaster recovery when a Cloud Storage Tier is attached to a TS7700 Grid.
This chapter includes the following topics:
11.1 Tier to Cloud Considerations
With the introduction of a new cloud storage tier, your production cluster and DR cluster can be TS7760Cs where both clusters are connected to the same cloud object store. However, even if both are attached to the same cloud object store and the logical data for a production volume is in the cloud object store, the DR host (by way of the DR cluster) can access only that data if the copy policy on the production cluster sends a copy to the DR cluster and that copy completed replication to the DR cluster.
The DR cluster being attached to the same cloud object store as the production cluster does not allow it to access the data that is stored in the cloud object store by the production cluster. The best way to think about a cloud object store (as it relates to functionality from a DR perspective) is that it is synonymous with a tape-attached TS7700.
11.2 Cloud object store availability
An important aspect of data management with the TS7700C is that access to object stores in the cloud are available only to a cluster where the LVOL was resident in that cluster’s TVC or replicated to that cluster across the grid.
In the case of replication, the timing of when the replication completes and the copy policies are selected is important. For example, copies on the deferred queue might not be complete at the point when a disaster occurs. Regardless of the presence of a copy of the data in the object store in the cloud, that copy becomes inaccessible from any cluster where the LVOL did not finish replication to that cluster.
If the data is critical, the copy policies must be set up with Run or (better yet) Sync on the DR cluster so that a copy of the LVOL is ensured to be placed in the DR cluster TVC. After that copy is replicated, the volume can then be migrated to object store in the cloud from either site and still be accessible from the DR cluster.
11.3 Required data for restoring the host environment
Volumes that contain data that is required to restore the host environment, such as DASD full volume backups of IPL required data, must be kept in resident partitions. Although backups of these volumes can be kept in the object store in the cloud, most current full volumes backups of the DASD pool normally are not good candidates to migrate to the object store in the cloud because of the recall time that is required to restore them.
Also, primary data (for example, HSM ML2 files), also cannot be good candidates to keep only in object stores on the cloud because the time that is required to recall such data can affect restoring the host to operational status.
11.4 Volume sizing
An important aspect of object store management is the consideration of multi-file volumes. When a volume is recalled from object stores, it must be recalled in its entirety. Even if only one file must be accessed, the entire logical tape volume must be recalled to the TVC. Therefore, sizing volumes with multiple files is done carefully. Unless a volume contains files that all must be accessed, smaller logical volume sizes might be preferred.
11.5 Recovery time objectives
When a TS7700C is part of the disaster recovery plan, consider the amount of data in the object store in the cloud that is required to be recalled if a disaster occurs. The amount of data that is required and how quickly that data can be moved from the object store in the cloud back to the DR cluster factors heavily into the recovery time objectives. Any volumes that need to be immediately available to minimize the time it takes to return to operational status must be kept in resident partitions on the DR cluster.
11.6 Production activity and bandwidth
After the host system is recovered and normal operations resume, some data in the object store on the cloud might exist that must be recalled. The grid links are being shared between the object store access points and the other clusters in the grid. If the copy policies in the grid are replicating volumes between the remaining clusters while the object store recalls are ongoing, the potential for grid link degradation exists.
Consider the temporary use of copy policies that limit the number of copies being written across the grid links until all object store recalls are complete. Optionally, sizing can be done to ensure that bandwidth is available at the DR site to accommodate the object store retrieval from the cloud and the normal grid workload.
11.7 Redundancy in the cloud
Consideration must be made of the possibility of a failure that can affect the availability of the cloud. If the data that is stored in the cloud is critical to operations, you might want to replicate the cloud to multiple locations to reduce the chance that such an outage limits access to the data. For more information, see Chapter 2, “Container resiliency” on page 9.
11.8 Cost of object store retrieval
The policies of the cloud service provider dictate the costs that are associated with retrieving data from the object store in the cloud. The speed at which that data must be recalled and the amount of data is a factor in those costs. However, other factors can be important, such as the need for redundancy in the cloud and to where you are transferring the data.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.115.44