SnapMirror
This chapter introduces IBM System Storage N series SnapMirror. It allows a volume (flexible or traditional) or qtree to be replicated between IBM System Storage N series storage systems over a network, typically for backup or disaster recovery purposes. However, it can be used also for application testing, load balancing, and remote access to data. SnapMirror is enhanced by the introduction of FlexVol and FlexClone technology, and by the introduction of synchronous and semi-synchronous modes.
The following topics are covered:
 
13.1 SnapMirror at a glance
SnapMirror, as shown in Figure 13-1, is a feature of Data ONTAP that enables you to replicate data. SnapMirror enables you to replicate data from specified source volumes or qtrees to specified destination volumes or qtrees, respectively.
Figure 13-1 SnapMirror overview
You need a separate license to use SnapMirror. After the data is replicated to the destination storage system, you can access the data on the destination to perform the following actions:
Provide users immediate access to mirrored data in case the source goes down.
Restore the data to the source to recover from disaster, data corruption (qtrees only), or user error.
Archive the data to tape.
Balance resource loads.
Back up or distribute the data to remote sites.
You can configure SnapMirror to operate in one of the following modes:
Asynchronous mode: SnapMirror replicates Snapshot copies to the destination at specified, regular intervals.
Synchronous mode: SnapMirror replicates data to the destination as soon as the data is written to the source volume.
Semi-synchronous mode: SnapMirror replication at the destination volume lags behind the source volume by 10 seconds. This mode is useful for balancing the need for synchronous mirroring with the performance benefit of asynchronous mirroring.
SnapMirror can be used with traditional volumes and FlexVol volumes.
13.2 Introduction to SnapMirror
SnapMirror is a chargeable feature of IBM System Storage N series storage systems (which requires a license code). It allows a volume or qtree to be replicated between IBM System Storage N series storage systems over a network for backup or disaster recovery purposes. But it can be used also for application testing, load balancing, and remote access to data.
After an initial baseline transfer of the entire volume or qtree, as shown in Figure 13-2, subsequent updates only transfer new and changed data from the source to the destination. It makes SnapMirror highly efficient in terms of network bandwidth utilization. The result is an online, read-only volume (mirror) that contains the same data as the source volume at the time of the most recent update.
Figure 13-2 Baseline creation
To replicate data for the first time, the storage system transfers the active file system and all Snapshots from the source volume to the mirror. After the storage system finishes transferring the data, it brings the mirror online. This version of the mirror is the baseline for future incremental changes. Also, like any other volume, after you finish, you can export the mirror for Network File System (NFS) mounting or add a share corresponding to this volume for Common Internet File System (CIFS) sharing.
To make incremental changes on the mirror, the storage system takes regular Snapshots on the source volume according to the schedule specified in the configuration file. By comparing the current Snapshot with the previous Snapshot, the storage system determines what changes it must make to synchronize the data in the source volume and the data in the mirror.
The destination volume is available for read-only access, or the mirror can be broken to enable writes to occur on the destination. After breaking the mirror, it can be re-established by synchronizing the changes made to the destination back onto the source file system.
A variation on the basic SnapMirror deployment involves a writable source volume replicated to multiple read-only destinations. The function of this deployment is to make a uniform set of data available on a read-only basis to users from various locations throughout a network to allow for updating that data uniformly at regular intervals.
13.2.1 The need for SnapMirror
SnapMirror software provides a fast, flexible enterprise solution for mirroring or replicating data over local or wide area networks. SnapMirror is used for these purposes:
Disaster recovery
Remote enterprise-wide online backup
Data replication for local read-only access at a remote site
Application testing on a dedicated read-only mirror
Data migration between IBM System Storage storage systems
SnapMirror technology is a key component of enterprise data protection strategies. If a disaster occurs at a source site, businesses can access mission-critical data from a mirror on another IBM System Storage N series storage system, ensuring uninterrupted operation (Figure 13-3). Enterprise tape backups are made from SnapMirror, not a production system, reducing CPU load on the production system.
The IBM System Storage N series storage system can be located virtually any distance from the source. It can be in the same building, or on the other side of the world, as long as the interconnecting network has the necessary bandwidth to carry the replication traffic that is generated.
Figure 13-3 SnapMirror
The advantages of SnapMirror over copy vol (created with the vol copy command) is that SnapMirror supports these functions:
Automated and scheduled updates of Snapshot
Incremental Snapshot updates
Qtree level replication between the source and the mirror
There are three modes of operation to replicate the data between the source and the mirror volume:
Asynchronous mode: In the traditional asynchronous mode of operation, updates of new and changed data from the source to the mirror volume occur on a schedule defined by the storage administrator. These updates can be as frequent as once per minute or as infrequent as once per week, depending on user needs.
Synchronous mode: This mode is also available, which sends updates from the source to the destination as they occur, rather than on a schedule. If configured correctly, it can guarantee that data written on the source system is protected on the mirror volume, even if the entire source system fails due to natural or human-caused disaster. In addition to a standard SnapMirror license, the synchronous feature requires a special license key.
Semi-synchronous mode: This mode can minimize loss of data in a disaster while also minimizing the performance impact of replication on the source volume. In order to maintain consistency and ease of use, the asynchronous and synchronous interfaces are identical with the exception of a few additional parameters in the configuration file.
 
Important: Starting with Data ONTAP 8.1 N series systems support volume SnapMirror replication between 32-bit and 64-bit volumes.
13.2.2 Rules for using SnapMirror
The following rules apply when using SnapMirror:
The source and the mirror volume must be of the same volume type, that is, both must be a traditional or a flexible volume.
The mirror volume must be manually created because SnapMirror does not automatically create the mirror volume.
SnapMirror can be used through a firewall. The port number that SnapMirror listens for connections is 10566. You have to allow a range of TCP ports from 10565 to 10569.
The source volume must be online.
The mirror cannot be the root volume.
The capacity of the mirror must be greater than or equal to the capacity of the source volume. The configuration of the volumes, however, can be different.
The mirror volume must run under a version of Data ONTAP equal to or later than that of the SnapMirror source volume. If the IBM System Storage N series storage systems must be upgraded, then the IBM System Storage N series storage system that hosts the mirror volume must be upgraded before the IBM System Storage N series storage systems that host the source volume. This requirement does not apply for qtree replication. This rule only applies for volume replication.
Quotas cannot be enabled on a mirror.
Qtrees cannot be created on a mirror. However, if one qtree exists in the source volume, the storage system mirrors the qtrees to the mirror.
SnapMirror replicates a file system on one volume to a read-only copy on another volume.
SnapMirror is based on Snapshot technology. Only changed blocks are copied after the initial mirror is established.
It runs over IP or FC.
Data is accessible read-only at remote sites.
Replication is either volume based or qtree based.
IP name resolution must be configured properly before the use of SnapMirror. If no DNS or host file is configured, then the IP address must be used and the Enable IP Checking must be enabled at SnapMirror.
 
Restriction: The maximum number of entries in /etc/snapmirror.conf is 1024 lines.
The maximum number of concurrent replication operations with the Nearstore feature enabled varies per N series model:
For Volume SnapMirror:
 – N3000 models can handle up to 100 concurrent replication operations.
 – N6000 and N7000 models can handle up to 300 concurrent replication operations.
For Qtree SnapMirror:
 – N3000 models can handle up to 160 concurrent replication operations.
 – N6000 and N7000 models can handle up to 512 concurrent replication operations.
13.3 The three modes of SnapMirror
SnapMirror can be used in three different modes:
Asynchronous
Synchronous
Semi-synchronous
We explain these modes in more detail in the following sections.
13.3.1 Asynchronous mode
In asynchronous mode, shown in Figure 13-4, SnapMirror performs incremental, block-based replication as frequently as once per minute. Consult your technical team for the best plan for your environment or to determine whether synchronous SnapMirror is a better match. The performance impact on the source IBM System Storage N series storage system is minimal as long as the system is configured with sufficient CPU and disk I/O resources.
Figure 13-4 Asynchronous SnapMirror options
Asynchronous mode initialization
The first and most important step in asynchronous mode involves the creation of a one-time, baseline transfer of the entire data set. It is required before incremental updates can be performed.
This operation proceeds as follows:
1. The primary storage system takes a Snapshot (a read-only, point-in-time image of the file system).
2. Referring to Figure 13-2 on page 177, this Snapshot is called the baseline copy.
3. All data blocks referenced by this Snapshot and any previous Snapshot copies are transferred and written to the secondary file system.
4. After initialization is complete, the primary and secondary file systems will have at least one Snapshot in common.
Asynchronous mode updates
After initialization, both scheduled or manually triggered updates can occur. Each update transfers only the new and changed blocks from the primary to the secondary file system. This operation proceeds as follows:
1. The primary storage system takes a Snapshot.
2. The new Snapshot is compared with the baseline Snapshot to determine which blocks have changed.
3. The changed blocks are sent to the secondary and written to the file system.
4. After the update is complete, both file systems have the new Snapshot, which becomes the baseline Snapshot for the next update.
Because asynchronous replication is periodic, SnapMirror is able to consolidate writes on the source volume and conserve network bandwidth.
13.3.2 Synchronous mode
Synchronous SnapMirror is a SnapMirror feature that replicates data from a source volume to a partner destination volume at or near the same time that it is written to the source volume, rather than according to a predetermined schedule. This insures that data written on the source system is protected on the destination even if the entire source system fails. It guarantees zero data loss in the event of a failure, but can have a significant impact on performance. It is not necessary or appropriate for all applications.
Synchronous SnapMirror (Figure 13-5) replicates data between single storage systems or clustered storage systems located at remote sites using IP or FCP infrastructure with no special converters required. Synchronous SnapMirror is simply a mode of operation or feature that has recently been added to the SnapMirror software. This mode requires a special license key to function properly.
Figure 13-5 Single Path SnapMirror
 
Synchronous SnapMirror is supported only for configurations of which the source system and destination systems are the same type of system and have the same disk geometry. The type of system and disk geometry of the destination impacts the perceived performance of the source system. Therefore, the destination system must have the bandwidth for the increased traffic and for message logging. Log files are kept on the root volume. Therefore, you must ensure that the root volume spans enough disks to handle the increased traffic. The root volume must span four to six disks.
For the best performance, you need to have a dedicated high-bandwidth, low-latency network between the source and destination storage systems. Synchronous SnapMirror can support traffic over Fibre Channel and IP transports.
Disk configurations supported
The following configurations are supported for synchronous SnapMirror relationships:
A source storage system with only ATA disks attached to a destination storage system with only ATA disks attached
A source storage system with only Fibre Channel disks attached to a destination storage system with only Fibre Channel disks attached
Any other configuration of attached disks, such as a combination of ATA and Fibre Channel disks, is not supported.
Terminology
To avoid any potential confusion, it is appropriate to review exactly what is meant by the term synchronous in this context. The best way to do this task is to examine a scenario where the primary data storage device fails completely and then examine the disaster’s impact on an application.
In a typical application environment, the following steps occur:
1. A user saves information in the application.
2. The client software communicates with a server and transmits the information.
3. The server software processes the information and transmits it to the operating system on the server.
4. The operating system software sends the information to the storage.
5. The storage acknowledges receipt of the data.
6. The operating system tells the application server that the write is complete.
7. The application server tells the client that the write is complete.
8. The client software tells the user that the write is complete.
In most cases, these steps take only tiny fractions of a second to complete. If the storage system fails in such a way that all data on it is lost (for example, as a result of a fire or flood that destroys all of the storage media), the impact to an individual transaction varies based on when the failure occurs, as explained here:
If the failure occurs before step 5, the storage never acknowledges receipt of the data. It results in the user receiving an error message from the application, indicating that it failed to save the transaction.
If the failure occurs after step 5, the user sees client behavior that indicates correct operation (at least until the following transaction is attempted). Despite the indication by the client software (in step 8) that the write was successful, the data is lost.
The first case is obviously preferable to the second, because it provides the user or application with knowledge of the failure and the opportunity to preserve the data until the transaction can be attempted again. In the second case, the data can be discarded based on the belief that it is already safely stored.
With traditional asynchronous SnapMirror, data is replicated from the primary storage to a secondary or destination storage device on a schedule. If this schedule were configured to cause updates once per hour, for example, it is possible for a full hour of transactions to be written to the primary storage, and acknowledged by the application, only to be lost when a failure occurs before the next update. For this reason, many customers attempt to minimize the time between transfers. Some customers replicate as frequently as once per minute, which significantly reduces the amount of data that can be lost in a disaster.
This level of flexibility is good enough for the vast majority of applications and users. In most real-world environments, loss of one minute or five minutes of data is of trivial concern compared with the downtime incurred during such an event. Any disaster that completely destroys the data on the IBM System Storage N series storage system will most likely also destroy the relevant application servers, critical network infrastructure, and so on.
However, there are some customers and applications that have a zero data loss requirement even in the event of a complete failure at the primary site, as shown in Figure 13-6.
Figure 13-6 Availability
For these situations, synchronous mode is appropriate because it modifies the application environment described such that replication of data to the secondary storage occurs with each transaction, as explained here:
1. A user saves information in the application.
2. The client software communicates with a server and transmits the information.
3. The server software processes the information and transmits it to the operating system on the server.
4. The operating system software sends the information to the primary storage.
5. The primary storage sends the information to the secondary storage.
6. The secondary storage acknowledges receipt of the data.
7. The primary storage acknowledges receipt of the data.
8. The operating system tells the application server that the write is complete.
9. The application server tells the client that the write is complete.
10. The client software tells the user that the write is complete.
The key difference, from the application’s point of view, is that the storage does not acknowledge the write until the data has been written to both the primary and the secondary storage. It has some performance impact, as described later, but modifies the failure scenario in beneficial ways:
If the failure occurs before step 7, the storage never acknowledges receipt of the data. It results in the user receiving an error message from the application, indicating that it failed to save the transaction. It causes inconvenience, but no data loss.
If the failure occurs during or after step 7, the data is safely preserved on the secondary storage system despite the failure of the primary.
 
Attention: Regardless of what technology is used, it is always possible to lose data. The key point here is that with synchronous mode, loss of data that has been acknowledged is prevented.
Operation
The first step involved in synchronous replication is a one-time, baseline transfer of the entire data set, just as in asynchronous mode, as described in 13.3.1, “Asynchronous mode” on page 180.
 
Tip: SnapMirror must be licensed before synchronous SnapMirror.
After the baseline transfer has completed, SnapMirror can change to synchronous mode, as follows:
1. Asynchronous updates occur, as described earlier, until the primary and secondary file systems are close to being synchronized.
2. NVLOG forwarding begins. It is a method for transferring updates as they occur.
3. Consistency point (CP) synchronization begins. It is a method for ensuring that writes of data from memory to disk storage are synchronized on the primary and secondary systems.
4. New writes from clients or hosts on the primary file system are blocked until acknowledgment of those writes has been received from the secondary system.
5. A final update occurs using the same method as asynchronous updates.
After SnapMirror has determined that all data acknowledged by the primary has been safely stored on the secondary, the system is in synchronous mode. At this point, the output of a SnapMirror status query shows that the relationship is in sync.
 
Attention: If the environment is unable to maintain synchronous mode (because of networking or destination issues), SnapMirror drops to asynchronous mode. When the connection is re-established, the source IBM System Storage N series asynchronously replicates data to the destination once each minute, until synchronous replication is re-established. After it occurs, a message will be logged of the change of status (into or out of synchronous status). This safety net is known as fail-safe synchronous.
Synchronous mode paths
More than one physical path might be required for a synchronous mirror. Synchronous SnapMirror supports up to two paths for a particular relationship. These paths can be Ethernet, Fibre Channel, or a combination of the two.
Multipath support allows synchronous and semi-synchronous traffic to be load-balanced between these paths and provides for failover in the event of a network outage. There are two modes of multipath operation:
Multiplexing mode, as shown in Figure 13-7, in which both paths are used simultaneously and load balancing transfers across the two. When a failure occurs, the load from both transfers moves to the remaining path.
Figure 13-7 SnapMirror multipath
Failover mode, in which one path is specified as the primary path in the configuration file. This path is the desired path and is used until a failure occurs. The second path is then used.
The role of NVLOG in synchronous SnapMirror
NVLOG forwarding is a critical component of synchronous mode operation. It is the method used for write operations submitted from clients against the primary file systems to be replicated to the destination.
When NVLOG forwarding is active in synchronous mode, some modifications are made as described here:
The request is journaled in non-volatile RAM (NVRAM). It is also recorded in cache memory and forwarded over the network to the SnapMirror destination system, where it is journaled in NVRAM and cache memory.
After the request is safely stored in NVRAM and cache memory on both the primary and secondary systems, Data ONTAP acknowledges the write to the client system, and the application that requested the write is free to continue processing.
As can be seen, NVLOG forwarding is the primary mechanism by which data is synchronously protected.
The synchronous SnapMirror replication mode synchronously replicates writes from the source NVLOG RAM to the destination NVLOG RAM. After the data transfer is completed, an acknowledgement is sent from the destination NVLOG RAM. It is known as NVLOG forwarding.
At this point, data is not yet written to the disk. After the destination NVRAM is half-full or 10 seconds to previous Consistency Point (CP) time, Data ONTAP creates a tetris that computes RAID parity information and data blocks to be written to the disk. It also forwards the same tetris data to a destination system until the CP data is written to the destination. It is known as CP forwarding.
If there is a delay in NVLOG or CP forwarding due to network or storage error synchronous replication, the synchronous mode falls back to the asynchronous mode. As there are two writes, one during NVLOG forwarding and another during CP forwarding, you must consider 2x of data written in synchronous replication mode.
13.3.3 Semi-synchronous mode
SnapMirror also provides a semi-synchronous mode, sometimes called semi-sync. Synchronous SnapMirror can be configured to lag behind the source volume by a user-defined number of write operations or milliseconds.
Semi-synchronous mode overview
This mode is like asynchronous mode in that the application does not need to wait for the secondary storage to acknowledge the write before continuing with the transaction.
(Of course, for this reason, it is possible to lose acknowledged data.)
This mode is also like synchronous mode in that updates from the primary storage to the secondary storage occur right away, rather than waiting for scheduled transfers. This makes the potential amount of data lost in a disaster very small. Semi-synchronous mode minimizes data loss in a disaster, while also minimizing the extent to which replication impacts the performance of the source system.
Semi-synchronous mode provides a middle ground that keeps the primary and secondary file systems more closely synchronized than asynchronous mode. Configuration of semi-synchronous mode is identical to configuration of synchronous mode, with the addition of an option that specifies how many writes can be outstanding (unacknowledged by the secondary system) before the primary system delays acknowledging writes from the clients.
Internally, semi-synchronous mode works identically to synchronous mode in most cases. The only difference lies in how quickly client writes are acknowledged. The replication methods used are the same. However, it is possible to configure semi-synchronous mode in a way that changes the replication strategy. A CP is triggered when NVRAM is one-half full, or every 10 seconds, whichever occurs sooner. If semi-synchronous mode is configured to allow unacknowledged transactions greater than 10 seconds old, SnapMirror falls back to performing CP synchronization only. NVLOG forwarding is halted, because a CP synchronization is sufficiently frequent to meet the service level requested.
When a CP synchronization occurs under such circumstances, the tetris sent to the secondary IBM System Storage N series storage system includes not just the list of data blocks to be written, but also the content of those data blocks. It is because with NVLOG forwarding disabled, the secondary system does not have a copy of the data until the CP synchronization occurs.
For the vast majority of customer configurations, NVLOG forwarding is desirable. Thus, configuring SnapMirror to allow more than 10 seconds of outstanding data is not desirable for customers who want higher synchronicity levels.
However, if NVLOG forwarding is not required, specifying a large time value for outstanding data might reduce the overall CPU usage on the primary storage system. This configuration can allow for significant increases in overall throughput if CPU usage is a limiting factor.
 
Tip: Unlike asynchronous mode, which can replicate either volumes or quota trees, synchronous and semi-synchronous modes work only with volumes.
Semi-synchronous mode scenario
The semi-synchronous mode scenario consists of the following steps:
1. A user saves information in the application.
2. The client software communicates with a server and transmits the information.
3. The server software processes the information and transmits it to the operating system on the server.
4. The operating system software sends the information to the primary storage.
5. The primary storage sends the information to the secondary storage. The primary storage simultaneously acknowledges receipt of the data.
6. The operating system tells the application server that the write is complete.
7. The application server tells the client that the write is complete.
8. The client software tells the user that the write is complete.
9. At some point after step 5, the secondary acknowledges receipt of the data. (Note that step 9 can potentially occur before, or simultaneously with, step 6.)
If the secondary storage system is slow or unavailable, it is possible that a large number of transactions can be acknowledged by the primary storage system and yet not protected on the secondary. These transactions represent a window of vulnerability for the loss of acknowledged data.
For a window of zero size, customers can use fully synchronous mode rather than semi-sync. If using semi-sync, and the size of this window is customizable based on user and application needs. It can be specified as a number of operations, milliseconds, or seconds.
If the number of outstanding operations equals or exceeds the number of operations specified by the user, further write operations will not be acknowledged by the primary storage system until some have been acknowledged by the secondary.
Likewise, if the oldest outstanding transaction has not been acknowledged by the secondary within the amount of time specified by the user, further write operations will not be acknowledged by the primary storage system until all responses from the secondary are being received within that time frame.
13.4 SnapMirror applications
You can use SnapMirror for the following applications:
For data replication for local read access at remote sites:
 – Slow access to corporate data is eliminated.
 – You can off-load tape backup CPU cycles to a mirror (Figure 13-8).
Figure 13-8 Data replication for warm backup/off-load
To isolate testing from production volume:
 – ERP testing
 – Offline reporting
For cascading mirrors:
Replicated mirrors on a larger scale
For disaster recovery.
Replication to hot site for mirror failover and eventual recovery
The Data ONTAP SnapMirror feature can be used in combination with FlexClone volumes to perform migration faster and more efficiently:
 – For enterprises with a warm backup site, or those that must off-load backups from production servers
 – For generating queries and reports on near-production data
13.5 Synchronous and asynchronous implications
With synchronous SnapMirror, a Snapshot is made on the destination volume every time that a write is done on the source. The Snapshot can be deleted from the clone, but not from the source volume, while the SnapMirror relationship is in sync. Synchronous SnapMirror has a hard lock. In contrast, asynchronous SnapMirror has a soft lock. If the process falls out of synchronous mode, it reverts to asynchronous mode and becomes a soft lock.
Synchronous SnapMirror keeps the source and destination in sync as much as possible:
If the NVLOG channel requests (per op) time out
If the CP on the source takes more than one minute
If network errors persist even after three retransmissions
If the source or destination fails to restart
If the network connection fails
In such situations, synchronous SnapMirror completes an asynchronous update within one minute. It also turns on consistency point forwarding and NVLOG forwarding.
13.6 Volume capacity and SnapMirror
The source capacity must be less than or equal to the destination capacity when using flexible volumes. When the administrator performs a SnapMirror break and the destination capacity is greater than the source capacity, the destination volume shrinks to match the capacity of the smaller source volume. It is a much more efficient usage of disk space because it avoids consumption of unused space.
13.7 Guarantees in a SnapMirror deployment
Guarantees determine how the aggregate preallocates space to the flexible volume. SnapMirror never enforces guarantees, regardless of how the source volume is set. As long as the destination volume is a SnapMirror destination (replica), the guarantee is volume-disabled. Subsequently, the guarantee mode is the same as the volume mode when the volume is broken off using SnapMirror break.
13.8 SnapMirror architecture
A full Snapshot is created (named Snap A) and then a baseline transfer to the target volume is performed, as shown in Figure 13-9.
Figure 13-9 SnapMirror detail
Figure 13-10 displays the SnapMirror internal operation.
Figure 13-10 SnapMirror internals
As you might expect in a 24x7 operation, updates to the source volume continue to occur while the baseline image is transferred, leading to the creation of Snap B. The integrity of Snap A is maintained with Snapshot and at a point in time. The baseline image of Snap A is transferred, as shown in Figure 13-11.
.
Figure 13-11 Consistent SnapMirror
For reference purposes, we refer to Snap A as T0 time and to Snap B as T1 time. At T1 time, a Snapshot is done again, capturing a image of the volume at that point in time. After completion of Snap B, an incremental transfer is initiated (it is incremental because only portions of the volume have changed since T0 time).
Updates continue to occur, but the Snapshot maintains the integrity of Snap B. After completion of the incremental transfer, there is now a consistent full image copy of the source volume as it looked at T1 time (Figure 13-12).
Figure 13-12 Snap C consistency
Operations continue and now another Snapshot is done (Snap C or T2 time), capturing an image of the volume at that point in time. After completion of Snap C, an incremental transfer is initiated (it is incremental because only portions of the volume have changed since T1 time or snap B).
Updates continue to occur but the Snapshot maintains the integrity of Snap C. After completion of the incremental transfer, there is now a consistent full image copy of the source volume as it looked at T2 time.
13.9 Isolating testing from production
After a consistent image (that is, a baseline image and subsequent incremental transfers) is captured, the SnapMirror relationship is broken and the target is enabled for write operations, read for application testing, and so on (Figure 13-13).
Figure 13-13 Isolate testing from production
During this time, the source volumes continue to be available online. Note that at any time, you can resync forward by re-establishing the mirror relationship.
13.10 Cascading mirrors
Cascading is a method of replicating from one destination system to another in a series, as shown in Figure 13-14. For example, you might want to perform synchronous replication from the primary site to a nearby secondary site, and asynchronous replication from the secondary site to a far-off tertiary site. Currently, only one synchronous SnapMirror relationship can exist in a cascade.
Figure 13-14 Cascading mirrors
13.10.1 Cascading replication
Figure 13-15 shows an example of cascading replication.
Figure 13-15 Cascading replication example
You can replicate to multiple (30) locations across the continent:
You send data only once across the WAN.
You reduce resource utilization on the source IBM System Storage N series storage system.
13.10.2 Disaster recovery
SnapMirror can become one of the methods to recover or continue operations in a disaster, as shown in Figure 13-16. Here are various requirements for business continuity that might require SnapMirror:
Enterprises that cannot afford the downtime of a full restore from tape (days)
Data-centric environments
The mean time to recovery when a disaster occurs must be reduced
Figure 13-16 Disaster recovery example
13.11 Performance impact of synchronous and semi-synchronous modes
Performance is a complex and difficult area to quantify. It is beyond the scope of this book to describe IBM System Storage N series storage system performance. But we do examine what affects synchronous SnapMirror has on individual system performance and how synchronous SnapMirror affects overall performance.
It is important to note that the guidelines and preferred practices that follow are not exact measurements. Any synchronous replication method, regardless of the technology used, will have an impact on the performance of applications using the storage.
Understanding business requirements for application performance and data protection allows an organization to make informed choices between various data protection strategies. When examining the application performance impact of synchronous or semi-synchronous replication, there are two primary factors to consider:
Overall system throughput might be reduced due to these factors:
 – CPU impact imposed by the replication process
 – Network bandwidth constraints between the primary and secondary storage
 – Slower system performance on the secondary storage than on the primary
 – Impact of workload on the secondary storage, thus reducing its ability to service replication traffic
 – Root volume performance on the secondary system
Individual write operations take longer to complete due to the need for additional processing of each operation and network latency between the primary and secondary storage.
Our description of these factors focuses on the primary storage system and its client applications. The preferred practice is to provide a dedicated secondary storage system for synchronous or semi-synchronous replication, and we assume that this preferred practice is being followed. Thus, performance impact on the secondary storage system is not considered an important issue except insofar as it creates an impact on the primary storage system.
13.12 CPU impact of synchronous and semi-synchronous modes
When a system running SnapMirror in synchronous or semi-synchronous mode receives a write request from a client, it must do all of the standard processing that will be required normally. It also must do additional processing, related to SnapMirror, to transfer the information to the secondary storage system. This adds significant CPU impact to every write operation.
Although it is beyond the scope of this book to describe the individual components of this CPU impact in detail, it is helpful to illustrate the concept using an example. Reading or writing information over network connections is one task performed by an IBM System Storage N series storage system. Higher volumes of data being passed across the network result in more CPU usage on the storage system. So if the network-related CPU impact is considered independently of other factors, a client writing data to the IBM System Storage N series storage system at 30 MBps will use about half of the CPU used by a client writing data at 60 MBps.
When replicating data in synchronous or semi-synchronous mode, all of the data written to the primary by clients must also be passed across a network to the secondary system. So in addition to processing the data coming in from clients, the IBM System Storage N series CPU must do additional work to send the same data back out to the secondary system.
The same basic mechanism is at work in other CPU-intensive parts of the software in addition to networking. So in general, you can expect about double the CPU usage on a system with synchronous or semi-synchronous SnapMirror as compared with the same workload on a system without SnapMirror. You can use the stats command to display statistics on your CPU (Example 13-1).
Example 13-1 The stats command
itsotuc2*> stats show processor
processor:processor0:processor_busy:1%
processor:processor1:processor_busy:1%
 
itsotuc2*> stats show system
system:system:nfs_ops:0/s
system:system:cifs_ops:0/s
system:system:http_ops:0/s
system:system:dafs_ops:0/s
system:system:fcp_ops:0/s
system:system:iscsi_ops:0/s
system:system:net_data_recv:1KB/s
system:system:net_data_sent:0KB/s
system:system:disk_data_read:0KB/s
system:system:disk_data_written:8KB/s
system:system:cpu_busy:1%
system:system:avg_processor_busy:0%
system:system:total_processor_busy:1%
system:system:num_processors:2
13.13 Network bandwidth considerations
Because all of the data written to the primary storage must be replicated to the secondary storage as it is written, write throughput to the primary storage cannot generally exceed the bandwidth available between the primary and secondary storage devices. Because SnapMirror transfers can be performed over standard Ethernet networks and over Fibre Channel networks, there is a choice for transport. This choice will most likely be determined by preference or existing infrastructure rather than by performance needs.
In general, the configuration guideline is to configure the network between the primary and secondary storage with at least as much bandwidth as the network between the clients and the primary storage.
13.14 Replication considerations
Table 13-1 outlines the different maximum concurrent replication operations for different flavors of SnapMirror operations. Values are provided for Data ONTAP 8.1 with the Nearstore feature enabled. Without the Nearstore feature enabled, values would only be 50% from the stated values. Starting with Data ONTAP 8.1, the Nearstore feature is enabled by default.
Table 13-1 Concurrent replication operations with Nearstore feature enabled
Storage system model
Async
Volume SnapMirror
Sync or Semi-Sync Volume SnapMirror
Async
Qtree SnapMirror
SnapVault
Open systems SnapVault
N3150
50/100
16
120/60
120
64
N3220
N3240
N6220
50/100
16
320/160
320
128
N6250
N7550T
150/3001
32
512/2562
512
128
N7950T

1 Source/Target
2 Single-path/Multi-path
Note that the system resources for replication all come from a shared pool. For example, an N7950T can support either 150/300 asynchronous Volume SnapMirror streams or 32 synchronous Volume SnapMirror streams; but not both at the same time.
Next we show the approximate number of replication streams available when used in combination on an N7950T. When maximum number of replication streams are in use, then there is no capacity for any more. A similar sharing of resources occurs on the other N series model. See Figure 13-17.
Figure 13-17 Replication streams available
 
 
13.14.1 Maximum concurrent transfers for clustered configurations
In a cluster configuration, each node can have its own set of replication operations that are limited by the maximum number of concurrent transfers for that node.
In the event of a cluster takeover, replication operations from the node taken over are managed by the node that performed the takeover.
If the combined number of concurrent transfers is less than the maximum allowed for the single node, all of the replication operations can run concurrently.
If the combined number of concurrent transfers is greater than the maximum allowed for the single node, the replication operations for the node performing the takeover are run concurrently. Then, as a replication operation finishes, a replication operation from the taken over node replaces the finished operation.
Obviously this can present a problem if the total number of Synchronous replication streams in a cluster is greater than a single node can support in the event of a cluster takeover. We advise using a MetroCluster solution rather than using a large number of Synchronous SnapMirror relationships.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.231.128