Performance considerations
This chapter describes performance considerations regarding TS7700 cloud attach and includes the following topics:
15.1 Generalizing a grid configuration using units of work
The TS7700 performance behavior, including cloud Object Store data, depends on the configuration. First, it is important to understand the data flow within a TS7700 grid.
Figure 15-1 shows a sample data flow in a two-cluster grid that consists of a TS7700C (CL0) and a TS7700T (CL1). It is an example of a near worse case scenario in which all data is replicated and premigrated to tape and cloud.
Figure 15-1 TTS7700 data flow sample in a two-cluster grid
The goal of this example is to inform you of how disk cache or TVC disk cache cumulative throughputs can be a factor in configuration’s performance. In addition, total bandwidth on the grid links can be a factor in the configuration’s throughputs. The example that is shown in Figure 15-1 attempts to break each activity to and from the disk TVC disk and each activity on the grid network as units of work.
This list describes the assumptions of the example in Figure 15-1:
Each cluster receives its own 300 MBps uncompressed from its connected hosts.
All logical volumes include RUN or Deferred copy mode with a zero DCT.
All logical volumes are premigrated to cloud on CL0 and 3592 tape on CL1.
The data compression ratio is 3:1.
None to minimal logical volumes are read from the host in this example.
With a 300 MBps channel speed, the A1 and B1 units of work each are 100 MBps after compression. If all things are at equilibrium in a sustained state of operation, each arrow or unit of work must match the 100 MBps throughput.
CL0 includes five total arrows (TVC reads and writes) coming into or out of its TVC disk cache. Therefore, its disk cache must sustain a total of 500 MBps of raw compressed 1:1 mixed read/write throughput. CL1 also must sustain the same rate because it also has five units of work or arrows into and out of the disk cache.
If deferred copy throttling (DCT) is enabled, the replication component can be deferred allowing fewer units of work into and out of the TVC disk cache. Premigration to the cloud or tape might be delayed or skipped, which also reduces the total demand on disk cache throughput.
The most complex grid configurations can be generalized by using this basic unit of work concept. It can help determine whether disk cache is potentially a performance limiter.
By using the TS7700 performance white paper, you can determine the maximum mixed 1:1 throughput of your disk cache configuration that is based on how many physical drawers are installed. It can then be used to determine the expected maximum sustained states of operation of the solution. If remote copies to a third location are also occurring, those copies also add units of work to the TVC disk cache and grid network links.
For this same example, the total units of work on the CL0 grid network are four: one outbound for replication, one inbound for replication, and two for outbound cloud premigration. The cumulative read/write rate of the grid network at CL0 must be 400 MBps to sustain the worst case scenario in this example. Again, limiting replication or deferring or skipping premigration can reduce the workload on the links.
15.2 Cloud attach-specific performance considerations
This section describes the cloud attach-specific performance considerations.
15.2.1 Network bandwidth and premigration queue size
Network bandwidth to public cloud Object Stores is often limited when compared to private on premise cloud stores. Maximum throughput to an Object Store is also likely slower than the speed of which the TS7700 can write to 3592 physical tape drives.
Therefore, the premigration queue size can build up faster on a TS7700C cluster because it might have slower premigration speeds than similar TS7700T configurations. If the premigration backlog causes the sustained speed of operations to the TS7700C to be slower than expected (excessive throttling), consider adding premigration increments.
FC5274 (1 TB Active Premigration Queue) and FC5279 (5 TB Active Premigration Queue) are features that allow for an increase of premigration queue size.
If the grid remote write, replication, and cloud premigration activity exceeds the available bandwidth of the grid links, throttling or delays can occur. Therefore, it is ideal that the available grid network bandwidth can accommodate the expected throughputs of the configuration. Lower than needed bandwidth speeds can result in delayed RPO times, delayed premigration rates to the cloud, or lower than expected host rates when synchronous or RUN replication types are used.
15.2.2 Logical volume size
If network bandwidth is limited, premigration to a cloud and recall from a cloud for a logical volume requires longer times when compared to 3592 tape drives. For example, a 25 GB logical volume requires almost 40 minutes if only 100 Mbps bandwidth is available. It is recommended that smaller volume sizes be used for workloads that require frequent recalls so that mount completion and access to data can occur sooner.
15.2.3 Premigrate and recall time outs
You can customize premigrate and recall time out value by using the LIBRARY REQUEST CLDSET command. The timeout values are based on a 1 GB scaling factor, which allows you to choose a rate that accommodates all volume sizes.
You can also set the maximum concurrent tasks to premigrate and recall by using the LIBRARY REQUEST CLDSET command. If network bandwidth is narrow, you might need to set longer timeout values. If too many tasks are sharing the bandwidth, you can choose a smaller number of concurrent tasks so that each task receives a larger portion of the available bandwidth.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.234.39