Chapter 3. IBM TS7700 usage considerations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

IBM TS7700 usage considerations

This chapter provides a general overview of the necessary information and considerations to choose the best configuration for your needs.

The IBM TS7700 offers a great variety of configuration options, features, functions, and implementation parameters. Many of the options serve different purposes, and their interactions with other options can affect how they contribute to achieving your business goals.

For some environments, these features and functions are mandatory, whereas for other environments, they only raise the level of complexity. There is no “one configuration and implementation fits all needs” solution. Therefore, you need a plan to build an environment to meet your requirements.

This chapter summarizes what to consider during the planning phase, especially for introducing new features. It also provides some general suggestions about considerations for the day-to-day operation of a TS7700 environment.

This chapter includes the following topics:

•3.1, “Introduction” on page 112

•3.2, “Gather your business requirements” on page 114

•3.3, “Features and functions for all TS7700 models” on page 121

•3.4, “Features and functions available only for the TS7700T” on page 128

•3.5, “Operation aspects: Monitoring and alerting” on page 129

•3.6, “Choosing a migration method” on page 131

3.1 Introduction

Since the first days of tape usage, the world has changed dramatically. The amount of data that is stored has increased, as have the sources of data and data legal requirements. The technical data management possibilities have grown dramatically.

In each new release of the IBM Virtual Tape Server (VTS), IBM has delivered new features to support your most demanding client needs. Consider that although some of these functions are needed in your environment, others are not:

•Some features are independent from all others, others are not.

•Certain features have a strong effect on the behavior of your environment, for example, performance or data availability.

•Specific features influence your setup of the environment.

•Some features can be overruled by Override settings.

Therefore, although these functions might be necessary to support different client requirements, they might not be required in all use cases. In fact, they might only add complexity to the solution, or they might result in unexpected behaviors. Therefore, understanding the available features, and how they complement, interact, and affect each other, helps you plan an optimal and successful implementation.

3.1.1 A short look at history

At first, data was measured in megabytes (MB). Terabytes (TB) of data were hosted only by a few mainframe clients. You always knew the location of your data. When you mounted a tape (by operator or by robot), you could be sure that your data was written directly to that specific tape. The ownership of physical tapes was clear. If you wanted to have two tapes, you needed duplex writing from the host. If you wanted to relocate specific data to a different storage location, you moved that data to a specific tape.

Your batch planners ensured that, if multifile was used, they belonged to the same application, and that the same rules (duplexing and moving) applied. Sharing resources between multiple different logical partitions (LPARs), multiple IBM Z operating systems, or even multiple clients was mostly not wanted or needed. You were always sure where your data on tape was located. Another important aspect was that users did not expect fast read response time for data on tape.

3.1.2 Challenges of today’s businesses

The amount of data is increasing tremendously. Legal retention requirements, compliance needs, and increased redundancy for business continuance has driven much of this growth. In addition, other drivers are new sources of data:

•Email.

•Social networks and their global implications.

•Web shops record not only your actual shopping, but also your interest and buying patterns. Based on this information, you get personalized information per email or other communication paths.

•Electronic retention of paper documents.

•Digital media.

This large amount of data must all be stored and protected, and this data must be quickly and readily accessible. With more primary workloads ending on tape, response time requirements have become more demanding.

Due to cost pressures, businesses are enforcing a Tier-Technology environment. Older or not often-used data must be on less expensive storage, whereas highly accessed data must stay on primary storage, which enables fast access. Applications such as Content Manager, Hierarchical Storage Manager (HSM), or output archiver are rule-based, and are able to shift data from one storage tier to another, which can include tape. If you are considering using such applications, the tier concept needs to be planned carefully.

3.1.3 Challenges of technology progress

With advanced technology, there are challenges, such as having many new options to meet many client needs. For example, the TS7700 has many options regarding where data can be located, and where and how it must be replicated. Investing some time in choosing the correct set of rules helps you meet your requirements.

Also, the TS7700 itself decides which workloads must be prioritized. Depending on the cluster availability in the grid, actual workload, or other storage conditions, the copy queues might be delayed. In addition, the TS7700 automates many decisions to provide the most value. This dynamic behavior can sometimes result in unexpected behaviors or delays. Understanding how your environment behaves, and where your data is stored at any point in time, is key to having a successful implementation, including the following considerations:

•During a mount, a remote Tape Volume Cache (TVC) was chosen over a local TVC.

•Copies are intentionally delayed due to configuration parameters, yet they were expected to complete sooner.

•Copy Export sets do not include all of the expected content because the export was initiated from a cluster that was not configured to receive a replica of all the content.

A reaction might be to configure your environment to define synchronous and immediate copies to all locations, or to set all overrides. This likely increases the configuration capacity and bandwidth needs, which can introduce negative results. Planning and working with your IBM account team so that the optimal settings can be configured helps eliminate any unexpected behaviors.

Other features, such as scratch allocation assistance (SAA) and device allocation assistance (DAA), might affect your methodology of drive allocation, whereas some customizing parameters must always be used if you are a Geographically Dispersed Parallel Sysplex (GDPS) user.

So, it is essential for you to understand these mechanisms to choose the best configuration and customize your environment. You need to understand the interactions and dependencies to plan for a successful implementation.

Important: There is no “one solution that fits all requirements.” Do not introduce complexity when it is not required. Allow IBM to help you look at your data profile and requirements so that the best solution can be implemented for you.

3.2 Gather your business requirements

There are several different types of business requirements that you need to consider.

3.2.1 Requirement types

Consider as a starting point the lists that are described in this section.

Requirements from the data owners, application administrators, and the applications

The following items should be considered when you gather data and application requirements:

•How important is the data? Consider multiple copies, Copy Consistency Points, retention requirements, and business recovery time expectations.

•How often will the data be accessed, and what retrieval times are expected? Consider sizing and performance.

•How will the application react if the tape environment is not available? Consider high availability (HA) and disaster recovery (DR) planning and copy consistency.

•How will the application react if specific data is not available? Consider HA and DR planning and copy consistency.

•How much storage for the data is needed? Factor in any future growth.

•What are the performance expectations during an outage or disaster event?

•Protection against cybercrime (golden copies).

It can be difficult to get all of the required information from the owners of the data and the owners of the applications to best manage the data. Using service level agreement (SLA) requirements and an analysis of your existing tape environment can help with the process.

Requirements from the IT department

The following items should be considered when you gather information technology (IT) requirements:

•Support of the general IT strategy (data center strategy and DR site support)

•Sharing of a TS7700 environment between multiple LPARs or sysplexes (definition of logical pools, physical pools, and performance)

•Sharing of a TS7700 in a multi-tenancy environment (logical pools, physical pools, Selective Device Access Control (SDAC), export and migration capabilities, and performance)

•Support of zAutomation concepts (monitoring and validation)

•Environmental requirements (power, cooling, and space)

•Financial requirements

•Multiple platforms required (IBM Z operating systems)

•Monitoring and automation capabilities to identify issues and degradations

•Maintenance (microcode) and defect repair strategy

•Capacity forecast

•Network infrastructure

Depending on your overall IT strategy, application requirements and data owner requirements can be used to select an appropriate TS7700 configuration. If you have multiple data centers, spread your clusters across the data centers, and ensure that copies of the data are in each data center.

If your approach is that each data center can host the total workload, plan your environment accordingly. Consider the possible outage scenarios, and verify whether any potential degradations for certain configurations can be tolerated by the business until the full equipment is available again.

In a two-cluster environment, there is always a trade-off between availability and a nonzero point of recovery. Assume that data protection is the highest priority within your workload. During a planned outage, new or modified workloads do not have a redundant copy that is generated, which might be unacceptable. Putting new allocations on hold during this period might be optimal. If availability is rated higher, you might want to take the risk of a single copy during an outage so that operations can continue.

However, more advanced TS7700 configurations can be implemented that enable both availability and data protection to be equally important, for example, a four cluster grid. Consider what type of data you store in your TS7700 environment. Depending on your type of data, you have multiple configuration choices. This section starts with a general view before looking closer at the specific types of data.

3.2.2 Environment: Source of data

Depending on the method of creating data, you might have different requirements. Assume that you have all four types of systems to create data:

•Sandbox system: Used to verify new operating and subsystem versions

•Development system: Used to develop new applications

•User Acceptance Test (UAT) system: Used for integration and performance testing

•Production system

Consider the following guidelines:

•Data from a sandbox system (regardless of whether it is backup or active data) might not need multiple copies because you can re-create the information from other sources (new installation, and so on).

•Application data from a development system might not need multiple copies in different storage pools or data centers because the data can be re-created from production systems.

•Application code from a development system likely needs multiple copies because that data might not be re-created from elsewhere.

• If physical tape is present, have UAT-created content migrate to physical tape so that precious disk cache space is not used.

•Not all production or backup workloads that target the TS7700 might be replicated. Perhaps, you have those workloads managed differently for DR needs, or you do not need that workload in a DR event. These non-replicated workloads can optionally be Copy Exported as a DR alternative if replication is not feasible.

Data from your sandbox, test, UAT, or production system might share the tape environment, but it can be treated differently. That is important for sizing, upgrades, and performance considerations as well.

Note: Plan your environments and the general rules for different types of environments. Understand the amount of data that these environments host today.

3.2.3 Backup data, active data, and archive data

In general, data from different applications has different requirements for your tape environment. Your tape processing environment can be all virtual, all physical, or a combination of the two.

Backup data

The data on tape is only a backup. Under normal conditions, it will not be accessed again. It might be accessed again only if there are problems, such as direct access storage device (DASD) hardware problems, logical database failures, and site outages.

Expiration

The expiration is mostly a short or medium time frame.

Availability requirements

If tape environment is not available for a short time, the application workload can still run without any effect. When the solution is unavailable, the backup to tape cannot be processed.

Retrieval requirements

Physical tape recall can normally be tolerated, or at least for previous generations of
the backup.

Multiple copies

Depending on your overall environment, a single copy (not in the same place as the primary data) might be acceptable, perhaps on physical tape. However, physical media might fail or a storage solution or its site might experience an outage. Therefore, one or more copies are likely needed. These copies might exist on more media within the same location or ideally at a distance from the initial copy.

If you use multiple copies, a Copy Consistency Point of Deferred might suffice, depending on your requirements.

Active data on tape

The data is stored only on tape. This data is not also somewhere in DASD. If the data needs to be accessed, it is read from the tape environment.

Expiration

The expiration depends on your application.

Availability requirements

When the tape environment is not available, your original workload might be severely affected.

Retrieval requirements

Physical tape recalls might not be tolerated, depending on your data source (sandbox, test, or production) or the type of application. Older, less-accessed active data might tolerate physical tape recalls.

Multiple copies

Although tape is the primary source, a single copy is not suggested. Even a media failure can result in data loss. Multiple copies should be stored in different locations to be prepared for a data center loss or outage. In a stand-alone environment, dual copies on physical tape are suggested.

Depending on the recovery point objectives (RPO) of the data, choose an appropriate Consistency Point Policy. For example, synchronous mode replication is a good choice for these workloads because it can achieve a “zero point RPO at sync point” granularity.

Especially for DFSMShsm ML2, HSM backups, and OAM objects, use the synchronous mode copy.

Archive data on tape

Archive data on tape is also active data. However, archive data is stored for a long time. Expiration dates for 10 - 30 years to satisfy regulatory requirements are common. Sometimes, logical Write Once Read Many (LWORM) data is required.

Expiration

The expiration depends on your application, but it is usually many years.

Availability requirements

Archive data is seldom accessed for read. If the tape environment is not available, your original workload might still be affected because you cannot write new archive data.

Retrieval requirements

Physical tape recalls might be tolerated.

Multiple copies

Although the tape is the primary source, a single copy is not suggested. Even a media failure results in data loss. Store multiple copies in different locations to be prepared for a data center loss. In a stand-alone environment, dual copies on physical tape are suggested.

Depending on the criticality of the data, choose an appropriate Copy Consistency Point Policy.

Archive data sometimes must be kept for 10 - 30 years. During such long time periods, the technology progresses, and data migration to newer technology might need to take place. If your archive data is on physical tapes in a TS7740/TS7700T, you must also consider the life span of physical tape cartridges. Some vendors suggest that you replace their cartridges every five years, other vendors, such as IBM, offer tape cartridges that have longer lifetimes.

If you are using a TS7740/TS7700T and you store archive data in the same storage pools with normal data, there is a slight chance that, due to the reclaim process, the number of stacked volumes that contain only archive data will increase. In this case, these cartridges might not be used (either for cartridge reads or reclaim processing) for a longer time. Media failures might not be detected. If you have more than one copy of the data, the data can still be accessed. However, you have no direct control over where this data is stored on the stacked volume, and the same condition might occur in other clusters, too.

Therefore, consider storing data with such long expiration dates on a specific stacked volume pool. Then, you can plan regular migrations (even in a 5 - 10-year algorithm) to another stacked volume pool. You might also decide to store this data in the common data pool.

3.2.4 IBM DB2 archive log handling

With IBM DB2, you have many choices about how to handle your DB2 archive logs. You can put both of them to DASD and maybe rely on a later migration to tape through DFSMShsm or an equivalent application. You can write one archive log to DASD and another one to tape. Alternatively, you can put them both directly to tape.

Depending on your choice, the tape environment is more or less critical to your DB2 application. This depends also on the number of active DB2 logs that you define in your DB2 environment. In some environments, due to peak workload, logs are switched every two minutes. If all DB2 active logs are used and they cannot be archived to tape, DB2 stops processing.

Scenario

You have a four-cluster grid, spread over two sites. A TS7760D and a TS7760T are at each site. You store one DB2 archive log directly on tape and the other archive log on disk. Your requirement is to have two copies on tape:

•Using the TS7760 can improve your recovery (no recalls from physical tape needed).

•Having a consistency point of R, N, R, N provides two copies, which are stored in both TS7760s. If one TS7760 is available, DB2 archive logs can be stored to tape. However, if one TS7760 is not available, you have only one copy of the data. In a DR situation where one of the sites is not usable for a long time, you might want to change your policies to replicate this workload to the local TS7760T as well.

•If the TS7760D enters the Out of cache resources state, new data and replications to that cluster are put on hold. To avoid this situation, consider having this workload also target the TS7760T and enable the Automatic Removal policy to free space in the TS7760D. Until the Out of cache resources state is resolved, you might have fewer copies than expected within the grid.

•If one TS7760D is not available, all mounts must be run on the other TS7760D cluster.

•In the unlikely event that both TS7760Ds are not reachable, DB2 stops working when all DB2 logs on the disk are used.

•Having a consistency point of R, N, R, D provides you with three copies, which are stored in both TS7760Ds and in the TS7760T of the second location. That exceeds your original requirement, but in an outage of any component, you still have two copies. In a loss of the primary site, you do not need to change your DB2 settings because two copies are still written. In an Out of Cache resources condition, the TS7760D can remove the data from cache because there is still an available copy in the TS7760T.

Note: Any application with the same behavior can be treated similarly.

3.2.5 DFSMShsm Migration Level 2

Several products are available on the IBM Z platform for hierarchical storage management (HSM). IBM Data Facility Storage Management Subsystem Hierarchical Storage Manager (DFSMShsm) provides different functions. DFSMShsm migrates active data from disk pools to ML2 tape in which the only copies of these data sets are on tape.

To achieve fast recall times to DASD, you should consider storing the data in a TS7760D or a TS7760T CP0 at least for a certain period. With time-delay copies to additional tape-attached (TS7760T, TS7720T, or TS7740) clusters in a grid, you can ensure that the data is kept first in the TS7760D, and later copied to a cluster with tape attachment. Auto removal processing (if enabled) can then remove the content from the TS7760 as it ages.

Ideally, DFSMShsm ML2 workloads should be created with synchronous mode copy to ensure that a data set is copied to a second cluster before the DFSMShsm migration processes the next data set. The DFSMShsm application marks the data set candidates for deletion in DASD. With z/OS 2.1, MIGRATION SUBTASKING enables DFSMShsm to offload more than one data set at a time, so it can do batches of data sets per sync point.

Using TS7700 replication mechanisms rather than DFSMShsm local duplexing can save input/output (I/O) bandwidth, improve performance, reduce the number of logical volumes that are used, and also reduces the complexity of bringing up operations at a secondary location.

Other vendor applications might support similar processing. Contact your vendor for more information.

Tip: To gain an extra level of data protection, run ML2 migration only after a DFSMShsm backup runs.

3.2.6 Object access method: Object processing

You can use the object access method (OAM) to store and transition object data in a storage hierarchy. Objects can be on disk (in DB2 tables, the IBM z Systems® file system [zFS], or Network File System [NFS] mountable file systems), optical, and tape storage devices. You can choose how long an object is stored on disk before it is migrated to tape. If the object is moved to tape, this is active data.

Users accessing the data on tape (in particular the TS7700T or TS7740) might have to wait for their document until it is read from physical media. The TS7700D or the TS7700T CP0 is traditionally a better option for such a workload given the disk cache residency can be much longer and even indefinite.

For OAM primary objects, use Synchronous mode copy on two clusters and depending on your requirements, additional immediate or deferred copies elsewhere if needed.

With OAM, you can also have up to two backup copies of your object data. Backup copies of your data (managed by OAM) are in addition to any replicated copies of your primary data that are managed by the TS7700. Determine the copy policies for your primary data and any additional OAM backup copies that might be needed. The backups that are maintained by OAM are only used if the primary object is not accessible. The backup copies can be on physical or virtual tape.

3.2.7 Batch processing: Active data

If you create data in the batch process, which is not stored on disk, it is also considered active data. The access requirements of these data types can determine whether the data should be placed on a TS7700D or TS7700T CP0, a TS7700T, CPx, a TS7740, or a mix of them. For example, active data that needs quick access is ideal for a TS7700D or the resident partition of a TS7700T.

Depending on your environment, you also can place this data on a tape partition of a TS7700T and ensure that the data is kept in cache. This can be done either by delaying premigration or defining the size of the tape partition to keep all of this data in cache.

Rarely accessed data that does not demand quick access times can be put on a TS7700T tape partition with PG0 and a not-delayed migration, or on the TS7700T in a second cluster.

Data that becomes less important with time can also use the TS7700D or TS7700T CP0 auto-removal policies to benefit from both technologies.

Assume that you have the same configuration as the DB2 archive log example:

•With a Consistency Copy Point policy of [N,R,N,R], your data is stored only on the TS7700T CPx or TS7740s (fast access is not critical).

•With a Consistency Copy Point policy of [R,N,R,N], your data is stored only on the TS7700Ds (fast access is critical).

•With a Consistency Copy Point policy of [R,D,R,D], your copy is on the TS7700Ds first and then also on the TS7700Ts, enabling the older data to age off the TS7700Ds by using the auto-removal policy.

3.2.8 Data type and cache control

You can differentiate the type of data held on a TS7700, as shown in Table 3-1.

Table 3-1 Type of data

Type of Data	Application Examples	Fits best on	Suitable cache Control
Data needs a 100% cache hit	OAM objects (primary data), HSM ML2	TS7700 Disk-Only TS7700T CP0	Pinned / PG1 CP0/ Pinned/ PG1
Data that benefits from a longer period in disk cache	Depending on the user requirements. OAM objects (primary data), HSM ML2	TS7700D with autoremoval TS7700T CPx TS7740 if duration in disk cache can be minimal	PG1 PG1 PG1
Data that is needed for a specific time in cache, but then should be kept on tape	DB2 log files (depending on your requirements), Active Batch data	TS7700T TS7740	CPx / PG1 delay premigration PG1
Data with limited likelihood to be read, only cache pass through	Backups, Dumps	TS7700T TS7740	CPx / PG0 PG0

3.3 Features and functions for all TS7700 models

Based on the gathered requirements, you can now decide which features and functions you want to use in your TS7700 environment.

3.3.1 Stand alone versus grid environments

Consider a stand-alone cluster in the following conditions:

•You do not need a high availability or an electronic DR solution.

•You can handle the effect to your application in a cluster outage.

•In a data center loss, a data loss is tolerable or a recovery from Copy Export tapes is feasible (time and DR site).

•You can plan outages for Licensed Internal Code loads or upgrade reasons.

If you cannot tolerate any of these items, consider implementing a grid environment.

3.3.2 Sharing a TS7700

Sharing TS7700 resources is supported in most use cases. Whether the environment includes different applications within a common sysplex, independent sysplexes, or
IBM Z operating systems, the TS7700 can be configured to provide shared access. The TS7700 can also be shared between multiple tenants.

Because the TS7700 is policy-managed, each independent workload can be treated differently depending on how the data is managed within the TS7700. For example, different workloads can be on different tape partitions in a TS7700T and use independent physical volume pools within a TS7740 or TS7700T. Alternatively, different workloads can use different replication requirements.

All applications within a Parallel Sysplex can use the same logical device ranges and logical volume pools, simplifying sharing resources. When independent sysplexes are involved, device ranges and volume ranges are normally independent, but are still allowed to share the disk cache and physical tape resources.

Of all the sharing use cases, most share the FICON channels into the TS7700. Although the channels can also be physically partitioned, it is not necessary because each FICON channel has access to all device and volume ranges within the TS7700.

However, there are still considerations:

•The TVC is used in common in a TS7700D, a TS7700T CP0, or a TS7740. You cannot define a physical limit to the amount of space a client is using in the TVC. However, through policy management, you can use preference groups differently in these models or the removal policies can be configured differently, giving more TVC priority to some workloads over others. In a TS7720T, you can specify multiple tape partitions to enable that each tenant get its own dedicated disk cache residency.

•Define the scratch categories that the different systems use. The scratch categories are specified in the DEVSUPxx parmlib member.

•Decide which VOLSER ranges the different systems use. This is typically handled through the tape management system (TMS). For DFSMSrmm, this is handled through their PARTITION and OPENRULE parameters.

•Another main item to consider is how the drives are managed across the different systems, and which systems share which drives. This is typically handled through a tape device sharing product.

•Storage management subsystem (SMS) constructs and constructs on the TS7700 must match. If not, new constructs in SMS lead to new constructs in the TS7700 that are created with default parameters. To avoid the uncontrolled buildup of constructs in the TS7700, SMS should be controlled by a single department.

•SMS constructs used by different workloads need to use unique names when the TS7700 behavior is expected to be different. This enables each unique application’s behavior to be tuned within the TS7700. If the behavior is common across all shared workloads, the same construct names can be used.

•Ensure that the single defined set of constructs within the TS7700 are configured with a behavior that is acceptable to all users. If not, different constructs must be used for
those customers.

•Control of the TS7700 Management Interfaces (MIs), TS3500 GUI, and TS4500 GUI must be allowed only to a single department that controls the entire environment. Control must not be given to a single customer.

•Review the IBM RACF statements for the Devserv and Library commands on all LPARs. These commands must be protected. In a multiple-client environment, the use of Library commands must be restricted.

When independent sysplexes are involved, the device ranges and corresponding volume ranges can be further protected from cross-sysplex access through the SDAC feature.

When device partitioning is used, consider assigning the same number of devices per cluster per sysplex in a grid configuration so that the availability for a given sysplex is equal across all connected clusters.

Override policies set in the TS7700 apply to the whole environment and cannot be enabled or disabled by an LPAR or client.

For more considerations, see the Guide to Sharing and Partitioning IBM Tape Library Data, SG24-4409.

Note: Some parameters can be updated by the Library Request command. This command changes the cluster behavior. This is not only valid for the LPAR where the command was run, but for all LPARs that use this cluster.

Ensure that only authorized personnel can use the Library Request command.

If you share a library for multiple customers, establish regular performance and resource usage monitoring. For more information, see 3.4, “Features and functions available only for the TS7700T” on page 128.

Note: With APAR OA49373 (z/OS V2R1 and later), the individual IBM MVS LIBRARY command functions (EJECT, REQUEST, DISPDRV, and so on) can be protected using a security product such as RACF. This APAR adds security product resource-names for each of the LIBRARY functions.

3.3.3 Tape Volume Cache selection

Depending on your Copy Consistency Policies, the cluster where the virtual tape mount occurred is not necessarily the cluster that is selected as the TVC. When a TVC other than the local TVC is chosen, this is referred to as a remote mount. Plan the Copy Consistency Policy so that you are aware where your data is at any point in time.

TVC selection is also influenced by some LI REQ parameters. For more information about the parameters LOWRANK and SETTINGS,PHYSLIB, see Library Request Command, WP101091:

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101091

TVC selection might also influence the Copy Export. For more information, see 12.1, “Copy Export overview and considerations” on page 748.

3.3.4 Copy Consistency policy

Define the consistency policy for each Management Class (MC). For more information, see 2.3.5, “Copy consistency points” on page 68.

The following list describes several general considerations:

•When a cluster is assigned a policy of N, this cluster is not the target of a replication activity:

– This cluster cannot be chosen as the TVC (it can be chosen as the mount point).

– If only N clusters are available, any mount that uses that MC fails.

– If Force local copy override was selected, this makes the local mount an R if previously an N.

•A consistency point of [D, D, D, D] means that the selected TVC is treated as RUN, and the additional copies are created asynchronously. For a scratch selection, the mount point cluster is normally chosen as the TVC, although it is not required. Copy Override settings can be used to prefer that it also acts as the TVC.

•A consistency point of [D, R, D, D] means that Cluster 1 is preferred as the TVC, even if the other cluster is chosen as the mount point. Therefore, the ‘R’ location is preferred, which can result in a remote mount when the mount point is not the same as the ‘R’ location. This can be done intentionally to create a remote version as the initial instance of a volume.

If you do not care which TVC is chosen and you prefer a balanced grid, use [D, D, D, D].

With the new Time Delayed Replication policy, you can now decide that certain data is only copied to other clusters after a specified time. This policy is designed for data that usually has a short lifecycle, and is replaced shortly with more current data, such as backups and generation data groups (GDGs). In addition, this policy can also be used for data with an unknown retention time, and where the data should be copied only to another cluster when this data is still valid after the given time. Time Delayed Replication policy is targeted especially for multi-cluster grids (3 or more).

With the usage of the consistency copy policies before R3.1, this data was either never replicated or always replicated to the specified clusters. In case the target was a TS7740, the data was copied to the backend tape and can result in excessive reclamation when expired early. Now you can specify that this type of data is only replicated if the data is still valid and the specified time (after creation or last access) has expired. That might reduce replication traffic, and the backend activities in TS7740s.

However, plan to have at least two copies for redundancy purposes, such as on a local TS7700D/TS7700T and a remote TS7700D/TS7700T.

3.3.5 Synchronous mode copy

Synchronous mode copy creates a copy of the data whenever an explicit or implicit sync point is written from an application. This enables a much more granular copy than all other consistency points, such as Run or Deferred.

This consistency point is ideal for applications that move primary data to tape, such as DFSMShsm or OAM Object Support, which can remove the primary instance in DASD after issuing an explicit sync point.

Therefore, you should use Synchronous mode copy for this type of applications.

The synchronous mode copy offers three options for how to handle private mounts:

•Always open both instances on private mount.

•Open only one instance on private mount.

•Open both instances on z/OS implied update.

Plan the usage of this option. Dual open is necessary for workloads that can append to existing tapes. When only reads are taking place, the dual open can introduce unnecessary resource use, especially when one of the instances requires a recall from a physical tape. Using the dual open z/OS implied update helps reduce any resource use to only those mounts where an update is likely to take place.

In addition, synchronous mode copy provides an option to determine its behavior when both instances cannot be kept in sync. One option is to move to the synch-deferred state. Another option is to fail future write operations. Depending on your requirements, determine whether continued processing is more important than creating synchronous redundancy of the workload. For more information, see 2.3.5, “Copy consistency points” on page 68.

3.3.6 Override policies

Override policies overrule the explicit definitions of Copy policies.

Note: Synchronous mode is not subject to override policies.

The Override policies are cluster-based. They cannot be influenced by the attached hosts or policies. With Override policies, you can help influence the behavior on how the TS7700 cluster chooses a TVC selection during the mount operation, and whether a copy must be present in that cluster (for example, favoring the local mount point cluster).

Copy Count Override enables the client to define for this cluster that at least two or more copies exist at RUN time, but the client does not care which clusters have a copy. If you use Copy Count Override, the grid configuration and available bandwidth between locations likely determines which RUN copies meet the count criteria. Therefore, the limited numbers of copies can be within the closest locations versus at longer distances. Remember this if you use this override.

3.3.7 Cluster family

Cluster families can be introduced to help with TVC selection or replication activity. You might want to use them for the following conditions:

•You have an independent group or groups of clusters that serve a common purpose within a larger grid.

•You have one or more groups of clusters with limited bandwidth between the groups and other clusters in the grid.

Cluster families provide two essential features:

•During mounts, clusters within the same family as the mount point cluster are preferred for TVC selection.

•During replication, groups of clusters in a family cooperate and distribute the replication workload inbound to its family, which provides the best use of the limited network outside of the family.

Therefore, grid configurations with three or more clusters can benefit from cluster families.

3.3.8 Logical Volume Delete Expire Processing versus previous implementations

When a system TMS returns a logical volume to a SCRATCH category during housekeeping processing, the TS7700 is aware that the volume is not in a SCRATCH pool. The default behavior of the TS7700 is to retain the content on the virtual volume and its used capacity within the TS7700 until the logical volume is reused or ejected. Delete expire provides a means for the TS7700 to automatically delete the contents after a period of times has passed.

A scratch category can have a defined expiration time, enabling the volume contents for those volumes that are returned to scratch to be automatically deleted after a grace period passes. The grace period can be configured from 1 hour to many years. Volumes in the scratch category are then either expired with time or reused, whichever comes first.

If physical tape is present, the space on the physical tape that is used by the deleted or reused logical volume is marked inactive. Only after the physical volume is later reclaimed or marked full inactive is the tape and all inactive space reused. After the volume is deleted or reused, content that was previously present is no longer accessible.

An inadvertent return to scratch might result in loss of data, so a longer expiration grace period is suggested to enable any return to scratch mistakes to be corrected within your host environment. To prevent reuse during this grace period, enable the additional hold option to prevent such reuse. This provides a window of time where a host-initiated mistake can be corrected, enabling the volume to be moved back to a private category while retaining the previously written content.

3.3.9 Software compression (LZ4 and ZSTD)

With R4.1.2, two new software compressions can be selected with the DATACLAS.

Plan ahead if you are also using Copy Export or plan to use the Grid to Grid migration tool in the future. The receiving clusters/grids need to be capable of reading the compressed logical volumes, so they need to have the same capability (R4.1.2 microcode or later).

Using the software compression at least during the migration period can affect capacity planning. The reduction is not only for GB in cache or on physical tape. It also reduces the amount of data in the premigration queue (FC 5274), the amount of data needs to be copied (might result in a better RPO) and the necessary bandwidth for the physical tapes.

3.3.10 Encryption

Depending on your legal requirements and your type of business, data encryption might be mandatory.

Consider the following information:

•If you use the Copy Export feature and encrypt the export pool, you must ensure that you can decrypt the tapes in the restore location:

– You need to have access to an external key manager that has the appropriate keys available.

– The same or compatible drives that can read the exported media format must be available.

•TVC encryption for data at rest in disk cache can be enabled only against the entire cache repository.

•Both physical tape and TVC encryption can be enabled at any time. After TVC encryption is enabled, it cannot be disabled without a rebuild of the disk cache repository. If you use an external key manager for physical tape and TVC encryption, the same external key manager instance must be used.

•Disk-based encryption can be enabled in the field retroactively on all Encryption Capable hardware. Therefore, enabling encryption can occur after the hardware has been configured and used.

3.3.11 z/OS Allocation with multiple grids connected to a single host

You can connect multiple grids to the same sysplex environment, and define them all in the same storage group. In this case, the mounts are distributed depending on the JES2 selected method (EQUAL or BYDEVICE) to the grids.

This allocation routine is aware if a grid crossed the virtual tape scratch threshold and takes this into account for the mount distribution. All other information (like TVC usage, premigration queue length, and TVC LOWRANK) are not available at this point in time and will not be used for grid selection.

3.3.12 z/OS Allocation assistance inside a grid

Allocation assistance is a function that is built into z/OS and the TS7700 that enables both private and scratch mounts to be more efficient when they choose a device within a grid configuration where the same sysplex is connected to two or more clusters in an active-active configuration.

Remember: Support for the allocation assistance functions (DAA and SAA) was initially only supported for the job entry subsystem 2 (JES2) environment. With z/OS V2R1, JES3 is also supported.

If you use the allocation assistance, the device allocation routine in z/OS is influenced by information from the grid environment. Several aspects are used to find the best mount point in a grid for this mount. For more information, see 2.3.15, “Allocation assistance” on page 76.

Depending on your configuration, your job execution scheduler, and any automatic allocation managers you might use, the allocation assist function might provide value to your environment.

If you use any dynamic tape manager, such as the IBM Automatic Tape Allocation Manager, plan the introduction of SAA and DAA carefully. Some dynamic tape managers manage devices in an offline state. Because allocation assist functions assume online devices, issues can surface.

Therefore, consider keeping some drives always online to a specific host, and leave only a subset of drives to the dynamic allocation manager. Alternatively, discontinue working with a dynamic tape allocation manager.

Automatic tape switching (ATSSTAR), which is included with z/OS, works with online devices, and is compatible with DAA and SAA.

3.3.13 25 GB logical volumes

The TS7700 has traditionally supported 400 megabyte (MB), 800 MB, 1 gigabyte (GB), 2 GB, 4 GB, and 6 GB logical volumes. As of R3.2, 25 GB logical volumes are also supported. Using 25 GB logical volumes can have several advantages:

•Fewer virtual volumes to insert and have managed by your TMS.

•Migration from other tape libraries can be easier.

•Large multi-volume workloads, such as a large database backup, can be stored with fewer logical tapes.

Consider the following points if you choose to use 25 GB logical volumes:

•25 GB logical volumes that use RUN copy consistency points are viewed as Deferred consistency points.

•If present only on physical tape, the entire volume must be recalled into disk cache before completing the logical mount.

•Appending data to larger volumes requires a full replication to peers, and can result in larger inactive spaces on physical tape.

•Many jobs running to 25 GB logical volumes can create a large increase in disk cache content, which can result in non-optimal performance.

•Depending on the grid network performance, and the number of concurrently running copy tasks involving 25 GB volumes, consider increasing the Volume Copy Timeout value from the default of 180 minutes to 240 minutes if copies are timing out. This can be done using the library request command Li Req SETTING COPY TIMEOUT <value> either from MI or the Host Console.

To avoid any performance effect, review your installation before you use the 25 GB volumes.

3.3.14 Grid resiliency function: “Remote” fence

Grid resiliency gives you the ability to automatically identify a “sick but not dead” condition and take a predefined action.

Unlike other IBM Z availability functions such as System failure management for z/OS LPARs or Hyperswap for disks, this feature does not react in seconds. The grid technology is designed not only for local implementations, but also for remote data placement, often thousands of miles away. Therefore, timeouts and retries must be much longer to cover temporary network issues.

Although the function supports defining very small thresholds parameters, change the default only after you analyze the TS7700 grid environment to prevent false fence.

The secondary action (isolate the cluster from the network) should be considered only if a clear automated action plan is defined and the effect on your production is fully understood.

3.3.15 Control Unit Initiated Reconfiguration

In R4.1.2 the CUIR supports only the planned action for service preparation and can only be used in a grid environment. Using this function can reduce the manual effort needed for a service maintenance. This feature is beneficial in these circumstances:

•The TS7700 clusters are shared between a lot of different z/OS systems

Note: The z/OS host must include APAR 0A52376 with code level V2R2 and later.

•No checkout tests needs to be executed after microcode loads

In case software products like Automatic Tape Allocation Manager (ATAM) or other third party vendor products are used, review whether CUIR is beneficial for your environment. After CUIR is used to offline the drives, the usual z/OS online command cannot be used to online the devices after the service has been finished.

Therefore, your automation, operating procedures, or both need to be reviewed.

3.4 Features and functions available only for the TS7700T

With the introduction of tape support behind the TS7700T, additional features that are unique to the TS7700T are now available:

•Multiple tape-managed partitions

•Delay premigration to physical tape

Having multiple tape partitions enables you to define how much disk cache is used by a workload or group of workloads. Through partitioning, a workload’s TVC residency footprint can be fixed when compared to other independent workloads. Therefore, independent of how much content other partition workloads create, their activity does not alter the residency foot print of the partition of interest.

In addition, delay premigration was introduced to help manage the movement of data to physical tape. By using policies that can delay premigration of specific workloads from one to many hours, only content that has not yet expired when the delay period passes ends up on tape. This creates a solution where the aged or archive component of a workload is the only content that moves to tape. Until then, the data is only resident in disk cache.

When the data expires from a host perspective while it is still in cache, it is not premigrated or migrated to a tape. That reduces your back-end activities (migrate and reclaim).

3.5 Operation aspects: Monitoring and alerting

To ensure that your TS7700 environment works as expected, and to be notified of any potential issues or trends, two different topics should be reviewed:

•Message handling:

– Check for the introduction of new messages into your automation and alerting tool.

– Use automation to trap on alerts of interest that are surfaced to the hosts.

•Regularly scheduled performance monitoring:

– Gather long-term statistics through tools, such as VEHSTATS and BVIR, to retain data for trending.

– Analyze any changes in the workload profile or behavior of the grid environment to ensure that the overall configuration operates as expected, and to determine whether changes should be made.

In addition, optional checks might be useful, especially after complex migrations or changes in your environment.

3.5.1 Message handling

With each new feature or Licensed Internal Code release, new messages might be introduced. Usually, they are described in the PTF description or mentioned in the messages and codes books. Identify all new messages for the TS7700 (usually CBRxxxx) and review them. The main message is the CBR3750 message, which contains many submessages. Evaluate the meanings to understand how they relate to your business.

For a complete list of all possible CBR3750 submessages, see IBM Virtualization Engine TS7700 Series Operator Informational Messages, WP101689:

https://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101689

Identify the appropriate action that an operator or your automation tool must run. Introduce the new messages in your automation tool (with the appropriate action) or alert the message for human intervention.

With R4.1.2, you can now modify the priority/severity for messages presented to the attached LPAR by the TS7700. These messages result from conditions or events requiring some level of interaction with an operator (known as “intervention messages”). The z/OS systems will identify the intervention messages with the CBR3750I ID. Be aware that this modification is used only in your environment. The Call home capability will still use the original priority/severity provided by IBM.

In addition, you can enhance the messages and can extend the message text with user-defined content.

If you modify your environment, back up this modification to make sure you can upload your changes if a microcode issue occurs.

3.5.2 Regularly scheduled performance monitoring

Regularly scheduled performance monitoring enables you to complete the following tasks:

•See trends in your workload profile so that you can tune your environment before any
issues arise.

•Store historical information of your environment for trends and performance analysis.

The TS7700 keeps performance data for the last 90 days. If more than 90 days is required, running tools periodically to collect the information and store it is required. Then, set up regular Bulk Volume Information Retrieval (BVIR) runs and keep the data. Check this data on a periodic basis to see the usage trends, especially for shortage conditions.

3.5.3 Optional checks

Especially after major changes to the environments, you should consider running extra checks.

Verifying your data redundancy

The TS7700 in a grid configuration provides both high availability and disaster recovery. Both require one or more replicas of content within the grid. The BVIR Copy Audit provides a method to verify that replicas of all volumes exist at specific clusters or specific groups of clusters. The audit can run in a way that assumes that all volumes should replicate, and also has methods to verify replicas only based on assigned copy policies.

Consider running copy audits after major changes in the environment, such as joins, merges and before the removal of one or more clusters. You can also run the Copy Audit periodically as a method to audit your expected business continuance requirements.

Checking the SMS environment

Make sure that all distributed library names within a grid are configured within z/OS, even if they are not connected to the specific z/OS host.

Checking the settings environment

To check the settings environment and ensure that all parameters are correct, run the LIBRARY REQUEST command.

3.6 Choosing a migration method

To introduce new technology, sometimes data migration is needed because a hardware upgrade itself is not sufficient. In addition, you might need a data migration method to support a data center move. In general, there are two different methodologies:

•Host-based migration

•TS7700 internal data migration

TS7700 Release 3.3 introduced a new data migration method that is called Grid to Grid Migration (GGM), which is offered as a service from IBM.

The following section provides an overview of the different migration techniques.

3.6.1 Host-based migration

Host-based migration means that the data is read by the host through the FICON channels from the tape and written into the new tape environment, which has some consequences:

1. The logical volume number changes because the data is transferred by the host from one logical volume to another one.

2. Without manipulation of the Tape Management Catalog (TMC), you lose the origin creation date, job, owner, expiration date, and so on. Therefore, copy tools are often used to keep the origin information.

3. The data on the “old” logical volumes must be deleted manually.

The biggest advantage of this migration is that it is technology- and vendor-independent. However, it is resource-intensive (human effort and processor resources) and manual actions are error-prone.

3.6.2 TS7700 internal data migration

With the introduction of Release 3.3, there are two different data-migration methods provided by the TS7700 technology:

•Join and Copy Refresh Processing

•The GGM tool

Still other possibilities, for example Host Tape Copy, exist.

Join and Copy Refresh processing

If you want to move to a new data center, or do a technical refresh, use this method to migrate the data to a new cluster without using host-based migration. To do so, complete the following steps:

1. Join a new cluster.

2. Change the MC contents to allow copies to the new cluster.

3. Use the LI REQ parameter with the CopyRefresh parameter from the host for each logical volume, to produce a new copy of the data in the new cluster.

While the command is submitted from a host, the data is copied internally through the gridlinks. There is no Host I/O through the FICON adapters, and all data in the TCDB and tape management remain unchanged.

This method can be used only if the data migration is inside a grid. Inside a grid, it is a fast and proven copy method. In addition, the BVIR AUDIT parameter provides an easy method to ensure that all data is copied.

Grid to Grid migration tool

The GGM tool is a service offering from IBM. You can use it to copy logical volumes from one grid to another grid while both grids have a separated grid network. After the GGM is set up by an IBM Service Support Representative (IBM SSR), the data from the logical volumes is transferred from one grid to the other grid through the existing IP addresses for the gridlinks. Much like Join and Copy Refresh processing, there is no host I/O with the FICON adapters.

The GGM tool should be considered if the following situations are true:

•There are already six clusters installed in the grid.

•The Join and Copy Refresh processing cannot be used (there are floor space requirements, microcode restrictions, or other considerations).

•Source and Target grid belongs are maintained by different providers.

The GGM tool also provides several different options, such as how the new data (new device categories) and the old data (keep or delete the data in the source grid) is treated.

To access the data in the new grid, TCDB and the TMC must be changed. These changes are the responsibility of the customer, and must be processed manually.

The GGM is controlled by the LI REQ command, and reporting is provided by additional BVIR reports. A summary of this command can be found in the following white paper:

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/PRS5328

In addition, several supporting tools to create the necessary input control statements, and the necessary TCDB entry changes and TMC entry changes, are provided at the IBM Tape Tool website:

ftp://public.dhe.ibm.com/storage/tapetool

For more information, see Chapter 8, “Migration” on page 303, or ask your local IBM SSR.

3.6.3 Tape drive technology behind a TS7700

Before Release 3.3, all tape drives that were attached to a TS7700 had to be homogeneous. There was no intermixing allowed.

With Release 3.3, you now can mix the TS1150 with only one older drive technology. This intermix is for migration purposes because a TS1150 cannot read content from JA and JB cartridges.

The following considerations apply:

•The “old” technology is used only for reads. You cannot write data on the legacy cartridge tape media by using the older drive technology.

•The maximum of 16 back-end drives must be divided by two tape technologies. Plan ahead to have enough tape drives in the older technology for recalls, and maybe for reclaim. But, have enough TS1150 tape drives to allow premigration, recalls, and reclaim for newly written data.

•Use the LI REQ to define the values for the alerts for missing physical drives for both technologies and the TS1150.

•Run VEHSTATS to understand the physical drive behavior.

With reclamation, the data from the discontinued media is moved to the new data. If you do not want that situation to occur, modify the “Sunset Media Reclaim Threshold Percentage (%)” for the specific physical pool on the MI to 0, and 0 reclaim runs for the discontinued media inside that pool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. IBM TS7700 usage considerations

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 3. IBM TS7700 usage considerations