IBM Spectrum Archive Enterprise Edition
This chapter introduces the IBM Spectrum Archive Enterprise Edition (formerly IBM Linear Tape File System™ Enterprise Edition (LTFS EE)) and describes its business benefits, general use cases, technology, components, and functions.
This chapter includes the following topics for IBM Spectrum Archive Enterprise Edition (EE):
1.1 Introduction
IBM Spectrum Archive, a member of the IBM Spectrum Storage™ family, enables direct, intuitive, and graphical access to data stored in IBM tape drives and libraries by incorporating the LTFS format standard for reading, writing, and exchanging descriptive metadata on formatted tape cartridges. IBM Spectrum Archive eliminates the need for additional tape management and software to access data. IBM Spectrum Archive offers three software solutions for managing your digital files with the LTFS format: Single Drive Edition (SDE), Library Edition (LE), and Enterprise Edition (EE). This book focuses on the IBM Spectrum Archive EE.
IBM Spectrum Archive EE provides seamless integration of LTFS with IBM Spectrum Scale, which is another member of the IBM Spectrum Storage family, by creating a tape-based storage tier. You can run any application that is designed for disk files on tape by using IBM Spectrum Archive EE because it is fully transparent and integrates in the IBM Spectrum Scale file system. IBM Spectrum Archive EE can play a major role in reducing the cost of storage for data that does not need the access performance of primary disk.
With IBM Spectrum Archive EE, you can enable the use of LTFS for the policy management of tape as a storage tier in an IBM Spectrum Scale environment and use tape as a critical tier in the storage environment.
The use of IBM Spectrum Archive EE to replace online disk storage with tape in tier 2 and tier 3 storage can improve data access over other storage solutions because it improves efficiency and streamlines management for files on tape. IBM Spectrum Archive EE simplifies the use of physical tape by making it not apparent to the user and manageable by the administrator under a single infrastructure.
Figure 1-1 shows the integration of an IBM Spectrum Archive EE archive solution.
Figure 1-1 High-level overview of an IBM Spectrum Archive EE archive solution
IBM Spectrum Archive EE uses an enhanced version of the IBM Spectrum Archive LE, which is referred to as the IBM Spectrum Archive LE+ component, for the movement of files to and from tape devices. The scale-out architecture of IBM Spectrum Archive EE can add nodes and tape devices as needed to satisfy bandwidth requirements between IBM Spectrum Scale and the IBM Spectrum Archive EE tape tier.
Low-cost storage tier, data migration, and archive needs that are described in the following use cases can benefit from IBM Spectrum Archive EE:
Operational storage
Provides a low-cost, scalable tape storage tier.
Active archive
A local or remote IBM Spectrum Archive EE node serves as a migration target for IBM Spectrum Scale that transparently archives data to tape that is based on policies set by the user.
The following IBM Spectrum Archive EE characteristics cover a broad base of integrated storage management software with leading tape technology and the highly scalable IBM tape libraries:
Integrates with IBM Spectrum Scale by supporting file-level migration and recall with an innovative database-less storage of metadata.
Provides a scale-out architecture that supports multiple IBM Spectrum Archive EE nodes that share tape inventory with load balancing over multiple tape drives and nodes.
Enables tape cartridge pooling and data exchange for IBM Spectrum Archive EE tape tier management:
 – Tape cartridge pooling allows the user to group data on sets of tape cartridges.
 – Multiple copies of files can be written on different tape cartridge pools, including different tape libraries in different locations.
 – Supports tape cartridge export with and without the removal of file metadata from IBM Spectrum Scale.
 – Supports tape cartridge import with pre-population of file metadata in IBM Spectrum Scale.
Furthermore, IBM Spectrum Archive EE provides the following key benefits:
A low-cost storage tier in an IBM Spectrum Scale environment.
An active archive or big data repository for long-term storage of data that requires file system access to that content.
File-based storage in the LTFS tape format that is open, self-describing, portable, and interchangeable across platforms.
Lowers capital expenditure and operational expenditure costs by using cost-effective and energy-efficient tape media without dependencies on external server hardware or software.
Allows the retention of data on tape media for long-term preservation (10+ years).
Provides the portability of large amounts of data by bulk transfer of tape cartridges between sites for disaster recovery and the initial synchronization of two IBM Spectrum Scale sites by using open-format, portable, self-describing tapes.
Migration of data to newer tape or newer technology that is managed by IBM Spectrum Scale.
Provides ease of management for operational and active archive storage.
Expand archive capacity simply by adding and provisioning media without affecting the availability of data already in the pool.
 
Tip: For a no-cost trial version of the IBM Spectrum Archive EE, contact your local IBM sales representative.
1.1.1 Operational storage
This section describes how IBM Spectrum Archive EE is used as a storage tier in an IBM Spectrum Scale environment.
Using an IBM Spectrum Archive tape tier as operational storage is useful when a significant portion of files on a disk storage system infrastructure is static, meaning the data is not changing.
In this case, as shown in Figure 1-2, it is optimal to move the content to a lower-cost storage tier, in this case a physical tape. The files that are migrated to the IBM Spectrum Archive EE tape tier remain online, meaning they are accessible from the IBM Spectrum Scale file system under the IBM Spectrum Scale namespace at any time. Tape cartridge pools within IBM Spectrum Archive EE can also be used for backup.
Figure 1-2 Tiered operational storage with IBM Spectrum Archive EE managing the tape tier
With IBM Spectrum Archive EE, the user specifies files to be migrated to the IBM Spectrum Archive tape tier by using standard IBM Spectrum Scale scan policies. IBM Spectrum Archive EE then manages the movement of IBM Spectrum Scale file data to the IBM Spectrum Archive tape cartridges. It also edits the metadata of the IBM Spectrum Scale files to point to the content on the IBM Spectrum Archive tape tier.
Access to the migrated files through the IBM Spectrum Scale file system remains unchanged, with the file data provided at the data rate and access times of the underlying tape technology. The IBM Spectrum Scale namespace is unchanged after migration, making the placement of files in the IBM Spectrum Archive tape tier not apparent to users and applications. See 8.10.1, “Creating a traditional archive system policy” on page 238.
1.1.2 Active archive
This section describes how IBM Spectrum Archive EE is used as an active archive in an IBM Spectrum Scale environment.
The use of an LTFS tape tier as an active archive is useful when you need a low-cost, long-term archive for data that is maintained and accessed for reference. IBM Spectrum Archive satisfies the needs of this type of archiving by using open-format, portable, and self-describing tapes based on the LTFS standard.
In an active archive, the IBM Spectrum Archive file system is the main store for the data while the IBM Spectrum Scale file system, with its limited disk capacity, is used as a staging area, or cache, in front of IBM Spectrum Archive EE. IBM Spectrum Scale policies are used to stage and de-stage data from the IBM Spectrum Scale disks to the IBM Spectrum Archive EE tape cartridge.
Figure 1-3 shows the archive storage management with the IBM Spectrum Archive tape tier in the IBM Spectrum Scale file system, the disk that is used for caching, and the namespace that is mapped to the tape cartridge pool.
Figure 1-3 Archive storage management with IBM Spectrum Archive EE
The tapes from the archive can be exported for vaulting or for moving data to another location. Because the exported data is in the LTFS format, it can be read on any LTFS-compatible system.
1.2 IBM Spectrum Archive EE functions
This section describes the main functions that are found within IBM Spectrum Archive EE. Figure 1-4 shows where IBM Spectrum Archive EE fits within the solution architecture that integrates with IBM Spectrum Archive LE and IBM Spectrum Scale. This integration enables the functions of IBM Spectrum Archive to represent the external tape cartridge pool to IBM Spectrum Scale and file migration based on IBM Spectrum Scale policies. IBM Spectrum Archive EE can be configured on multiple nodes with those instances of IBM Spectrum Archive EE sharing a physical tape library.
Figure 1-4 IBM Spectrum Archive EE integration with IBM Spectrum Scale and IBM Spectrum Archive LE
With IBM Spectrum Archive EE, you can perform the following management tasks on your system:
Create and define tape cartridge pools for file migrations.
Migrate files in the IBM Spectrum Scale namespace to the IBM Spectrum Archive tape tier.
Recall files that were migrated to the IBM Spectrum Archive tape tier back into IBM Spectrum Scale.
Reconcile file inconsistencies between files in IBM Spectrum Scale and their equivalents in IBM Spectrum Archive.
Reclaim tape space that is occupied by non-referenced files and non-referenced content that is present on the physical tapes.
Export tape cartridges to remove them from your IBM Spectrum Archive EE system.
Import tape cartridges to add them to your IBM Spectrum Archive EE system.
Add tape cartridges to your IBM Spectrum Archive EE system to expand the tape cartridge pool with no disruption to your system.
Obtain inventory, job, and scan status of your IBM Spectrum Archive EE solution.
1.3 IBM Spectrum Archive EE components
This section describes the components that make up IBM Spectrum Archive EE:
EE component (multi-tape management module (MMM))
LE+ component
HSM component
IBM Spectrum Scale is a required component for the IBM Spectrum Archive solution.
IBM Spectrum Archive EE is composed of multiple components that enable an IBM Spectrum Archive tape tier to be used for migration and recall with the IBM Spectrum Scale. Files are migrated to, and recalled from, the IBM Spectrum Archive tape tier by using the IBM Spectrum Archive EE components that are shown in Figure 1-5 on page 8 and Figure 1-6 on page 9.
1.3.1 IBM Spectrum Archive EE terms
This list highlights the components of an IBM Spectrum Archive EE solution:
IBM Spectrum Archive EE node A x86_64 IBM Spectrum Scale server that is running on IBM Spectrum Archive EE. Each EE node must be connected to a set of tape drives in a tape library, through an FC connection. One EE node cannot be connected to more than one logical library.
IBM Spectrum Archive EE Cluster A set of EE nodes that are connected to a single IBM Spectrum Scale cluster. All nodes in a cluster can see the files on the IBM Spectrum Scale file system with same inode number.
IBM Spectrum Scale clusters that are connected by active file management (AFM) are considered as two separate IBM Spectrum Scale clusters by IBM Spectrum Archive EE.
 
IBM Spectrum Scale Cluster IBM Spectrum Scale servers (non-EE nodes) and EE nodes.
IBM Spectrum Scale only Node An IBM Spectrum Scale server that is running on a supported platform, such as Linux or Windows, without IBM Spectrum Archive EE.
Tape Pool A set of tape cartridges of the same type (either Write Once Read Many (WORM) or Non-WORM, and either LTO or 3592) that are in one logical tape library. A tape pool uses the same generation of tapes within the pool.
A tape pool does not span across multiple tape libraries.
A tape pool is assigned to only one node group.
Node Group Nodes that are connected to the same tape library. Normally there is a one-to-one relationship between a node group and a tape library, so a dual-library EE cluster has two node groups, at minimum. In theory, you can divide one tape library into multiple node groups, just like partitioning.
Tape pools are assigned to only one node group, but a node group can access multiple tape pools.
Control Node An EE node that is running in an MMM. IBM Spectrum Archive EE V1R2 and subsequent releases require you to configure one control node per tape library. The control node manages all the requests for access to its associated tape library. The control node redirects requests for access to other tape libraries to the control nodes of the other tape libraries.
Figure 1-5 shows the components that make up IBM Spectrum Archive EE. The components are shown with IBM Spectrum Scale configured on separate nodes for maximum scalability.
Figure 1-5 Components of IBM Spectrum Archive EE with separate IBM Spectrum Scale nodes
In Figure 1-6, the components that make up IBM Spectrum Archive EE are shown with no separate IBM Spectrum Scale nodes. This diagram shows how IBM Spectrum Scale can be configured to run on the same nodes as the IBM Spectrum Archive EE nodes.
Figure 1-6 Components of IBM Spectrum Archive EE with no separate IBM Spectrum nodes
A second tape library can be added to the configuration, expanding the storage capacity and offering the opportunity to add even more nodes and tape devices. Availability can be improved by storing redundant copies on different tape libraries.
With multiple tape library attachments, the tape libraries can be connected to an IBM Spectrum Scale cluster in a single site or can be placed in the metro distance locations through IBM Spectrum Scale synchronous mirroring (stretched cluster).
The following distances are supported for stretched cluster with synchronous mirroring using block-level replication:
For V4.2.2.3 and later, distances greater than 300 km
For V4.2.1 and later, distances up to 300 km
For V4.2 and previous, distances less than 100 km
It is important to remember that this is still a single IBM Spectrum Scale cluster. In a configuration using IBM Spectrum Scale replication, a single IBM Spectrum Scale cluster is defined over two geographically separated sites consisting of two active production sites and by using tiebreaker disks.
One or more file systems are created, mounted, and accessed concurrently from the two active production sites. The data and metadata replication features of IBM Spectrum Scale are used to maintain a secondary copy of each file system block, relying on the concept of disk failure groups to control the physical placement of the individual copies:
1. Separate the set of available disk volumes into two failure groups. Define one failure group at each of the active production sites.
2. Create a replicated file system. Specify a replication factor of 2 for both data and metadata.
When allocating new file system blocks, IBM Spectrum Scale always assigns replicas of the same block to distinct failure groups. This feature provides a sufficient level of redundancy, allowing each site to continue operating independently should the other site fail.
For more information about synchronous mirroring that uses IBM Spectrum Scale replication, see the following website:
 
Important: Stretched cluster is available for distances as listed below. For longer distances, use the AFM feature of IBM Spectrum Scale with IBM Spectrum Archive. With the release of IBM Spectrum Archive V1.2.3.0, limited support is provided for IBM Spectrum Scale AFM. The only supported AFM mode of IBM Spectrum Scale is independent-writer (IW). The use of AFM will be with two different IBM Spectrum Scale clusters with one instance of IBM Spectrum Archive at each site. For more details about IBM Spectrum Scale AFM, see 2.3.5, “Active File Management” on page 29.
The following distances are supported for stretched cluster with synchronous mirroring using block-level replication:
For V4.2.2.3 and later, distances greater than 300 km
For V4.2.1 and later, with distances up to 300 km
For V4.2 and previous, with distances less than 100 km
Figure 1-7 shows a fully configured IBM Spectrum Archive EE.
Figure 1-7 IBM Spectrum Archive EE shown with scaled-out capacity, multiple libraries, and redundant copies
1.3.2 Hierarchical Storage Manager
A Hierarchical Storage Manager (HSM) solution typically moves the file’s data to back-end storage (in most cases, physical tape media) and leaves a small stub file in the local storage file system. The stub file uses minimal space, but leaves all metadata information about the local storage in such a way that for a user or a program the file looks like a normal, local stored file. When the user or a program accesses the file, the HSM solution automatically recalls (moves back) the file’s data from the back-end storage and gives the reading application access to the file when all the data is back retrieved and available online again.
1.3.3 Multi-Tape Management Module
This component is a service of IBM Spectrum Archive EE control node. The MMM service implements policy-based tape cartridge selection and maintains the state of all of the resources that are available in the system. There is one control node per tape library.
The scheduler component of the control node uses policy-based cartridge selection to schedule and process job requests, such as migration and recall requests, which are fulfilled by using available system nodes and tape resources. The following tasks are done by the scheduler:
Choosing a job from the job queue
Choosing an appropriate tape cartridge and tape drive to handle the work
Starting the job
The control node also manages the creation of replicas across multiple tape libraries. For example, when the ltfsee migrate command specifies making replicas of files in multiple tape libraries, the command accesses the control node that manages the tape pool for the primary copy.
The control node puts the copy job in the job queue for the primary tape library, then passes the secondary copy job to the control node for the second tape library. The second control node puts the copy job in the job queue for the second tape library.
When the scheduler component of the control node selects a tape cartridge and tape drive for a migration job, it manages the following conditions:
If the migration is to a tape cartridge pool, the tape drive must belong to the node group that owns the tape cartridge pool.
If a format generation property is defined for a tape cartridge pool, the tape cartridge must be formatted as that generation, and the tape drive must support that format.
The number of tape drives that are being used for migration to a tape cartridge pool at one time must not exceed the defined mount limit.
If there are multiple candidate tapes available for selection, the scheduler tries to choose a tape cartridge that is already mounted on an available tape drive.
When the scheduler selects a tape cartridge and tape drive for jobs other than migration, it makes the following choices:
Choosing an available tape drive in the node group that owns the tape cartridge and tape cartridge pool
Choosing the tape drive that has the tape drive attribute for the job
When the control node scheduler selects a tape cartridge for transparent recalls such as double-clicks or application reads, it manages the following conditions:
If the file has a replica, the scheduler always chooses the primary copy. The first tape cartridge pool that is used by the migration process contains the primary copy.
If the primary copy cannot be accessed, the scheduler automatically retries the recall job by using the other replicas if available.
Other functions that are provided by the control node include the following functions:
Maintains a catalog of all known drives that are assigned to each IBM Spectrum Archive node in the system
Maintains a catalog of tape cartridges in the tape library/libraries
Maintains an estimate of the free space on each tape cartridge
Allocates space on tape cartridges for new data
The MMM service is started when IBM Spectrum Archive EE is started by running the ltfsee start command. The MMM service runs on only one IBM Spectrum Archive EE control node at a time. Several operations, including migration and recall, fail if the MMM service stops. If SNMP traps are enabled, a notification is sent when the MMM service starts or stops.
 
Important: If the ltfsee start command does not return after several minutes, it might be because the firewall is running. The firewall service must be disabled on the IBM Spectrum Archive EE nodes. For more information, see 4.3.2, “Installing, upgrading, or uninstalling IBM Spectrum Archive EE” on page 64.
The ltfsee start command also does the unmount of the tape drives, so the process might take a long time if there are many mounted tape drives.
1.3.4 IBM Spectrum Archive Library Edition Plus component
The IBM Spectrum Archive Library Edition Plus (LE+) component is the IBM Spectrum Archive tape tier of IBM Spectrum Archive EE. It is an enhanced version of the IBM Spectrum Archive LE that is designed to work with the EE.
The LE+ component is installed on all of the IBM Spectrum Scale nodes that are connected to the IBM Spectrum Archive EE library. It is the migration target for IBM Spectrum Scale. The LE+ component accesses the recording space on the physical tape cartridges through its file system interface and handles the user data as file objects and associated metadata in its namespace.
With IBM Spectrum Archive EE V1.2.4.0 and later, IBM Spectrum Archive LE+ is started automatically when running the ltfsee start command. If errors occur during start of the IBM Spectrum Archive EE system, run the ltfsee info nodes command to display which component failed to start. For more information about the updated ltfsee info nodes command, see 7.6, “IBM Spectrum Archive EE automatic node failover” on page 149.
1.4 IBM Spectrum Archive EE cluster configuration introduction
This section describes a cluster configuration for IBM Spectrum Archive EE. This configuration is for single-library, multiple-node access.
Single-library, multiple-node access enables access to the same set of IBM Spectrum Archive EE tape cartridges from more than one IBM Spectrum Archive EE node. The purpose of enabling this capability is to improve data storage and retrieval performance by assigning fewer tape drives to each node.
When this cluster configuration is used, each IBM Spectrum Archive EE node must have its own set of drives that is not shared with any other node. In addition, each IBM Spectrum Archive EE node must have at least one control path drive that is designated as a control path by an operator of the attached IBM tape library.
IBM Spectrum Archive EE uses the drive that is designated as a control path to communicate with the tape library. This type of control path is also known as a media changer device. IBM Spectrum Archive EE is scalable so you can start out with a single node and add more nodes later.
 
Important: As part of your planning, work with your IBM tape library administrator to ensure that each IBM Spectrum Archive EE node in your configuration has its own media changer device (control path) defined in its logical library.
Figure 1-8 shows the typical setup for an IBM Spectrum Archive EE single-library, multiple-node access.
Figure 1-8 Single-library multiple-node access setup
IBM Spectrum Archive EE manages all aspects of the single-library, multiple-node access, which includes the management of the following areas:
Multiple tenancy
The contents of the tape cartridge are managed automatically by the IBM Spectrum Archive EE system so that each IBM Spectrum Archive EE node does not have to be aware of any changes made on other IBM Spectrum Archive EE nodes. The index on each tape cartridge is updated when the tape is mounted and the index is read from this tape.
Single node management of library inventory
The IBM Spectrum Archive EE system automatically keeps the library inventory up to date to manage the available drives and tape cartridges. The library inventory is kept on the node on which the MMM service runs.
Space reclaim management
When data is moved from one tape cartridge to another to reclaim the space on the first tape cartridge, the IBM Spectrum Archive EE system ensures that the internal database reflects the change in the index of the IBM Spectrum Archive physical tape cartridge.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.244.137