Advances in Backup Technology

In the past, IT organizations have turned to mainframes as a solution for high-speed backup of large databases. While UNIX systems typically delivered backup throughput of 50 to 70 Gbyte/hr, mainframes with their high-speed tape drives had a throughput nearly six times faster. However, several recent developments have turned the tables on this equation by enabling sustained backup rates of more than 1 Tbyte/hr on Sun servers while at the same time decreasing the intrusiveness of backup operations. Some of these recent developments are described in the following sections.

Faster Throughput Rates

Tape drive technology has seen dramatic improvements in throughput rates. The Mammoth 8 mm drive provides a native transfer rate of 12 Mbyte/sec with a compressed mode transfer rate of at least 30 Mbyte/sec. The STK 9840 drive, which competes with the IBM 3590 drive, provides a native transfer rate of 9 Mbyte/sec with a burst rate of 40 Mbyte/sec. A double-speed version of the STK 9840 drive is pending at the time of this writing. The DLT 7000 provides a native transfer rate of 5 Mbyte/sec. Linear Tape-Open (LTO) technology offers a native transfer rate of 12 to 15 Mbyte/sec.

Greater Capacities

Along with the improvements in speed have come improvements in capacity. The Mammoth2 native capacity is 60 Gbytes, with the compressed capacity typically being 150 Gbytes. The STK 9840 native capacity is 20 Gbytes. and the STK 9840 is designed to maximize time-to-data, rather than capacity. The Linear Tape-Open (LTO) and Super DLT (SDLT) technologies offer native capacities of 100 Gbytes, with even greater capacity for compressed data.

New Approaches to Online Backups With Database Technology

Recognizing the need for high-speed backups that require no downtime, database vendors have developed approaches to online backups that enable specialized backup software such as NetBackup or Solstice Backup to transfer data from the database management system (DBMS) to backup devices, using parallel streams of data. One example is the Oracle RMAN utility. This utility manages the creation of a database snapshot and feeds parallel data streams to the backup tool for multiplexing onto tape devices. Previously, this process required dumping database tables to separate ASCII files and then backing up the files. Today, however, RMAN provides a convenient API that can be used by third-party backup and restore utilities.

Automated Backup and Recovery Management Procedures

Another important development that is changing the character of backups is the advent of management software that automates backup policies and optimally feeds data to tape devices—ensuring integrity and the speeding up of the backup process. Raw tape speed and high-capacity drives are meaningless without the ability to effectively manage the transfer of data.

NetBackup and Solstice Backup offer built-in GUI-based schedulers. However, large organizations often use a third-party scheduler. A third-party scheduler takes advantage of the backup tool’s command-line interface.

Some third-party schedulers:

  • Control-M for Open Systems from New Dimensions Software

  • Tivoli Workload Scheduler (formerly Maestro) from Tivoli Systems

  • Tivoli Management Environment (TME) from Tivoli Systems

  • Platinum AutoSys from Platinum Technology

  • Event Control Server (Global ECS) from Vinzant

What are the tradeoffs between using a backup tool’s built-in GUI scheduler and a third-party scheduler? A script-based approach with a third-party scheduler is more complex to implement but provides greater automation and power.

If a large number of backup jobs need to be automated or if other types of jobs (besides backup jobs) need to be automated, a third-party scheduler could be used. For example, one large organization backs up more than 10,000 machines each night.

Using a script-based approach with a third-party scheduler is safer than allowing an individual to manipulate the entire schedule with a GUI tool. Furthermore, if a script-based approach is used, the script can be regenerated on-the-fly each night, thereby performing a more sophisticated scheduling than could be accomplished using a GUI. For example, a query could be performed to determine which file systems exist and are mounted and then generate a script to backs up those file systems.

Another important point to consider is that third-party schedulers are event based and can schedule jobs and react to events that are outside the domain of the backup tool. For example, you might not want to run a backup job until a particular report has finished running or until a large database update job has completed. This choice is easier to accomplish with a third-party scheduler.

Multiplexing

Backup tools such as NetBackup and Solstice Backup now make it possible to run multiple backup jobs simultaneously and to stream data to one or more devices. This technique is known as multiplexing. Multiplexing can be accomplished in two ways:

  • Across tape devices

    Multiplexing across tape devices allows high throughput to the tape subsystem.

  • Across input streams

    Multiplexing input streams allows more data to be staged to tape, allowing full tape bandwidth use.

The streams can originate from locally attached disks or from clients over a network. As jobs finish, the backup tool can dynamically add more backup streams to the backup device. The configuration can be tuned to the desired level of multiplexing for each backup device. These backup tools also make it possible to initiate parallel restores from multiplexed images on tape.

Multiplexing makes it possible to keep a fast tape drive running continuously. Continuous operation is important for DLT drives since they require a relatively long time to spin up. The multiplexing needs to be set high enough on a tape drive so that it can accept enough streams to keep it continuously busy. In some situations, multiplexing might be set to just one stream. This can be the case when one or more tape devices connect directly to a large Oracle database. The tape devices can run at full speed in this scenario. In other situations, the multiplexing might be set to as many as 20 or 30 streams per tape drive.

For example, a nationwide car rental business backs up 1000 desktop machines from various airports in the country to a centralized location. Some streams come off 56 Kbit/sec leased lines, some come off 128 Kbit/sec ISDN lines. The data comes in from all over the country, so the multiplexing is set to 20 to 30 streams per tape. This practice ensures the tape drives are kept busy.

If multiplexed tapes need to be duplicated for offsite storage or other purposes, there are two options. Exact copies of the tapes can be created so that the copies contain the data in multiplexed format. Alternatively, the tapes can be demultiplexed on-the-fly during duplication. A demultiplexed tape can be restored more quickly. However, demultiplexing requires additional machine cycles. Usually, a dedicated backup server is used for this purpose. Typically, a higher priority will be placed on demultiplexing the most important datasets so these can be restored faster.

Compression

Data compression can be used to reduce tape storage requirements, improve backup speed, and possibly reduce network traffic. Two compression options are available: the software compression functions built into tools such as NetBackup and Solstice Backup, or the compression functions provided by dedicated hardware on tape drives.

It is better to use the hardware compression on tape drives if there is sufficient network bandwidth to support noncompressed network traffic. Hardware drivers offer higher performance, with compression comparable to software compression, however, hardware compression generally uses a device-specific format. The use of hardware compression reduces tape portability, since a tape may have to be read on the same type of device that originally wrote the compressed data. If data is compressed prior to being sent over a network, it occupies less bandwidth; however, the performance of the backup client will be degraded because the compression software requires CPU cycles.

For further information, see “Compression” on page 78 and “Data Type” on page 63.

Raw Backups With File-Level Restores

NetBackup now offers a feature, called VERITAS NetBackup FlashBackup, that improves backup performance in certain situations, although restores of files backed up in this way may be slower. FlashBackup performs a fast backup of an entire raw partition as it bypasses the file system. However, it does keeps track of the inode information so that individual files can be restored. This approach works well if a backup of a large number of small files is required. Furthermore, backup catalog sizes are smaller with this approach, since it is not necessary to keep all the file information in the backup catalog.

True Image Restore

NetBackup also provides a function called true image restore. This function can restore a file system to its most recent state. When this feature is enabled, additional information is collected during an incremental backup. NetBackup tracks any files that were added or deleted since the last backup. For example, an incremental backup was performed on Monday, Tuesday, and Wednesday. However, on Tuesday, 80 files were deleted. Additionally, the disk crashed on Thursday. With a true image restore, the deleted files aren’t restored since they were no longer present when the most recent snapshot of the file system was taken. This could be important in some situations. For example, a user may have purposely deleted files to make room on a disk. A full restore of all files that had existed since Monday would amount to 3.5 Gbytes of data, but the disk may only have a capacity of 2 Gbytes.

Automatic Multistreaming

Another new performance feature available in NetBackup is automatic multistreaming. If a server is attached to several disk drives, you can specify that all local drives are to be backed up as separate streams. With this feature, the data can be streamed to multiple tape drives, or multiple streams can be sent to a single tape drive.

This feature automatically multistreams drives from a single NetBackup class. For example, if 10 local disk drives are attached to a server, the multistreaming feature can send 10 data streams to the tape drive (or drives) which will increase performance. Additionally, multistreaming can automatically restart any failed streams by using the checkpoint restart function. This is important if a large backup job is in progress and a part of the job fails because NetBackup can restart where the job left off and redo any failed streams.

Fine–grained control can be achieved by using the command that creates a new stream, which allows the user to specify groups of subdirectories and files as individual streams. By using this feature, you can make portions of a disk into backup streams instead of entire disks as is the case with the all local drives command.

When should entire disks be multistreamed, and when should the streams be defined in terms of specific subdirectories and files? An entire disk might be streamed if the disk contains many subdirectories, especially if subdirectories and files are added and deleted frequently, for example, a file system disk that contains many directories which are often modified. In this case, if directories are added or deleted, it is not necessary to remember to update the corresponding NetBackup class. On the other hand, finer-grained control can be achieved if there is one large subdirectory that is segmented. In this case, a backup of the individual subdirectories could increase performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.35.81