Expectations

It is important to develop realistic expectations when planning a backup architecture. This section discusses potential areas of confusion in the planning that could result in a disproportionate number of problems.

Compression

Data compression has two effects. It speeds up the effective data transfer rate, and it compacts data written to tape, enabling the tape to hold more information.

Compression can be problematic for a number of reasons. The benefits of compression vary, depending on the type of data to be compressed and the compression mechanism used. When the same compression algorithm is used, different types of data compress to different degrees. The level of compression depends on how much redundancy can be identified and remapped in the time available. Some data types, for example, MPEG video, have little or no redundancy to eliminate and, therefore, do not compress well regardless of which compression mechanism is used. By contrast, raw video compresses reasonably well in most cases.

Hardware compression typically used in a tape drive relies on a buffer where data is held temporarily while being compressed. The size of the buffer places limitations on how much data can be examined for redundant patterns. Also, the amount of time necessary to locate all redundant patterns may not be available to the compression mechanism since compression happens in real time, when data is streamed onto the tape.

Administrators often expect either the 2:1 compression ratio frequently quoted in the tape literature, or compression ratios similar to utilities such as compress (1) or GNUzip. This 2:1 compression value has been touted by manufacturers as “typical,” when in reality, the value is only typical with the test pattern algorithms used by the manufacturer. The compression ratios of diverse types of data can often be lower. Performance planning based around the 2:1 compression ratio may well be inadequate for the task.

Another common mistake is to use software to compress data and then use that compression ratio to estimate the hardware compression ratio. Software compression and hardware compression are inherently different. Software compression utilities can use all the system memory to perform compression and are under no time constraints. Conversely, hardware compression is constrained by the hardware buffer size and is also limited because compression must be performed in real time. The compression ratio delivered by software utilities is generally better than drive hardware compression.

Compression ratios for various types of data (as observed in simple tests) are shown in TABLE 4-1. For hardware compression, a realistic compression ratio would be closer to 1.4:1, although some data types appear to do better. When data with little or no redundancy (MPEG, JPEG) is backed up, hardware compression should be turned off.

Table 4-1. Typical Compression Ratios
Mode Speedup Ratio Compaction Ratio
None 1:1 1:1
Text 1.46:1 1.44:1
Motion JPG 0.93:1 0.92:1
Database 1.60:1 1.57:1
File Server 1.60:1 1.63:1
Web Server 1.57:1 1.82:1
Aggregate 1.32:1 1.39:1

Metadata Overhead

Any backup plan must allow for the metadata overhead (above the data itself). Backup software keeps a catalog of files that reside on tape, with a record for each instance of a file. An estimated 150 to 200 bytes are needed per file record. Solstice Backup software typically requires slightly more byte allocation than does NetBackup. A catalog containing a million file records typically requires between 143 and 191 Mbytes of additional space for metadata. Plan to allocate fast and reliable disk space to accommodate the catalog, and include a schedule to back up the catalog itself.

Backup software also writes a certain amount of metadata to tape in order to track what is being written and its location. However, the amount of metadata is usually small in comparison to the dataset size. Tests indicate that metadata written to tape by NetBackup and Solstice Backup is commonly below 1 percent. Other software (for example, ufsdump) may write more metadata to tape, depending on the format used.

Recovery Performance

A common misconception assumes restore performance is identical to backup performance. It was a rule of thumb to anticipate a restore taking three times longer than the corresponding backup. Although this was a safe metric to use, recent measurements indicate that it is too conservative for the latest software and systems from Sun. With correct tuning and adequate hardware support, it is possible to implement restore procedures only 10 percent slower than backup procedures. However, without additional information being available, it may be safer to use a value between 50 percent and 75 percent.

The performance discrepancy between backups and restores exists largely because disk writes often take longer than disk reads. Also, there is more demand for disk writes to be performed synchronously (to guarantee consistency). For example, creating files requires several synchronous writes to update the metadata that tracks file information.

Restore time is increased because of a browse delay occurring at the start of a request. When a restore request is initialized, the software will browse the file record database and locate all records that need to be retrieved. This takes time, especially in a large database containing millions of records.

The situation is more complicated in multiplexed restores because the software usually waits until all restore requests are received before initiating the restore. Alternatively, the software may retrieve files requested after the restore begins. A time delay occurs as file retrieval is synchronized. This synchronization is necessary as data could be distributed across the media. If file retrieval was not synchronized, the restore operation must be serialized—resulting in repeated rewinding of the tape to access individual backup streams.

Ease of Use and Training Requirements

Storage management software has powerful features which can be accessed through the GUI. Library hardware has also been streamlined for ease of use (for example, the GUI touch-screen controls on the Sun StorEdge L3500 tape library). However, “ease of use” does not necessarily equate to “easy to use.” Backup and data protection implementation can be highly complex. Decisions made at the planning, installation, and operator levels affect the success of the overall system. Therefore, all involved in the process must either possess the skills required or receive training for the appropriate hardware, software, or issues involved.

It would be naive to expect a well-tuned backup solution could be put together by just assembling the hardware and installing the software. Even a moderately complex backup installation requires experienced personnel to install, configure, and tune various components. This process can take anywhere from days to a few weeks, depending on the complexity of the installation.

One approach is to contract with experienced consultants. For example, Sun, Legato, and others offer professional contract staff who will install and configure a system to specific requirements. This service can include on-site training of personnel to operate and maintain the system. In addition to this basic training, the staff should have further training that enables them to modify the system configuration to deal with changing demands. Alternatively, instead of training staff, you could use a long-term contract that includes system tuning to meet the changing demands of a system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.174.168