Protection of the Backup Index

The backup database, or index, keeps track of which files were backed up to which volume. Since the backup system can’t restore anything without this index, it becomes the single most important database in your environment. It is also the single point of failure in any backup system. As mentioned earlier, even if a volume is made with a format that is readable by a native utility, you still need the index to know what’s on it. The backup index is the greatest invention since someone created volume labels, but if it goes bad, you are out of luck.

Backup indexes usually are located on the central backup server, but they can be spread out to what is sometimes called "slave servers.” (A slave server would be one that is allowed to have backup devices.) One of the first questions you might ask is, “How big will this thing get?” The typical answer is .5 percent to 1 percent of the amount of data that is being backed up. That answer is very misleading, completely wrong, and totally irrelevant. The total size of the data that is being backed up has absolutely nothing to do with the database size. Let me state that again.

Tip

The total size of the data that is being backed up has absolutely nothing to do with the size of the backup database. It is the number of files being backed up, not their size, that determines the size of the database.

Each file that is backed up becomes a record in the index. That record will be the same size, regardless of how big the file is that was backed up.[40] The appropriate question, therefore, is, “How many bytes does each new file add to the index (a) the first time it is backed up and (b) any additional times it is backed up during incremental backups?” This number can then be multiplied by the number of files that are being backed up. This will show how big the index can get from one full backup. Multiply that number by the number of full backups the system is required to keep online. Using an estimate of a 2 -5 percent daily volatility rate,[41] estimate how big the index will grow from each incremental backup. Multiply that number by the number of incremental backups that the system is required to keep online. Add that to the first number, then multiply by 2. The result will be a pretty realistic, albeit slightly exaggerated, estimate of how big the backup index will get.

Managing the growth of the index is also a big issue. Whatever database format they use, one of the index files may grow larger than what the filesystem permits. If that happens, the index may get immediately corrupted. The backup product should have some method of dealing with this problem. Also, the entire index may get larger than the largest filesystem allowed, so it should be able to spread that data out across multiple filesystems.

Just as the volumes should be platform independent, so should the backup index. You should be able to restore it to any system in which the server software runs and continue working. In order for this to work, the index needs to be completely platform independent. Some products are, and some aren’t. Some of them are not platform independent, but they do provide a utility to move the index to other platforms. One of the best tests of this is to attempt to recover a Unix server’s backup index to a Windows 2000 server.

Before committing to buying a commercial backup product, test its index restore procedure. Some products can restore the index in a single step, while others require 20 pages of steps. Once you actually purchase a product, test that procedure again. Then test it on a regular basis so that you never hear yourself saying, “My whole world just crashed, including my backup server. Now what am I supposed to do again?”

A couple of minor (but nice) features also are helpful. The first is the ability to change a client’s name within the index. If a backup client’s hostname changes, and the backup product does not support this feature, there are only two choices. The first choice is to give up all backup history for that host. The second is to pay for another license, since the software will recognize the new hostname as a new client.

Another very nice feature, seen as essential by some, is the ability to reread a volume back into the index. Suppose there is a volume that has been set aside and is now expired out of the index completely. What if the only backup of a file that you need is on that volume? What if you don’t know? Some products can perform the restore without having to reread the entire volume, while others can read the volume right back into the index, making it appear as if it were just backed up. Some products are not able to reread that volume at all! One factor that goes into the product’s ability to read a volume like that is whether the vendor puts a copy of the index information onto the volume. It’s an extra step, but I think it’s well worth it. Basically, after every backup, the new portion of the backup index that was created from that backup is placed on the volume. That makes rereading the volume from scratch much easier. It is possible to reread a backup volume without placing the index on the volume, but having the index there makes it much easier.

The importance of the backup index, and your familiarity with how it works, cannot be overemphasized. It is the lifeblood of any backup system and should be treated like gold.



[40] Some products do use a variable-length record so that things like the length of the pathname can slightly affect the size of the record, but the size of the file still has no bearing.

[41] This is actually a huge volatility rate, but most environments don’t have any data on the number of files that change each day. Even if they’ve been monitoring their backup software, most reports talk only about how much data was backed up, not how many files were backed up.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.159.195