Database Backup Strategies in Support of High Availability

Backing up and restoring databases are fundamental to any highly available implementation. The real trick is in knowing what type of backup to do and how often to do it so that your database can be recovered easily and quickly to fulfill your high availability requirements. You have two things to protect: the integrity of the data and the availability of the data. Understanding how long it takes to back up your database or to restore your database should be well known as you plan for high availability. This must also correspond closely to the service level agreements you have committed to. Having the proper backup and recovery approach in place is key to achieving these commitments. Plan your backup strategy based on the time it takes to back up and the time it takes to fully recover a database. It is this total time that must be considered for a system that is to be highly available. For databases that have tables with varying uses, you might want to consider a multiple file group approach that keeps tables that are read-only in one file group, huge volatile tables in another, and so on. In this way, you would only need to back up the data tables that change and not the static tables (achieving even better availability).

The good news is that there are a few varying types of database backup techniques that can be leveraged to achieve an optimal recovery strategy. There is a full database-oriented approach and a file-oriented approach. Both work well. They offer up-to-the-minute recovery, point-in-time recovery, and some degree of redundancy that allows you to recover from damaged full database backups/file backups. As you are hopefully already aware, there are four backup operations that can be used with SQL Server 2000:

  • Full database backups— This effectively starts the logical point of recovery for most backup strategies (a full database backup or a file/file group backup). A full database backup will back up the entire database allocation, including the data portion (the database structures, indexes, database users, and so on) along with the transaction log portion of the database. This results in an exact image of the database stored in an optimized backup format that can easily be used to restore/recover from.

  • File/File group backups— Works pretty much the same way a full database backup works, except that the transaction log portion is not backed up. You must do this as an added step to get to a recovery point that makes sense for you. However, if you are dealing with very large databases, very often the file/file group approach makes the most sense to use.

  • Transaction log backups— Transaction log backups are the way that changes can be captured at set intervals since the last database full backup, last full file/file group backup, or the last transaction log backup (incremental log backups). These specifically back up only the “committed” portion of the transaction log. Any transaction that is still “in flight” (not completed) would not be recovered by a transaction log backup. As part of a transaction log backup, the inactive portion of the transaction log is cleared as well. Each transaction log backup essentially marks a point in time from which you can recover to.

  • Differential backups— A differential backup makes a copy of all the pages in a database that have been modified since the last database full backup. Using this approach, backups run relatively quickly and are smaller in size than other types of backups. Moreover, differential backups may be performed while users access the database. Because of their speed and low server impact, differential backups can be made more frequently than other types of database backups, decreasing the risk of data loss.

Most database-related failures that will result in you having to do a restore are

  • Server crashes— Sometimes memory or operating system related, power loss, and so on. SQL Server usually recovers automatically on its own in these cases, but there is a chance that a database will end up in a “suspect” mode.

  • Corrupt or lost data— Broken page chains for a table, wrong update, delete or insert statement being executed by a user (user error), and so on. These very often require point-in-time recovery to a point just before the corruption occurred.

  • Corrupt or lost database— Database becomes “suspect” or is damaged in some way. Often requires full database restoration.

  • Corrupt or lost transaction log— Transaction log becomes invalid somehow. Often requires full database restoration.

  • Corrupt or lost master database— This affects the SQL Server instance itself. Rebuilding a master db should be one thing you have in your arsenal, since all databases on the SQL Server instance won't be available until master is recovered properly.

  • Lost SQL Server— The disk crashes, or other factors make this entire server unavailable. You will need to be able to completely recover a full SQL Server instance within a short amount of time.

Two Backup Approaches for High Availability

Your focus should be on minimizing your unavailability and, at the same time, not sacrificing data integrity or performance. For these reasons, we recommend a couple of backup approaches that leverage full backups (database or file/file group), coupled with differential and transaction log backups for the most up-to-date recovery points possible. This is a tailored hybrid backup/restore approach that promotes fast recovery to support your high availability needs. Also, keep in mind that SQL Server allows data to be backed up while the database remains online and accessed by users. This offers some flexibility if you are worried about what to do with your database if it must be up 24 hours a day.

The Full Database Backup Approach

Making full database backups with some periodicity (like every day or weekly—depending on the database's size and volatility) will be critical to your database's availability (actually, recoverability). The traditional full database backup approach is what most of us do (and have done since SQL Server first came out). It allows you to have up-to-the-minute and point-in-time recovery.

Up-to-the-minute recovery literally gets you recovered up to the last committed transaction in the transaction log. This is highly desirable.

Point-in-time recovery allows you to pick the point in time you wish to recover to, perhaps just before an application error messed some table's data up severely. More recently, differential database backups were added for SQL Server, which help greatly in speeding recovery because large groups of log transactions can be skipped during recovery process.

Figure 10.13 shows a typical full database backup sequence that is done daily, coupled with incremental transaction logs done on-the-hour along with a mid-day differential database backup.

Figure 10.13. Full database backup daily schedule, with incremental transaction log dumps and a differential database backup.


Worst case recovery times are now minimized to not more than the time to restore the database from its full database backup (plus apply the differential database backup if your failure was after 1200 hours) and apply log transactions up to the point in time of the failure. This can be very fast depending on the database size and volatility. If the differential backup was invalid for any reason, you can still fully recover your database using the transaction log backups. And, vice versa, if the transaction logs are bad, you at least can recover to the differential backup point (reducing risk accordingly).

The Full File/File Group Backup Approach

With full file/file group backups, you can back up any file or file group that has been defined and at any time. There really are few restrictions here. The primary issue that comes to mind is that you cannot restore file/file group backups to a location other than the original location without first restoring the entire database. But this is rarely needed.

This type of strategy makes the most sense when you have some tables that are static (and read only) and a few that are highly volatile (and large). If these are isolated into their own files or file groups, you can selectively just back up the volatile tables (file/file groups) much more efficiently and can restore these selectively as well. The same type of backup sequencing and scheduling should be employed that was illustrated in Figure 10.13, but the need to do full database dumps is no longer needed. Best rule of thumb is to back up logically related tables and indexes together. Then, the same differential backup approach is used at the file/file group level (with all the same advantages). Great!

Design Note

We often use the special backup device of NUL to test out our backups. It will actually back up your database but not write the backup anywhere (it goes to the NUL device). It does allow you to accurately see the number of pages processed of both your data and log portions of the database, and the amount of time of the backup process so you can better plan your recovery (and availability). Below is an example of using the DISK = 'NUL' technique:

BACKUP DATABASE Northwind to DISK = 'NUL'
Go
Processed 320 pages for database 'Northwind',
       file 'Northwind' on file 1.
Processed 1 pages for database 'Northwind',
       file 'Northwind_log' on file 1.
BACKUP DATABASE successfully processed 321 pages in 0.365
       seconds (7.187 MB/sec)

To translate the pages processed into the actual disk space that it will occupy simply multiply the pages by 8192 bytes per page (for example, 321 pages * 8192 bytes/page = 2,629,632 bytes or 2.6MB backup file size).

And, once you have run a BACKUP command using the NUL device, SQL Server still thinks that a normal backup has occurred. So, don't forget to start the backup sequence again (for real) with a full database backup (or full file/file group backup) to a valid backup device before you go live with these strategies.


Parallel Striped Backup

SQL Server 2000 also allows you to backup a database to multiple devices in parallel, resulting in a striped backup set. You might want to pursue this type of backup strategy if you need super fast full database backups. We have not seen this used in practice because of the limitations it has on restoration. All striped backup devices must be restored together and if any one of them is damaged, the whole set is invalid. And, with the faster and faster disk drives that are available today, the speed of disk access has become less of the bottleneck.

Split-Mirror Backups (Server-less Backups)

The split-mirror or server-less backup business is intended to garner much higher database availability by providing an extremely rapid database backup and restore capability. And, with nearly instantaneous split-mirror restores of large amounts of data, you can keep your system online and available almost continuously. Split-mirror backups also help greatly to reduce the resource utilization on your server that would normally be needed to do scheduled backups. In addition, split-mirrored backups are also easily tapped to create copies of your primary database that can be used for reporting or even for fail-over if needed. This is not triple mirroring though. Triple mirroring is a purely hardware only solution at the disk-subsystem level.

To use split-mirror backup and restore capabilities, your database will have to leverage customized hardware and software from leading disk subsystem vendors such as EMC, Veritas, HP, and others. Be forewarned: There is usually a hefty price tag to this technology. These specialized offerings usually contain at least RAID 10 mirrored disks and volume management software that interacts directly with SQL Server.

The disk subsystem illustration in the top of Figure 10.14 shows the starting point of a split-mirror configuration (which is not split yet). In this case, it is a RAID 10 mirrored configuration that is mirroring to two mirrored volume sets (Mirrored Set and Second Mirrored Set) in the same disk subsystem. All SQL Server disk writes are being fully mirrored from the primary set to both mirror sets simultaneously. This, in itself, is a very high fault-tolerant configuration. To turn this into a split-mirror backup, you will “split” off the second mirrored set to become a snap shot of the primary set that can be synchronized later. So, at the time of this split, you are giving up a mirror, but gaining a database snapshot to provide you with faster backup and recovery capabilities (for high availability).

Figure 10.14. Split-Mirrored backup approach for higher database availability.


In Figure 10.14 (at the bottom), you can see the second mirrored set that gets split off. This is often referred to as a business continuance volume or a split-mirrored set. The vendor's software utility will issue a SQL Server native level backup of the database and will store it on the split-mirrored set. During this backup process, writes to the physical disks are suspended to eliminate torn pages (partially written pages to a mirror) but reads are still possible. And, the splitting itself is really fast. Once the split-mirrored backup is complete, writes at the physical disk level are resumed. This split-mirrored backup is completely usable in any way you want. It can be the basis of a fail-over, used for archiving, or for reporting. You can choose to re-synchronize this split-mirror with the primary mirror (basically a re-mirroring) and do a split-mirror restores (at the SQL Server database level).

If you do a split-mirrored restore, SQL Server will be able to read the split-mirrored set immediately and write to all mirrored sets, the net effect being that the database becomes available almost instantaneously. Following the restore, you would also be able to apply incremental log backups to your database as needed (and these are usually applied very quickly depending on the time between log backups).

You should be able to do numerous full database backups during normal operating hours to guarantee up-to-date database images and maximum availability (minimum recovery times).

Volume Shadow Copy Service (VSS)

If you haven't heard by now, Windows Server 2003 comes with a Volume Shadow Copy Service that is, as its name suggests, a volume-oriented shadowing capability. It has been integrated with SQL Server via a component named SQLWriter. It is essentially like a split-mirror but done at the OS level rather than via specialized hardware. Combined, you will be able to do full database backups (with snapshot) via the VSS service and then separately apply transaction logs to the point in time you want to recover to. Slowly, backup and recovery (of even SQL Server databases) is being brought out to the operating system level for single point of management (this is long overdue).

Monitoring/Verifying Backups

If you are serious about database backup and recovery, you will want to monitor and verify the backups that are being made. I've lost count of the number of times that I went to restore a backup and it was unreadable for a variety of reasons. There is both a database integrity check and a backup integrity check (for both data and transaction log dumps) that you can execute as part of your regular maintenance plans (or as a separate offline process). This can save the day if the integrity check just catches one bad transaction log backup or one bad full database backup. Figure 10.15 shows both the verify database integrity option and the verify backup integrity option as part of your database maintenance plan. Performance issues of these integrity checks may preclude you from making them part of the maintenance plan, but they should be run regularly somehow.

Figure 10.15. “Check Database Integrity” and “Verify the Integrity of the Backup when Completed” options of a database maintenance plan.


In addition, activating alerts to notify support personnel when backups fail, transaction logs are filling up, or integrity is in question is an essential part of maintaining high availability. Figure 10.16 shows the integrated capability of SQL Server to completely support an alert and notification process in the event of these types of problems.

Figure 10.16. Alert and notification setup and testing for the primary SQL Server issues during backup and integrity verification.


If you do not define these notifications, you are putting your company at risk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.185.96