21

Defining Your Business Continuity Management Plan

By Chris Whitehead

SharePoint is fast becoming a key application within many organizations, whether used as the platform for a company's Internet presence, or as a departmental collaboration solution. More often than not, these types of usage scenarios would automatically deem an application as business-critical. Indeed, e-mail is a communication and collaboration tool that is often considered business-critical for most organizations with an IT function.

On an all-too-often basis, a business impact analysis is not performed for SharePoint within an organization. This is reflected by the lack of a suitable business continuity management plan, and associated service level agreements (SLAs) for the service. If you do not know the business importance of a service, or the costs associated with outage and data loss, you cannot effectively define SLAs for that service. These SLAs will not only define the agreements for recovery objectives and service availability, they will often help determine what backup, recovery, and availability solutions are required to meet them. Moreover, they are likely to feed into other key design aspects for a SharePoint deployment, ranging from storage planning, to governance guidance for the creation and deployment of customizations.

One approach would be to back up everything as often as possible, while providing multiple redundancy solutions for hardware and software. Unfortunately, because of cost and complexity, this is simply not possible for most organizations. The challenge for any architect is to understand all the options available for backup, recovery, and availability, and then provide the best solution for meeting SLAs, while weighing the cost and complexity of the solution.

SharePoint has always provided some interesting challenges in this area. SharePoint 2010 is no exception. However, it does provide another leap forward with the inclusion of new features such as native support for SQL Server database mirroring, unattached content database data recovery, and configuration-only backups. In fact, new features were even added in Service Pack 1, with the much sought addition of a Site Recycle Bin. This chapter explores both these new and proven methods for SharePoint business continuity management.

Of course, before you can choose the tools and techniques to use, you must determine and define your SharePoint business continuity requirements.

DEFINING YOUR BUSINESS CONTINUITY REQUIREMENTS

To define your SharePoint business continuity requirements, you must first analyze the potential business-impact scenarios and threats. Once complete, you will be in a position to determine your SLAs. This is likely to be a balancing act of cost versus business risk. Let's take a look at the information that you need to complete this process.

Analyzing Business-Impact Scenarios and Threats

When analyzing business-impact scenarios and threats, you should focus on understanding which could impact the continuity of your SharePoint environment, as well as the data and services built upon it. Defining your SLAs for each scenario should be left for later.

Impact scenarios and threats could range from loss of a single file, to an entire data center failure. Following is a list of potential business-impact scenarios and threats for any SharePoint deployment. Some may not be relevant to you, and you will likely find scenarios and threats that apply to your own specific environments.

  • Loss of data (including specific scenarios for SharePoint objects such as individual items and their versions, lists and document libraries, and sites and site collections)
  • Slow recovery of data
  • Service outage caused by failure of individual hardware components (such as servers, storage, networking, and their sub-components)
  • Failure or partial failure of the entire data center
  • Failure of external services provided by third parties, such as Internet service providers (ISPs)
  • Service outage caused by software failures or human error

images You should always consider software and hardware on which SharePoint has dependencies. For example, lack of redundancy for Active Directory would be classed as a risk to SharePoint. Operating level agreements (OLAs), which are SLAs between functional IT groups and services, should be put in place where necessary.

Determining Your SLAs

When deciding on your SharePoint business continuity requirements, you will need to work with the business owners and key stakeholders to come up with realistic numbers for SLAs. They may initially provide unrealistically high numbers without thinking through the reasons and consequences of doing so. More often than not, these numbers will be accompanied by an unwillingness to provide the necessary infrastructure and resources to meet the SLAs. This is where negotiation starts, during which you will need to discuss the following:

  • What the environment is going to be used for
  • What data and services need protecting and the priority for each
  • What the business impact is if the data and services are unavailable or unrecoverable, including the following:
    • Measurement of cost or lost opportunity/revenue over time
    • Possible legal implications that need discussing
  • Any existing SLAs and the reason for choosing the proposed SLAs
  • Potential chargeback levels for different SLAs
  • Penalties for failing to meet SLAs
  • How to report on adherence to SLAs
  • What infrastructure and resources are needed to meet these SLAs

The last point is your biggest bargaining chip when negotiating SLAs. Business owners are likely to think in terms of costs, and will start to compromise in order to bring costs down. As such, you should be prepared to provide estimates of costs, and justify them accordingly.

At the end of the discussion, you should end up with SLAs around which you can design your business continuity plan. You will be in a position to choose the tools and techniques required to meet these SLAs, and should have the financial backing for the infrastructure and resources needed to put them in place and support them.

First, of course, you must understand which SLAs will need defining for any SharePoint business continuity plan.

images For further information about defining SLAs, see the “Service Level Management” section within the “Microsoft Operations Framework” at http://technet.microsoft.com/en-gb/library/cc543312.aspx.

Backup and Recovery SLAs

Backup and recovery SLAs usually identify the data and services to be backed up and recovered, and the recovery time objective, recovery point objective, and recovery level objective for each. Other information included in backup and recovery SLAs may include information on backup windows.

Determining Your Recovery Time Objectives

A recovery time objective (RTO) is the maximum allowed time for a recovery procedure to take place. In other words, it is the maximum amount of downtime allowed by the business for the data or service before normal operations must be resumed. For example, the RTO for an outage that can be resolved by restoring a backup includes the following:

  • The lead time between the outage being recorded and recovery being initiated
  • The time necessary to locate the backup media
  • The time necessary to restore the backup
  • The time necessary to perform any post-restoration procedures in order to resume operation of the failed system

You may wish to define different RTOs for differing items or times. For example, the RTO for the recovery of a single item may be different from that of the whole farm, and the RTO for each may be different on weekends.

Additionally, although much more complicated to restore, you may actually define a shorter RTO for the entire farm than an individual item. This is likely because a full farm outage will incur a much larger business impact. Additionally, individual items will likely have end-user content recovery mechanisms in place, such as versioning and the SharePoint Recycle Bin.

Determining Your Recovery Point Objectives

A recovery point objective (RPO) is the maximum allowed time between the last available backup and any point in time that a failure could occur. In other words, it is the point to which data must be restorable, and represents the maximum amount of acceptable data loss. This is also known as freshness of backup, or the latency between a production data set and its redundant or replicated copy.

For example, you might establish that no more than two hours of data can be lost if a system fails. For this example, taking daily full backups and transaction log backups or log shipping at least every two hours would be a feasible solution.

images You do not need to measure RPO as an amount of time. Instead, you might measure it as a number of transactions or changes to the system.

Determining Your Recovery Level Objectives

A recovery level objective (RLO) defines the granularity with which you must be able to recover data. In SharePoint, this can mean the entire farm, a web application and associated databases, site collection, site, list or library, or individual item.

For example, you may have an RLO that individual items must be restorable, and associated RTO and RPO for individual items. Built-in features such as versioning may be used in this example to give an RLO of major versions only, or, alternatively, major and minor versions.

images A useful workbook to help you plan your backup and recovery SLAs and strategy for SharePoint 2010 is available from www.microsoft.com/downloads/en/details.aspx?FamilyID=a4e1a142-0797-4675-922d-6cc5cdb623f1&displaylang=en.

Availability SLAs

One of the most common measures of availability for a system or component is as a percentage of available uptime. This is often measured by and called the number of nines. For example, a system with a 99.999 percentage of uptime is said to have “five nines of availability.”

In general, the number of nines is not often used by engineers when modeling and measuring availability. More often, engineers speak of downtime per year. The number of nines is typically reserved for high-level discussion of availability SLAs, or in marketing documents. Table 21-1 correlates the number of nines to time equivalents.

TABLE 21-1: The Number of Nines to Time Equivalents

images

Notice how an availability SLA of five nines equates to a mere 5.26 minutes of downtime per year. As such, when discussing availability SLAs, you should think carefully about what the implications of designing for more than two nines will mean for costs.

Additionally, when calculating availability SLAs, consider that most organizations specifically exempt or add hours for planned maintenance activities. This is likely to be particularly relevant for SharePoint, where planned downtime is likely to be required for maintenance that cannot be avoided (such as patches).

Also, note that uptime and availability are not synonymous. A system can be up, but not available. For example, this may be the case if a partial network outage occurs and a web application is inaccessible to some hosts.

Disaster Recovery SLAs

For each component within a farm that is covered by a disaster recovery plan, an SLA may identify the RPOs and RTO. Different RTOs are often set for different circumstances. For example, in the event of a natural disaster, it is likely you will want (and indeed need) to define longer RTOs than those for a simple disk failure.

Balancing Costs versus Business Risk

The need to balance costs versus business risk was briefly discussed earlier in this chapter. In reality, availability SLAs that require many nines, as well as backup and recovery SLAs requiring low RTOs, RPOs, and granular RLOs, will require very expensive and complex solutions to meet them.

The answer is to ensure that the costs of the backup, recovery, and availability solutions are balanced against the risks to the business, and to the costs of outage or data loss. To achieve this, you need accurate cost estimates for outage or data loss from the business, and stakeholders will need to know the costs of various solutions from you or those responsible for providing them.

At this point, you will know what services and data you need to protect, what the SLAs are for each, and should have agreed on a budget with the business to choose the best backup, recovery, and availability solutions to help you achieve them. One final point to remember, though, is that business continuity management is an ongoing process, and you will likely need to adjust your SLAs and solutions at a later date as business requirements change.

The next section focuses on designing your backup and recovery strategy, and covers each data component within a farm, as well as some of the tools available for backing up and restoring each.

DESIGNING YOUR BACKUP AND RECOVERY STRATEGY

Once you have defined your SharePoint business continuity requirements, you can start designing an effective backup and recovery strategy. The requirements are likely to affect the following:

  • Which tools you use, based on capability of the tools versus requirements
  • Which strategy you choose for each requirement
  • The location of backups and environments

For example, if you know you must meet a specific RTO and RPO for individual items, you can plan to use the Recycle Bin and versioning to protect this content. This planning will be based on the SLAs, and will extend to the amount of space allocated and the length of time items are held for. If the SLAs do not include items deleted from the Recycle Bin, it may include negotiating departmental charges for recovering items using an alternative solution. Of course, you will need to design and plan for these alternative solutions. However, if the business has not defined the need for item-level recovery, then there is no need to design and support solutions that enable it.

The majority of recoveries for a SharePoint environment are likely to be content recovery for end users at the site, list, or item level. However, you must design a backup and recovery strategy that covers all eventualities and all data components that make up a SharePoint environment. This incorporates everything ranging from the full farm, to service applications, individual items, and customizations. The following sections detail each of these data components, and some of the tools available for backing up and restoring each.

Farm

Full farm backups can be performed in SharePoint 2010 by using Central Administration or the SharePoint 2010 Management Shell. A farm backup includes backing up all the components that make up a SharePoint farm, with the exception of some customizations, configuration settings, and physical server backups for bare-metal recovery.

  • You can choose from two options when you perform a farm backup:
    • If you choose content and configuration data (the default), the entire server farm is backed up, including settings from the configuration database.
    • If you choose configuration only, some of the configuration database settings for the farm are backed up without content.

      images Backing up a farm backs up the configuration database and the Central Administration content databases, but you cannot restore these by using SharePoint 2010 tools. This is because restoring these databases in SharePoint 2010 is still unsupported.

    • Web applications, associated content databases and various settings
    • Service applications, associated databases, and various settings.

When a farm backup is initiated, a SQL Server database backup is started for content and service application databases. The search index files are backed up and synchronized with the search database backups. Configuration settings for the farm are written to XML files that are included in the backup.

images For detailed steps on how to perform a farm backup and restore of a full farm or components by using either Central Administration or the SharePoint 2010 Management Shell, see http://technet.microsoft.com/en-us/library/ee428316.aspx and http://technet.microsoft.com/en-us/library/ee428314.aspx, respectively.

Configuration-only Backup and Restore

A configuration-only backup extracts and backs up the configuration settings for the farm from a configuration database. A configuration backup can be restored to the same or any other server farm. For example, you might want to perform configuration-only restores in test, development, or standby environments. Additionally, if you are using SQL Server to back up the databases for a farm, you will want to back up the configuration separately.

In order to restore configuration settings, a new or alternate farm (configuration database) must already be provisioned. Upon restore, the settings contained within the backup will overwrite any settings in the new or alternate farm. If any settings present in the farm are not contained in the configuration backup, they will not be changed.

A configuration-only backup and restore can only be performed by using built-in tools. This includes Central Administration and the SharePoint 2010 Management Shell for both procedures. When using the SharePoint 2010 Management Shell, you can perform a configuration-only backup by using the Backup-SPFarm cmdlet with the -ConfigurationOnly parameter.

You can also perform a configuration-only backup of the current farm or a configuration database that comprises another farm by using the Backup-SPConfigurationDatabase cmdlet, as shown here:

Backup-SPConfigurationDatabase -DatabaseName SharePoint_Config -
DatabaseServer SqlServer1 –Directory \servershareBackup

A configuration-only restore can be performed by using the Restore-SPFarm cmdlet with the -ConfigurationOnly parameter.

images For detailed steps on how to perform a configuration-only backup and restore, see http://technet.microsoft.com/en-us/library/ee428320.aspx and http://technet.microsoft.com/en-us/library/ee428326.aspx, respectively.

Considerations for Using Farm Backups

There is no built-in scheduling for farm backups. If you wish to schedule full farm or configuration-only backups, you must create a backup script in Windows PowerShell, and then schedule the script to run as a Windows task.

Additionally, since all user data is stored in SQL Server databases, unless a remote BLOB storage solution is used, it is likely that most organizations will prefer to use existing backup and recovery strategies for their SharePoint data where possible. In most cases, this will mean backup and recovery by using SQL Server tools or third-party solutions. This is a perfectly acceptable approach for content databases and most service applications, and is discussed in more depth later in this chapter.

If this approach is adopted, you can still take full farm backups when needed, or take configuration-only backups by using Central Administration or Windows PowerShell to protect farm settings.

What's Backed Up?

In addition to web applications, service applications, and the settings and databases associated with them, a full farm backup will include the following settings and features of a server farm:

  • Antivirus settings
  • Information Rights Management (IRM) settings
  • Outbound e-mail settings (only restored when performing an overwrite)
  • Customizations deployed as trusted solutions
  • Diagnostic logging settings (everything except for the trace log location)
  • Managed account automatic password change settings
  • InfoPath Forms Services settings and exempt user agents
  • Active Directory account creation mode settings
  • Quota templates
  • Managed paths for host-named site collections
  • Sandboxed user code service settings

Features at the web application level are backed up and restored as part of the default backup and restore behavior. Configuration-only backups take farm-scoped Features, as long as they are activated. Upon restore, the Features will be installed, and an attempt will be made to force-activate them.

images Web application and service application settings are not included in a configuration-only backup. Only the items described previously will be included. You can back up settings for web applications and service applications as part of a full back up, or use Windows PowerShell cmdlets to manually document and copy these settings. For more information, refer to “Document farm configuration settings (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/ff645391.aspx.

What's Not Backed Up?

The following settings and features are not included in a full farm backup or configuration-only backup of a server farm:

  • Direct changes to web.config files that are not made through the SharePoint API
  • Customizations that are not deployed as part of a trusted solution
  • Application pool account passwords
  • HTTP compression settings
  • Time-out settings
  • Custom Internet Server Application Programming Interface (ISAPI) filters
  • Computer domain membership
  • Internet Protocol security (IPsec) settings
  • Network Load Balancing (NLB) settings
  • Secure Sockets Layer (SSL) certificates
  • Dedicated IP address settings
  • Certificates used to form trust relationships

These are settings that are stored on SharePoint servers. As such, you must plan to document these settings and features, and, where necessary, back them up manually.

Web Applications

As with farm backups, web applications can be backed up or restored by using Central Administration or the SharePoint 2010 Management Shell. You cannot back up a complete web application by using SQL Server tools. However, you can use SQL Server tools to back up the content databases individually. When you choose to back up a web application, all content databases associated with that web application and the following settings will be included in the backup:

  • Application pool name and application pool account
  • Service accounts
  • Internet Information Services (IIS) binding information, such as the protocol type, host header, and port number
  • General web application settings, such as alerts and managed paths
  • Changes to the web.config file that have been made through the SharePoint API
  • Authentication settings

images These settings are only included in a backup if made by using the SharePoint API. Changes made manually through IIS, for example, will not be included in the backup.

If you choose to use SQL Server tools or a third-party product to back up your content databases, and you perform configuration-only backups, the previously described settings will not be backed up. In order to back up these settings separately, you must either manually back them up by using other methods, or you could periodically perform a full farm backup without the content databases attached. This would allow you to back up the farm settings and the web application settings, in addition to any service applications that you select, but not duplicate content database backups. Alternatively, you could employ a scripted deployment strategy and create scripts containing these settings based on your design documentation.

Service Applications

Service applications can consist of both service settings and one or more databases, or just service settings. Central Administration and the SharePoint 2010 Management Shell are the only tools that you can use to back up and restore both the service settings and the databases. However, you can use other tools to back up and restore the databases for most service applications, and then manually reprovision the service application.

If you select the Shared Services node when using SharePoint to perform a backup, all of the shared service applications and proxies in the farm are backed up at once. For each service application, a database backup is started if the service application has an associated database, followed by a backup of the service configuration.

However, if you select individual service applications, the related proxies are not backed up. To back up both the service application and the proxy, you must perform two backups — first of the service application, and then of the proxy.

In the event a specific service application fails, it is likely that you will want to restore that service application only and not the complete farm. It is important to remember that some service applications provide data to other services and sites. As a result, users might experience some service interruption until the recovery process is finished. This could have significant impact on your ability to meet SLAs, and should be a key consideration when planning your backup and recovery strategy for your service applications.

In addition, if you are sharing service applications across farms, be aware that trust certificates that have been exchanged are not included in farm backups. You must back up your certificate store separately. When you restore a farm that shares a service application, you must import and redeploy the certificates, and then re-establish any inter-farm trusts. This process is likely to have an impact on your RTO for these service applications and services that depend on them.

Search Service Application

The search service application is a special case. You cannot use SQL Server tools to back up the search service application because of its distributed architecture and associated dependencies between the various search databases and index files. When a backup is started by using SharePoint, index merges are prevented, and crawling is paused when necessary to enable consistent backups. A restore will place all components and data back in the correct location, and resume search activities.

In past versions of SharePoint, if no SLAs were defined for the search service, one approach used by many was to perform a full crawl in the event of a failure. This approach is perfectly feasible; indeed it could be faster and simpler in some environments. But you should carefully consider the time it takes to perform a full crawl and the impact it will have on your users and farm before adopting this approach. Additionally, this approach will require that you document and the reconfigure all search settings such as content sources, crawl rules, and managed properties.

Content Databases

The SharePoint backup tools provide a good solution for farm settings and service applications. However, content databases are usually the most important item when discussing backup and recovery. For most organizations, making use of existing backup and recovery strategies and tools is the desired (and often the best) approach for content databases. They often account for the majority of the storage required for backups. As such, it is usually a sensible approach to separate backing them up from other services and settings.

In fact, this is a very common strategy. Provided you have a complete content database backup, you can restore a copy elsewhere, and then use granular backup and recovery procedures to extract content from within the content database and restore it to its original location.

If you have an RPO of 24 hours for all content in the farm, then you should ensure that backups are made of your content databases at least every 24 hours. You may choose to take weekly full backups and daily differential backups, depending on the size of your databases, your data churn, and the number of databases you must back up. If you have a much shorter RPO, or a very short RTO, you may need to adopt a strategy of transaction log backups on a frequent basis. This will allow you to recover to a much more recent point in time.

images If you are designing an environment with a large number of high-churn databases that have very short RPOs and RTOs, you may need to choose different tools, or augment existing tools, to protect your farm. Products such as System Center Data Protection Manager provide continuous protection of SharePoint by using the efficient Volume Shadow Copy Service (VSS) technology to perform incremental backups allowing for very short RPOs and RTOs.

Another consideration is that you can attach different SLAs to different content databases and the content that resides within them. For example, a recommended approach would be to separate business-critical site collections into their own content databases, and then design appropriate backup and recovery strategies for each. Content databases that host My Site data may not be a high priority; as such, they may have an RTO of 24 hours instead of 1 hour, as defined for a content database hosting a corporate intranet.

Post-Restore Steps

If you adopt a strategy of using tools outside of SharePoint to back up and restore content databases, you must consider the post-restore steps that may be required to ensure that all sites are accessible. If you restore a content database that contains a deleted site collection, the information about that site collection will not be present in the farm configuration database upon restore.

In SharePoint 2007, you had to detach and reattach content databases after a restore to refresh the sitemap table in the configuration database. This table contains a mapping of site URLs to content databases. When a request is received for a URL, the database containing the site is looked up and the content is fetched.

In SharePoint 2010, you can refresh the sitemap table by using the traditional method through the Dismount-SPContentDatabase and Mount-SPContentDatabase cmdlets. Or, you can update the sitemap without needing to detach and reattach a content database from the farm at all. This has the benefit of allowing the database to remain online once restored.

The following example shows how to refresh the configuration database with the site information from a database named WSS_Content:

$database = Get-SPContentDatabase -Identity WSS_Content
$database.RefreshSitesInConfigurationDatabase()

Content Stored in Remote BLOB Stores

Remote BLOB Storage (RBS) is designed to move the storage of BLOBs from database servers to commodity storage solutions. RBS saves significant space, conserves expensive server resources, and provides a standardized model for applications to access BLOB data. RBS is supported in SharePoint 2010 for content databases in SQL Server 2008. As such, the data stored outside of SharePoint content databases must be considered in any backup and recovery plan.

Fortunately, content in remote BLOB storage is backed up and restored transparently along with other content (such as traditional content databases), as long as the RBS provider in use has this capability. The SharePoint 2010 backup and restore tools and SQL Server 2008 can back up and restore content that is stored in remote BLOB stores when you use the SQL FILESTREAM provider. Third-party RBS providers will likely require additional consideration.

You can use granular tools in the same way as usual to back up/export or restore/import content when using RBS. For example, during restore or import, content will be placed inside the database if you are restoring or importing it to a database without RBS enabled. However, when performing a full database restore, if a content database is set to use the SQL FILESTREAM RBS provider, the RBS provider must be installed both on the database server that is being backed up, and on the database server that is being recovered to.

Granular Backup and Recovery

A SharePoint environment is likely to have an RLO defined that requires backup and recovery of content within a content database. This granularity includes the following:

  • Site collections
  • Sites
  • Lists and libraries
  • Individual items and their versions

End-user features such as the Recycle Bin and versioning help address the need for frequent item-level restores, but only provide a certain level of protection. If this functionality satisfies the defined SLAs, then there is no need to come up with a backup and recovery strategy for granular items. However, it is highly likely that additional protection will be required where content is deleted from the Recycle Bin.

images Much has been written regarding how to design and configure the Recycle Bin and versioning to support SLAs. For additional information, see the article “Plan to protect content by using recycle bins and versioning (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/cc263011.aspx.

Recovering content from within a content database has always been a fairly strenuous task — often with the requirement for a secondary “recovery” farm that is used as a staging area for a restored copy of the database, while the offending site is extracted, before being manually restored back to its original location. SharePoint 2010 offers a number of enhancements and new features that make this task a lot easier and much better for system performance.

Site Collections

Site collections are the top-level logical container for content within SharePoint. As such, you may decide to attach SLAs at the site-collection level, and plan a strategy for site collection backup and recovery. Using this approach was examined earlier in this chapter during the discussion of content databases.

SharePoint has long provided the capability to perform site collection backup and recovery. In SharePoint 2010, you can use the Backup-SPSite and Restore-SPSite cmdlets to achieve this. Now, in SharePoint 2010, you can also use Central Administration to back up a site collection, as shown in Figure 21-1, although it cannot be used for restoring a site collection.

images

FIGURE 21-1: The “Site collection backup” page in Central Administration

An important point to remember before you start planning for multiple site collection backups is that a content database can contain many site collections. Performing an I/O-intensive operation on a site collection will affect all site collections that share the same content database.

As such, the use of site collection backups alone is not regarded as a very good backup strategy. In general, site collection backups should only be used to back up a site collection from a restored copy of a content database, and restore that site collection back to its original or a new location.

For both performance reasons and backup and recovery, when a site collection becomes large, it should be moved to its own content database. The upper limit for using site collection backup in SharePoint 2010 is 100 GB. However, moving a site collection to its own content database before it reaches less than half this size would be a sensible design decision.

When restoring a site collection, if it is 1 GB or larger in size, you can use the GradualDelete parameter for better performance during the restore process. When you use this parameter, the site collection that is overwritten is marked as deleted, which immediately prevents any additional access to its content. The data in the marked site collection is then deleted gradually over time by a timer job, instead of all at the same time, which reduces the impact on server performance.

images For detailed steps on how to perform site collection backup and restore, see http://technet.microsoft.com/en-us/library/ee748617.aspx and http://technet.microsoft.com/en-us/library/ee748655.aspx, respectively.

Sites, Lists, and Libraries

Previous versions of SharePoint provided the capability to export sites by using stsadm and the Export and Import operations. SharePoint 2010 now has PowerShell equivalents: Export-SPWeb and Import-SPWeb. These cmdlets still serve the purpose of being content migration tools, as opposed to full-fidelity backup tools. However, they do need to be considered as part of any backup and recovery strategy for recovering sites and lower-level content. Indeed, unless you choose to use third-party solutions, using export and import provides the only mechanism for extracting site-level content or below from a content database.

images Using the export operation saves data, but it is not the same as using the backup operation. You cannot save workflows, alerts, Features, solutions, or the Recycle Bin state by using the export operation.

It is important that you plan for this when designing your granular backup and recovery strategy. You will need to factor in the time and resources needed to perform these export and import operations. You may find that the manual process is too time-consuming or error-prone, and opt to use a third-party solution for your granular backup and recovery needs.

SharePoint 2010 provides some enhancements to export and import capabilities. You can now use Central Administration to export content, although you must use PowerShell to import the content. You can also export and import content down to the list or library level by using built-in tools, whereas previously you could only use built-in tools at the site level.

The second of these enhancements makes quite a significant difference to the time required for recovery when you just need to recover a list or library. It also makes it possible to easily design for an RLO at the list or library level without the need for third-party solutions when an item is no longer protected by the Recycle Bin. Figure 21-2 shows the Site Or List Export page in Central Administration.

images

FIGURE 21-2: The Site Or List Export page in Central Administration

images For detailed steps on how to perform an export and import of content, see http://technet.microsoft.com/en-us/library/ee428301.aspx and http://technet.microsoft.com/en-us/library/ee428322.aspx, respectively.

The Site Recycle Bin

SharePoint 2010 with Service Pack 1 adds new functionality that allows deleted sites and site collections to be stored in the Recycle Bin. In prior versions, if a site was accidentally deleted, a database restore was required to a “recovery farm,” and then the site or site collection had to be backed up or exported, and restored or imported again. This approach was time-consuming and resource-intensive.

With Service Pack 1 for SharePoint 2010, a deleted site is stored in the second-stage Recycle Bin, and a deleted site collection is retained in the content database as an SPDeletedSite object.

Sites and site collections are subject to the standard web application settings for the Recycle Bin. By default, content is deleted from the Recycle Bin after 30 days. Sites will also be subject to the percent of live site quota for second-stage deleted items setting, which is set to 50 percent by default.

A site can be restored by a site collection administrator from the “Deleted from end user Recycle Bin” view of the site collection Recycle Bin. Restoring a site collection requires access to the SharePoint 2010 Management Shell and use of the Get-SPDeletedSite and Restore-SPDeletedSite cmdlets.

For example, you can use the following commands to get a list of deleted site collections and restore a specific site collection. The SiteId is obtained by first running the Get-SPDeletedSite cmdlet:

Get-SPDeletedSite
Get-SPDeletedSite | Where {$_.SiteId -eq “e4e57440-2933-42b7-a0f9-
     346a79c84865”} | Restore-SPDeletedSite

If a site has since been created using the same URL as the original site, the restore will fail. The existing site must be moved or deleted first.

images You can permanently remove a site collection from the Recycle Bin by using the Remove-SPDeletedSite cmdlet. For more information on restoring a deleted site or site collection from the Recycle Bin, see “Restore a deleted site (SharePoint Server 2010)” and “Restore a deleted site collection (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/hh272540.aspx and http://technet.microsoft.com/en-us/library/hh272537.aspx, respectively.

Unattached Content Database Data Recovery

The granular backup and recovery strategy for the content discussed previously has always required a secondary “recovery” farm or web application that is used as a staging area for backing up and extracting content from restored databases. The content would then be restored or imported back to its original location by the SharePoint administrator.

In previous versions of SharePoint, this approach was cumbersome because it required the ongoing maintenance, support, and licensing of a secondary environment. Any customizations or SharePoint updates applied to the production environment would also need to be applied to the recovery farm. Failure to do this would lead to database attach errors after restoring a copy of a database, or sites that would fail to export because of missing dependencies.

Possibly the biggest enhancement to granular backup and recovery in SharePoint 2010 is the new unattached content database data recovery feature. It allows you to connect to a content database that is not attached to a SharePoint web application, and select objects down to the list or library level for backup or export. These objects can then be restored or imported using standard approaches already discussed. Figure 21-3 shows the Unattached Content Database Data Recovery page.

images

FIGURE 21-3: The Unattached Content Database Data Recovery page in Central Administration

You can also use Windows PowerShell to connect to an unattached content database by using the ConnectAsUnattachedDatabase parameter of the Get-SPContentDatabase cmdlet. You can then use the Backup-SPSite or Export-SPWeb cmdlets to retrieve content from the database.

images If you try to connect to a database at a different version level than the current farm, you will get a “Compatibility range mismatch” error. The farm and the database must be upgraded to the same version and build level in order to use the feature. This scenario is only likely to occur if you need to recover content from a database backup that is older than the installation of a recent SharePoint update.

This feature alone decreases the administrative burden when supporting granular backup and recovery SLAs in SharePoint 2010. Content can now be restored much easier and faster without the need for a recovery environment.

Database Snapshots

With SharePoint 2010, if you are using SQL Server Enterprise Edition, the granular backup system can optionally use SQL Server database snapshots to ensure that data remains consistent while the backup or export is in progress. Database snapshots are read-only, static views of a database. Each database snapshot is transactionally consistent with the source database as of the moment when the snapshot was created.

When a backup or export is requested, a SQL Server database snapshot of the appropriate content database is taken, SharePoint 2010 uses it to create the backup or export package, and then the snapshot is deleted. This functionality is unlikely to provide much benefit when recovering content from a restored copy of a content database. But if you need to back up or export content from a live database, it offers the benefit of consistency while the content remains unaffected by the backup or export process.

In order to use a snapshot during content export or backup, you must use the UseSqlSnapshot parameter with either the Backup-SPSite or Export-SPWeb cmdlets.

images Using database snapshots outside of the built-in SharePoint tools is also a valid backup and recovery strategy. For more information, see “Back up databases to snapshots (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/ee748594.aspx.

Individual Items

If you are not using third-party products, the backup and recovery strategy adopted for individual items in SharePoint has always been one of having to restore a copy of a content database, and then exporting the lowest-level object before importing it into a test site for download. You must then manually download the item from the test site and upload it back to the production site. As already discussed, in previous versions of SharePoint, this would have been a site; in SharePoint 2010, it is a list or document library. Additionally, the unattached content database data recovery feature in SharePoint 2010 makes this approach a whole lot easier and faster.

This strategy is often suitable for most organizations, since versioning and the Recycle Bin provide the first line of defense against accidentally deleted or overwritten items. However, this approach is both time-consuming and error-prone. If you have very tight RTOs for individual items, then you may need to look into third-party solutions for item-level recovery. These solutions automate much of this process, automatically restoring individual items back to their original location in a very short time, and without any manual steps.

Alternatively, you could use the SharePoint content migration API and the unattached content database data recovery feature to write your own solution that works at a lower level than lists or libraries.

Customizations

The way in which customizations are created and deployed has a significant impact on the tools and strategies that can be used to back up and recover them.

Customizations deployed in sandboxed solutions, or those that contain authored site elements (such as master pages, layout pages, cascading style sheets, and forms), are contained within content databases, and will be covered by your backup strategy for content databases.

Customizations deployed as trusted solutions offer the best method to package and deploy all other customizations, while also providing the easiest approach for backup and recovery. Since they are stored in the farm configuration database, the solutions are backed up in a farm backup or configuration-only backup. You simply restore and redeploy each solution when required. Solution packages are even deployed automatically to new servers, or servers that are rebuilt after a disaster.

Customizations that are not packaged as solution packages will have a more complex backup and recovery process. For example, changes to web.config files without using the SharePoint APIs, or manually copying feature files, will prevent SharePoint backup tools from “knowing” about the customizations. Where possible, you should consult the development team or customization vendor to determine if they can package the customizations in solution packages. If that is not possible, for each of these customizations, you must identify the files and settings that need backing up, and use manual procedures or other tools such as Windows Backup.

Common customizations that may be deployed to SharePoint servers without the use of solution packages can include the following:

  • Web parts, site, or list definitions, custom columns, new content types, custom fields, custom actions, coded workflows, or workflow activities and conditions
  • Third-party solutions and their associated binary files and Registry keys, such as IFilters
  • Changes to standard XML files
  • Custom site definitions
  • Changes to the web.config files

images As part of your development backup procedures, you should ensure that you keep separate copies of your solution packages outside of SharePoint in case the farm configuration database fails and you do not have configuration backups.

Choosing Backup and Recovery Tools

Thus far, you have learned about some of the out-of-the-box backup and recovery tools. In order to choose the right tools for backup and recovery, you must determine whether you can meet the defined SLAs within your budget for providing business continuity.

It is not uncommon to use more than one tool when protecting an environment, especially when some tools better meet your needs, or they are already in use within the organization.

Following are some key factors to consider when choosing tools:

  • Speed of backup and recovery
  • Space required/backup type supported (full, differential, or incremental)
  • Support for encryption, compression, and other common features of backup and recovery tools
  • Completeness of recovery
  • Granularity offered
  • Complexity of managing the tool
  • Familiarity with the tool
  • Cost to license and support the tool

Designing your availability strategy is another key component of defining any SharePoint business continuity management plan. The next section provides guidance for redundancy of SharePoint across the application and data tiers.

DESIGNING YOUR AVAILABILITY STRATEGY

Planning for and ensuring that an IT service can continue to operate correctly after failure of an individual component is something that IT professionals deal with routinely. Common techniques for providing hardware fault tolerance of individual components include redundant hard drive disk arrays, redundant power supplies, and multiple network interface cards (NICs).

Regardless of this, protecting every component within a server is neither practical nor cost-effective. Multiple servers are often used to ensure both scalability and redundancy of server roles for any IT service. SharePoint is no exception. Where possible, you should plan for hardware fault tolerance, in addition to increased redundancy of server roles.

Figure 21-4 shows a server farm configured for a minimum level of redundancy at every tier. The focus of this section will be redundancy of server roles.

images

FIGURE 21-4: A typical server farm configured for redundancy at each tier

SQL Server Redundancy

Given the reliance on databases, SQL Server redundancy is extremely important for any SharePoint farm. Failover clustering and high-availability database mirroring can be used to ensure the availability of your databases in the event of a failure.

Failover Clustering

Failover clustering is one of the most common methods for providing redundancy for all databases at the instance level in SQL Server.

A failover cluster is a combination of one or more nodes (servers), and two or more shared disks. The failover cluster instance appears on the network as a single computer, but has functionality that provides failover from one node to another if the current node becomes unavailable. A basic failover cluster will consist of two nodes configured in an active/passive cluster configuration, where the passive node will remain redundant until the active node in the cluster fails.

A SQL Server failover cluster is seen as a single database service by SharePoint. Therefore, failover is automatic and transparent to SharePoint when it occurs. Failover clustering is the recommended approach for providing basic automated redundancy of the database role in a SharePoint farm.

images For more information about failover clustering, refer to “Getting Started with SQL Server 2008 R2 Failover Clustering” at http://go.microsoft.com/fwlink/?LinkID=102837&clcid=0x409.

Database Mirroring

Database mirroring can be used to provide database redundancy on a per-database basis. It works by sending transactions from a principal database and server to a mirror database and server. A number of modes are available with database mirroring, but to provide redundancy with automatic failover, you must use high-availability mirroring, also known as high-safety mode with automatic failover.

High-availability database mirroring involves three server instances: a principal, a mirror, and a witness. To ensure consistency between principal server and mirror server, transactions are not committed on the principal server until they have been committed on the mirror server. The witness server enables automatic failover from the principal server to the mirror server (typically in a matter of seconds).

In previous versions of SharePoint, fully automatic failover was not easily possible because SharePoint was not mirroring-aware. It was necessary to manually update the name of the database server in SharePoint, or “trick” SharePoint through the use of SQL Server aliasing or some other method. Either approach required an IISReset on every server in the farm in order to refresh connection information on those servers.

SharePoint 2010 is mirroring-aware, allowing you to configure the mirror database server location for each database. Setting a mirror database location adds a parameter to the connection that SharePoint uses to connect to SQL Server. If the principal server becomes unavailable, the witness server automatically swaps the roles of the principal and mirror databases, and SharePoint automatically attempts to contact the server that is specified as the mirror location.

You can use Central Administration or Windows PowerShell to configure a mirror database server location. At the bottom of Figure 21-5, you can see the Failover Database Server option given when creating or editing the settings for a content database or service application database in Central Administration.

images

FIGURE 21-5: The Failover Database Server option in Central Administration

You cannot use Central Administration to configure a failover server for the Central Administration content database or the configuration database. You must use Windows PowerShell to do this by using the following commands:

$db = Get-SPDatabase | where {$_.Name -match “DatabaseName”}
$db.AddFailoverServiceInstance(“MirrorServerName”)
$db.Update()

You can use the following commands to find out which databases have not been set up with a failover server:

$dbs = Get-SPDatabase
foreach ($db in $dbs) {if (!$db.FailoverServiceInstance) {$db.Name}}

images For information on configuring database mirroring (including requirements for database mirroring), see “Configure availability by using SQL Server database mirroring (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/dd207314.aspx.

Database mirroring provides a good alternative to failover clustering. It provides the same automatic failover, but also protects against failed storage and allows for the use of less-expensive direct-attached storage (DAS). However, this is at double the storage requirement, as well as some performance overhead, requiring additional memory and processor resources for each mirrored database.

When choosing the approach for your environment, you should take these points into consideration while also considering the SLAs that you need to meet. If you are planning to use RBS, you should also consider that you cannot mirror databases that have been configured to use the SQL Server FILESTREAM RBS provider.

SharePoint Server Redundancy

SharePoint servers that serve content to end users are typically labeled as Web Front End (WFE) servers. These servers have the Microsoft SharePoint Foundation Web Application role configured. To provide redundancy for WFE servers, you need more than one server hosting this role, and these servers require load-balancing technology in order to balance the load between them and provide redundancy should a server fail. You can implement load balancing by using software such as the Network Load Balancing (NLB) component of Windows Server, or by using a dedicated hardware device.

Software load balancing is cost-effective, because it is generally provided by a service running on the load-balanced servers themselves. In some cases (for example, with NLB), this has a result of consuming additional hardware resources.

Hardware load balancing is provided by a dedicated hardware device, which is generally running a proprietary operating system. The device does not consume additional resources on the load-balanced servers, because it acts independently. Hardware devices generally provide a richer feature set and greater scalability than software load balancers do.

SharePoint servers that hold roles such as search are typically labeled as application servers. These servers do not require load-balancing technology, since SharePoint provides load-balancing internally for service applications. In reality, servers labeled as either WFE servers or application servers may hold many different roles. For example, it is common for the search query role to be configured on WFE servers.

Redundancy Strategies for Service Applications

The redundancy strategy you choose for protecting service applications that run in a farm varies, depending on if the service application stores data in a database.

Service Applications that Store Data Outside of a Database

Protecting service applications that store data outside a database is as simple as provisioning the service application on multiple application servers to provide redundancy within the environment. This keeps the service application running, but does not guarantee against data loss. If an application server fails, the active connections for that application server will be lost, and users will lose some data. This is unavoidable.

The following service applications store data outside a database:

  • Access Services
  • Excel Services Application

Service Applications that Store Data in Databases

Protecting service applications that store data in databases requires that you provision the service application on multiple application servers, and that you configure SQL Server failover clustering or database mirroring.

The following service applications store data in databases:

  • Search service application (Search Administration, Crawl, and Property databases)

    images The Search service application is a special case for redundancy within a farm. For more information, see Chapter 27.

  • User Profile service (Profiles, Social, and Synchronization databases)
  • Business Data Connectivity service application
  • Application Registry service application
  • Usage and Health Data Collection service application

    images Mirroring the Usage and Health Data Collection service application Logging database is not recommended because of high throughput.

  • Managed Metadata service application
  • Secure Store service application
  • State service application
  • Web Analytics service application (Reporting and Staging databases)
  • Word Automation Services service application
  • Microsoft SharePoint Foundation Subscription Settings Service
  • PerformancePoint Services

    images Mirroring some service application databases is only supported when using synchronous mirroring. Additionally, at release, mirroring was not supported at all for some databases. For the latest guidance, see “Plan for availability (SharePoint Server 2010)” at http://technet.microsoft.com/en-us/library/cc748824.aspx.

Redundancy for Closely Located Data Centers

It is common for large organizations to have multiple data centers, some of which are located close to one another with high-bandwidth, low-latency links. The purpose of having multiple local data centers is to provide redundancy and automatic failover in the event of a fault in one of them. Although this design does not provide a means for disaster recovery in the event of a local or regional disaster, it does provide a suitable redundancy solution for most applications that can be spanned across two or more data centers.

A SharePoint farm can be designed to span two or more closely located data centers, provided there is less than 1 millisecond (ms) latency between SQL Server and the WFE servers in one direction, and at least 1 gigabit per second (Gbps) bandwidth. These are tough requirements to meet, but certainly achievable for data centers located a small distance apart. It is not supported to design such a farm if you cannot meet these requirements.

The farm is designed in the same way as usual for providing redundancy, but redundant servers are located in each data center, and a load-balancing device directs traffic between both, as shown in Figure 21-6.

images

FIGURE 21-6: Redundancy for closely located data centers

images SQL Server code-named “Denali” introduces a new feature called AlwaysOn Availability Groups for enhancing the availability of databases. The feature looks similar to database mirroring, but it is actually a combination of both database mirroring and clustering. Each availability group contains a set of databases known as availability databases that fail over together. An availability group can have multiple failover targets (or secondary replicas), and you can configure secondary replicas to support read-only access to secondary databases. This is a vast improvement on current versions of SQL Server, and looks promising for high availability and disaster recovery in SharePoint. Importantly, this feature is supported for use with SharePoint versions from SharePoint 2010 Service Pack 1.

An availability strategy covers the planning and design for providing redundancy of components within a SharePoint farm. However, when that data center hosting the servers in the farm fails, a disaster recovery strategy is required. The next section covers the considerations for designing a SharePoint disaster recovery strategy.

DESIGNING YOUR DISASTER RECOVERY STRATEGY

Disaster recovery means different things to different people, depending on the service or data. In the context of this chapter, you should think of a disaster recovery strategy as the plan for recovering from failure of a data center that hosts your SharePoint environment. This will usually be caused by a local disaster that cannot be recovered from quickly. Equally, failure of your farm configuration database while the data center is still active could be deemed as a disaster, and be classed as a trigger for initiating your disaster recovery plan.

A SharePoint disaster recovery strategy will almost always involve a standby farm running in a different location. A standby farm is often referred to as a hot, warm, or cold standby, depending on the time to get it up and running.

A hot standby farm can resume services within seconds or minutes, whereas a cold standby farm will take hours or even days to resume services. As with everything in the realm of business continuity management, you should base your solution on your SLAs. Providing a hot standby solution when the business does not ask for it does not make sense.

A hot standby farm would usually cost a lot more to design and run than a cold standby farm. As a rule of thumb, the shorter the interval between failure and recovery of services, the more complex and expensive the solution is likely to be. Additionally, the amount of data you are replicating between farms will have a large bearing on cost also.

images No matter which disaster recovery solution you decide to implement for your environment, you are likely to incur some data loss, however small.

Designing for disaster recovery in SharePoint requires knowledge of the solutions that can be used, tied together with an understanding of how those solutions are supported for use with SharePoint. There are a number of supportability considerations when using solutions such as log shipping or database mirroring in a disaster recovery scenario. This section explains the various solutions available, and the supportability considerations for each.

images It is important to note that your SharePoint disaster recovery strategy must be coordinated with the disaster recovery strategy for other services on which SharePoint depends (such as Active Directory). You shouldn't assume that a viable plan is in place for such services.

Cold Standby Farm

A cold standby farm disaster recovery strategy typically ships backups for bare-metal recovery to local and regional locations. You can recover in the event of a disaster by setting up a new farm in a new location and restoring your backups. Having arrangements in place for short-term equipment hire and hosting is usually the best approach to take.

This approach involves the following steps:

  1. Install and configure SharePoint 2010 (preferably by using a scripted deployment)
  2. Restore customizations
  3. Restore a farm backup

This approach is often the cheapest option to maintain operationally, but with the disadvantage of additional costs associated with recovery from every disaster. It is also the slowest option for recovery, and relies on backups being fully restorable. If costs are an issue, and SharePoint is not a business-critical service, this is likely to be the best approach to take.

Warm Standby Farm

A warm standby farm disaster recovery strategy typically ships virtual server images to local and regional farms. Virtual images of servers in your farm are taken frequently and, when required, can be brought online to recover your farm in the secondary location.

images This approach requires knowledge of virtualization options for a SharePoint farm, as well as performance and supportability implications. For example, taking a snapshot of a SharePoint server in an active farm is currently unsupported. For more information, see Chapter 23.

This approach offers a balance between the cold and hot standby farm options. It is relatively inexpensive to perform recovery, but does have relatively complex recovery procedures and operational costs. The trade-off is that you can perform a full farm recovery in a matter of minutes or hours if required.

This approach is likely to be the most suitable for most organizations that have secondary data centers, but do not need near-instant disaster recovery for their SharePoint environment. If you need near-instant disaster recovery, a hot standby farm is the approach to take.

Hot Standby Farm

A hot standby farm disaster recovery strategy typically involves maintaining a duplicate passive farm in another data center ready to be brought online in the event of a disaster. All settings, customizations, and updates are applied in both farms. Content databases and some service application databases are asynchronously mirrored or log-shipped to the secondary farm.

Upon failure, databases are bought online in the secondary farm, and DNS records are updated to resolve traffic to the servers in the secondary farm. Figure 21-7 shows a typical hot standby farm disaster recovery strategy for SharePoint before failover.

images

FIGURE 21-7: A typical hot standby farm disaster recovery strategy before failover

images Asynchronously mirroring or log-shipping a configuration database, Central Administration content database, and some service application databases is not supported. You should maintain a separate configuration and Central Administration content database in your hot standby farm. Considerations for service applications are discussed later in this section.

If you choose to use log shipping, this strategy can be repeated across many data centers. Unfortunately, database mirroring only allows you to copy databases to a single mirror server. In larger organizations, making use of storage area network (SAN) replication is a viable alternative to log shipping or database mirroring. You should consult with your SAN vendor to determine if SAN replication is supported and feasible for your environment.

This approach offers superior speed to recovery, although this comes at a high cost and the additional complexity involved in configuring and maintaining both farms. If your disaster recovery SLAs require a very short RTO, this is the approach you should adopt.

images You will need to refresh the sitemap in the configuration database for the secondary farm when it is brought online. Information on how to achieve this was covered in the discussion, “Content Databases,” in the “Designing Your Backup and Recovery Strategy” section earlier in this chapter.

When providing a hot standby farm disaster recovery solution, you must consider the various service applications configured in your primary farm. Where possible, services that can be run cross-farm should exist in a separate services farm that is accessible from both the primary and secondary data centers.

Service applications that cannot be run cross-farm, or that you do not wish to run cross-farm, will require a redundancy strategy that depends on your SLAs for the service applications. This will also depend on whether the databases associated with the service applications can be asynchronously mirrored or log-shipped.

For example, search requires complete synchronization between its databases and index. Because of this requirement, you cannot replicate search between farms by using an asynchronous replication mechanism. To provide up-to-date search on a failover farm, you must configure search separately on the secondary farm. This often presents a problem if you want search online quickly after a disaster, because you cannot crawl content databases while they are mirrored. A solution to this is to use SQL Server snapshots to periodically snapshot the content database, and then crawl the snapshot. Alternatively, you can log-ship in stand-by mode and crawl the read-only copy.

The profile database (which is part of the User Profile service application) is another example of a database that cannot be mirrored or log-shipped in this scenario. To provide redundancy for the User Profile service application, you must configure the service application in the secondary farm, and use the User Profile Replication Engine that is included in the SharePoint Administration Toolkit.

images Rather than list each service application here, as well as the supportability implications and requirements for each, see “Planning for hot standby data centers” at http://technet.microsoft.com/en-us/library/ff628971.aspx#Section3 for the latest guidance.

DOCUMENTATION

Documentation to support any business continuity plan is as important as the solution itself. If you do not document the processes, policies, and procedures, your solution might fail to meet the SLAs that you have agreed to with the business. Worse still, your system might fail completely, or you could lose data.

Include time to develop this documentation within any SharePoint design and deployment process. The documentation should cover operational procedures, as well as the design of the solution. You should regularly review and maintain any documentation to ensure that it is accurate and up-to-date.

An effective business continuity plan should fully document every aspect of the plan. Some of the items that you may want to incorporate into your plan include the following:

  • An explanation of when to use the plan
  • A history of any updates
  • Permissions required to execute the plan
  • A list of key contacts
  • A step-by-step execution plan for your environment
  • The location of all installation files and customizations
  • Installation and configuration instructions
  • Testing instructions, including stability tests, performance tests, and security tests
  • Comments from previous disasters and restorations

Your SharePoint environment and business continuity plan is only as good as the testing that is performed against it, so factor in time for full tests. Before you put a SharePoint environment into production, test it against a number of simulated situations, such as the following:

  • Failure of hardware components
  • Data corruption
  • Loss of a server
  • Network outage
  • Loss of data center

Any issues arising from these tests should be addressed and documented. You should test regularly and after each major change to the environment.

BEST PRACTICES

Microsoft has documented a number of performance, quality assurance, and procedural best practices for SharePoint backup and recovery at http://technet.microsoft.com/en-us/library/gg266381.aspx. It is strongly recommended that you review these best practices and adopt them in your business continuity plan.

SUMMARY

This chapter covered business continuity management for SharePoint 2010, providing guidance for defining your business continuity requirements, and using them to design your backup and recovery, availability, and disaster recovery strategies.

Whether SharePoint is the primary collaboration tool in an organization, or it provides a company's Internet presence, a well-designed business continuity plan is critical for ensuring that you can provide availability for the service and fast recovery in the event of data loss or a disaster. It is important to remember that if you do not know the business importance of a service, or the costs associated with outage and data loss, you cannot effectively define SLAs for that service. These SLAs will not only define the agreements for recovery objectives and service availability, they will often help determine what backup, recovery, and availability solutions are required to meet them.

Once you know your business continuity requirements, you can plan, design, and execute a business continuity plan that works best for your environment using some of the tools and techniques discussed in this chapter.

Chapter 22 explores how to design for cloud-based solutions and multi-tenancy services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.5.68