Chapter 8. Disaster Recovery

A book on Configuration Manager troubleshooting would not be complete without reference to Disaster Recovery (DR). What do you do when all else fails? Many administrators will never have to recover Configuration Manager in this way. However, they must be prepared for the worst-case scenario. It's a very important aspect of our job.

So what is DR? In simple terms, it is the ability to recover a service from catastrophic failure in the least possible time with minimal data loss. A Disaster Recovery Plan (DRP), sometimes known as a Business Continuity Plan, documents the procedures and policies required to recover services. You (or one of your team) are responsible for the Configuration Manager DRP.

So what has to be done? What does DR mean in relation to Configuration Manager? Infrastructures vary across organizations. Some have large environments with a Central Administration Site and several Primary Sites. Recovery techniques for these organizations may differ from organizations with a single Primary Site. In this chapter, we will discuss DR solutions. It is not meant to be a comprehensive walk-through for implementing a DR solution. Rather, it will give you an overview of what is required.

  • Planning for Disaster Recovery
  • Robust backup process
  • Configuration Manager Site Restore
  • High availability

Planning for Disaster Recovery

Make no mistake. Recovering from a Configuration Manager failure is a complex process. You must be skilled with the product and the integrated components. The process must be well planned in advance. All the information you need should already be available. The next section describes some of the items you should consider.

Document your environment

As a Configuration Manager administrator, you should document your environment thoroughly. Of course, this isn't just part of a DR process. It's just common sense. However, in reality, this is not always the case.

  1. Start by drawing a diagram of the hierarchy (very large environments may have a CAS, multiple Primary Sites, and, perhaps, multiple Secondary Sites).
  2. Illustrate the hierarchy accurately, even if you only have a single Primary Site.
  3. Include the server names in the hierarchy diagram.
  4. Create a table containing all the information you are likely to need for recovery of each Site System. Don't worry if you think you have too much information. In terms of information and DR, it's better to be looking at it than looking for it.

A typical table for a single Primary Site could be as seen in the following (note that the specifications are examples, not recommendations):

 

SERV01

SERV02

Physical or virtual

Virtual

Virtual

Server specification

16 GB RAM

2vCPU

4 GB RAM

2vCPU

High availability

Yes, at Hypervisor level

No

Role(s)

Primary Site Server

Management Point

Database

Software Update Point

Reporting Services Point

Intune Connector

Management Point

Distribution Point

Software Update Point

Configuration Manager version (including SPs and CUs)

Configuration Manager 2012 R2 SP1

(5.00.8239.1000)

 

Site code

P01

P01

Operating system

Windows Server 2012 R2 (6.3.9200)

Windows Server 2012 R2 (6.3.9200)

Drive partitions

(examples)

C: 80GB (OS)

E: 80GB (Program files)

F: 80GB (Database)

L: 30GB (Log files)

T: 30GB (Temp DB)

C: 80GB (OS)

E: 80GB (Program files)

F: 200GB (Content Library)

Domain

MyDomain.local

MyDomain.local

Configuration Manager installation folder

E:Program FilesMicrosoft Configuration Manager

 

SQL Server information

Local SQL Server 2012 SP2 CU4 (11.0.5569.0)

 

You may have spotted a reference to High Availability (HA) in the table. Configuration Manager is not a real-time product and a certain amount of downtime can be tolerated in most cases. However, it is still beneficial to build as much redundancy into the solution as possible to try and eliminate, or at least minimize, the requirement for DR. HA is discussed later in this chapter.

Create Disaster Recovery Plan

A DRP details everything you may need to recover the service after a catastrophic failure. The DRP will include at least the following items:

  • Documentation describing the environment (diagrams of infrastructure, tables containing Site System information).
  • Documented backup processes (see the Robust backup process section later in this chapter for more details).
  • Documented recovery processes (see the Configuration Manager Site Recovery section later in this chapter for more details).
  • Checklist for testing after site recovery (this should be a comprehensive list of tests to verify that all features are functioning as they did previously).
  • Results from previous DR tests (which should include what was learned from previous tests). Regular DR tests should be carried out in a lab environment.

The DRP should be live and should be updated whenever major changes are made to the Configuration Manager environment.

Note

Note that DR testing is not easy with Configuration Manager. As restored servers generally need to have the same name as the original server, it is not possible to test in production. The only way to test DR properly is to duplicate the production environment as best you can on an isolated network.

When you carry out your DR tests, you should record how long it takes for full recovery. This is an important piece of information to be able to share with management when you are looking for approval of your DRP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.76.89