Disaster Recovery

Disaster recovery, business continuity, and forensics have become closely related topics. You might think forensics applies only to criminal activities; it often does. However, after an information technology–related disaster, forensic techniques may be the best method for determining what caused the disaster and for avoiding a repeat of that disaster, or at least mitigating its consequences, in the future.

The forensic process really begins once an incident has been discovered, but it is not fully under way until after the disaster or incident is contained. However, before you examine the forensic process for disasters, it is a good idea to start with a basic understanding of disaster recovery.

Incident Response Plan

All organizations must plan for the possibility of some disaster occurring that disrupts normal operations. When narrowing the focus to just computer-related operations, you need to consider a number of types of potential disasters. Of course, you need to plan for such events as:

  • Fire

  • Flood

  • Hurricane

  • Tornado

However, those do not involve computer forensics. So this chapter focuses only on computer disasters, such as the following, regardless of the cause:

  • Hard drive failure

  • Network outage

  • Malware infection

  • Data theft or deletion

  • Intrusion

Each of these activities can disrupt normal operations for an organization’s computer systems and, therefore, constitute a disaster. There are actually two plans most organizations have for responding to such disaster. Those are the business continuity plan (BCP) and the disaster recovery plan (DRP). A BCP is focused on keeping the organization functioning as well as possible until a full recovery can be made. A DRP is focused on executing a full recovery to normal operations. (Although some experts differentiate between the terms disaster recovery plan and incident response plan, this chapter uses the terms interchangeably.) For example, suppose a virus takes the main web server offline. A BCP would be concerned about what can be done to get at least minimal operations going until such time as the organization can be returned to full functionality. It might provide for temporarily using an old server that could provide minimal functionality but that may not be as robust. A DRP would be focused on actually returning the organization to full functionality. In the scenario just described, this would be having a full web server, equivalent to the failed server, back online and running at full capacity.

You should be familiar with the following federal standards for BCPs.

ISO 27001

This contains requirements for information security management systems. Section 14 addresses business continuity management.

NIST 800-34

This is the Contingency Planning Guide for Information Technology Systems. It contains a seven-step process for BCP and DRP projects from the U.S. National Institute for Standards and Technology.

NFPA 1600

This is the Standard on Disaster/Emergency Management and Business Continuity Programs. This is from the U.S. National Fire Protection Association.

ISO 27035

This standard guides you in how to formulate an incident response plan. It requires a structured and planned approach to detect, report, and assess information security incidents; respond to and manage information security incidents; detect, assess, and manage information security vulnerabilities; and continuously improve information security and incident management as a result of managing incidents.

NIST 800-61

This standard also will help guide you in forming an incident response plan. Establishing an incident response capability should include the following actions:

  • Creating an incident response policy and plan

  • Developing procedures for performing incident handling and reporting

  • Setting guidelines for communicating with outside parties regarding incidents

  • Selecting a team structure and staffing model

  • Establishing relationships and lines of communication between the incident response team and other groups, both internal (e.g., legal department) and external (e.g., law enforcement agencies)

  • Determining which services the incident response team should provide

  • Staffing and training the incident response team

These standards provide a good overview of what should be covered in any business continuity plan, and some, like NIST 800-34, are also applicable to disaster recovery plans. You should certainly consider reviewing these standards at some point in your career. For the purposes of forensic examination, you don’t need to be an expert in disaster recovery—just a basic overview of the process is sufficient. The essential steps are outlined here.

Business Impact Analysis

Business impact analysis (BIA) is a process whereby the disaster recovery team contemplates likely disasters and what impact each would have on the organization. For example, a company that ships goods to retail stores, but does not sell directly to the public, might be slightly affected if its web server were down for a day. A company that sells directly to the public both in store and online would be moderately affected by such an outage. And a completely e-commerce company, one that sells products only online, would be severely affected.

Usually some sort of table is created listing the various disasters being planned for and the impact they would have on the organization. In more complex scenarios, the organization may be broken down into subsections, and the impact of each disaster on each piece of the organization is rated. Whether a plan goes into great detail or not, one item that must be considered is maximum tolerable downtime (MTD); that is, how long can the system or systems be down before it is impossible for the organization to recover? Imagine if your favorite e-commerce site were down. You may be a loyal customer and wait for it to come back up. But as time goes on, fewer and fewer customers would wait, and more money would be lost, until the company reaches a point at which it simply cannot recover. If the disaster recovery team knows the MTD for the organization as well as for portions of the organization, it can then prioritize the recovery plan.

Two other terms relate to maximum tolerable downtime. Mean time to repair (MTTR) is the average time it takes to repair an item, and mean time before failure (MTBF) is the amount of time, on average, before a given device is likely to fail through normal use. These are important questions to answer when performing a business impact analysis. If an organization cannot operate without a given piece of equipment for more than 14 days and still recover, yet the mean time to repair is 7 days, that means you have only 7 days after a disaster to initiate repairs or the organization will not be able to recover.

The Recovery Plan

The recovery plan has two parts. The ultimate goal is a complete recovery, and this is outlined in the disaster recovery plan. But unless that DRP is going to get the organization back up to full capacity within 24 hours or less, there will need to also be a BCP, a plan for how to get at least minimal functionality until full recovery is accomplished. Both plans are based on the priorities that were established during the business impact analysis phase.

Even though the disaster recovery plan and the business continuity plan don’t have exactly the same goals, they do require that the same questions be asked:

  • Have you identified alternate equipment?

  • If needed, have you identified alternate facilities?

  • Is there a mechanism in place for contacting all affected parties, employees, vendors, customers, and contractors, even if the primary means of communication is down?

  • Is there off-site backup of the data?

  • Can that backup be readily retrieved and restored?

When considering backups and restoring backups, you need to think about what type of backups you have. Although database administrators may use a number of different types of data backups, from a security point of view, there are three primary backup types you need to be concerned with:

  • Full—All changes

  • Differential—All changes since the last full backup

  • Incremental—All changes since the last backup of any type

If you did a full backup, then just restore the last backup. However, if the backup strategy includes differential or incremental backups, and it probably will, then there will be additional backup data to restore.

There is another type of backup that is becoming more popular, called hierarchical storage management (HSM). HSM provides continuous online backup by using optical or tape “jukeboxes.” It appears as an infinite disk to the system, and it can be configured to provide the closest version of an available, real-time backup.

The Postrecovery Follow-Up

Different disaster recovery textbooks label this differently; however, the intention is the same. When the disaster is over and the organization has recovered, you have to find out what happened and why. This is where forensics comes into play. This phase is not necessarily about assigning blame; it is about discovering if the disaster was caused by some weakness in the system. That could be an act of negligence by an individual, it could be a gap in policy, or it could be an intentional act. But if the root cause is not discovered and addressed, the chances of the same disaster occurring again are significant.

Incident Response

When an incident occurs, regardless of the level or severity of the incident, there needs to be an organized response. For example, if a single workstation is infected with a virus, this probably does not constitute a disaster. However, if it is not responded to quickly, it may grow into a disaster as the virus spreads. Proper incident response is important. Every incident response plan must include some key steps, which are outlined in this section.

Containment

The first step is always to limit the incident. This means keeping it from affecting more systems. In the case of a virus, the strategy is to keep the virus from spreading. It is probably a good idea to have a policy in place that instructs users to disconnect their computers from the network and then call tech support if they suspect they have a virus. This contains the virus and prevents it from spreading further.

Other incidents might not have such a clear containment path. For example, if there is an intruder getting into the web server, how is that contained? First, the web server itself is isolated from the rest of the network. Then, you seek to prevent further intrusion. This can be done by changing passwords throughout the organization, on the assumption that the intruder might have compromised passwords.

Although the specifics of containment might vary, the goal does not. Limit the spread of whatever the incident is, as much as possible. This phase must occur before any others. It is vital that the incident’s effects not spread further. This must be addressed before you attempt to eradicate.

Eradication

Once the incident is contained, the next step is to eradicate the problem. In the case of malware, the issue is to remove the malware. In some cases, anti-malware, such as Norton, McAfee, Kaspersky, or AVG, can remove the malware. In other cases, the IT staff will need to manually remove the malware.

Other attacks are not so clear. For example, if the incident is an intruder infiltrating the network via SQL injection, what does eradication entail? The first step is to fix whatever vulnerability allowed the intruder to get in in the first place. In the case of SQL injection, it would involve correcting the flaws in the webpage that allowed this to occur.

Regardless of the particular incident, eradication needs to be thorough. This means a comprehensive examination of what occurred and how far it reached. It is imperative to ensure that the issue was completely addressed.

This is the stage at which forensics must begin. If the vulnerability is simply eradicated, it is likely that evidence will be eradicated along with it. It is imperative that you begin collecting evidence prior to eradicating the vulnerability. This may involve performing the forensic investigation prior to any eradication steps taking place. In some cases, it is just not possible to perform a full forensic investigation while keeping the systems on hold. In that case, image the drives involved so that a forensic investigation can be conducted at a later time.

Recovery

Recovery involves returning the affected systems to normal status. In the case of malware, that means ensuring the system is back in full working order with absolutely no presence of the malware. In many cases, this involves restoring software and data from a backup source that has been verified to be free from the malware infection.

Follow-Up

The follow-up phase is another stage at which forensics plays a critical role. The IT team must determine how this incident occurred and what steps can be taken to prevent the incident from reoccurring. Clearly, those decisions cannot be made without the input from the forensic examination.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.67.166