Responding to Incidents

In the best of worlds, all of your risk management practices will prevent any incidents. However, avoiding all incidents is highly unlikely, so organizations also come up with a plan for how to respond to incidents when they occur.

In the context of IT security, a security incident is any violation of policies or security practices that has the potential to result in an adverse event. NIST SP 800-61, Computer Security Incident Handling Guide, provides several definitions that are helpful in identifying incidents:

An event is any observable occurrence in a system or network.

Adverse events are events with a negative consequence, such as system crashes, network packet floods, unauthorized use of system privileges, unauthorized access to sensitive data, and execution of malicious code that destroys data.

A computer security incident is a violation or imminent threat of violation of computer security policies, acceptable use policies, or standard security practices.

Notice that all events are not security incidents. Events need to be investigated to determine whether they are incidents. Additionally, all computer security incidents don’t necessarily result in adverse events. For example, imagine that the company security policy states that users should not use computers to do personal shopping on the Internet. If a user violates this policy, it is an incident, but it probably won’t result in negative consequences, except perhaps for the employee.

SP 800-61 also provides a standardized process for preparing and responding to incidents. Figure 7-6 shows the life cycle of incident response and the following sections explore these in greater depth.

image

Figure 7-6 Incident response life cycle recommended by NIST SP 800-61


image
EXAM TIP The first step in incident response is preparation. Once an incident has been detected and verified, it’s important to contain the incident as quickly as possible.


The following are some common types of incidents:

Denial of service (DoS) Any type of attack that attempts to reduce a system’s ability to provide a service is an incident. This includes both DoS and DDoS attacks. Intrusion detection systems and intrusion prevention systems attempt to detect and block DoS and DDoS attacks before they can cause any damage.

Malware The discovery of any type of malicious software such as a virus, worm, Trojan horse, and so on is an incident. Of course, if AV software finds the malware as soon as it’s introduced into the system, it’s a minor incident. Then again, if the malware is able to spread unabated through a network, it can become a major incident.

Inappropriate usage Most organizations have an acceptable usage policy identifying what employees can do with IT systems. Any attempt to violate this policy is an incident.

Unauthorized access If any individual or software program is able to access data or systems that the user or program is not authorized to access, it is an incident. Depending on the value of the data or system accessed, the incident may be a minor incident or a major incident. For example, in March 2011, Epsilon (a marketing firm that sends e-mail to customers on behalf of various large organizations such as JPMorgan Chase, Capital One, Citi, Target, Tivo, and more) was hacked. Attackers retrieved a large number of names and e-mail addresses along with the company name and then began using this data to launch phishing attacks. Epsilon reported the incident to its customers (the large organizations). Some of these companies in turn reported the incident to their customers. In this case, the incident was a major incident resulting from unauthorized access.

Preparation

Preparing for an incident is an extremely important first step. With solid preparation, you have a much better chance of minimizing damage from an incident when it occurs. Preparation includes the steps to prevent an incident by ensuring that your systems are secure, but it also includes the steps to take when an incident occurs.

Most organizations identify individuals in a computer incident response team (CIRT). These individuals are responsible for responding to an incident when it occurs.


image
TIP Some organizations use different terms for their teams. CIRT is common, but you may see the term of computer security incident response team (CSIRT) or just incident response team (IRT). Each of the terms refers to the same group of people who respond to an incident.


The CIRT needs ready access to tools and resources. Every minute they have to look for something to investigate and contain an incident is a minute lost and gives the incident time to affect other systems. It’s possible (and recommended) to keep these tools up to date and available for the CIRT when the incident occurs. Some of the tools include the following:

Contact information This includes contact information for all CIRT members, and also key personnel in the organization who may need to be notified of an ongoing incident.

Reporting forms The organization may want to ensure that specific information is recorded about an incident. Pre-created forms help the team remember what needs to be recorded.

War room In the case of a serious incident, the team may need to use a central war room for central coordination. This will likely be something like a general-purpose conference room used only as a war room during an incident.

Forensic tools Computer forensic tools help an organization collect and analyze evidence while ensuring that the evidence is not modified and can be used in legal proceedings later if necessary. Chapter 13 covers forensic issues in more depth.

Documentation This includes documentation on systems and the network infrastructure. It should include ready reference to approved changes and the status of these changes. When team members identify something that looks suspicious, such as the way that a system is configured, they need to be able to identify whether it is a valid or malicious change. For example, TFTP is often disabled on systems, but malware may enable TFTP to forward itself to other systems. It’s also possible that TFTP is a legitimate part of a system, and if it’s disabled, it will result in loss of availability of a service. In this example, up-to-date documentation will quickly indicate whether TFTP is a legitimate service or not.

Software and hardware The team may need access to hardware to perform simple tasks such as creating reports, doing research, or performing analysis. They also may need access to software used to rebuild systems.

This is by no means a complete list, and each organization’s list will likely be different. However, the important thing to remember is that the CIRT will need ready access to tools and resources.

Risk assessments are another important part of incident response. They start by identifying the important assets for an organization and identify threats, vulnerabilities, and controls. Periodically updating risk assessments helps an organization keep abreast of current threats and implement safeguards to prevent incidents.

Detection and Analysis

The next stage of the incident response life cycle is detection and analysis. At this stage, the incident is discovered, investigated to determine whether it is an actual incident, and then analyzed for severity.

There are multiple methods of detecting an incident. It could be as simple as AV software alerting that a USB device is infected with malware as soon as it’s inserted into a system. In this case, the AV software usually cleans the device, effectively containing the incident immediately. Users could notice suspicious activity and report it. Systems may randomly lock up or crash. Administrators may see logs growing at an alarming rate, strange files appearing on servers, or simply an increase in network activity (and a slowdown in network response). Once any of these events occur, they need to be investigated to determine whether an incident has occurred.

Intrusion detection systems (IDSs) provide automated detection of potential incidents and display alerts to IT personnel about them. It’s important to realize that even though an IDS alerts personnel about activity, that activity isn’t necessarily an incident. Consider a SYN flood attack. In day-to-day operation, network issues may prevent the third packet from reaching the server. If this happens once in an hour, it’s highly unlikely that it’s an attack. Then again, if a server is receiving 100 half-open connections a second, that is very likely an attack.


image
NOTE Chapter 5 covered SYN flood attacks. As a reminder, it disrupts the TCP handshake process by withholding the third packet to complete the connection. The attacker sends the first packet (the SYN packet), the attacked system responds with a SYN/ACK packet, but the final ACK packet is never sent and the server is left waiting.


Security professionals that manage an IDS have to define the number between 1 in an hour and 100 in a second that constitutes an attack. If they set the threshold too high, they will be attacked without any notification. If they set the threshold too low, they will receive false positives—notifications of possible attacks that are actually not attacks. In an ideal world, there’s an exact number that defines the line between false positives and actual attacks, but in the real world, there is no such number. Given the choice between these two, most security professionals would rather accept some false positives rather than suffer attacks without being notified.

With this in mind, each potential incident must be investigated and analyzed. This analysis determines whether the event is an incident, and if so, attempts to prioritize the incident. Minor incidents don’t affect critical systems or critical infrastructure. Critical incidents affect mission-critical systems and can seriously degrade the organizations ability to perform its primary mission. Obviously, a critical incident would mandate the use of all available resources to address the incident.

Containment, Eradication, and Recovery

The next stage of the incident life cycle is containment, eradication, and recovery. Although these are included in a single step in the SP 800-61, you can actually think of them as three separate steps.

Once an incident is verified, it is important to contain the incident as quickly as possible. The goal is to prevent the spread of the incident. For example, if a single system is infected with malware and it’s attempting to discover other systems on the network to infect, you can disconnect the network cable to contain the malware to a single system. In contrast, if you didn’t contain the malware, it’s possible that this single system could infect others, in which case instead of dealing with a minor incident on a single end-user computer, you would suddenly have a major incident where all your systems are infected.


image
EXAM TIP Once an incident has been identified, it should be contained as soon as possible. This can’t occur before it’s detected, but you can contain an incident after you’ve verified that it is an actual incident and not a false positive.


Once you’ve contained the incident, you may need to move to eradication. For example, some multipartite viruses have multiple components and each individual component must be removed. If you miss any part of it, the virus has the capability to come alive and re-infect all the files you’ve cleaned. Some malware may reconfigure systems and weaken security, enable and create new accounts used for back door access, or even open previously closed ports on firewalls.

The last step in this stage is recovery, and it’s dependent on the damage caused by the incident. For minor incidents that haven’t caused any damage, recovery may not require any steps at all. However, a major incident may require the recovery of an entire server or even an entire location. For example, if a fire destroyed a building, recovery entails activating an alternate location, moving critical systems and data to the alternate location, and bringing everything back online.


image
NOTE Chapter 12 covers disaster recovery operations within the context of a disaster recovery plan. It also covers hot sites, cold sites, and warm sites that provide varying levels of readiness for disaster recovery.


Post-incident Activity

The last step is post-incident activity. In this step, you examine what incident occurred and how the organization responded. The goal is to determine whether the existing plans and procedures were able to address the incident or whether there were areas that could be improved. In many incidents, organizations find that they are able to provide some improvement to their existing processes to improve their response in subsequent incidents.

One of the ways to perform a post-incident activity is by holding a meeting with people involved in the incident and posing questions to generate discussions. Obviously, you wouldn’t hold a meeting like this for a minor incident, but if an incident negatively affected the mission of the organization, a post-incident activity meeting is warranted. The information gathered during this meeting is compiled into a post-incident report that can be used in future risk management activities such as risk assessments.

The following are some of the questions you can include in this meeting:

• Are there any lessons to be learned from this incident? What are they?

• Did we respond to the incident as quickly as possible? If not, what can we do to improve the response time for a subsequent incident?

• Was there anything that prevented the recovery from occurring in a timely manner?

• Was the documentation up to date? If not, what was not up to date and how should it be updated?

• Did the CIRT have all the tools it needed? If not, what tools need to be available to the CIRT when another incident occurs?

• Are there any preventative measures we can implement either to prevent a similar incident or at least reduce the severity of the incident?

One of the challenges of performing a post-incident activity meeting is to avoid finger pointing and “blame storming.” Instead of individuals focusing on improving the process for a potential event in the future, they are instead focused on ensuring that they aren’t blamed for this event. If senior management stresses that the goal is to improve the process and not to place blame, participants may be more willing to provide constructive input. Of course, if management uses the post-incident report to hold individuals or departments accountable, it’s highly unlikely future post-incident activity meetings will have any success.

Chapter Review

Risk is the probability or likelihood that a threat will exploit a vulnerability and result in a loss. It is sometimes represented in the formula of Risk = Threat * Vulnerability. Risk management includes several elements used to reduce risk to a manageable level. After risks have been mitigated, the remaining risk is known as residual risk. Senior management is responsible for any losses resulting from residual risk.

NIST publishes several documents related to IT security, and two important documents related to risk are SP 800-30 (Risk Management Guide for Information Technology Systems) and SP 800-61 Rev 1 (Computer Security Incident Handling Guide).

One of the first steps in risk management is identifying assets. Once an organization identifies the assets that it considers valuable, it then takes steps to protect them. However, if the valuable assets haven’t been identified, it’s possible that they won’t be protected and that resources will be expended to protect less valuable assets.

A risk assessment is a point-in-time evaluation. It examines a system to identify threats, vulnerabilities, and current controls. It then attempts to evaluate the likelihood of a threat exploiting a vulnerability and the impact, or magnitude of harm, from this event to determine the level of risk. It uses the risk determination to determine what risks should be further managed and identifies, evaluates, and recommends controls. The risk assessment is documented and presented to management. Management decides what controls to implement and what remaining risk to accept as residual risk.

Risk assessments commonly use either a quantitative analysis or a qualitative analysis. A quantitative analysis uses numerical figures to calculate risk. The single loss expectancy (SLE) is multiplied with the annual rate of occurrence (ARO) to determine the annual loss expectancy (ALE). If the cost of the control is less than the ALE, it is justified. If it is significantly higher than the ALE, it is not justified. If the cost of the control and the ALE are relatively close, they must be evaluated to determine a long-term return on investment (ROI).

Many organizations designate a computer security incident response team (CSIRT, CIRT, or IRT) to respond to incidents. It’s important to know the different steps in incident response. SP 800-61 Rev 1 defines these steps as (1) preparation; (2) detection and analysis; (3) containment, eradication, and recovery; and (4) post-incident activity.

Questions

1. Which of the following choices is the best formula of risk?

A. Risk = Vulnerability * Threat

B. Risk = Vulnerability * Weaknesses

C. Risk = Attacks * Threats

D. Risk = Mitigation * Controls

2. You are involved in reducing risk within your organization. Of the following activities, which one is the best choice to describe what you are doing?

A. Reducing threats

B. Increasing vulnerabilities

C. Increasing impact

D. Mitigating risk

3. Of the following choices, what does not represent a natural threat?

A. A hurricane

B. A disgruntled employee

C. A lightning storm

D. A tsunami

4. What is a vulnerability?

A. A threat source

B. A risk

C. A weakness

D. A control

5. What’s a primary method used to reduce risk?

A. Reducing threats

B. Reducing vulnerabilities

C. Increasing threats

D. Increasing vulnerabilities

6. You decide to manage risk by purchasing insurance to cover any losses. Which one of the following risk management techniques are you using?

A. Accept

B. Avoid

C. Mitigate

D. Transfer

7. An organization had a business location in Miami, Florida. Due to the risks associated with hurricanes, the organization decided to move the location to Atlanta, Georgia, away from any ocean. What risk management strategy are they using?

A. Accept

B. Avoid

C. Mitigate

D. Transfer

8. An organization has implemented several controls to mitigate risks. However, some risk remains. What is the remaining risk called?

A. Vulnerable risk

B. Mitigated risk

C. Alternate risk

D. Residual risk

9. A risk assessment recommended several controls to mitigate risks, but only some of the controls were accepted and implemented. Who is responsible for any losses that occur from the remaining risk?

A. The person completing the risk assessment

B. Senior management

C. IT personnel managing the systems

D. Security personnel

10. Of the following choices, which one most accurately reflects differences in risk management and risk assessments?

A. A risk assessment is a point-in-time event, while risk management is an ongoing process.

B. Risk management is a point-in-time event, while a risk assessment is an ongoing process.

C. Risk assessments are broad in scope, while risk management is focused on a specific system.

D. Risk management is one part of an overall risk assessment strategy for an organization.

11. Of the following choices, what is an important first step in a risk management plan?

A. Implementing controls

B. Identifying vulnerabilities

C. Identifying assets

D. Identifying threats

12. You are completing a risk assessment and using historical data. You’ve identified that a system has failed three times in the past year, and each of these outages resulted in approximately $10,000 in losses. What type of analysis does this allow you to perform?

A. Qualitative

B. Quantitative

C. Informative

D. Subjective

13. You are completing a risk assessment and using historical data. You’ve identified that a system has failed five times in each of the past two years, and each outage resulted in losses of about $5,000. What is the ARO?

A. Five

B. $5,000

C. $25,000

D. Impossible to determine with the information provided

14. You have completed a risk assessment and determined that you can purchase a control to mitigate a risk for only $10,000. The SLE is $2,000 and the ARO is 20. Is this cost justified?

A. Yes. The control is less than the ALE.

B. No. The control exceeds the ALE.

C. Yes. The control exceeds the ARO.

D. No. The control is less than the ARO.

15. Of the following choices, what best represents steps to take in response to an incident?

A. Preparation, containment, detection, analysis, eradication, and recovery

B. Preparation, detection, analysis, containment, eradication, and recovery

C. Containment, preparation, detection, analysis, eradication, and recovery

D. Containment, analysis, detection, eradication, and recovery

Answers

1. A. The formula for risk is Risk = Vulnerability * Threat, or said another way, risk is the likelihood that a threat can exploit a vulnerability, resulting in a loss. Vulnerabilities are weaknesses and by themselves are not risks. A threat is any activity that can be a possible danger, such as a potential attack. Risk is reduced or mitigated by adding controls that reduce vulnerabilities.

2. D. Risk mitigation is the process of reducing risk. You can rarely reduce threats, but you can often reduce (not increase) vulnerabilities or reduce (not increase) the impact of a risk.

3. B. While a disgruntled employee can certainly represent a significant insider threat to an organization, employees create manmade (not natural) threats. The other threats are part of natural weather or other Mother Nature events.

4. C. A vulnerability is a weakness or a flaw in a system. A threat source is something that can exploit a vulnerability, resulting in a loss. Risk is defined as a combination of vulnerability and threats, but simply having a vulnerability without any potential threat doesn’t represent a risk. Controls, or safeguards, attempt to mitigate risk by reducing vulnerabilities.

5. B. A primary method of risk mitigation is reducing vulnerabilities. Threats often can’t be reduced, and adding additional threats won’t reduce risk. You reduce vulnerabilities by implementing controls.

6. D. Insurance is one of the ways that you can manage risk by transferring the risk to a third party. Risk acceptance doesn’t take any further action to mitigate the risk. In risk avoidance, you avoid the activity that results in the risk. It’s most common to try to reduce the risk using risk mitigation.

7. B. By moving the location to a city that can’t be hit by a hurricane, the company is using risk avoidance. Risk acceptance doesn’t take any action to mitigate the risk. In risk mitigation, you attempt to reduce the risk, perhaps by ensuring that the building is built with hurricane-resistant materials. The company can transfer the risk by purchasing hurricane and flood insurance.

8. D. Residual risk is any risk that remains after controls have been implemented to mitigate the risk. It’s often not cost-effective to implement controls to eliminate all risks, so senior management must make decisions on what risk to mitigate and what risk to accept as residual risk. Alternate risk is not a valid term associated with risk management.

9. B. Senior management is responsible making decisions on what risk to mitigate. The remaining risk is residual risk and senior management is responsible for any losses from this residual risk.

10. A. Risk assessments are a point-in-time event and risk management is an ongoing process. Risk assessment is one element of a risk management strategy, and risk assessments are generally focused on specific systems with a limited scope, while risk management is much broader.

11. C. You must identify assets first. You can then identify threats against these assets and vulnerabilities in these assets. You can’t recommend or implement controls until you know what you want to control.

12. B. A quantitative analysis uses numerical figures to identify the actual costs associated with a risk. A qualitative analysis uses subjective terms such as low, medium, and high to analyze a risk. There is no such thing as an informative analysis.

13. A. The annual rate of occurrence (ARO) is five because it happened five times each in the past two years. The single loss expectancy (SLE) is $5,000 and the annual loss expectancy is $25,000.

14. A. Because the cost of the control is less than the annual loss expectancy (ALE), the cost is justified. The cost of the control is $10,000 and the ALE is $40,000. The annual rate of occurrence (ARO) is how many times the loss occurred (20 in the example), but it is only useful when you multiply it with the single loss expectancy (SLE) to identify the ALE.

15. B. The steps recommended in NIST SP 800-61 are preparation, detection, analysis, containment, eradication, and recovery. Containment is important once an incident has been detected and analyzed, but can’t be done beforehand.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.189.251