Summarizing the Importance of Policies, Processes, and Procedures for Incident Response
This chapter covers the following topics related to Objective 4.2 (Summarize the importance of policies, processes, and procedures for incident response) of the CompTIA Security+ SY0-601 certification exam:
Incident response plans
Incident response process
Preparation
Identification
Containment
Eradication
Recovery
Lessons learned
Exercises
Tabletop
Walkthroughs
Simulations
Attack frameworks
MITRE ATT&CK
The Diamond Model of Intrusion Analysis
Cyber Kill Chain
Stakeholder management
Communication plan
Disaster recovery plan
Business continuity plan
Continuity of operations planning (COOP)
Incident response team
Retention policies
A recent report recognizes that nearly three-quarters of organizations don’t have a consistent enterprise wide cybersecurity incident response (IR) plan. Organizations with IR teams and regular testing have an average data breach costing $2 million less than those with no IR team or plan in place. Today’s enterprises are global, as are their footprints. Having applications and resources in the cloud and remote workforces has increased the potential for cyber threats. The first and most important step in the incident response lifecycle is preparation. Preparation ahead of an incident is what will allow you to respond more quickly and effectively in the midst of an incident.
The “Do I Know This Already?” quiz enables you to assess whether you should read this entire chapter thoroughly or jump to the “Chapter Review Activities” section. If you are in doubt about your answers to these questions or your own assessment of your knowledge of the topics, read the entire chapter. Table 27-1 lists the major headings in this chapter and their corresponding “Do I Know This Already?” quiz questions. You can find the answers in Appendix A, “Answers to the ‘Do I Know This Already?’ Quizzes and Review Questions.”
Table 27-1 “Do I Know This Already?” Section-to-Question Mapping
Foundation Topics Section |
Questions |
---|---|
Incident Response Plans |
1–2 |
Incident Response Process |
3–4 |
Exercises |
5 |
Attack Frameworks |
6–7 |
Stakeholder Management |
8 |
Communication Plan |
9 |
Disaster Recovery Plan |
10–11 |
Business Continuity Plan |
12 |
Continuity of Operations Planning (COOP) |
13 |
Incidence Response Team |
14 |
Retention Policies |
15 |
Caution
The goal of self-assessment is to gauge your mastery of the topics in this chapter. If you do not know the answer to a question or are only partially sure of the answer, you should mark that question as wrong for purposes of the self-assessment. Giving yourself credit for an answer you correctly guess skews your self-assessment results and might provide you with a false sense of security.
1. Which high-level document is a step-by-step procedure that should be created as part of an incident response plan that can target specific incident handling like malware and ransomware?
Playbooks
Play stations
Lessons learned after action report
Security incident field report
2. Which legal portion of an incident response plan requires notification or disclosure within 72 hours of discovery of a data incident?
DR Disclosure Federal Law
NIST Disclosure Law
New Jersey Privacy Law
GDPR
3. Which incident response plan item will provide an understanding of the severity of an incident so that it can be prioritized quickly and correctly?
Disaster recovery report
Incident response report
Triage matrix
Threat matrix
4. Which phase of the incident response process should be performed within two weeks of the end of an incident?
Identification
Preparation
Lessons learned
Remediation
5. Which exercise simulates a real-life scenario of an incident response plan and is used to the test and highlight areas where your team excels and areas that need to be addressed?
Tabletop
Containment exercises
Recovery process
Cyber kill chain
6. What Diamond Model places the basic components of malicious activity at one of the four points on a diamond shape? What are the four points? (Choose two.)
Malware and infection vectors
Personas and biometrics
Adversary and infrastructure
Capability and victim
7. The ATT&CK Framework has 11 tactics and hundreds of techniques. Which tactic describes the way an adversary implements a technique?
Collection
Procedures
Privilege escalation
Impact
8. Which of the following is one of the five key stakeholders of the incident response team?
Security Operations
Security Guards
Public Library
Legal
9. In a communication plan, escalating communication information on a regular schedule or timeline is important. What is the appropriate frequency of this communication?
Once an hour
Once every six hours
As key information is available
As every item is uncovered
10. Which of the following is a formal document that contains details on how to respond to a cyber attacks and unplanned incidents?
Incident response model
Disaster continuity plan
Disaster recovery plan
Containment process
11. Not implementing a disaster recovery plan properly can lead to which of the following?
Satisfied customers
Brand awareness
Lost revenue
Faster recovery
12. What is one thing that a BCP plan contains that a DR plan does not?
A standby data center
Continuity of a DRP in conjunction with a BCP
A continuity plan for the entire organization
A disaster recovery model
13. Which of the following ensures the restoration of organizational functions in the shortest possible time?
COOP
MITRE ATT&CK
Diamond Model
Cyber kill chain
14. What are some of the incidents that an incident response team might be prepared for and respond to? (Select all that apply.)
Attackers gaining access to the web server
Hackers obtaining passwords from executives
A nasty computer virus that the antivirus contained
A power outage in the data center
15. NIST SP 800-53 requires that all federal agencies retain data for how many years?
Ten years on magnetic media or 20 years on paper
Three years on magnetic media
Seven years on magnetic media
Seven years on magnetic media and 10 years on paper
In the simplest of terms, a cybersecurity incident response plan (or IR plan) is a set of instructions designed to help companies prepare for, detect, respond to, and recover from network security incidents. An incident response plan ensures that in the event of a security breach, the right personnel and procedures are in place to effectively deal with a threat. Having an incident response plan in place ensures that a structured investigation can take place to provide a targeted response to contain and remediate the threat.
An organization’s failure to have or implement an incident response plan can have serious legal repercussions. To effectively deal with a cybersecurity incident, your company may need a team that specializes in incident response. Some organizations call this team the computer security incident response team (CSIRT); there are other permutations of that acronym like security incident response team (SIRT) or computer incident response team (CIRT). The mission of this team is the same no matter what you call it—to enact the company’s established incident response plan when a cybersecurity incident occurs.
If you work in data security, you deal with security incidents on a day-to-day basis. Occasionally, a minor security issue turns out to be a real-life panic situation. When an incident occurs, will everyone on the team know what to do? Will CSIRT members know their role and responsibilities and follow the approved plan?
Simply having an IR plan is not enough; the CSIRT team must have the skills and experience to deal with a potentially high-stress situation like this. The team needs digital forensics experts, malware analysts, incident managers, and security operations center (SOC) analysts who are all heavily involved and actively dealing with the situation. This involves making key decisions, conducting an in-depth investigation, providing feedback to key stakeholders, and ultimately giving assurances to senior management that the situation is under control.
This activity often takes place in a time crunch. Data breach notification laws are becoming more common: the General Data Protection Regulation (GDPR) in the European Union, for instance, requires that companies report data security incidents within 72 hours of discovery.
Having an incident response plan is imperative. The first step is identifying and having the right people with the right skill sets and experience available and ready to respond. You should regularly test and update your incident response plan. Everyone who is part of the plan should understand their role and the role of others to help reduce confusion during a real event.
The incident response process is made up of four key elements that can be developed as a company’s security posture measures. There are important considerations to be made when building an incident response plan. First and foremost, backing from senior management is paramount. Occasionally, people use the incident response process and IR plan interchangeably. It’s important to note that the process is a roadmap to developing a specific plan for each organization; each is as different as the organization.
Building an incident response plan should not be a box-checking exercise or something handed off to inexperienced employees. If senior management does not support this process, there is a risk of its becoming filed away, incomplete and useless when needed. Senior leadership should be outlining critical processes, systems, and resources important to business continuity. Part of plan development includes defining the key stakeholders and obtaining contact details for key individuals and teams inside and outside of business hours; this information needs to be added to the plan.
Figure 27-1 shows the NIST incident response lifecycle. The incident response process is a business process that enables you to remain in business. The list that follows describes the four phases in more detail.
Note
For more details on NIST incident handling, see https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf.
Phase 1. Preparation: The quality of the response to an incident largely depends on incident response preparedness. In this phase, all components needed to effectively respond to an incident are identified, acquired, or created.
Phase 2. Detection and analysis: This phase primarily focuses on detection and discovery of indicators of compromise (IOCs). Here, having an incident reporting policy and procedure in place is critical to training.
Phase 3. Containment, eradication, and recovery: After stopping the problem from getting worse or spreading, you limit the damage and then regain control of your network and systems. Finally, recovery starts with restoration of systems to normal operations.
Phase 4. Post-incident activity: This phase centers on lessons learned to improve the incident response capability and prevent the incident from happening again.
A playbook provides manual orchestration of incident response. For example, specific incidents and threats have their own playbooks. As a result, the response that an organization takes is formalized in a step-by-step procedure.
By contrast, IT operations or security operations centers (SOCs) use runbooks as a reference for routine procedures that administrators perform both as standard mode of operations and during emergencies such as an IR incident.
A security operations center is similar to a network operations center in that it is staffed 24/7 by expert staff. In an SOC, security experts monitor, alert, respond to, and triage incidents. They also act as the front line for all security-related incidents.
The incident response plan should have an owner, and that owner is responsible for sending out communications, assigning tasks, and establishing the appropriate actions that should be taken. Also, your organization should consider who needs to be included in any incident communication and how much detail is required depending on the audience. Tasks assigned to security teams need to be precise and technical, whereas updates to the company board (executives) need to be clear and free of any technical terms. You should develop playbooks that provide guidance to the SOC when triaging an incident; they should give clear instructions on how to prioritize an incident and when and how incidents should be escalated. The playbooks should be high level and focused on specific areas such as malware, insider threats, unauthorized access, ransomware, and phishing. The playbooks and procedures should be tested with the people and teams who will be using them. Tabletop exercises are an excellent way to solidify the knowledge and see whether any improvements can be made. Playbooks and tabletop exercises are described in more detail shortly.
Tip
A risk matrix displays the probability or likelihood versus the consequences of risk. Table 27-2 provides a sample matrix.
Table 27-2 Risk Matrix
Likelihood |
Consequences |
||||
---|---|---|---|---|---|
Not Significant |
Minor |
Moderate |
Major |
Severe |
|
Very Likely |
Medium |
High |
High |
Very High |
Very High |
Likely |
Medium |
High |
High |
Very High |
Very High |
Possible |
Low |
Medium |
High |
High |
Very High |
Unlikely |
Low |
Low |
Medium |
Medium |
High |
Rare |
Low |
Low |
Low |
Low |
Medium |
According to an old saying, you can only successfully defend what you see. In this case, you can only remove a security threat when you know the size and scope of the incident. Step one begins with identifying “patient zero,” the initially compromised device. The goal here is to understand the root cause of the compromise; however, you should not just focus on the one device. Could the threat have spread and moved laterally, or is there another potential initially compromised device?
Actual identification of an incident comes from gathering useful indicators of compromise. Investigators should look to identify any unique IOCs that can be used to search across your network for further evidence of compromise. If the incident relates to a malware infection, then ask the following questions: What network traffic and connections does the malware generate? Does the malware connect to any specific IP address or domain? What files were downloaded? What running processes are created? What files are created in memory or on disk? Have any unique registry keys been created? This data is used to search for further evidence of compromise and identify any other infected machines in your network.
Once the scope of an incident has been successfully identified, the containment process can then begin. This is where the compromised devices within the network are isolated from the rest of the network to stop the spread. Short-term containment may be used to isolate a device that is being targeted by attack traffic. Long-term containment may be necessary when a deep-dive analysis is required, which can be time consuming. This process may involve taking a forensic image of the device and conducting detailed forensic analysis; the analysis may generate further IOCs and identification of the source, and may need to be revisited.
Many organizations don’t understand the risks associated with improper eradication and how to actually fix a malware infection or breached system. Once the incident is successfully contained, the eradication of the threat can begin. This process varies depending on what devices have been compromised and what caused the compromise. If possible, you should perform a complete wipe/reimage of the affected system. This step, of course, requires you to have good backups in place and the ability to establish the initial date/time of infection, rolling back to just before that. Besides reimaging systems, you should make sure you use the latest patches on all devices, disarm malware, disable compromised accounts, and change passwords. These are just a few examples of what may be required in the eradication phase of an incident.
Regardless of how you choose to eradicate an infection, you need to have a plan for immediate increased monitoring of any affected systems for some period of time after the eradication process, usually 30 days or longer. This plan is important to make sure the steps you took to fix the issue were effective and there were no lingering malware, rootkits, backdoors, or additional compromised accounts. This should be a daily task where you review event logs from both affected host and Active Directory logs for account usage.
The IR team brings affected production systems back online carefully to ensure another incident doesn’t take place. Important decisions at the recovery stage are from which time and date to restore operations, how to test and verify that affected systems are back to normal, and how long to monitor the systems to ensure activity is back to normal. The goal of the recovery phase of an incident is to restore normal service to the business. If clean backups are available, then they can be used to restore service. Alternatively, any compromised device will need rebuilding to ensure a clean recovery. Additional monitoring of affected devices may need to be implemented.
The lessons learned phase should be performed no later than two weeks from the end of the incident. A Post Incident Review (PIR) meeting is used to ensure information is fresh on the team’s mind. The main purpose of this phase is to update documentation that could not be prepared during the response process and investigate the incident further to identify its full scope, how it was contained and eradicated, and what was done to recover the attacked systems. After the threat has been fully remediated, the next step involves answering this question: How do we stop this from happening again? The (PIR) is the platform used to discuss what went well during the incident and what processes and procedures can be improved. At this stage, the incident response plan is refined based on the outcome of the PIR, and procedures and playbooks are amended to reflect any agreed changes. Figure 27-2 illustrates the stages of the incident response process.
IR exercises are an excellent way to solidify your organization’s knowledge and see if any improvements can be made. From these exercises, you want to ensure your organization’s increased awareness and understanding of threats, be able to evaluate your overall incident preparedness, and identify deficiencies in your IR plan, including technical, planning, procedural, and resource gaps.
A tabletop exercise begins with a security incident preparedness activity, a plan to test your IR plan. It takes participants through the process of dealing with a simulated incident scenario and provides a hands-on training exercise for participants who can then highlight flaws in incident response planning. Usually, a trained expert facilitates the discussion through multiple scenarios to determine the team’s readiness or whether there are potential gaps. The output of this exercise is used to drive enhancements to the organization’s plan and approach to identifying, analyzing, and resolving incidents and how they could be prevented in the future.
A tabletop exercise is used to validate and improve an organization’s IR plan. Real-life scenarios are used to put the response plan to the test, highlighting areas where your team excels and areas to be addressed. The tabletop exercise also ensures that everyone on your team knows their roles and responsibilities in the event of an attack. The tabletop exercise aligns everyone’s understanding of the process, helps assign the roles of each member, and provides participants with knowledge through hands-on experience. The exercise begins with the incident response plan (IRP) in a classroom-type setting and gauges team performance against the following questions: What happens when you encounter a breach? Who does what, when, how, and why? What roles will HR, legal, IT, corporate communication, and company officers play? Who is assigned to spearhead the effort, and what specific authority will they have? What resources are available to them and when they need them?
The facilitator develops the input required to meet the predetermined goal of the exercise, based on the preset specifications about the environment and goals set by leadership. Walkthroughs take place prior to tabletop exercises. Some of these steps depend on the industry, regulations, and their current security posture. The facilitator develops the test input to meet a specific level of management, as well as the appropriate technical personnel and all others who should make the appropriate decisions. In this discussion-based exercise, you walk through the steps of the plan. Typically, a project kickoff call/meeting occurs between the facilitator and the team to assign roles and responsibilities, update the IR plan, and identify any other participants needed. A typical walkthrough exercise flow consists of three essential elements: inputs, process, and outputs.
Security incident response simulations (SIRS) are internal events that provide a structured exercise to practice your team’s IR procedures and plan in a simulated realistic scenario, where components such as deadlines and external injects increase the realism of the event. Injects are scenario-based actions that help add a realistic element to the event. SIRS events are about preparing team members with realistic simulations and helping them improve response capabilities. The value in these exercises is what is obtained, retained, and fed back into your plan, starting with
Validating the team and the plan’s readiness
Identifying and documenting deficiencies to facilitate constructive feedback
Helping develop team member confidence
Providing evidence of compliance, depending on your industry
The value and benefit derived from team members participating in SIRS include increases in the organization’s effectiveness during stressful live events.
There are multiple frameworks for incident response, including one from NIST and one from SANS. These are the two dominant institutes whose incident response steps have become the industry standard. Similarly, a number of attack frameworks are available for use in attack scenarios. The benefit of a framework is that it puts everyone on the same page, with a common language, and a common place to level set and start so that everyone is speaking the same terms. Imagine if three people in the same organization used three different frameworks to communicate about an attack or incident. Besides this scenario being counterproductive, the company would waste valuable time trying to identify and recover from an incident.
A MITRE ATT&CK is a globally accessible knowledge base of adversary tactics, techniques, and procedures (TTPs) based on real-world observations of cybersecurity threats. They’re displayed in multiple matrixes that are arranged by attack stages, from initial system access to data theft or machine control.
The aim of the framework is to improve post-compromise detection of adversaries in enterprises by illustrating the actions attackers may have taken. How did attackers get in? How are they moving around? The knowledge base is designed to help answer these questions while contributing to the awareness of an organization’s security posture at the perimeter and beyond. Organizations can use the framework to identify holes in defenses and prioritize remediation of them based on risk.
Threat hunters leverage the ATT&CK framework to look for specific tactics, techniques, and procedures that adversaries may use in conjunction with other methods. The framework can be useful for gauging an environment’s level of visibility against targeted attacks with the existing tools deployed across an organization’s endpoints and perimeter.
The enterprise ATT&CK framework consists of 11 tactics. You might consider tactics the “why” part of the ATT&CK equation. What objective did attackers want to achieve with the compromise?
Initial access
Execution
Persistence
Privilege escalation
Defense evasion
Credential access
Discovery
Lateral movement
Collection
Exfiltration
Impact
Each tactic contains an array of techniques that have been observed being used by threat actor groups in compromises or by malware. You can think of tactics as the “how” part of the ATT&CK framework. How are attackers escalating privileges or exfiltrating data?
Although there are only 11 tactics in the enterprise ATT&CK framework, there are scores of techniques, too many to list individually here, but as of this writing, there are 291. They’re perhaps best researched and visualized via MITRE’s ATT&CK Navigator, an open-source web application that allows basic navigation and annotation. Techniques are referenced in ATT&CK as xxxx, where, for example, spear phishing is T1192 and remote access tools are T1219. Each technique contains contextual information, like the permissions required, what platform the technique is commonly seen on, and how to detect commands and processes they’re used in.
Procedures describe the way adversaries implement a technique. A procedure concerns the way the instance was used. It also can be useful for understanding exactly how a technique is used and for replication of an incident with adversary emulation as well for specifics on how to detect the instance in use.
Although the framework has been around for years, it is being adopted by more organizations, the government, and end users to share threat intelligence. While there are other ways to share threat intelligence, the ATT&CK framework provides a common language that’s standardized and is globally accessible.
Note
To learn more about the ATT&CK framework, visit the Mitre site at https://attack.mitre.org/.
In the cybersecurity and threat intelligence industries, several approaches are used to analyze and track the characteristics of cyber intrusions by advanced threat actors. The Diamond Model of Intrusion Analysis emphasizes the relationships and characteristics of four basic components: the adversary, capabilities, infrastructure, and victims (see Figure 27-3). The main axiom of this model states, “For every intrusion event, there exists an adversary taking a step toward an intended goal by using a capability over infrastructure against a victim to produce a result.” This means that an intrusion event is defined as how the attacker demonstrates and uses certain capabilities and techniques over infrastructure against a target.
The Diamond Model identifies adversaries with developing capabilities and techniques that are unique to that group. The method’s context directly translates to the capability edge of that model. It can become obvious that an adversary uses distinct malware and attack vectors as part of its capabilities and TTPs.
Figure 27-3 shows specifically what an adversary might look like or what information may be gleaned from the intrusion, whereas infrastructure shows what was used, capability shows the method, and victim shows the activity that allowed the attacker in. By identifying events and linking them into activity threads, an analyst gains information regarding what occurred during an attack. By looking at the gaps in their knowledge (such as missing features), the analyst identifies where further information is needed.
MITRE ATT&CK and the Diamond Model are both very focused on the capabilities of attackers. The capabilities are the most unchanging attributes about a particular threat actor. The Diamond Model also discusses infrastructure, which can be used to link different attack campaigns to a particular adversary. However, this infrastructure can also be easily replaced and can present false positives, due to the fact that multiple distinct actors can use the same infrastructure, making it a less-than-ideal tool for attribution. However, an attacker’s capabilities, on the other hand, require significant investment to build and are likely more unchanging. After development of a custom piece of malware, a threat actor is likely to use it across multiple campaigns, providing a more reliable basis for attribution.
The cyber kill chain is a series of eight steps that trace stages of a cyberattack from the early reconnaissance stages to the exfiltration of data.
The cyber kill chain enables you to understand and combat ransomware, security breaches, and advanced persistent threats (APTs). Lockheed Martin derived the cyber kill chain framework from a military model—originally established to identify, prepare to attack, engage, and destroy the target. Since its inception, the kill chain has evolved to better anticipate and recognize insider threats, ransomware, social engineering, and innovative and advanced types of attacks.
Each stage is related to a certain type of activity in a cyber attack, regardless of whether it’s an internal or external attack:
Reconnaissance: This is the observation stage, where attackers typically assess the situation from the outside in order to identify both targets and tactics required for the attack.
Intrusion: This is based on what the attackers discovered in the reconnaissance phase and how they’re able to get into your systems, which is often by leveraging malware or security vulnerabilities.
Exploitation: This is the act of exploiting vulnerabilities and delivering malicious code onto the system to get a better foothold.
Privilege Escalation: Attackers often need more privileges on a system to get access to more data and permissions: for this, they need to escalate their privileges, often to an Admin.
Lateral Movement: After they are in the system, attackers can move laterally to other systems and accounts to gain more leverage or control, whether that’s higher permissions, more data, or greater access to systems.
Obfuscation/Anti-forensics: To successfully pull off a cyber attack, attackers need to cover their tracks, and in this stage, they often lay false trails, compromise data, and clear logs to confuse and/or slow down any forensics team.
Denial of Service (DoS): This is the disruption of normal access for users and systems to stop the attack from being monitored, tracked, or blocked.
Exfiltration: This is the extraction stage: getting data out of the compromised system.
Each phase of the cyber kill chain is an opportunity to stop a cyberattack in progress. With the right tools to detect and recognize the behavior of each stage, you are able to better defend against a systems or data breach.
Note
The MITRE ATT&CK may be similar to a cyber kill chain but focuses more on the nuances of different attack techniques to understand and defend against an attacker. The Diamond Model places the basic components of malicious activity at one of four points of a diamond shape: adversary, infrastructure, capability, and victim. Lockheed Martin developed the cyber kill chain framework to help defend its networks based on the chain of actions that an attacker takes from beginning to end. This model has since been adopted by the security industry.
Managing the expectations of leaders of your organization is one of the most important aspects of a complete incident response plan. Leaders of an organization must communicate to owners, stockholders, and other leadership within the organization. Ensuring your plan includes a regular interval in which you update management/leadership is essential to setting expectations and providing transparency in the process; this all should be addressed in the IR plan and be part of the tabletop exercise.
The use of a communication template is critical to appropriately word the message to stakeholders. Everyone on your IR team is a stakeholder, and communication and management of activities are important as well. There are five key stakeholders for any IR team, including IT Services, Security Management, Legal, Human Resources, and Public Relations. Another consideration—and one that requires management approval—is who has the authority to involve law enforcement and when that notification should take place. These are difficult decisions because law enforcement involvement often changes the nature of an investigation and increases the likelihood of public attention.
As mentioned previously, there are five key stakeholders as part of the incident response. Clearly defined communication protocols and procedures help better prepare the entire team in managing an incident from start to finish and keep everyone apprised of the status.
There are three key things you can do to facilitate and engage with the appropriate members:
Step 1. Identify the right people. Find or nominate key individuals within the stakeholder groups. They do not need to be security experts, but they need to be aware of the incident response team’s existence. You should make them aware of their duties, which could be to act as a support point for any incident activity.
Step 2. Set up regular security cadence meetings with all stakeholders. People forget things, but you can minimize this issue with a regular meeting between all the stakeholders. You can use this meeting to drive improvements, review previous incidents, or plan for exercises.
Step 3. Escalate the incident response. When your team is engaged with an incident, you should have them set up proactive alerting. They don’t need to call everyone every time, but your handlers need to plan ahead. Your incident response team needs to be warming up key contacts so that when they have to notify them, it doesn’t come as a surprise.
A disaster recovery plan (DRP) is a formal document created by organizations that contains detailed instructions on how to respond to unplanned incidents such as natural disasters, power outages, cyber attacks, or other disruptive events. The plan should contain strategies on minimizing the effects of a disaster so that an organization can continue to operate or be able to quickly resume key operations.
Disruptions lead to lost revenue, brand damage, and dissatisfied customers. And, the longer the recovery time, the greater the adverse impact to the business. This means a good disaster recovery plan should enable rapid recovery from disruptions, regardless of the source of the disruption.
A DRP is more focused than a business continuity plan and does not necessarily cover all contingencies for business processes, assets, human resources, and business partners. A successful DR solution addresses all types of operation disruptions, not just the major natural or person-made disasters that make a location unavailable. Disruptions can include power outages; telephone system outages; temporary loss of access to a facility due to fire, water, or floods; facility threats; or a low-impact nondestructive fire or other event. A DRP should be organized by type of disaster, affected systems, and the location. It must contain instructions that can be implemented by anyone; you cannot always count on the people who put the plan together to be available 24/7 during a disaster. Figure 27-4 illustrates the fundamentals of a successful disaster recovery plan.
Disaster recovery planning starts with defining key stakeholders and the data gathering process, a business impact analysis (BIA), risk analysis, recovery strategies, and finally the DR plan. All disaster recovery plans and business impact analyses are cyclical processes that must be maintained and exercised on a regular basis.
We rarely get advance notice that a disaster is ready to strike. Even with some lead time such as with hurricanes, multiple things can go wrong; every incident is unique and unfolds in unexpected ways. This is where a business continuity plan (BCP) becomes key to an organization’s survival. The appropriate BCP will provide your organization with the best shot at success during a disaster. You need to put a current, tested plan in the hands of all team members and personnel responsible for carrying out any part of that plan. The lack of a plan does not just mean a delay in recovering from an event or incident in your organization; it also could potentially mean the end of the business.
Business continuity (BC) refers to maintaining business functions or quickly resuming them in the event of a disruption, whether caused by a storm, flood, fire, or a malicious attack by cybercriminals. A business continuity plan outlines procedures and instructions an organization must follow in the face of such disasters; it covers business processes, assets, human resources, business partners, and more.
Many people think a DR plan is the same as a BCP, but a DR plan focuses mainly on restoring the IT infrastructure and operations after a crisis. The DR plan is actually one part of a complete business continuity plan, because a BCP looks at the continuity of the entire organization, meaning it needs to be more thorough.
Do you have a way to get Sales, Human Resources, Manufacturing, and Support services functionally up and running so the company can continue to operate and make money after a disaster? For example, if the building that houses your customer service representatives is flattened by a natural disaster like a storm, do you know how those salespeople will be able to handle customer calls? Will you have them work from home temporarily or from an alternate location? The BCP addresses these types of concerns.
A business impact analysis is another part of a BCP. A BIA identifies the impact of a sudden loss of business functions, usually quantified as a direct cost to the business. This analysis can also help you evaluate whether you should outsource noncore activities in your BCP, which can come with its own risks. The BIA essentially helps you look at your entire organization’s processes and determine which are most important.
Continuity of operations planning (COOP) is a federal initiative to encourage people and departments to plan to address how critical operations will continue under a broad range of circumstances. COOP is important as a good business practice and because the planning fosters recovery and survival in and after emergency situations. A COOP plan addresses emergencies from an all-hazards approach. A continuity of operations plan establishes policy and guidance, ensuring that critical functions continue and that personnel and resources are relocated to an alternate facility in case of emergencies. The plan should develop procedures for
Alerting, activating, notifying, and deploying employees
Identifying critical business functions
Establishing an alternate facility or work-from-home process
Creating a roster of personnel with authority and knowledge of business operational functions
Creating a continuity of business operations plan is a guided process and a team effort; it will draw on the team’s understanding of department operations with emergency management’s expertise in preparing for contingencies.
The plan could be activated in response to a wide range of events or situations—from a fire in the building to a natural disaster to the threat or occurrence of a terrorist attack. Any event that makes it impossible for employees to work in their regular facility could result in the activation of the continuity plan.
Note
Continuity planning is simply the good business practice of ensuring the execution of essential functions and a fundamental duty of public and private entities responsible to their stakeholders.
An incident response team is a group of business and information technology professionals in charge of preparing for and reacting to any type of organizational emergency. Responsibilities of an incident response team include developing a proactive incident response plan, testing and resolving system vulnerabilities, maintaining strong security, and exercising best practices enterprisewide. The team provides support for all incident handling measures, and their expertise typically covers various technical skills, backgrounds, and roles to be prepared for a wide range of unforeseen security incidents.
In incident response, types of emergencies are usually categorized in two ways:
Public incidents: These incidents affect an entire community. They could include natural disasters, terrorist attacks, and widespread epidemics.
Corporate/organizational incidents: These incidents are typically organization-specific and happen on a smaller scale. They could include data breaches, cybersecurity attacks, and physical location threats.
Incident response teams are trained to be prepared for both types and are common in organizations and businesses with valuable intellectual property (IP). As mentioned previously in the chapter, an incident response team could take the following forms:
Computer emergency response team (CERT): This team of professionals is in charge of handling cyber threats and vulnerabilities within an organization. In addition, CERTs tend to release their findings to the public to help other companies and teams strengthen their security infrastructure.
Computer security incident response team (CSIRT): This team of professionals is responsible for preventing and responding to security incidents. A CSIRT may also handle aspects of incident response for other departments, such as dealing with legal issues or communicating with the press.
Security operations center (SOC): Typically, this team includes investigators, threat hunters, and analysts who focus on system security incident response; they often are part of or contribute to the CSIRT investigation.
An incident response plan must have a data retention policy in preparation for executive review, internal audit, and for possible prosecution. The policy should include specific steps on preserving data and documenting the chain of custody. Without this policy, the cause of the data breach could remain unknown, and a similar breach could occur again in the future. Additionally, an organization can experience numerous legal repercussions for failing to properly preserve data. During a cyber incident, legal counsel should work with the incident response team. The attorney’s participation during this process helps establish an attorney-client privilege regarding incident response inquiries, data breaches, or cyber attacks. Note that some tasks, such as when an IRT performs daily operations, do not need or receive attorney-client privilege.
Retention policies for the government agencies fall under NIST SP 800-53, which outlines the requirements contractors and federal agencies need to take to meet the Federal Information Security Management Act (FISMA). It requires data retention for a minimum of three years. SP800-53 also describes how to develop specialized sets of controls tailored for specific types of missions or business functions, technologies, or operation.
Note
For more information, see https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final.
Use the features in this section to study and review the topics in this chapter.
Review the most important topics in the chapter, noted with the Key Topic icon in the outer margin of the page. Table 27-3 lists a reference of these key topics and the page number on which each is found.
Table 27-3 Key Topics for Chapter 27
Key Topic Element |
Description |
Page Number |
---|---|---|
Section |
Incident Response Plans |
761 |
Incident Response Process |
765 |
|
Section |
Tabletop |
765 |
Section |
Walkthroughs |
766 |
Section |
Simulations |
766 |
Section |
MITRE ATT&CK |
767 |
Section |
The Diamond Model of Intrusion Analysis |
768 |
Diamond Model of Intrusion Analysis |
769 |
|
List |
Description of cyber kill chain attack framework |
770 |
Section |
Stakeholder Management |
771 |
List |
Facilitating and engaging key stakeholders of the incidence response team as part of a communication plan. |
772 |
Section |
Disaster Recovery Plan |
772 |
Disaster Recovery Plan |
773 |
|
Section |
Business Continuity Plan |
773 |
Section |
Continuity Of Operations Planning (COOP) |
774 |
Section |
Incident Response Team |
775 |
Section |
Retention Policies |
776 |
Define the following key terms from this chapter, and check your answers in the glossary:
Diamond Model of Intrusion Analysis
business continuity plan (BCP)
Answer the following review questions. Check your answers with the answer key in Appendix A.
1. Having an incident response plan is imperative; the first step is identifying and having the right people with the right skill sets and experience available and ready to respond. How often should you test and update your plan?
2. Part of your incident response process includes the Eradication phase. During this phase, how long afterward should you increase your monitoring?
3. Incident response simulations are fundamentally about what?
4. Which attack framework emphasizes the relationships and characteristics of four basic components?
5. The cyber kill chain is a series of eight steps that trace stages of a cyber attack. What is step 4?
3.236.100.210