Incident response is becoming the new norm in security operations. The reality is that keeping adversaries off your network and preventing unauthorized activity is not going to provide the level of security your enterprise requires. This means the system needs to be able to operate in a state of compromise yet still achieve the desired security objectives. Your mindset has to change from preventing intrusion and attack to preventing loss.
This chapter explores the use of an incident response function to achieve the goals of minimizing loss under all operating conditions. This will mean a shift in focus and a change in priorities as well as security strategy. These efforts can succeed only on top of a solid foundation of security fundamentals as presented earlier in the book, so this is not a starting place but rather the next step in the evolution of defense.
An incident is any event in an information system or network where the results are different than normal. Incident response is not just an information security operation. Incident response is an effort that involves the entire business. The security team may form a nucleus of the effort, but the key tasks are performed by many parts of the business.
A successful incident response effort requires two components: knowledge of one’s own systems and knowledge of the adversary. The ancient warrior/philosopher Sun Tzu explains it well in The Art of War: “If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.”
Incident response is a term used to describe the steps an organization performs in response to any situation determined to be abnormal in the operation of a computer system. The causes of incidents are many, from the environment (storms) to errors on the part of users to unauthorized actions by unauthorized users, to name a few. Although the causes may be many, the results can be sorted into classes. A low-impact incident may not result in any significant risk exposure, so no action other than repairing the broken system is needed. A moderate-risk incident will require greater scrutiny and response efforts, and a high-level risk exposure incident will require the greatest scrutiny. To manage incidents when they occur, a table of guidelines for the incident response team needs to be created to assist in determining the level of response.
Two major elements play a role in determining the level of response. Information criticality is the primary determinant, and this comes from the data classification and the quantity of data involved. Information criticality is defined as the relative importance of specific information to the business. Information criticality is a key measure used in the prioritization of actions throughout the incident response process. The loss of one administrator password is less serious than the loss of all of them. The second major element involves a business decision on how this incident plays into current business operations. A series of breaches, whether minor or not, indicates a pattern that can have public relations and regulatory issues.
Once an incident happens, it is time to react, and proper reaction requires a game plan. Contrary to what many want to believe, there are no magic silver bullets to kill the security monsters. A solid, well-rehearsed incident response plan is required. This plan is custom-tailored to the information criticalities, the actual hardware and software architectures, and the people. Like all large, complex projects, the challenges rapidly become organizational in nature—budget, manpower, resources, and commitment.
CERT is a trademark of Carnegie Mellon and is frequently used in some situations, such as the US-CERT.
Having an incident response management methodology is a key risk mitigation strategy. One of the steps that should be taken to establish a plan to handle business interruptions as a result of a cyber event of some sort is the establishment of a computer incident response team (CIRT) or a computer emergency response team (CERT).
The organization’s CIRT will conduct the investigation into the incident and make the recommendations on how to proceed. The CIRT should consist of not only permanent members but also ad hoc members who may be called upon to address special needs, depending on the nature of the incident. In addition to individuals with a technical background, the CIRT should include nontechnical personnel to provide guidance on ways to handle media attention, legal issues that may arise, and management issues regarding the continued operation of the organization. The CIRT should be created, and team members should be identified before an incident occurs. Policies and procedures for conducting an investigation should also be worked out in advance of an incident occurring. It is also advisable to have the team periodically meet to review these procedures.
The goals of an incident response process are multidimensional in nature:
Confirm or dispel incident
Promote accurate information accumulation and dissemination
Establish controls for evidence
Protect privacy rights
Minimize disruption to operations
Allow for legal/civil recourse
Provide accurate reports/recommendations
Incident response depends upon accurate information. Without it, the chance of following data in the wrong direction is a possibility, as is missing crucial information and only finding dead ends. The preceding goals are essential for the viability of an incident response process and the desired outcomes.
Attack frameworks provide a roadmap of the types of actions and sequence of actions used when attacking a system. Frameworks bring a sense of structure and order to the multidimensional problem associated with defending a variety of systems against multiple different types of attackers with various objectives. The objective of using a framework is to improve post-compromise detection of adversaries in enterprises by providing guidance on where an adversary’s actions may be observable and where one can take specific actions. Organizations can use frameworks to identify holes in defenses and prioritize them based on the risk associated with actions an adversary is likely to take. Three different frameworks are described in the following sections: the MITRE ATT&CK framework, the Diamond Model of Intrusion Analysis, and the Cyber Kill Chain.
Attackers have a method by which they attack a system. Although the specifics may differ from event to event, there are some steps that are commonly employed. There are also numerous types of attacks—from old-school hacking to the new advanced persistent threat (APT) attack. The differences are subtle and are related to the objectives of each form of attack.
Attacks are not a new phenomenon in enterprise security, and a historical examination of large numbers of attacks shows some common methods. The following are the traditional steps:
4. Gaining access
5. Escalating privileges
7. Creating backdoors
8. Covering tracks
9. Denial of service (DOS)
Using nmap to Fingerprint an Operating System
To use nmap to fingerprint an operating system, use the –O option:
nmap -O –v
This command performs a scan of interesting ports on the target (scanme.nmap.org) and attempts to identify the operating system. The –v option indicates that you want verbose output.
Footprinting is the determination of the boundaries of a target space. There are numerous sources of information, including websites, DNS records, and IP address registrations. Understanding the boundaries assists an attacker in knowing what is in their target range and what isn’t. Scanning is the examination of machines to determine what operating systems, services, and vulnerabilities exist. The enumeration step is a listing of the systems and vulnerabilities to build an attack game plan. The first actual incursion is gaining access to an account on the system, almost always as an ordinary user, as higher-privilege accounts are harder to target.
The next step is to gain access to a higher-privilege account by escalating privileges. From a higher-privilege account, the range of accessible activities is greater, including pilfering files, creating backdoors so you can return, and covering your tracks by erasing logs. The step of DOS is commonly used as a technique to block specific services (think DNS) while an attacker injects their own response, misdirecting the victim process. The detail associated with each step may vary from hack to hack, but in most cases, these steps are employed in this manner to achieve an objective.
A relatively new attack phenomenon is the advanced persistent threat (APT), which is an attack that always maintains a primary focus on remaining in the network, operating undetected, and having multiple ways in and out. APTs began with nation-state attackers, but the utility of the long-term attack has proven valuable, and many sophisticated attacks have moved to this route. Most APTs begin via a phishing or spear phishing attack, which establishes a foothold in the system under attack. From this foothold, the attack methodology is similar to the traditional attack method described in the previous section, but additional emphasis is placed on the steps needed to maintain a presence on a network, as shown here:
1. Define target
2. Research target
3. Select tools
4. Test for detection
5. Initial intrusion
6. Establish outbound connection
7. Obtain credentials
8. Expand access
9. Strengthen foothold
10. Cover tracks
11. Exfiltrate data
The initial intrusion is usually performed via social engineering (spear phishing), over e-mail, using zero-day custom malware. Another popular infection method is the use of a watering hole attack, planting the malware on a website that the victim employees will likely visit. The use of custom malware makes detecting the attack by antivirus/malware programs a near impossibility. After the attackers gain access, they attempt to expand access and strengthen the foothold. This is done by planting remote-access trojan (RAT) software in the victim’s network, creating network backdoors and tunnels that allow stealth access to its infrastructure.
The next steps, obtaining credentials and escalating privileges, are performed through the use of exploits and password cracking. The true objective is to acquire administrator privileges over a victim’s computer and ultimately expand it to Windows domain administrator accounts. One of the hallmarks of an APT attack is the emphasis on maintaining a presence on the system to ensure continued control over access channels and credentials acquired in previous steps. A common technique used is lateral movement across a network. Moving laterally allows an attacker to expand control to other workstations, servers, and infrastructure elements and perform data harvesting on them. Attackers also perform internal reconnaissance, collecting information on surrounding infrastructure, trust relationships, and information concerning the Windows domain structure.
APT Attack Model
The computer security investigative firm Mandiant (now a division of FireEye) was one of the pioneers in the use of incident response techniques against APT-style attacks. They published a model of an APT attack to use as a guide, listed here:
1. Initial compromise
2. Establish foothold
3. Escalate privileges
4. Internal reconnaissance
5. Move laterally
6. Maintain presence
7. Complete mission
The key step is step 5, moving laterally. Lateral movement is where the adversary traverses your network, using multiple accounts, and does so to discover material worth stealing as well as to avoid being locked out by normal operational changes. This is one element that can be leveraged to help slow down, detect, and defeat APT attacks. Blocking lateral movement can defeat APT-style attacks from spreading through a network and can limit their stealth.
The Cyber Kill Chain is a model developed by Lockheed Martin as a military form of engagement framework. This model has a series of distinct steps that an attacker uses during a cyberattack—from the early reconnaissance stages to the exfiltration of data. The use of the Cyber Kill Chain helps us understand and combat different forms of attack—from ransomware, to security breaches, and even advanced persistent threats (APTs).
The Cyber Kill Chain, shown in Figure 22.1, has slightly different steps depending on whose version you use, but the most common implementations include the following ones:
• Figure 22.1 Cyber Kill Chain
1. Reconnaissance Research and identify targets.
2. Weaponization Exploit vulnerabilities to enter.
3. Delivery Deliver the payload (evil content).
4. Exploitation Begin the payload attack on the system and gain entry.
5. Installation Implement backdoors, persistent access, bots, and so on.
6. Command and control Communicate to outside servers for control purposes.
7. Action on objective Obtain the objective of the attack (for example, steal intellectual property).
By understanding the progression of an attack, defenders can choose their point of defense, which enables them to react to an attack with a plan and a purpose.
Developed by Lockheed Martin, the Cyber Kill Chain is a framework used to defend against the chain of events an attacker takes, from the beginning of an attack to the end of an attack.
The MITRE ATT&CK framework is a comprehensive matrix of attack elements, including the tactics and techniques used by attackers on a system. This framework can be used by threat hunters, red teamers, and defenders to better classify attacks and understand the sequential steps an adversary will be taking when attacking a system. This framework enables personnel to plan and defend, even during an attack, and it acts as a useful tool in assessing an organization’s risk.
The MITRE ATT&CK framework has a fairly simple design, with the top row of the matrix covering activities such as initial access, execution, persistence, privilege escalation, defense evasion, credential access, discovery, lateral movement, collection, command and control, exfiltration, and impact. Under each of these activities is a series of techniques and sub-techniques. Taken together, this matrix paints a comprehensive picture of paths through an organization’s IT enterprise.
The MITRE ATT&CK framework is a knowledgebase of various real-world observations and attack techniques. It is often used by organizations for threat modeling.
The Diamond Model of Intrusion Analysis is a cognitive model used by the threat intelligence community to describe a specific event. It is based on the notion that an event has four characteristics, each comprising a corner of the diamond, as shown in Figure 22.2. Taken together, these elements describe an event. The four nodes that make up an event are adversary, infrastructure, capabilities, and victim. The adversary node is a description of the attacker and their data, including anything you know about them (e-mails, names, locations, handles, and so on). The infrastructure node is a description of what is being used in the attack, such as IP addresses, domain names, e-mail addresses, and so on. The victim node is the target, and the capabilities node is a description of what is being used (malware, stolen certificates/credentials, tools, exploits, and so on). As an example, a completed diamond could take the following form:
• Figure 22.2 Diamond Model of Intrusion Analysis
1. Adversary Whois is used to get an e-mail for the registrant—the possible attacker.
2. Infrastructure The C2 domain name resolves to an IP address.
3. Capabilities The response teams finds the C2 server domain name.
4. Victim A victim discovers malware and launches an incident response.
The Diamond Model enables intrusion analysis by placing malicious activity at four points of the diamond: adversary, infrastructure, capabilities, and victim.
A major tool for defenders who are hunting attackers is threat intelligence. As presented in Chapter 1, threat intelligence is the actionable information about malicious actors, their tools, infrastructure, and methods. Incident response is a game of resource management. No firm has the resources to protect everything against all threats or investigate all possible hostile actions; attempting to do so would result in wasted efforts. A key decision is where to apply incident response resources in response to an incident. A combination of threat intelligence combined with the concept of the kill chain (the attacker’s most likely path) and you have a means to prioritize actions against most meaningful threats.
Threat hunting is an iterative process of proactively searching out threats inside the network. Several different models can be employed for threat hunting, but one of the most effective is based on creating a hypothesis and then examining that hypothesis. This act provides a level of scope to the hunt—rather than looking for anything in a sea of mostly normal, one is looking for specific items. A typical hypothesis would be something like “an adversary is using stolen credentials to mimic authorized users during nonworking hours.” This hypothesis is concise and can be tested by examining a set of logs for specific activities during nonworking hours.
The objective of threat hunting is to use current knowledge of what adversaries are doing to firms, and check to see if that is happening on your network. This can increase detection of malicious activity beyond the typical incident-type triggers. A complete explanation of threat hunting can be found in the whitepaper “A Practical Model for Conducting Cyber Threat Hunting,” by Dan Gunter and Marc Seitz (https://www.sans.org/reading-room/whitepapers/threathunting/practical-model-conducting-cyber-threat-hunting-38710).
Security operations in an enterprise environment have a lot of moving parts. From a top-level view, you have vulnerability management, threat intelligence, incident response, and automated security operations. All of these operate off of data—data that comes from a myriad of network appliances, intrusion detection systems, firewalls, and other security devices. This data is typically fed into a security information and event management (SIEM) system that can collect, aggregate, and apply pattern matching to the volumes of data. Alerts can then be processed by security personnel. However, this is far from complete integration. Security orchestration, automation, and response (SOAR) systems take SIEM data as well as data from other sources and assists in the creation of runbooks and playbooks.
Security administrators can create a series of runbooks and playbooks that can be used in response to a wide range of incident response activities. The details behind runbooks and playbooks are covered next. Combinations of runbooks and playbooks can be used to document different security processes and can provide users with approved procedures for orchestrating even the most complex security workflows. SOAR software integrates all of these elements into manageable solutions for the security operations center personnel, combining both raw and processed data into actionable steps based on approved procedures.
SOAR systems are extremely valuable when it comes to incident mitigation of severe threats because they can automate data gathering and initiate threat response.
A runbook consists of a series of action-based conditional steps to perform specific actions associated with security automation. These actions might involve data harvesting and enrichment, threat containment, alerts and notifications, and other automatable elements of a security operations process. The primary purpose of a runbook is to accelerate the incident response process by automating a series of approved steps and processes. Runbooks typically are focused on the systems and services and how they are actively managed.
A playbook is a set of approved steps and actions required to successfully respond to a specific incident or threat. Playbooks are commonly instantiated as itemized checklists, with all pertinent data prefilled in—systems, team members, actions, and so on. Playbooks provide a simple step-by-step, top-down approach to the orchestration of activities of the security team. They can include a wide range of requirements—technical requirements, personnel requirements, and legal or regulatory requirements—all in a preapproved form that alleviates spur-of-the-moment scrambling when the clock is ticking on an active event.
A runbook typically focuses on technical aspects of computer systems or networks. A playbook is more comprehensive and has more of a people/general business focus.
Incident response is the set of actions security personnel perform in response to a wide range of triggering events. These actions are vast and varied because they have to deal with a wide range of causes and consequences. Through the use of a structured framework, coupled with properly prepared processes, incident response becomes a manageable task. Without proper preparation, this task can quickly become impossible or intractably expensive.
Incident response is the new business cultural norm in information security. The key is to design the procedures to include appropriate business personnel, not keep it as a pure information security endeavor. The challenges are many, including the aspect of timing as the activities quickly become a case of one group of professionals pursuing another.
Incident response is a multistep process with several component elements. The first is organization preparation, followed by system preparation. An initial detection is followed by initial response and then isolation, investigation, recovery, and reporting. There are additional process steps of follow-up and lessons learned, each of which is presented in the following sections. Incident response is a key element of a security posture and must involve many different aspects of the business to properly respond. This is best built upon the foundation of a comprehensive incident response policy that details the roles and responsibilities of the organizational elements with respect to the process elements detailed in this chapter.
For the elements of incident response process, it is important to know the names, the topics contained, and the order in which they are performed, as understanding the basic flow is important if one is to contribute to the response. The steps are as follows: preparation, identification, containment, eradication, recovery, and lessons learned.
Incident Response Defined
NIST Special Publication 800-61 defines an incident as the act of violating an explicit or implied security policy. This violation can be intentional, incidental, or accidental, with causes being wide and varied in nature. These include but are not limited to the following:
Attempts (either failed or successful) to gain unauthorized access to a system or its data
Unwanted disruption or denial of service
The unauthorized use of a system for the processing or storage of data
Changes to system hardware, firmware, or software characteristics without the owner’s knowledge, instruction, or consent
Environmental changes that result in data loss or destruction
Accidental actions that result in data loss or destruction
Incident response activities at times are closely related to other IT activities involving IT operations. Incident response activities can be similar to disaster recovery and business continuity operations. Incident response activities are not performed in a vacuum but rather are integrally connected to many operational procedures, and this connection is key to overall system efficiency.
Preparation is the phase of incident response that occurs before a specific incident. Preparation includes all the tasks needed to be organized and ready to respond to an incident. Incident response is the set of actions security personnel perform in response to a wide range of triggering events. These actions are varied, as they have to deal with a wide range of causes and consequences. The organization needs to establish the steps to be taken when an incident is discovered (or suspected); determine points of contact; train all employees and security professionals so they understand the steps to take and who to call; establish an incident response team; acquire the equipment necessary to detect, contain, and recover from an incident; establish the procedures and guidelines for the use of the equipment obtained; and train those who will use the equipment. Through the use of a structured framework coupled with properly prepared processes, incident response becomes a manageable task. Without proper preparation, this task can quickly become impossible or intractably expensive. Successful handling of an incident is a direct result of proper preparation.
The old adage that “those who fail to prepare, prepare to fail” certainly applies to incident response. Without preparation, an organization’s response to a security incident will be haphazard and ineffective. Establishing the processes and procedures to follow in advance of an event is critical.
Preparing an organization requires an incident response plan, both for the initial effort and for the maintenance of that effort. Over time, the organization shifts based on business objectives, personnel change, business efforts and focus change, new programs, and new capabilities; virtually any change can necessitate shifts in the incident response activities. At a minimum, the following items should be addressed and periodically reviewed in terms of incident response preparation:
Develop and maintain comprehensive incident response policies and procedures
Establish and maintain an incident response team
Obtain top-level management support
Agree to ground rules/rules of engagement
Develop scenarios and responses
Develop and maintain an incident response toolkit
System plans and diagrams
Critical asset lists
Practice response procedures
Scenarios (“Who do you call?”)
Systems require preparation for effective incident response efforts. Incident responders are dependent upon documentation for understanding hardware, software, and network layouts. Understanding how access control is employed, including specifics across all systems, is key when determining who can do what—a common incident response question. Understanding the logging methodology and architecture will make incident response data retrieval easier. All of these questions should be addressed in planning of diagrams, access control, and logging, to ensure that these critical security elements are capturing the correct information before an incident.
Preparing for Incident Detection
To ensure that discovering incidents is not an ad hoc, hit-or-miss proposition, the organization needs to establish procedures that describe the process administrators must follow to monitor for possible security events. The tools for accomplishing this task are identified during the preparation phase, as well as any required training. The procedures governing the monitoring tools used should be established as part of the specific guidelines governing the use of the tools but should include references to the incident response policy.
Having lists of critical files and their hash values, all stored offline, can make system investigation a more efficient process. In the end, when you are architecting a system, taking the time to plan for incident response processes will be crucial to a successful response once an incident occurs. Preparing systems for incident response is similar to preparing them for maintainability, so these efforts can yield regular dividends to the system owners. Determining the steps to isolate specific machines and services can be a complex endeavor and is one best accomplished before an incident, through the preparation phase.
After the hacker has a list of software running on the systems, they will start researching the Internet for vulnerabilities associated with that software. Numerous websites provide information on vulnerabilities in specific programs and operating systems. Understanding how hackers navigate systems is important because system administrators and security personnel can use the same steps to research potential vulnerabilities before a hacker strikes. This information is valuable to administrators who need to know what problems exist and how to patch them.
Establishing an incident response team is an essential step in the preparation phase. Although the initial response to an incident may be handled by an individual, such as a system administrator, the complete handling of an incident typically takes an entire team. An incident response team is a group of people who prepare for and respond to any emergency incident, such as a natural disaster or an interruption of business operations. A computer security incident response team in an organization typically includes key skilled members who bring a wide range of skills to bear in the response effort. Incident response teams are common in corporations as well as in public service organizations.
Incident response team members ideally are trained and prepared to fulfill the roles required by the specific situation (for example, to serve as incident commander in the event of a large-scale public emergency). Incident response teams are frequently dynamically sized to the scale and nature of an incident, and as the size of an incident grows and as more resources are drawn into the event, the command of the situation may shift through several phases. In a small-scale event, or in the case of a small firm, usually only a volunteer or ad hoc team may exist to respond. In cases where the incident spreads beyond the local control of the incident response team, higher-level resources through industry groups and government groups exist to assist in the incident. Advanced preparation in the form of contacting and establishing working relations with higher-level groups is an important preparation step.
The incident response team is a critical part of the incident response plan. Team membership will vary depending on the type of incident or suspected incident but may include the following members:
Internal and external subject matter experts (SMEs)
Public affairs officer
Security office contact
Incident Response Team Questions
Well-executed plans are often well tested; when and how often do you test your response plans? How will your team operate undetected in an environment owned by the adversary? Do you have a backup, separate e-mail system that is external to the enterprise solution? Is it encrypted?
In determining the specific makeup of the team for a specific incident, there are some general points to think about. The team needs a leader, preferably a higher-level manager who has the ability to obtain cooperation from employees as needed. It also needs a computer or network security analyst, since the assumption is that the team will be responding to a computer security incident. Specialists may be added to the team for specific hardware or software platforms as needed. The organization’s legal counsel should be part of the team on at least a part-time or as-needed basis. The public affairs office should also be available on an as-needed basis, because it is responsible for formulating the public response should a security incident become public. The organization’s security office should also be kept informed. It should designate a point of contact for the team in case criminal activity is suspected. In this case, care must be taken to preserve evidence should the organization decide to push for prosecution of the individuals.
This is by no means a complete list because each organization is different and needs to evaluate what the best mixture is for its own response team. Whatever the decision, the composition of the team, and how and when it will be formed, needs to be clearly addressed in the preparation phase of the incident response policy.
To function in a timely and efficient manner, ideally a team has already defined a protocol or set of actions to perform to mitigate the negative effects of most common forms of an incident. One key and often-overlooked member of the incident response team is the business. It may be an IT system being investigated, but the data, processes, and value all belong to the business, and the business is the element that understands the risk and value of what is under attack. Having key, knowledgeable business members on the incident response team is a necessity to ensure that the security actions remain aligned with the business goals and objectives of the organization.
An incident response plan is documentation associated with the steps an organization performs in response to any situation determined to be abnormal in the operation of a computer system. The value of the plan lies in its ability to facilitate execution of the required response steps. Although individual causes may vary, there is a defined response methodology in the plan, and this guides responders to the correct actions. A well-documented and approved plan also assists in providing the necessary management permissions in advance, as opposed to lengthy decision cycles when the heat of an attack is on.
Two major elements play a role in determining the level of response. Information criticality is the primary determinant, and this comes from the data classification and the quantity of data involved. The loss of one administrator password is less serious than the loss of all of them, for example. The second factor involves a business decision on how this incident plays into current business operations. A series of breaches, whether minor or not, indicates a pattern that can have public relations and regulatory issues.
To assist in the planning of responses and to group the myriad possible incidents into a manageable set of categories, one step of the incident response planning process is to define incident types/categories. Documented incident types/category definitions provide planners and responders with a set number of preplanned scripts that can be applied quickly, minimizing repetitive approvals and process flows. Examples of categories are interruption of a service, malicious communication, data exfiltration, malware delivery, phishing attack, and so on. This list should be customized to meet the IT needs of each firm.
It’s critical to define the roles and responsibilities of the incident response team members. These roles and responsibilities may vary slightly based on the identified categories, but defining them before an incident occurs empowers the team to perform the necessary tasks during the time-sensitive aspects of an incident. Permissions to cut connections, change servers, or start/stop services are common examples of predefined actions that are best defined in advance to prevent time-consuming approvals during an actual incident.
Planning the desired reporting requirements, including escalation steps, is an important part of the operational plan for an incident. Who will speak about the incident and to whom? How does the information flow? Who needs to be involved? When does the issue escalate to higher levels of management? These are all questions best handled in the calm of a pre-incident planning meeting where the procedures are crafted rather than determined on the fly as an incident is occurring.
Typically more than one person will respond to an incident. Defining the cyber-incident response team, including identifying key membership and backup members, is a task that needs to be done prior to an incident occurring. Once a response begins, trying to find personnel to do tasks only slows down the function and in many cases makes it unmanageable. The planning aspect of incident response needs to define who is on the team, whether it’s a dedicated team or a group of situational volunteers, and what their duties are.
You don’t really know how well a plan is crafted until it is tested. Exercises come in many forms and functions, and doing a tabletop exercise where planning and preparation steps are tested is an important final step in the planning process.
An incident is defined as a situation that departs from normal, routine operations. What differentiates an incident from an incident that requires a formal response from the incident response team is an important triage step performed at the beginning of the discovery of an abnormal condition. A single failed login is technically an incident, but if it is followed by a correct login, then it is not of any consequence. In fact, this could even be considered as normal. However, having 10,000 failed attempts on a system, or failures across a large number of accounts, is distinctly different and may be worthy of further investigation.
Of course, an incident response team can’t begin an investigation until a suspected incident has been detected. At that point, the detection phase of the incident response policy kicks in. One of the first jobs of the incident response team is to determine whether an actual security incident has occurred. Many things can be misinterpreted as a possible security incident. For example, a software bug in an application may cause a user to lose a file, and the user may blame this on a virus or similar malicious software. The incident response team must investigate each reported incident and treat it as a potential security incident until it can determine whether it is or isn’t. This means that your organization will want to respond initially with a limited response team before wasting a lot of time having the full team respond. This is the initial step to take when a report is received that a possible incident has been detected.
Security incidents can take a variety of forms, and who discovers the incident will vary as well. One of the groups most likely to discover an incident is the team of network and security administrators that runs devices such as the organization’s firewalls and intrusion detection systems.
Another common incident is a virus. Several packages are available that can help an organization detect potential virus activity or other malicious code. Administrators will often be the ones to notice something is amiss, but so might an average user who has been hit by the virus.
Social engineering is a common technique used by potential intruders to acquire information that may be useful in gaining access to computer systems, networks, or the physical facilities that house them. Anybody in the organization can be the target of a social engineering attack, so all employees need to know what to be looking for regarding this type of attack. In fact, the target might not even be one of your organization’s employees—it could be a contractor, such as somebody on the custodial staff or nighttime security staff. Whatever the type of security incident suspected, and no matter who suspects it, a reporting procedure needs to be in place for the employees to use when an incident is detected. Everybody needs to know who to call should they suspect something, and everybody needs to know what to do. A common technique is to develop a reporting template that can be supplied to an individual who suspects an incident so that the necessary information is gathered in a timely manner.
Detecting that a security event is occurring or has occurred is not necessarily an easy matter. In certain situations, such as the activation of a malicious payload for a virus or worm that deletes critical files, it will be obvious that an event has occurred. In other situations, such as where an individual has penetrated your system and has been slowly copying critical files without changing or destroying anything, the event may take a lot longer to detect. Often, the first indication that a security event has occurred might be a user or administrator noticing that something is “funny” about the system or its response.
As discussed previously, an incident is defined as any situation that departs from normal, routine operations. Whether an incident is important or not is the first point of decision as part of an incident response process. The act of identification is coming to a decision that the information related to the incident is worthy of further investigation by the IR team and, in addition, what aspects of the IR team are needed to respond. For example, an e-mail incident may require different response team members than an attack on web services or Active Directory (AD).
A key first step is in the processing of information and the determination of whether to invoke incident response processes. Incident information can come from a wide range of sources, including logs, employees, help desk calls, system monitoring, security devices, and more. The challenge is to detect that something other than simple common errors that are routine is occurring. When evidence accumulates, or in some cases specific items such as security device logs indicate a potential incident, the next step is to escalate the situation to the incident response team.
Although there is no such thing as a typical incident, for any incident there is a series of questions that can be answered to form a proper initial response. Regardless of the source, the following items are important to determine during an initial response:
Current time and date
Who/what is reporting the incident
Nature of the incident
When the incident occurred
Point of contact for involved personnel
Initial Response Errors
Mistakes such as these are common during initial response:
Failure to document findings appropriately
Failure to notify or provide accurate information to decision-makers
Failure to record and control access to digital evidence
Waiting too long before reporting
Underestimating the scope of evidence that may be found
The purpose of an initial response is to begin the incident response action and place it on a proper pathway toward success. The initial response must support the goals of the information security program. If something is critical, treating it as routine would be a mistake, so triage with respect to information criticality is important. The initial response must also be aligned with the business practices and objectives. Triage with respect to current business imperatives and conditions is important. The initial response actions need to be designed to comply with administrative and legal policies as well as to support decisions with regard to civil, administrative, or criminal investigations/actions. For these purposes, maintaining a forensically sound process from the beginning is important. It is also important that the information is delivered accurately and expeditiously to the appropriate decision-makers so that future actions can be timely. One of the greatest tools to achieve all of these goals is a simple and efficient process, so establishing fewer steps that are clear and clean is preferred. Complexity in the initial response process only leads to issues later because of delays, confusion, and incomplete information.
A cyber first responder must do as much as possible to control damage or loss of evidence. Obviously, as time passes, evidence can be tampered with or destroyed. Look around on the desk, on the Rolodex, under the keyboard, in desktop storage areas, and on cubicle bulletin boards for any information that might be relevant. Secure optical discs, flash memory cards, USB drives, tapes, and other removable media. Request copies of logs as soon as possible. Most ISPs will protect logs that could be subpoenaed. Take photos (some localities require the use of Polaroid photos because they are more difficult to modify without obvious tampering) or video. Include photos of operating computer screens and hardware components from multiple angles. Be sure to photograph internal components before removing them for analysis. The first responder can do much to prevent damage or can cause significant loss by digitally altering evidence, even inadvertently. Collecting data should be done in a forensically sound nature (see Chapter 23 for details), and be sure to pay attention to recording time values so time offsets can be calculated.
Common Technical Errors
Common technical mistakes during initial response include the following:
Altering time/date stamps on evidence systems
“Killing” rogue processes
Patching the system
Not recording the steps taken on the system
Not acting passively
Any of these activities can alter the state of the system making the investigation more difficult, if not impossible.
Once the incident response team has determined that an incident has occurred and requires a response, the first step is to contain the incident and prevent it from spreading. If this is a virus or worm that is attacking database servers, then the protection of uninfected servers is paramount. Containment is the set of actions taken to constrain the incident to the minimal number of machines. This preserves as much of production as possible and ultimately makes handling the incident easier. This can be complex because in many cases to contain the problem, one has to fully understand the problem, its root cause, and the vulnerabilities involved.
Once the incident response team has determined that an incident most likely has occurred, it must attempt to quickly contain the problem. At this point or soon after containment begins, depending on the severity of the incident, management needs to decide whether the organization intends to prosecute the individual who has caused the incident (in which case collection and preservation of evidence is necessary) or simply wants to restore operations as quickly as possible without regard to possibly destroying evidence. In certain circumstances, management might not have a choice, such as if specific regulations or laws require it to report particular incidents. If management makes the decision to prosecute, specific procedures need to be followed in handling potential evidence. Individuals trained in forensics should be used in this case.
The incident response team must decide how to address containment as soon as it has determined that an actual incident has occurred. If an intruder is still connected to the organization’s system, one response is to disconnect from the Internet until the system can be restored and vulnerabilities can be patched. This, however, means that your organization is not accessible to customers over the Internet during that time, which may result in lost revenue. Another response might be to stay connected and attempt to determine the origin of the intruder. A decision will need to be made as to which is more important for your organization. Your incident response policy should identify who is authorized to make this decision.
Other possible containment activities might include adding filtering rules or modifying existing rules on firewalls, routers, and intrusion detection systems; updating antivirus software; and removing specific pieces of hardware or halting specific software applications. If an intruder has gained access through a specific account, disabling or removing that account may also be necessary.
Incident Response Team and Connection to SOC
In many organizations, a group called the Security Operations Center (SOC) exists to manage potential security incidents. Security monitoring controls across the enterprise forward information to the SOC for aggregation, assignment, and handling. The personnel in the SOC are set up in a series of tiers to efficiently handle issues requiring escalation:
Tier 1: Alert Analyst (monitors alerts and if needed escalates after analysis)
Tier 2: Incident Responder (performs investigations and remediation)
Tier 3: Subject Matter Experts (typically few in number, handle the tough cases)
Once the immediate problems have been contained, the incident response team needs to address the cause of the incident. If the incident is the result of a vulnerability that was not patched, the patch must be obtained, tested, and applied. Accounts may need to be disabled or passwords may need to be changed. Complete reloading of the operating system might be necessary if the intruder has been in the system for an unknown length of time or has modified system files. Determining when an intruder first gained access to your system or network is critical in determining how far back to go in restoring the system or network.
One method of isolating a machine is through a quarantine process. Quarantine is a process of isolating an object from its surroundings, preventing normal access methods. The machine may be allowed to run, but its connection to other machines is broken in a manner to prevent the spread of infection. Quarantine can be accomplished through a variety of mechanisms, including the erection of firewalls restricting communication between machines. This can be a fairly complex process, but if properly configured in advance, the limitations of the quarantine operation can allow the machine to continue to run for diagnostic purposes, even if it no longer processes a workload.
A more extreme response is device removal. In the event that a machine becomes compromised, it is simply removed from production and replaced. When device removal entails the physical change of hardware, this is a resource-intensive operation. The reimaging of a machine can be a time-consuming and difficult endeavor. The advent of virtual machines (VMs) changes this entirely, as the provisioning of virtual images on hardware can be accomplished in a much quicker fashion.
One key decision point in initial response is that of escalation. When a threshold of information becomes known to an operator and the operator decides to escalate the situation, the incident response process moves to a notification and escalation phase. Not all incidents are of the same risk profile, and incident response efforts should map to the actual risk level associated with the incident. When the incident response team is notified of a potential incident, its first steps are to confirm the existence, scope, and magnitude of the event and then respond accordingly. This is typically done through a two-step escalation process, where a minimal quick-response team begins and then adds members as necessitated by the issue.
Assessing the risk associated with an incident is an important first step. If the characteristics of an incident include a large number of packets destined for different services on a machine (an attack commonly referred to as a port scan), then the actions needed are different from those needed to respond to a large number of packets destined to a single machine service. Port scans are common, and to a degree relatively harmless, whereas port flooding can result in denial of service. Determining the specific downstream risks is important in prioritizing response actions.
The response to an incident will be highly dependent upon the particular circumstances of the intrusion. There are many paths one can take in the steps associated with an incident; the challenge is in choosing the best steps in each case. During the preparation stage, a wide range of scenarios can be examined, allowing time to formulate strategies. Even after an incident response team has planned a series of strategies to respond to various scenarios, determining how to employ those preplanned strategies to proper effect still depends on the circumstances of a particular incident. A variety of factors should be considered in the planning and deployment of strategies, including, but not limited to, the following:
How critical are the impacted systems?
How sensitive is the data?
What is the potential overall dollar loss involved/rate of loss?
How much downtime can be tolerated?
Who are the perpetrators?
What is the skill level of the attacker?
Does the incident have adverse publicity potential?
Playbooks are a set of step-by-step approved practices to aid analysts in beginning an investigation and/or responding to an uncommon event. Designed with checklists, and structured to match the automated systems data flows, playbooks reduce the cognitive load necessary to manage an incident. And because they have been planned and approved in advance, they alleviate the spur-of-the-moment scrambling when the clock is ticking on an active event. Playbooks are the best practice solution to managing incidents.
These pieces of information provide boundaries for the upcoming investigations. There are still numerous issues that need to be determined with respect to the upcoming investigation. Addressing these issues helps provide focal points during the investigation.
Restore normal operations
Determine public relations play
“To spin or not to spin?”
Determine probable attacker
Internal: Handle internally or prosecute?
Involve law enforcement?
Determine type of attack
DoS, theft, vandalism, policy violation?
Classify victim system
Number of users?
What other systems are affected?
Using the answers to these questions helps the team determine the necessary steps in the upcoming investigation phase. Although it is impossible to account for all circumstances, this level of strategy can greatly assist in scoping the work ahead during the investigation phase.
Investigation Best Practice
The first rule of incident response investigations is “Do no harm.” If the investigation itself causes issues for the business, how is this different from a business perspective than the original attack vector? In fact, in advanced threats, the attackers take great care not to impact the system or business operations in any way that could lead to their discovery. It is important for the response team to exercise extreme caution and to do no harm, lest they make future investigations impractical or deemed to be not worth pursuing.
The true investigation phase of an incident is a multistep, multiparty event. With the exception of very simple events, most incidents will involve multiple machines and potentially impact the business in multiple ways.
The primary objective of the investigative phase is to make the following determinations:
What systems are affected
What was compromised
What was the vulnerability
Who did it (if possible to determine)
What are the recovery/remediation options
Although the list appears daunting, this is where the real work of incident response occurs. It will take a team effort; partly because of workload, partly because of specialized skills, and partly because the entire effort is being performed in a race against time.
Duplication of drives is a common forensics process. It is important to have accurate copies and associated hash values so that any analysis is performed under proper conditions. Forensic disk duplication is necessary to ensure all data, including metadata, is properly captured and analyzed as part of the overall process.
To monitor network flow data, including who is talking to whom, one source of information is NetFlow data. NetFlow is a protocol/standard for the collection of network metadata on the flows of network traffic. NetFlow is now an IETF standard and allows for unidirectional captures of communication metadata. NetFlow can identify both common and unique data flows, and in the case of incident response, typically the new and unique NetFlow patterns are of most interest to incident responders.
A flow is unidirectional, so bidirectional flow would be recorded as two separate flows. NetFlow data is defined by these seven unique keys:
Source IP address
Destination IP address
Layer 3 protocol
TOS byte (DSCP)
Input interface (ifIndex)
Once a problem has been contained to a set footprint, the next step is eradication. Eradication involves removing the problem, and in today’s complex system environment, this may mean rebuilding a clean machine. A key part of operational eradication is the prevention of reinfection. Presumably, the system that existed before the problem occurred would be prone to a repeat infection; thus, this needs to be specifically guarded against. One of the strongest value propositions for virtual machines is the ability to rebuild quickly, making the eradication step relatively easy.
After the issue has been eradicated, the recovery process begins. At this point, the investigation is complete and documented. Recovery is the returning of the asset into the business function. Eradication, the previous step, removed the problem, but in most cases the eradicated system will be isolated. The recovery process includes the steps necessary to return the systems and applications to operational status.
Recovery is an important step in all incidents. One of the first rules is to not trust a system that has been compromised, and this includes all aspects of an operating system. Whether there is known destruction or not, the safe path is one where the recovery step includes reconstruction of affected machines. Recovery efforts from an incident involve several specific elements. First, the cause of the incident needs to be determined and resolved. This is done through an incident response mechanism. Attempting to recover before the cause is known and corrected will commonly result in a continuation of the problem. Second, the data, if sensitive and subject to misuse, needs to be examined in the context of how it was lost, who would have access, and what business measures need to be taken to mitigate specific business damage as a result of the release. This may involve the changing of business plans if the release makes them suspect or subject to adverse impacts.
Recovery can be a two-step process. First, the essential business functions can be recovered, enabling business operations to resume. The second step is the complete restoration of all services and operations. Because of the difficulty and uncertainty involved in repairing systems, most best practices today involve reconstituting the underlying system and then transferring the operational data. Staging the recovery operations in a prioritized fashion allows a graceful return to an operating condition.
Restoration can be done in a wide variety of ways. For many systems, the reconstitution of a clean operating system can restore a system. This type of restoration requires a significant amount of preparation. Having a clean version of each of your assets provides for this type of restoration effort. Recovery sounds simple, but in large-scale incidents, the number of machines can be significant. Add to this the chance of reinfection as machines are restored. This means that simply replacing the machine with a clean machine is not sufficient; rather, the replacement needs protection against reinfection.
The other challenge in large-scale recovery events is the sequencing of the effort. When there are many machines to be restored and the restoration process takes time and resources, scheduling is essential. Setting up a prioritized schedule is one of the steps that needs to be considered in the planning process. The time to do this type of planning is before the hectic pace of an incident occurs.
There are many different incident response processes in the information security space. For the CompTIA Security+ exam, you should know the steps of their process:
A key aspect in many incidents is that of external communications. Having a communications expert who is familiar with dealing with the press and has the language nuances necessary to convey the correct information and not inflame the situation is essential to the success of any communication plan. Many firms attempt to use their legal counsel for this, but generally speaking, the legally precise language used by an attorney is not useful from a PR standpoint, and a more nuanced communicator may provide a better image. In many cases of crisis management, it is not the crisis that determines the final costs but the reaction to and communication of details after the initial crisis.
After the system has been restored, the incident response team creates a report of the incident. Detailing what was discovered, how it was discovered, what was done, and the results, this report acts as a corporate memory and can be used for future incidents. Having a knowledge base of previous incidents and the actions used is a valuable resource because it is in the context of the particular enterprise. These reports also allow a mechanism to close the loop with management over the incident and, most importantly, provide a road map of the actions that can be used in the future to prevent events of identical or similar nature.
Part of the report will be recommendations, if appropriate, to change existing policies and procedures, including disaster recovery and business continuity. The similarity in objectives makes a natural overlap, and the cross-pollination between these operations is important to make all processes as efficient as possible.
A post-mortem session should collect lessons learned and assign action items to correct weaknesses and to suggest ways to improve. There is a famous quote about those who fail to learn from history are destined to repeat it. The lessons learned portion serves two distinct lesson sets. The first determines what went wrong and allowed the incident to occur in the first place. The second is that a failure to block this means a sure repeat.
Once the excitement of the incident is over and operations have been restored to their pre-incident state, it is time to take care of a few last items. Senior-level management must be informed about what occurred and what was done to address it. An after-action report should be created to outline what happened and how it was addressed. Recommendations for improving processes and policies should be incorporated so that a repeat incident will not occur. If prosecution of the individual responsible is desired, additional time will be spent helping law enforcement agencies and possibly testifying in court. Training material may also need to be developed or modified as part of the new or modified policies and procedures.
In the reporting process, a critical assessment of what went right, what went wrong, what can be improved, and what should be continued is prepared as a form of lessons learned. This is a critical part of self-improvement and is not meant to place blame but rather to assist in future prevention. Having things go wrong in a complex environment is part of normal operations; having repeat failures that are preventable is not. The key to the lessons learned section of the report is to make the necessary changes so that a repeat event will not occur. Because many incidents are a result of attackers using known methods, once the attack patterns are known in an enterprise and methods exist to mitigate them, then it is the task of the entire enterprise to take the necessary actions to mitigate future events.
The computer (cyber) incident response team (CIRT) is composed of the personnel who are designated to respond to an incident. The incident response plan should identify the membership and backup members, prior to an incident occurring. Once an incidence response begins, trying to find personnel to perform tasks only slows down the function, and in many cases would make it unmanageable. Whether it’s a dedicated team or a group of situational volunteers, the planning aspect of incident response needs to address the topic of who is on the team and what their duties are.
Management needs to appoint the team members and ensure that they have time to be prepared for service. The team leader is typically a member of management who fully understands both the enterprise IT environment and IR process because their job is to lead the team with respect to the process. Subject matter experts (SMEs) on the various systems that are involved provide the actual working portion of the team, often in concert with operational IT personnel for each system. The team is responsible for all phases of the incident response process, which was covered previously in the chapter.
A critical step in the incident response planning process is to define the roles and responsibilities of the incident response team members. These roles and responsibilities may vary slightly based on the identified categories of incident, but defining them before an incident occurs empowers the team to perform the necessary tasks during the time-sensitive aspects of an incident. Permissions to cut connections, change servers, and start/stop services are common examples of actions that are best defined in advance to prevent time-consuming approvals during an actual incident.
Several specific roles are unique to all IR teams: the team leader, the team communicator, and an appropriate bevy of SMEs. The team leader manages the overall IR process, so they need to be a member of management so they can navigate the corporate chain of command. The team communicator is the spokesperson for the team to all other groups, inside and outside the company. IR team members are typically SMEs, and their time is valuable and should be spent on task. The team communicator shields these members from the time-consuming press interview portion as much as possible.
One really doesn’t know how well a plan is crafted until it is tested. Exercises come in many forms and functions, and doing a tabletop exercise where the planning and preparation steps are tested is an important final step in the planning process. Having a process and a team assembled is not enough unless the team has practiced the process on the systems of the enterprise.
As previously mentioned in the chapter, a tabletop exercise is one that is designed for the participants to walk through all the steps of a process, ensuring all elements are covered and that the plan does not forget a key dataset or person. This is typically a fairly high-level review, designed to uncover missing or poorly covered elements and gaps in communications, both between people and systems. This tabletop exercise is a critical final step because it validates the planning covered the needed elements. The steps in the exercise should be performed by the principal leaders of the business and IT functions to ensure that all steps are correct. Although this will take time from senior members, given the criticality of this business process, as it is being done for operations determined to be vital to the business, it hardly seems like overkill.
This exercise aspect is not a one-time thing; it should be repeated after major changes to systems that impact the continuity of the operations plan or other major changes such as personnel turnover. As such, major corporations regularly exercise these types of systems on a predetermined schedule, rotating through day and night shifts, primary and backup personnel, and various systems.
Walkthroughs examine the actual steps that take place associated with a process, procedure, or event. Walkthroughs are in essence a second set of eyes, where one party either explains or demonstrates the steps to perform a task while a second person observes. The observer’s job is to examine the activity for compliance with applicable policies and directives. Is the task being accomplished correctly in terms of the process? Are the proper controls, processes, and procedures being followed? Walkthroughs can be done on elements such as computer code, where the person who wrote the code shows it to others on the team and walks them through the program, line by line. Explaining how it works and showing how it is coded allows for others to examine both syntax and process flow and provide valuable feedback on the code before it is implemented in a project. Having a supervisor observe the process for any function enables an independent determination as to whether their actions are in line with corporate security policies. Because the person doing the work relies upon training and repetitive practice, a periodic walkthrough provides evidence that proper procedures are actually being followed. Walkthroughs are commonly used by audit personnel to ensure proper processes are being followed.
A simulation is an approximation of the operation of a process or system that is designed to represent the actual system operations over a period of time. The simulation can be used in place of systems or elements that are not practical to replicate during an exercise, such as a complex element like a chemical plant or a time-consuming activity like a backup operation. Simulations are used in exercises to provide context for the participants without the expense associated with the use of a real system.
The different types of exercise elements, tabletop exercises, walkthroughs, and simulations can be used together as part of an exercise package.
Stakeholders are the parties that have an interest in a process or the outcome of a process. Stakeholders can be internal or external to an organization. With respect to incident response scenarios, all levels of management and many different business functions can be involved internally, including corporate, legal, communications, liaisons with regulators, customer support elements, and the operations personnel. Externally, there can be issues that involve vendors and customers, and there may be reporting requirements to regulators and other outside groups. With this wide range of involved parties, having a structure to manage communication with the various stakeholders is important to keep them properly informed and to separate the communication tasks from the operational tasks associated with responding to the incident. Having a stakeholder management process, including defined personnel roles and responsibilities, is essential for the management of the stakeholders and their relationships during incidents.
Planning the desired reporting requirements, including escalation steps, is an important part of the operational plan for an incident. Who will talk for the incident response team and to whom, and what will they say? How does the information flow? Who needs to be involved? When does the issue escalate to higher levels of management? These are all questions best handled in the calm of a pre-incident planning meeting, where the procedures are crafted, rather than on the fly as an incident is occurring. A communication plan as part of the incident response effort that answers the preceding questions and defines responsibilities for communication is a key element to be developed during the preparation phase.
Reporting requirements can refer to industry, regulatory, and statutory requirements in addition to internal communications. Understanding the reporting requirements to external entities is part of the responsibility of the communications lead on the team. Having the correct information in the hands of the correct people at the correct time is an essential part of reporting, and a prime responsibility of the communications lead on the team.
A modern enterprise has many data sources that can aid in the proper running of the enterprise. Some of these sources contain normal operational data that represents a normal baseline. Other data elements indicate a departure from normal conditions. Collecting all of this data, and then processing it to determine normal and abnormal elements is done through the use of tools such as a security information and event management (SIEM) system or a security orchestration, automation, and response (SOAR) system. These systems assist the SOC personnel in managing the flow of data into streams that provide for the investigation of abnormal conditions.
Log files are a primary source of information during an investigation. Software can record in log files a wide range of information as it is operating. From self-health checks, to error-related data, to operational metadata supporting the events that are happening on a system, all this data ends up in log files. These log files act as a historical record of what happened on a system. Log files require configuration because if you don’t log an event when it happens, you can’t go back in time to capture it. By the same token, logging everything creates too much data—data that must be waded through during an investigation. The key is balance: record what you need to know to make determinations—no more, no less.
Networks are filled with equipment that can provide valuable log information. Firewalls, routers, load balancers, and switches can provide a wealth of information as to what is happening on the network. Network logs tend to have a duplication issue, as packets can traverse several devices, giving multiple, nearly identical records. Removing duplicate as well as extraneous data is the challenge with network logging, but the payoff can be big because proper logging can make tracing attackers easier.
Virtually every operating system creates system logs. These logs can provide a very detailed history of what actions were performed on a system. Login records that indicate failed logins can be important, but so can entries that show login success. Multiple failures followed by a success can be suspicious, especially when the number of failures and timing precludes a human operator typing. What about access permission failures? These can indicate an attempt to perform unauthorized activity. What about access successes? Logging these would swamp the database with a large number of irrelevant records. This is one of the challenges of logging things on a system—which logs produce meaningful answers and which just produce noise? Also, realize that the decision to log has to happen before an event occurs; in other words, you can’t go back and have a do-over if you fail to log a crucial piece of evidence.
Application logs are generated by the applications themselves as they run. Some applications provide extensive logging; others minimal or even no logging. Some applications allow configuration of what is logged; others do not. Many server applications—web servers, mail servers, and database servers—have extensive logging capability, including which user performed which action and when. Other systems merely log when they start and stop operations and may log errors.
Security logs are logs kept by the OS for metadata associated with security operations. In Microsoft Windows, literally hundreds of different events can be configured to write to the Security log—system starting, system shutting down, permission failures, logins, failed logins, changing the system time, a new process creation, scheduled task changes, and more. These logs can be important, but to be important they need to be tuned to collect the information needed. In Windows, this is typically done through group policy objects. The driving force for what needs to be recorded is the system’s audit policy, a statement about what records need to be kept.
The Windows Event Viewer is used to look at Windows logs. The System log displays information related to the operating system. The Application log provides data related to applications that are run on the system. The Security log provides information regarding the success and failure of attempted logins as well as security-related audit events.
Web servers respond to specific, formatted requests for resources with responses, whether in the form of a web page or an error—and all of this activity can be logged. Web servers are specifically deployed to do this task, but they are also targets of attacks—attacks that try to run malicious scripts, perform DDoS attacks, perform injection and cross-site scripting attacks, and more. Web log files can help identify when these activities are occurring.
DNS logs, when enabled, can contain a record for every query and response. This can be a treasure trove of information for an investigator because it can reveal malware calling out to its command-and-control server, or data transfers to non-company locations. Analysis of DNS logs can show IP addresses and domain names that your systems should be communicating with as well as ones they shouldn’t be communicating with. In cases where an attacker or malware is doing the communication, these communication channels may be next to invisible on the network, but the DNS system, as part of the network architecture, can log the activity. This is one of the reasons why DNS logs are some of the most valuable logs to import into a SIEM system.
Authentication logs contain information about successful and failed authentication attempts. The most common source of authentication log information comes from the system’s security logs, but additional sources exist as well. With the expansion of multifactor authentication services, applications that manage second factors also have logs. These logs are important, as they can show anomalies such as proper primary login data but failed second-factor data, indicating that the primary authentication information may have been disclosed.
Dump files are copies of what was in memory at a point in time—typically a point when some failure occurred. Dump files can be created by the operating system (OS) when the OS crashes, and these files can be analyzed to determine the cause of the crash. Dump files can also be created by several utilities and then shipped off to a third party for analysis when an application is not behaving correctly. Dump files can contain a wide range of sensitive information, including passwords, cryptographic keys, and more. Care should be taken when handling dump files, and especially when sharing them for analysis. Several security vendors have tools that assist in the securing of sensitive information in dump files, but the risk of secret disclosure is still present. Because of the size and complexity involved in interpreting dump files, they are not a common investigative tool, except for narrow investigations such as why a system is crashing.
Attackers, on the other hand, love to get dump files and peruse them; therefore, setting systems to not persist dump files is common to prevent hackers from crashing a server and then coming back to get the subsequent dump file.
Voice over IP (VoIP) solutions and call manager applications enable a wide range of audio and video communication services over the Internet. These systems can log a variety of data, including call information such as the number called (to and from), time of the call, and duration of the call. These records are called call detail records (CDRs). When combined with video and audio systems using VoIP, these logs can be enhanced with information as to how the information was encoded, including the codecs involved and the resolutions.
The Session Initiation Protocol (SIP) is a text-based protocol used for signaling voice, video, and messaging applications over IP. SIP provides information for initiating, maintaining, and terminating real-time sessions. SIP traffic logs are typically in the SIP Common Log Format (CLF), which mimics web server logs and captures the details associated with a communication (such as to and from).
Syslog stands for System Logging Protocol and is a standard protocol used in Linux systems to send system log or event messages to a specific server, called a syslog server. Rsyslog is an open source variant of syslog that follows the syslog specifications but also provides additional features such as content-based filtering. Syslog-ng is another open source implementation of the syslog standard. Syslog-ng also extends the original syslog model with elements such as content filtering. A primary advantage of syslog-ng over syslog and rsyslog is that it can tag, classify, and correlate in real time, which can improve SIEM performance. For Linux-based systems, these implementations are the de facto standard for managing log files. As log files are one of the primary artifact sources, investigations make significant use of log files and syslog-captured data to build histories of what actually happened on a system.
Syslog, rsyslog, and syslog-ng all move data into log files on a log server. Rsyslog and syslog-ng both extend the original syslog standard by adding capabilities such as content filtering, log enrichment, and correlation of data elements into higher-level events.
On Linux systems, the initial daemon that launches the system is called systemd. When systemd creates log files, it does so through the systemd-journald service. Journalctl is the command that is used to view these logs. To see the various command options for journalctl, you should consult the man pages on the system. Here is an example of a journalctl command to view logs for a given system service:
journalctl -u ssh
You should understand the differences between journalctl and syslog. Journalctl is the command to examine logs on a server. Syslog (and the variants rsyslog and syslog-ng) is used to move logs to a log server and sometimes to manipulate the log file entries in transit.
NXLog is a multiplatform log management tool designed to assist in the use of log data during investigations. This tool suite is capable of handling syslog-type data as well as other log formats, including Microsoft Windows. It has advanced capabilities to enrich log files through context-based lookups, correlations, and rule-based enrichments. NXLog has connectors to most major applications and can act as a log collector, forwarder, aggregator, and investigative tool for searching through log data. As logs are one of the most used data sources in investigations, tools such as NXLog can enable investigators to identify security issues, policy violations, and operational problems in systems.
Bandwidth monitors are utilities designed to measure network bandwidth utilization over time. Bandwidth monitors can provide information as to how much bandwidth is being utilized, by service type, and how much remains. Bandwidth monitors can log this information over time and provide a historical record of network congestion problems, including by type of traffic in quality of service–enforced networks.
NetFlow and sFlow are protocols designed to capture information about packet flows (that is, a sequence of related packets) as they traverse a network. NetFlow is a proprietary standard from Cisco. Flow data is generated by the network devices themselves, including routers and switches. The data that is collected and shipped off to data collectors is a simple set of metadata—source and destination IP addresses, source and destination ports, if any (ICMP, for example, doesn’t use ports), and the protocol. NetFlow does this for all packets, while sFlow (sampled flow) does a statistical sampling. On high-throughput networks, NetFlow can generate large quantities of data—data that requires de-duplication. However, having all that data will catch the rare security event packets. sFlow is more suited for statistical traffic monitoring. Cisco added statistical monitoring to NetFlow on its high-end infrastructure routers to deal with the traffic volumes.
Both NetFlow and sFlow collect packets from routers and switches. NetFlow data can be useful in intrusion investigations. sFlow is used primarily for traffic management, although it will help with DDoS attacks.
Internet Protocol Flow Information Export (IPFIX) is an IETF protocol that’s the answer to the proprietary Cisco NetFlow standard. IPFIX is based on NetFlow version 9 and is highly configurable using a series of templates. The primary purpose of IPFIX is to provide a central monitoring station with information about the state of the network. IPFIX is a push-based protocol, where the sender sends the reports and receives no response from the receiver.
Metadata is data about data. A file entry on a storage system has the file contents plus metadata, including the filename, creation, access, and update timestamps, size, and more. Microsoft Word files have the document contents and additional fields of associated metadata. JPEGs have the same fields of metadata, including the location of the capture and the device used to create the images. Tons of metadata exists on a system, and in many cases individual elements of metadata need to be correlated with other metadata to determine activities. Take, for example, when a USB is inserted into a system. This creates metadata, but for what user? Separate metadata can tell you who was logged in at that time. Collecting, analyzing, and correlating metadata are all part of almost every investigation.
Remember that everything digital contains metadata, and correlating metadata is a part of almost every investigation.
E-mail is half metadata, half message. For short messages, the metadata can be larger than the message itself. E-mail metadata is in the header of the e-mail and includes routing information, the sender, receiver, timestamps, subject, and other information associated with the delivery of the message. The header of an e-mail includes information for the handling of the e-mail between mail user agents (MUAs), mail transfer agents (MTAs), and mail delivery agents (MDAs), as well as a host of other details. The entire message is sent via plain ASCII text, with attachments included using Base64 encoding. The e-mail header provides all of the information associated with an e-mail as it moves from sender to receiver. E-mail is covered in depth in Chapter 17.
Mobile devices generate, store, and transmit metadata. Common fields include when a call or text was made, whether it was an incoming or outgoing transmission, the duration of the call or the text message’s length (in characters), and the phone numbers of the senders and recipients. Note that the message or audio signal is not part of the metadata, but how much can you get from the metadata alone? More than meets the eye. For example, numbers can be looked up, providing the identities of senders and receivers (such as a conversation with the doctor’s office, followed by a call from a pharmacy).
Other sources of metadata include things like Wi-Fi access points connected to, GPS data in application logs, whether the device has a camera, and EXIF data (discussed later in the “File” section).
The Web provides a means of moving information between browsers and servers. There are a variety of protocols involved and a variety of sources of metadata. The web pages themselves are full of metadata, and browsers store different metadata covering what pages were accessed and when. Browser metadata is a commonly used source of forensic information, because entries of what and when a browser has accessed data can be important. Did a user go to a specific web page? Did they use a web-based e-mail client, exposing actual e-mail information as well as the fact they used e-mail? How long were they on a site? If a user hits a site that displays an image tagged by one of the security appliances, did they stay on that page or immediately go to a different site? There can be a wealth of user behavior information with respect to web browsing.
File metadata comes in two flavors: system and application. The filesystem uses metadata to keep track of the filename as well as the timestamps associated with last access, creation, and last write. The system metadata will include items needed by the OS, such as ownership information, parent object, permissions, and security descriptors.
Application metadata in a file is part of the file data field and is used by the application. For instance, a Microsoft Word document contains a lot of metadata, including fields for author, company, number of times edited, last print time, and so on. Currently, Word has over 90 fields of metadata that can be used/modified by a user. A JPEG file, on the other hand, has metadata that’s typically expressed in the form of EXIF data. The Exchangeable image file (EXIF) format is a standard that defines the formats of image, audio, and metadata tags used by cameras, phones, and other digital recording devices. Common EXIF metadata can include the following:
The original filename
Capture and last edited date and timestamps (with varying precision)
GPS location coordinates (degrees of latitude and longitude)
A small thumbnail of the original image
The author’s name and copyright details
Device information, including manufacturer and model
Capture information, including lens type, focal range, aperture, shutter speed, and flash settings
EXIF data exists to assist applications that use these files and can be modified independently of the file contents.
Metadata is a very valuable source of information during an investigation. Understanding what type of information and detail are present in each major category is important.
A plethora of data is available in a system for collection and use. For this to be useful, investigators have to have a sense of what is available and over what time frames. Two models used to document and provide a basis of understanding are the collection inventory matrix (CIM) and the collection management framework (CMF).
The collection inventory matrix (CIM) is a simple method used to sort your data sources with respect to a specific investigation or threat hunt. A CIM is a simple matrix with data sources listed as rows, and the columns can indicate what they cover, as in enterprise, business unite, enclaves, and so on. At the intersection of a row and column is a simple qualitative measure of something, like have you used them before? Are they easy or hard? Do you have access? Other commonly used differentiators include automated/not automated, level of completeness, and authority to collect (do you need special permission to access/use?). The purpose is to allow quick sorting when chasing your hypothesis; this is a form for prioritizing the sources. Once the matrix is populated, you can apply the typical heat map colors (red, yellow, green) for high, medium, and low to make an easy-to-use chart of what can help your current hunt/incident.
The collection management framework (CMF) is the tool used to maintain a record of what information sources you have available as well as data about those resources. Again, the rows are the different information sources, and probably arranged in some fashion that helps you navigate the rows, either by type or by location. The columns are the descriptors about the data sources, and these can vary from company to company. Some of the common descriptors include what, where, who controls, how long it is stored, what it is good for (what part of the kill chain does it apply to?), and so on. The idea is to have a catalog of what data is available to you that can help you shape your threat hunting and incident response actions. Assume you found that 10 days ago, a suspicious e-mail delivered a payload. You suspect a C2 server was employed by the actor. How long will your DNS logs help you? If the DNS logs are for 60 days, then yes. But what if a forensic investigation shows you a 90-day-old attack? They’re not as helpful anymore. Understanding the data sources, what they cover, for how long, and so on, is very helpful to people doing the investigations. Because this information really doesn’t change much over time, having it ahead of time as opposed to figuring it out as you go speeds up investigations and removes a lot of frustrations.
Many options are available to a team when planning and performing processes and procedures. For assistance in choosing a path, the team can consult both standards and best practices in the proper development of processes. From government sources to industry sources, there are many opportunities to gather ideas and methods, even from fellow firms.
The new standard of information security involves living in a state of compromise, where you should always expect that adversaries are active in your networks. It is unrealistic to expect that you can keep attackers out of your network. Operating in a state of compromise does not mean that you must suffer significant losses. A working assumption when planning for, responding to, and managing the overall incident response process is that the systems are compromised and that prevention cannot be the only means of defense.
The National Institute of Standards and Technology, a U.S. governmental entity under the Department of Commerce, produces a wide range of Special Publications (SPs) in the area of computer security. Grouped into several different categories, the most relevant SPs for incident response come from the Special Publications 800 series:
SP 800-61 Rev. 2: Computer Security Incident Handling Guide
SP 800-126 Rev. 2: NIST Security Content Automation Protocol (SCAP)
SP 800-137: Information Security Continuous Monitoring for Federal Information Systems and Organizations
NIST SP 800-36: Guide to Selecting Information Technology Security Products
NIST SP 800-40 Rev. 3: Guide to Enterprise Patch Management Technologies
NIST SP 800-51 Rev. 1: Guide to Using Vulnerability Naming Schemes [CVE/CCE]
In April 2015, the U.S. Department of Justice’s Cybersecurity Unit released a best-practices document titled “Best Practices for Victim Response and Reporting of Cyber Incidents.” This document identifies steps to take before a cyber incident, the steps to take during an incident response action, a list of actions not to take, and what to do after the incident. The URL for the document can be found in the “For More Information” section at the end of the chapter.
What Not to Do as Part of Incident Response
The U.S. Department of Justice has two specific recommended steps for what not to do as part of an incident response action:
Do not use the compromised system to communicate.
Do not hack into or damage another network or system.
The victim organization should always assume that any communications across affected machines will be compromised. This eavesdropping action is standard hacker behavior, and if you tip off your actions, they can be countered before you regain control of your system. Hacking, even retaliatory hacking, is illegal, and given the difficulty in attribution, attempts to respond by hacking the hacker may accidently result in hacking an innocent third-party machine.
An indicator of compromise (IOC) is an artifact left behind from computer intrusion activity. Detecting IOCs is a quick way to jump-start a response element. Originated by the security firm Mandiant, IOCs have spread in usage to a wide range of firms. IOCs act as a tripwire for responders. An IOC can be tied to a specific observable event, which then can be traced to related events, and to stateful events such as Registry keys. One of the biggest challenges in incident response is getting on the trail of an attacker, and IOCs provide a means of getting on the trail.
There are several standards associated with IOCs, but the three main ones are Cyber Observable Expression (CybOX), a method of information sharing developed by MITRE; OpenIOC, an open source initiative established by Mandiant that is designed to facilitate rapid communication of specific threat information associated with known threats; and the Incident Object Description Exchange Format (IODEF), an XML format specified in RFC 5070 for conveying incident information between response teams, both internally and externally with respect to organizations. The “For More Information” section at the end of the chapter provides URLs for all three standards.
Common Indicators of Compromise
Here are some common indicators of compromise:
Unusual outbound traffic This probably is the clearest indicator that data is going where it shouldn’t.
Geographical irregularities Communications going to countries for which no business ties exist is another key indicator that data is going where it shouldn’t.
Unusual login activity Failed logins, login failures to nonexistent accounts, and so forth, indicate compromise.
Anomalous usage patterns for privileged accounts Changes in patterns of when administrators typically operate and what they typically access indicate compromise.
Changes in database access patterns This indicates hackers are searching for data or reading it to collect large quantities.
Automated web traffic Timing can show some requests are scripts, not humans.
Change in HTML response sizes SQL injection can result in large HTML response sizes.
Large numbers of requests for specific files Numerous requests for specific files, such as join.php, may indicate automated attack patterns.
Mismatched port to application traffic This is a common method of attempting to hide activity.
Unusual DNS requests Command-and-control server traffic often uses unusual DNS requests.
Unusual Registry changes Unusual changes are indications of abnormal changes to a system state.
Unexpected patching Some hackers/malware will patch to prevent other hackers from entering a target.
Bundles of data/files in wrong place Large aggregations of data, frequently encrypted, may be files being prepared for exfiltration.
Changes to mobile device profiles Mobile is the new perimeter, and changes may indicate malware.
DDoS/DoS attacks Denial of service is used as a tool to provide smokescreen or distraction.
All data that is stored is subject to breach or compromise. Given this assumption, the question becomes, what is the best mitigation strategy to reduce the risk associated with breach or compromise? Data requires protection in each of the three states of the data lifecycle: in storage, in transit, and during processing. The level of risk in each state differs because of several factors.
Time Data tends to spend more time in storage and hence is subject to breach or compromise over longer time periods.
Quantity Data in storage tends to offer a greater quantity to breach or compromise than data in transit, and data in processing offers even less. If records are being compromised while being processed, then only records being processed are subjected to risk.
Access Different protection mechanisms exist in each of the domains, and this has a direct effect on the risk associated with breach or compromise. Operating systems tend to have very tight controls to prevent cross-process data issues such as error and contamination.
The next aspect of risk during processing is within process access to the data, and a variety of attack techniques address this channel specifically. Data in transit is subject to breach or compromise from a variety of network-level attacks and vulnerabilities. Some of these are under the control of the enterprise, and some are not.
One primary mitigation step is data minimization. Data minimization efforts can play a key role in both operational efficiency and security. One of the first rules associated with data is this: Don’t keep what you don’t need. A simple example of this is the case of spam remediation. If spam is separated from e-mail before it hits a mailbox, one can assert that it is not mail and not subject to storage, backup, or data retention issues. As spam can comprise greater than 50 percent of incoming mail, spam remediation can dramatically improve operational efficiency in terms of both speed and cost.
This same principle holds true for other forms of information. When credit card transactions are processed, certain data elements are required for the actual transaction, but once the transaction is approved, they have no further business value. Storing this information provides no business value, yet it does represent a risk in the case of a data breach. Data storage should be governed not by what you can store but by the business need to store. What is not stored is not subject to breach, and minimizing storage to only what is supported by the business need reduces risk and cost to the enterprise.
Data breaches may not be preventable, but they can be mitigated through minimization and encryption efforts.
Minimization efforts begin before data even hits a system, let alone a breach. During system design, the appropriate security controls are determined and deployed, with periodic audits to ensure compliance. These controls are based on the sensitivity of the information being protected. One tool that can be used to assist in the selection of controls is a data classification scheme. Not all data is equally important, nor is it equally damaging in the event of loss. Developing and deploying a data classification scheme can assist in preventative planning efforts when designing security for data elements.
MITRE, working together with partners from government, industry, and academia, has created a set of techniques (called Making Security Measurable) to improve the measurability of security. This is a comprehensive effort, including registries of specific baseline data, standardized languages for the accurate communication of security information, and formats and standardized processes to facilitate accurate and timely communications.
The entirety of the project is beyond the scope of this text, but Table 22.1 lists some of the common items by category, a few of which are described next in a bit more detail.
Table 22.1 Sample Elements of Making Security Measurable
MITRE has continued its efforts in the process of making security measurable and adding automation to the mix. Structured Threat Information Expression (STIX) is a structured language for cyberthreat intelligence information. MITRE created Trusted Automated Exchange of Indicator Information (TAXII) as the main transport mechanism for cyberthreat information represented by STIX. TAXII services allow organizations to share cyberthreat information in a secure and automated manner.
Cyber Observable Expression (CybOX) is a standardized schema for the communication of observed data from the operational domain. Designed to streamline communications associated with incidents, CybOX provides a means of communicating key elements, including event management, incident management, and more, in an effort to improve interoperability, consistency, and efficiency.
Data retention is the storage of data records. One of the first steps in understanding data retention in an organization is the determination of what records require storage and for how long. Among the many reasons for retaining data, some of the most common are for the purposes of billing and accounting, contractual obligation, warranty history, and compliance with local, state, and national government regulations, such as IRS rules. Maintaining data stores for longer than is required is a source of risk, as is not storing the information long enough. Some information is subject to regulations requiring lengthy data retention, such as PHI for workers who have been exposed to specific hazards. Some data elements, such as the card verification code (CVC/CV2) element in a credit card transaction, are never stored as part of a transaction record. They are used for approval and destroyed to prevent loss after the transaction is concluded.
Failure to maintain the data in a secure state can also be a retention issue, as is not retaining it. In some cases, destruction of data, specifically data subject to legal hold in a legal matter, can result in adverse court findings and sanctions. Even if the data destruction is unintentional or inadvertent, it is still subject to sanction, as the firm had a responsibility to protect it. Legal hold can add significant complexity to data retention efforts, as it forces almost separate storage of the data until the legal issues are resolved. Once data is on the legal hold track, its retention clock does not expire until the hold is lifted. This makes identifying, labeling, and maintaining the data subject to a legal hold an added dimension to normal storage considerations.
Data retention policies differ by organization. However, some information such as PHI may be subject to regulations requiring specific data retention rules.
The DOJ’s “Best Practices for Victim Response and Reporting of Cyber Incidents” www.justice.gov/criminal-ccips/file/1096971/download
Incident Object Description Exchange Format (IODEF) https://tools.ietf.org/html/rfc5070
Making Security Measurable https://makingsecuritymeasurable.mitre.org/
After reading this chapter and completing the exercises, you should understand the following about incident response.
There are several key attack frameworks an IR person needs to understand, including APT, Cyber Kill Chain, and the MITRE ATT&CK model.
The Diamond Model of Intrusion Analysis helps responders classify different attacks.
Threat intelligence is understanding what threats are likely for the enterprise.
Threat hunting is a proactive examination of a system looking for specific threats.
The role of incident management is the control of a coordinated and comprehensive response to an incident.
Learn the anatomy of an attack—both old versions and newer APT-style attacks.
The goals of incident response in an organization are to restore systems to functioning order and prevent future risk.
The major steps in the incident response process are preparation, incident identification, initial response, incident isolation, strategy formulation, investigation, recovery, reporting, and follow-up.
Develop a detailed understanding of the components of each of the steps.
Understand the linkages and interconnections between key process steps.
Enterprises have a wide range of data sources available for use in incident response, including log files from a wide range of sources.
Understanding where data sources are and how they can be used is critical for incident responders.
Metadata is another source of valuable information for IR personnel, and it comes from items such as e-mail, mobile devices, the Web, and file metadata.
There are two main data collection models: the collection inventory matrix and the collection management framework.
Modern systems should expect to exist in a state of compromise and have policies and processes designed to operate under these conditions.
The U.S. government, including NIST and the Department of Justice, has published useful guidance.
Indicators of compromise provide early-warning triggers for incident response investigators.
Taking actions against an incident in progress can be planned using a Cyber Kill Chain philosophy.
The “Making Security Measurable” material from MITRE can assist in the incident response process.
advanced persistent threat (APT) (839)
collection inventory matrix (CIM) (866)
collection management framework (CMF) (867)
computer emergency response team (CERT) (837)
computer incident response team (CIRT) (837)
Cyber Kill Chain (840)
Cyber Observable Expression (CybOX) (871)
data minimization (870)
Diamond Model of Intrusion Analysis (842)
incident response (837)
incident response plan (847)
incident response policy (844)
indicator of compromise (IOC) (868)
information criticality (837)
initial response (850)
lateral movement (840)
lessons learned (857)
MITRE ATT&CK framework (841)
remote-access trojan (RAT) (840)
reporting requirements (848)
security orchestration, automation, and response (SOAR) (843)
Structured Threat Information Expression (STIX) (871)
threat hunting (842)
threat intelligence (842)
Trusted Automated Exchange of Indicator Information (TAXII) (871)
Use terms from the Key Terms list to complete the sentences that follow. Don’t use the same term more than once. Not all terms will be used.
1. A(n) _______________ is any event in an information system or network where the results are different than normal.
2. When the attackers are focused on maintaining a presence during an incident, the type of attack is typically called a(n) _______________.
3. The determination of boundaries during an attack is a process called _______________.
4. The steps an organization performs in response to any situation determined to be abnormal in the operation of a computer system are called _______________.
5. One methodology for planning incident response defenses is known as _______________.
6. A(n) _______________ is an artifact that can be used to detect the presence of an attack.
7. The document that contains all the information about various data sources available to incident responders is referred to as the _______________.
8. _______________ is a proactive approach to finding an attacker in a network.
9. A key measure used to prioritize incident response actions is ________________.
10. _______________ and _______________ are used to communicate cyberthreat information between organizations.
1. Which of the following is not an indicator of compromise (IOC)?
A. Unusual outbound traffic
B. Increase in traffic over port 80
C. Traffic to unusual foreign IP addresses
D. Discovery of large encrypted data blocks that you don’t know the purpose of
2. A sysadmin thinks a machine is under attack, so he logs in as root and attempts to see what is happening on the machine. Which common technical mistake is most likely to occur?
A. The alteration of date/time stamps on files and objects in the system
B. Failure to recognize the attacker by process ID
C. Erasure of logs associated with an attack
D. The cutting of a network connection between an attacker and the current machine
3. What is the last step of the incident response process?
D. Lessons learned
4. Which of the following are critical elements in an incident response toolkit? (Choose all that apply.)
A. Accurate network diagram
B. Findings of last penetration test report
C. List of critical data/systems
D. Phone list of people on-call by area
5. Your organization experienced an APT hack in the past and is interested in preventing a reoccurrence. What step of the attack path is the best step at which to combat APT-style attacks?
A. Escalate privilege
B. Establish foothold
C. Lateral movement
D. Initial compromise
6. The goals of an incident response process include all of the following except which one?
A. Confirm or dispel an incident occurrence.
B. Minimize security expenditures.
C. Protect privacy rights.
D. Minimize system disruption.
7. During an initial response to an incident, which of the following is most important?
A. Who or what is reporting the incident
B. The time of the report
C. Who takes the initial report
D. Accurate information
8. When determining the level of risk of exposure for data in storage, in transit, or during processing, which of the following is not a factor?
C. Data type
9. What is the most useful tool to determine the next steps when investigating a common incident, like malware on a server?
B. SIEM data
D. Security orchestration, automation, and response (SOAR)
10. Which of the following activities should you not do during an incident response investigation associated with an APT?
A. Use the corporate e-mail system to communicate.
B. Determine system time offsets.
C. Use only qualified and trusted tools.
D. Create an off-network site for data collection.
1. The chief financial officer (CFO) sees you in the lunch room. Knowing that you are leading the company’s incident response initiative, she comes over to your table and asks if you have time to answer a question. You are surprised but say yes. Her question is simple and to the point: “Can you explain this incident response thing to me, in nontechnical terms, so I can respond appropriately at the next board meeting in the discussion?” In response, you offer to prepare a written outline for the CFO. In one page, outline the major points that need to be addressed and give examples in language suitable for the audience.
2. Explain the relationship between the anatomy of a hack and indicators of compromise.