Investigating a cyber breach can be a complex and often long process. The level of sophistication of the attacker, dwell time on the compromised network, and how far the attacker progressed through the cyberattack lifecycle directly contribute to the complexity of the investigation.
Incident responders follow a lifecycle-based process to investigate cyber breaches. The process typically includes multiple iterations of data collection and analysis in order to identify evidence of attacker activity. In some cases, the evidence is indisputable. In other cases, analysts must correlate and corroborate data from multiple sources to draw conclusions about attacker activity.
An investigation consists of many components that must work in tandem in order to determine the scope and extent of a breach. Investigating a large-scale breach requires a team of professionals with complementary skill sets, such as digital forensics, malware analysis, and cyber threat intelligence (CTI). Moreover, the order in which analysts discover evidence of attacker activity may not correspond to the order of the attack progression. For this reason, it is helpful to map digital evidence to a cyberattack framework in order to identify additional sources of potential evidence and reconstruct the “full picture” of an attack.
The evidence that analysts identify is crucial for containment and eradication. A remediation team uses information relating to the attacker operation to secure crucial assets and eradicate the attacker from the compromised environment.
This chapter draws on the information discussed thus far in this book, and it provides an in-depth discussion about investigating and remediating cyber breaches. The primary focus of this chapter is on large-scale network compromises. However, the reader can easily use the information presented to investigate smaller incidents as well.
Historically, enterprises leveraged traditional computer forensics to determine attacker activity on a compromised system. However, scalability was a significant challenge associated with this approach. Furthermore, the methodologies that underpinned traditional computer forensics focused on investigating computer crimes and were not flexible enough to support large-scale investigations in enterprise environments.
With the evolution of live response tools, enterprise incident response emerged. This approach combines digital forensics, live response technology, and CTI to allow organizations to respond to large-scale incidents efficiently. To facilitate enterprise incident response, victim organizations deploy incident response technology into their environments in order to collect and analyze forensic data at scale. Traditional computer forensic techniques still play a vital role in this process. However, incident responders acquire forensic images of specific systems of interest for more targeted analysis driven by investigative objectives.
With the introduction of endpoint detection and response (EDR) tools into the security market, organizations now have the ability to collect system events and other system metadata in the course of system operations to detect and investigate threats.
Regardless of the approach, effective incident response investigations follow a well-established process to gain a full understanding of an incident and remediate it successfully, as depicted in Figure 5.1.
The first step in the incident investigation process is to gather initial information about a reported incident and establish objectives for the investigation. Not all incidents require the same level of response. Furthermore, depending on the nature and scope of an incident, an organization may have different priorities and objectives for the investigation. For example, if analysts identify evidence of data theft, the discovery may lead to legal exposure if the data is subjected to regulatory compliance. Consequently, the objectives of the investigation may be different from those of a ransomware attack.
Incident responders must establish objectives for an investigation and use the information to drive investigative activities. Incident response is about managing residual risk, and the response effort must be proportional to the level of risk and impact associated with an incident. The following list briefly discusses typical activities in this step.
Chapter 3 discussed data acquisition methods in detail. Furthermore, Chapter 6 discusses this topic from a legal perspective with a particular emphasis on data preservation. The following list focuses on general data acquisition and preservation considerations and best practices for small- and large-scale incidents.
It is important to emphasize that the data that analysts require to answer crucial investigative questions may not exist. This scenario frequently happens during investigations of cyber breaches. For example, it is not uncommon for IT personnel to destroy forensic data by reimaging or rebuilding systems to contain an incident. In other cases, systems may not comply with security configuration baselines or retain log data as per the data retention policy. The result is that often vital forensic data is missing. Incident responders must document any missing data and communicate it with crucial stakeholders. Missing data is another reason why adequate and complete documentation is vital to the acquisition and preservation process.
The final point is that organizations must establish data acquisition and retention protocols as part of their incident response plan. Attempting to determine a protocol during an incident typically leads to confusion, conflicts of priorities, and inappropriate handling of digital evidence.
Arguably, analysis is the crux of incident response. The purpose of this step is to analyze available artifacts and other data to determine the root cause and full extent of an incident.
It is vital to emphasize that analysis is an iterative process. Analysts iterate through the lifecycle until they reach diminishing returns relating to new findings, such as discovering additional compromised systems or identifying additional attacker tools, tactics, and procedures (TTP) relevant to the investigated case. In simple terms, if consecutive iterations of the lifecycle no longer produce relevant findings, that is when the process typically terminates.
Analysis is a complex discipline, and it requires skills in multiple incident response domains, such as digital forensics, CTI, or malware reverse-engineering. During large-scale investigations, organizations typically assemble a team of analysts with complementary skills to progress analysis.
During enterprise incident response, organizations typically employ a lifecycle approach, as depicted in Figure 5.2, to analyze incidents.
Analysis is an iterative process. As analysts investigate an incident, they may decide to acquire and analyze additional data in order to understand the attacker's operations. This approach is particularly applicable to large-scale investigations. For example, during a forensic examination of a compromised system, an analyst may recover new IOCs. A logical step is to scan the environment for the indicators to determine if there were previously unknown compromised systems. If there are matches, the analyst may acquire data from the new systems for further analysis.
Once incident responders have a reasonable degree of confidence that they have understood the scope of an incident and have answers to the investigative questions, it is time to contain the incident and eradicate the threat actor from the compromised environment. Chapter 4 briefly discusses remediation as part of the incident response lifecycle. This topic is so important to cyber breach response that I dedicate an entire section to incident containment and eradication later in this chapter.
As previously mentioned, incident analysis is a complex process that requires skills in domains, such as digital forensics, CTI, and malware analysis. Each of these domains in turn is a complex discipline and requires specialized skills. During large-scale incidents, victim organizations typically convene a team with complementary skills across these domains. The personnel work together to answer crucial investigative questions and understand the scope of the incident. This section discusses each of the domains mentioned earlier and how they contribute to the overall investigative process.
Digital forensics is a specialized discipline that focuses on the analysis and recovery of digital data from compromised systems. Forensic analysts use specialized tools and techniques to reconstruct events on an examined system to determine how an attacker interacted with the system and to develop IOCs. Digital forensics can also help organizations recover deleted data to prove or refute a hypothesis. For example, in some cases, analysts may recover files deleted by an attacker that provide evidence of data staging.
Digital forensics is a broad discipline that requires skills and experience across various technologies. Three primary areas of digital forensics are computer, network, and mobile forensics. However, with cloud computing and other trends in technology, traditional forensics skills by themselves are no longer enough. Incident responders must be well versed in numerous technologies and be able to adapt to investigate incidents effectively. Arguably, specific knowledge and skills in traditional computer forensics are no longer enough in today's evolving digital work. Incident response professionals must understand the underlying methodologies that underpin the digital forensics and incident response (DFIR) domain to respond to different types of incidents and effectively work with business and technology stakeholders.
The following list discusses various aspects of digital forensics that organizations need to consider when building incident response capabilities.
It is important to emphasize that during large-scale investigations, analysts typically leverage a combination of the methods mentioned earlier. One frequent use case is to leverage live response techniques for rapid triage and acquire disk and physical memory images for an in-depth analysis. This approach makes the incident response process scalable.
Network forensics can be a stand-alone discipline and is sometimes the only source of evidence of attacker activity. However, in the majority of enterprise incident response cases, network forensics complements traditional host forensics and live response, and it provides corroborating evidence of attacker activity.
There are many challenges associated with network forensics. Organizations often encrypt communications, and decrypting the data can be a challenge, as well as violate regulatory requirements in some cases. Furthermore, network address translation (NAT) and the dynamic nature of IP address assignment on client systems, most of which use Dynamic Host Configuration Protocol (DHCP) to obtain ephemeral IP addresses, makes it challenging to track the sources of network connections. Finally, due to a significant amount of telemetry, network data is often stored for a short period of time unless specifically retained in a long-term storage solution.
Although mobile device forensics is not a mainstream discipline in enterprise incident response, it plays an increasingly important role in internal investigations, such as employee-related cases or electronic discovery for civil litigation. Analysts may recover and analyze a variety of artifacts from mobile devices, such as historical geolocation data, Internet browsing history, call history, documents, and data from installed apps, and they may even recover deleted data, among other artifacts. As of this writing, I have not come across cases where a threat actor used a compromised mobile device to pivot to an enterprise network.
As a field, digital forensics continually evolves, and there is ongoing research into various technologies to uncover system artifacts that may be of value to forensic analysts. Another important consideration is that forensic artifacts and techniques may vary from system to system. For these reasons, incident responders must continually strive to acquire new skills and stay abreast of new developments in this field.
Operating systems create many artifacts that have temporal characteristics, including filesystem, event logs, application logs, network connection logs, and registry entries, among others. Timeline analysis is a forensic technique that allows analysts to reconstruct events on the examined systems by arranging relevant events in a timeline. Timeline analysis is a powerful technique that allows analysts to answer questions relating to which events occurred before and after a given event, such as malware infection, and to gain valuable insights into attacker activity.
Timeline analysis is a particularly powerful tool during large-scale investigations. To arrive at a timeline of attacker activity during an incident, incident responders typically create timelines for individual compromised systems and combine them into a single master timeline consisting of significant events. This information is invaluable for reporting and reconstruing the picture of an attack. In my personal experience, a timeline with a narrative in business security language is particularly useful for reporting and communicating investigative findings to senior-level management.
One crucial consideration in building a timeline is to ensure that all timestamps are expressed in Coordinated Universal Time (UTC). UTC is the de facto standard for timestamping event logs. However, some organizations choose to use a time that is local to their geographic region instead. As a result, analysts must translate non-UTC time to UTC time before timelining events.
Digital forensics often relies on artifacts produced in the course of system operation, such as program execution artifacts, to answer vital investigative questions. In some cases, the data necessary to answer those questions may not be available. For example, a system has overwritten specific artifacts in the course of its operations or the necessary logging policies were not configured to capture the data of interest.
Even if specific artifacts are available, analysts may have to correlate and corroborate them with other sources of data, such as network telemetry to determine attacker activity. For example, during data theft cases, analysts typically correlate system artifacts with data relating to network connections in order to prove or refute a data theft hypothesis.
It is essential to keep in mind that digital forensics may not always provide all the answers to questions that business stakeholders ask. Another crucial consideration is that sophisticated threat actors often leverage anti-forensic techniques to make the discovery and investigation of their attacks challenging. For example, an attacker may manipulate the timestamps of a malware binary file to minimize the possibility of detection. Incident response professionals refer to this technique as timestomping.
To maximize the chances of answering vital investigative questions, enterprises need to configure logging and log retention policies to support incident response investigations. The enforcement of those policies across the enterprise is vital.
Chapter 1 briefly described CTI as a driver for cyber breach response. This section discusses intelligence-driven incident response, with a particular focus on tactical intelligence to support investigations.
Intelligence-driven response is an approach to incident response that leverages intelligence processes and concepts as an integral part of the overall investigation process.1 CTI augments digital forensic processes by researching and applying contextual threat information in order to identify attacker activity and scope an incident effectively. Not only is CTI necessary to understand the root cause and scope of a compromise, but it is also essential in containing an incident and eradicating the threat actor from the compromised environment.
CTI analysts leverage various techniques to gather information necessary to support incident investigations, such as attacker-centric and asset-centric approaches.2 This book discusses the attacker-centric approach only. Analysts also rely on structured analytical techniques, such as analysis of competing hypotheses (ACH),3 in order to reduce or eliminate analytical bias in the process. All too often, even the most veteran analysts have at some point made an attribution assessment based on past experiences instead of applying the appropriate rigor to their assessment.
CTI research typically starts with known information about an incident. For example, an organization has detected malware, a network connection to a known C2 domain, or evidence of a suspicious tool, such as credential harvesting software. The initial information acts as a starting point for the research. Analysts start with the initial indicators and perform research that may provide an insight into the TTP that an attacker leverages, uncover additional IOC, or even attribute the attack. CTI professionals often refer to this approach as intelligence enrichment. CTI research is like unraveling a thread. For example, I worked on cases where CTI enrichment of a single IOC opened up an entire investigation and allowed the analysts to uncover additional evidence of attacker activity in the compromised environment.
Attributing a cyberattack to a specific threat actor is often a part of the CTI process. Chapter 6 discusses attribution in depth. For the purposes of this chapter, it is worth noting that cyber attribution is becoming increasingly difficult as threat actors move away from using custom malware and leverage tools that they find in compromised environments to progress through the cyberattack lifecycle.
CTI analysts gather threat information and analyze the information to answer specific questions relating to an investigation. This section discusses the CTI lifecycle, with a particular emphasis on tactical intelligence to support incident response investigations.
CTI enrichment is an iterative process consisting of five phases. The last phase provides a feedback loop to the first phase to continue the process until the analysts have enough information to achieve the investigation's objectives or they reach diminishing returns relating to new findings This process is tightly coupled with forensic analysis and monitoring for attacker activity, as depicted in Figure 5.3.
It is crucial to emphasize that the intelligence-driven approach involves all the components mentioned here to support an investigation.
The following list briefly describes common components of the attacker-centric approach that analysts often leverage in investigations.
It is important not to confuse IOCs with indicators of attack (IOA). The latter represents the actions that an attacker takes before compromising a system or network, such as reconnaissance.
Not all CTI is equal. Attackers may easily change some indicators, whereas others are more difficult to change without significantly adjusting the way that a particular attacker operates. To demonstrate this concept, David Bianco created the Pyramid of Pain,4 as represented in Figure 5.4. The pyramid orders IOCs in an increasing level of difficulty for attackers to change. At the bottom, the pyramid lists more volatile indicators that are trivial to change, such as hashes and IP addresses. Toward the top, the pyramid lists indicators that are harder for attackers to change and that have more extended longevity.
According to David Bianco, the entire point of indicators is to use them in response and remediation. For this reason, incident responders need to keep in mind that some indicators are temporal. In contrast, other indicators, such as a particular way of executing an attack, are more reliable in the long term. For example, during client incident investigations, I have observed that indicators, such as C2 IP addresses or malware hash values, change. However, I have not come across a case where an attacker would suddenly change their behavioral characteristics in response to an investigation.
The CTI community uses the same acronym “TTP” to refer to two different but related terms that describe how threat actors operate: 1) tactics, techniques, and procedures and 2) tools, tactics, and procedures. In this book, I use the latter term to call out attacker tools explicitly. However, both terms are correct, and the community uses them interchangeably.
According to the Pyramid of Pain, TTPs are associated with behavioral characteristics and are the hardest to change. Tools are software applications, such as malware, utilities, and other software that a specific threat actor uses as part of their attack operations. Tactics describe how a threat actor operates at various phases of the cyberattack lifecycle and how the activities relate to one another. An example of a tactic used to escalate privileges on a compromised system is when a threat actor can use credential dumping software to harvest credentials from volatile memory and uses those credentials for lateral movement. Procedures describe a sequence of actions a threat actor executes during each phase of the cyberattack lifecycle. For example, during the lateral movement phase, an adversary may leverage harvested administrative credentials to access remote systems via services that accept remote connections, such as Secure Shell (SSH) on Unix-like systems.
Not all IOCs are equal. Analysts may have different levels of confidence in the information that they gather as part of the CTI lifecycle. One way to categorize the information is in terms of fidelity.
Another critical consideration is to classify IOCs into specific categories, such as network, host, or behavioral indicators. Grouping indicators and assigning them a fidelity level allows responders to search and filter for data of interest easily, as well as to identify specific information for further investigative steps and CTI enrichment.
It is also worth mentioning that the SANS Institute and many practitioners classify IOCs as follows:5
Finally, it is of crucial importance to document the indicators alongside their source and link them to the investigation. As investigations grow and include multiple, simultaneous workstreams, accurate and comprehensive documentation is what allows stakeholders to remain informed about how the threat actor operates in the compromised environment.
It is also worth noting that enterprises can choose to participate in intelligence sharing communities to gain access to CTI that otherwise might not be available to them. This information can be of vital importance during incident investigations. By participating in intelligence sharing programs, enterprises can draw on the collective knowledge and capabilities of their members. Examples of entities that create intelligence sharing programs include government agencies, cybersecurity vendors, and industry-specific CTI centers. These entities usually make CTI available for consumption in the following formats:
Some platforms and websites make specific CTI available to the cybersecurity community at no cost, such as IBM X-Force Exchange or VirusTotal. These services also offer premium access to organizations that require access to advanced features.
Another option is to partner with a specific CTI vendor that can assist with strategic, operational, and tactical intelligence packages. In my personal experience, partnering with CTI vendors during incident investigations has been invaluable in understanding and scoping enterprisewide intrusions.
Malware analysis is an integral part of incident response investigations and requires breadths and depths of understanding of various technical disciplines. Attackers leverage malware and other software tools to gain unauthorized access to a computer network, move laterally, steal data, or encrypt data for ransom, among other things.
Malware has significantly evolved over the years from primitive, self-contained binaries to highly modular malware that often implements obfuscation techniques, such as encryption or polymorphism, to make it difficult to detect and analyze it. Sophisticated malware often relies on a complex infection chain that includes downloaders and droppers, payloads, and a C2 infrastructure. Furthermore, some malware types leverage legitimate applications as part of the infection chain. For example, many malware infections start with the execution of macros embedded in Microsoft Excel files.
With the evolution of the threat landscape, sophisticated attackers often rely on malware providers who offer malware-as-a-service on the dark web. Historically, it was common to attribute attacks based on the malware alone. With the malware-as-a-service model, attribution based purely on malware is rare.
This section discusses common malware types and malware analysis techniques that analysts leverage as part of the incident response lifecycle.
Several ways to classify malware exist, such as behavior or infection vector. Malware taxonomy is a process of classifying malware based on specific characteristics and attributes. The following list briefly discusses common malware types that threat actors leverage during enterprise intrusions.
Another way of looking at malware is in terms of how attackers use it and its purpose. Commodity malware is a category of malware that is widely available for purchase or free download and does not require any customization. Threat actors typically leverage commodity malware in opportunistic attacks by employing a “spray and pray” strategy. Dridex7 is an example of commodity malware.
In contrast, targeted malware, also referred to as bespoke malware, is very precise malware that threat actors leverage while targeting specific users or a specific entity. Malware authors often write targeted malware for very specific purposes after performing an extensive reconnaissance to maximize the chances of achieving their objectives. Targeted malware is often associated with advanced persistent threat (APT) groups, such as nation-state threat actors. Stuxnet is a primary example of targeted, destructive malware.
An important point to cover before discussing specific malware analysis techniques is fileless malware and “living off the land” techniques. Traditional malware requires attackers to place a binary file on disk, which leaves forensic artifacts behind. Fileless malware, on the other hand, lives entirely in computer memory and does not leave evidence on a disk volume. This technique makes it much harder to detect with traditional tools, such as antivirus, and requires memory forensics to uncover evidence of its execution.
Living off the land is a technique that attackers often use alongside fileless malware. Attackers increasingly rely on legitimate administrative tools that they find in the compromised environment to progress through the cyberattack lifecycle. This technique is appealing because system administrators often whitelist tools that they require for day-to-day tasks. Furthermore, the use of legitimate tools as part of an attack makes it challenging for incident responders to identify malicious activity.
Depending on the objectives of an investigation, analysts can perform basic or advanced static malware analysis or a combination of thereof.
Basic static analysis is the process of examining an executable file to determine whether the file is malicious and to provide fundamental information about it. During basic static analysis, analysts do not examine the actual code instructions. Instead, they gather and examine the file metadata, including hash, file type, and size. Analysts also search for strings that may provide information about the file's functionality, such as hard-coded IP addresses and URLs.
Static analysis may also check for obfuscation. Malware authors use this technique to obscure meaningful information to make it harder to analyze their malware. In some cases, basic static analysis may yield enough information to generate a signature.
To understand a malware sample fully, analysts may resort to advanced static analysis, which focuses on disassembling malware at the code level and examining the actual instructions. Analysts typically use tools, such as a disassembler, to break down a compiled malware binary file into machine-code instructions. By performing advanced static analysis, analysts can understand the full capabilities of the malware.
Depending on the size and complexity of the malware, reverse-engineering a malware sample can take days or even weeks. For this reason, analysts resort to advanced analysis when dealing with unknown malware, or they need to understand the malware capabilities fully to drive containment and eradication.
Dynamic analysis is the process of running a malware sample in a controlled environment in order to determine its behavior. One of the most common techniques that analysts leverage for dynamic analysis is a sandbox. A sandbox is a dedicated, isolated system with installed dynamic analysis software that collects telemetry when the malware executes.
Dynamic analysis typically focuses on the changes that malware makes to the system, such as creating new processes, making changes to the registry, network connections, or creating persistence mechanisms. The goal of dynamic analysis is to gather information necessary to identify the examined malware in the compromised environment.
During incident response investigations, malware analysis and CTI go hand in hand. CTI often provides focus and direction for malware analysis. For example, if prior analysis yielded evidence of data theft, a reverse engineer may focus on looking for code instructions that may enumerate various repositories for data of interest.
At the same time, malware analysis can provide rich information on how a particular threat actor operates, including IOCs or tactics. This information, in turn, feeds into the CTI lifecycle. This process is iterative. The information that analysts glean from malware analysis initiates a full CTI enrichment process and servers as pivot points. The outcome of the enrichment process, in turn, allows analysts to identify additional malware that a threat actor may have used as part of their operations in the compromised environment.
Threat hunting is an approach to threat detection that combines a methodology, technology, skills, and CTI to detect attacker activity proactively that programmatic approaches, such as traditional antivirus software, may miss.8
Although threat hunting is primarily a proactive approach, incident responders also leverage threat hunting in situations where an organization discovers a suspicious activity but there are no specific IOCs to inform the investigation.
I led investigations where clients reported a suspicious behavior on their network or received a third-party notification about a potential compromise but did not identify specific IOCs. Without a “smoking gun,” our team had to resort to threat hunting techniques to identify IOCs and scope the intrusions.
The premise behind threat hunting is that programmatic approaches to threat detection, such as traditional antivirus, network traffic inspection, or even statistical analysis methods, may not uncover all attacker activity. This approach is particularly applicable to attacks where threat actors employ stealthy techniques or evasion methods, or heavily rely on “living off the land” methods.
Developing threat hunting capabilities requires commitment from senior management, dedicated resources, and an investment into appropriate technologies. The following list describes vital components that organizations must consider in order to enable threat hunting.
Threat hunting is an iterative process with clearly defined phases. Moreover, organizations have two options at their disposal to perform threat hunting: data-driven and target-driven threat hunting.9
The data-driven approach involves collecting a specific data set and analyzing it for evidence of suspicious activity. For example, an analyst may choose to collect program execution artifacts from specific environments and look for evidence of malware and other suspicious tools.
In contrast, a target-driven approach allows organizations to determine whether a particular threat is present in their environments. For example, an analyst may compile a list of IOCs associated with a specific threat actor and leverage tools, such as EDR, to query endpoints for those indicators. Organizations with a mature threat hunting capability typically combine both approaches.
Figure 5.5 depicts a typical lifecycle approach that organizations can leverage to hunt for threats in their environment.10
A formal report should clearly state the purpose and scope of a threat hunting activity before discussing the outcome and findings. A sound report also briefly articulates the analysis techniques that analysts employed and any limitations and obstacles.
Some threat hunting activities may lead analysts to discover evidence of a historical or ongoing network intrusion. In such cases, organizations must consider transitioning the hunting activity into an incident investigation. By declaring an incident, organizations can ensure that they dedicate the necessary resources to appropriately respond to the incident.
Reporting is a critical but often forgotten part of the analysis phase. An adequate report summarizes an incident in a succinct yet complete manner, and it helps key stakeholders understand significant incident events. There are three types of reports that incident responders typically produce during enterprise incident response.
The following paragraphs provide general recommendations that can help incident responders produce high-quality reports that communicate their findings to key stakeholders effectively.
To write in the formal style, report authors must use objective, impersonal, and precise language and avoid slang and informal expressions. Also, ensuring correct grammar, clear transitions, and logical flow between different sections of a report is vital.
I cannot emphasize enough the importance of reporting. Organizations may require investigation reports for a variety of purposes, including compliance, cyber insurance, or for anticipated litigation. Furthermore, reports are more reliable than human memory and outlive investigations. After several months or even a few years, a report may constitute the only reliable information available about a past incident.
Systems and software applications often generate significant volumes of artifacts that incident responders may leverage during their investigations to determine attacker activity in a compromised environment.
Some of the artifacts are byproducts of normal system operations that happen to have forensic value. For example, the Windows operating system implements performance and software compatibility mechanisms that generate artifacts relating to program execution. These artifacts are an invaluable source of information on malware and other tools that a threat actor executed on a compromised system.
Furthermore, system administrators configure security policies that cause systems and applications to generate artifacts in response to specific events. For example, authentication logs provide information, such as user accounts that successfully and unsuccessfully attempted to log in to a computer system.
Before diving into a detailed conversation about digital evidence, it is crucial to understand the difference between artifacts and evidence. An artifact is a piece of data that a system or software application produces in the course of its operations. For example, an event log showing a successful authentication attempt is an artifact. In contrast, an artifact that is relevant to an investigated case because it either supports or refutes a hypothesis becomes evidence. For example, if an analyst identifies a program execution artifact associated with attacker malware, that artifact becomes digital evidence.
Operating systems generate several types of artifacts that allow analysts to establish attacker activity on a compromised system. Chapter 3 discussed data acquisition in detail. Depending on the acquisition method, incident responders can collect persistent and volatile artifacts from systems of interest.
Persistent artifacts constitute data that resides in persistent storage, such as a hard drive, and outlives the process that created the data. For example, systems generate and write event logs to a persistent storage volume. If an administrator reboots the system that generated the event logs, the data is still available on the storage volume. The following list discusses typical persistent artifacts that systems generate in the course of their operations.
This list is by no means exhaustive. It is important to emphasize that artifacts may vary from system to system. For example, production Linux systems that support core applications typically do not have a GUI subsystem installed. For that reason, the systems do not generate browsing history artifacts.
In contrast to persistent artifacts, volatile artifacts constitute data that resides in system memory and is typically short-lived. The data ceases to exist when a user or administrator powers down the system or the process that created the data terminates. Analysts typically acquire volatile data through live response or by acquiring a forensic image of the system memory. The following list discusses typical volatile artifacts that systems generate in the course of their operations.
As in the case of persistent artifacts, this list is by no means comprehensive. The purpose of this section is to demonstrate the type of evidence that analysts may uncover during forensic examination of compromised systems. It is also worth mentioning that certain artifacts reside in memory first before the operating system writes them to persistent storage.
Network data is an invaluable source of evidence during incident investigations. It often augments host forensics and helps prove or refute a hypothesis surrounding an investigated incident. In some situations, network data is the only source of evidence that analysts have available.
Several use cases exist for acquiring and analyzing network telemetry as part of an incident investigation. The following list presents some common use cases:
Alerts triggered by host and network security tools are often the first indication of a cybersecurity incident. Although alerts by themselves may not be enough to determine attacker activity in a compromised environment, they often help analysts to establish a hypothesis and create initial leads to drive further investigation. There are typically two types of security tools that enterprises deploy: host and network. The following list discusses typical information that analysts can glean from security alerts.
Remediation is the final step of the incident response lifecycle. Remediation encompasses the containment, eradication, and recovery phases. Each of these phases is necessary to protect crucial assets while an investigation is under way, eradicate the threat actor from the compromised environment, and recover technology into a fully operational state, respectively. Remediation is a complex process that typically starts at the onset of an investigation. It requires establishing a dedicated team and precise planning to be successful.
This section discusses vital considerations that victim organizations need to take into account when remediating large-scale breaches, including establishing a remediation team, planning, and execution. It is worth noting that in many enterprises, recovery is a function of information technology. For this reason, this section primarily addresses the containment and eradication phases of remediation.
Chapter 2 briefly discussed the incident response lifecycle in detail. In this chapter, I focus on the overall process to remediate cyber breaches, as depicted in Figure 5.6.
Remediation requires cross-functional effort that often involves skills and expertise in multiple domains across various business, technology, and security disciplines. Chapter 2 discussed a cybersecurity incident response team (CSIRT) coordination model in detail, including roles and responsibilities of stakeholders who participate in incident response.
A CSIRT fulfills two primary functions: incident investigations and incident remediation. Although these two domains are related to each other, they often require different skills and experience. The former focuses primarily on analysis and answering vital investigative questions, whereas the latter focuses on specific actions necessary to contain the investigated incident and eradicate the threat actor from the compromised environment.
This section briefly discusses two roles that are critical and distinct to remediation: remediation lead and remediation owner. Chapter 2 discusses other business, technology, and third-party roles that may participate in remediation as part of a CSIRT.
During smaller incidents, a single individual can assume the roles of an incident manager and a remediation lead. However, during large-scale and complex incidents, organizations need to designate a dedicated individual to lead the remediation effort. In such cases, an incident manager focuses on the overall investigation, whereas a remediation lead is responsible for planning and executing containment and eradication. The following list briefly describes the skills that an effective remediation lead must possess.
A remediation owner is a senior business stakeholder who closely works with the remediation lead and holds the overall responsibility for the remediation. In many cases, a remediation owner is synonymous with the role of an incident officer, described in Chapter 2.
The remediation of large-scale incidents often results in interruption to business operations and causes anxiety and stress for senior leaders. A remediation owner provides strategic oversight of remediation and works closely with other senior leaders in order to ensure that remediation planning includes business priorities. For example, an organization might choose to switch to an alternate site or system to support core business functions during a disruptive remediation. The remediation owner must work closely with the remediation lead to ensure that technical priorities align with business priorities.
A crucial responsibility of a remediation owner is to assign authority to the remediation lead and communicate the authority downstream to ensure that technical teams prioritize remediation over day-to-day responsibilities. If there is a conflict of priorities, then the remediation owner must resolve it.
Sometimes remediations run into unexpected complications. For example, a victim organization may need additional budget to hire consultants, conflicting priorities, exceeded timelines, and others. The remediation lead typically works with the remediation owner to resolve any complications. Figure 5.7 depicts how information flows between vital stakeholders responsible for incident response, including remediation.
The remediation of a large-scale breach is a complex undertaking and requires cross-business collaboration and coordination. For this reason, planning and project management are vital components of remediation. In some cases, an organization may choose to assign a dedicated project manager to remediation to allow the remediation lead to focus on the overall planning and preparation for execution.
As part of containment and eradication planning, the remediation lead must identify and consider business needs and priorities and ensure that technical staff have the necessary resources available. The following list briefly describes business considerations that organizations need to take into account before planning and executing containment and eradication.
Containment and eradication often require changes to the technologies that support business functions. The following list briefly describes technology considerations when planning containment and eradication.
Including logistics as part of remediation planning helps to ensure that the necessary resources, support, and services are available to the remediation personnel. The following list briefly discusses logistical matters that organizations need to consider when executing containment and eradication.
I cannot emphasize enough how vital it is to conduct a comprehensive investigation and determine the full scope and extent of a breach before executing the containment and eradication phases of the incident response lifecycle. As part of their operations, threat actors may plant backdoors on numerous systems to maintain persistence in the compromised environment. If eradication does not cover the full scope of a breach, it is just a matter of time before the threat actor comes back and continues the attack.
Moreover, the remediation team must understand and take into account attacker TTPs and the associated IOCs to ensure that containment and eradication are comprehensive and effective. Forensic and CTI analysts derive this information during their investigation and refer to it as actionable intelligence. In practical terms, this information is necessary to inform containment and eradication. Gathering actionable intelligence is another reason why a comprehensive investigation is vital before eradicating a threat actor from the compromised environment.
The remediation of large-scale incidents is a complex task, and the planning typically starts in the early phases of the investigation. However, an organization must first understand the full scope of the incident before deciding on execution timing. The only exception to this rule is when an attacker becomes destructive, and the organization must immediately prevent any further damage.
Incident responders and the remediation team should make every possible effort to prevent the attacker from becoming aware of the investigation. Depending on the attacker and their level of confidence, they may react in different ways. Their response may also be detrimental to the investigation. In some cases, where the attacker compromised systems that support communications tools, such as email or instant messaging, the incident response team must consider moving incident communications to an out-of-band communication channel. The following list briefly describes some frequent implications of alerting an attacker.
An execution plan contains a detailed definition of the activities that a remediation team must execute as part of the remediation effort, including a listing of resources and a schedule. An effective plan typically focuses on containment and eradication activities. Recovery can take days, weeks, or even months in the most severe cases, and it often requires separate planning. The disaster recovery (DR) function typically handles this task. The following list discusses other considerations and highlights the main components of a containment and eradication plan.
A vital part of the plan is an execution sequence of the activities mentioned earlier with clearly defined start and end times. Furthermore, it is a good practice to create milestones to demonstrate progress, as well as to include technical and management updates in the execution sequence.
Containment focuses on protecting critical assets, whereas eradication allows the victim organization to remove the attacker from the compromised environment. The following sections examine these two phases.
The remediation team determines and executes containment actions to deny the attacker access to vital assets in order to minimize impact and prevent further damage. Containment does not remove the attacker's access to the compromised environment. Instead, it limits the attacker's ability to achieve their objective. For example, if an attacker gains access to a system that handles highly sensitive data, the remediation team may put measures in place to prevent further access and data theft. The measures may include multifactor authentication (MFA), access control lists (ACLs), and other access control mechanisms that limit access to the system or network segment.
The remediation lead must plan containment measures with key technology stakeholders and consult with management about the plan before execution. It is important to emphasize that the involvement of third-party organizations, such as external vendors and technology outsourcing partners, often adds a layer of complexity to this process that the remediation lead must take into account.
Containment often requires changes to how users and administrators access certain technologies. The remediation lead must identify and prioritize, with the help of management and business stakeholders, mission-critical assets that the victim organization must protect. Furthermore, the remediation lead also must work with technical groups to understand technical constraints and the potential impact of implementing containment measures such as MFA.
Some containment actions are short-term and designed to prevent immediate damage. For example, during a ransomware outbreak, the victim organization may take immediate actions, such as isolating network segments from the rest of the network to limit the spread of the malware. The premise behind short-term containment is to “stop the bleeding.”
In contrast, the remediation team executes long-term containment actions as temporary fixes. The term “long-term” may be misleading in this context. What it really means is that the organization makes temporary changes after “stopping the bleeding” before planning long-term, strategic controls to secure the environment. For example, the victim organization may implement a temporary jump server and lock down access to an environment that hosts highly sensitive data. This measure may prevent the attacker from accessing the environment while allowing the organization to investigate the incident and create an eradication plan. The following list includes examples of typical containment actions:
The purpose of eradication is to remove the attacker from the compromised environment. The remediation team executes eradication during an agreed-on window, which typically lasts several hours.
One crucial consideration in remediating large-scale breaches is network isolation. In severe cases where an attacker gained a significant footprint in the enterprise environment, the victim organization may need to disconnect their internal networks from the Internet entirely. This approach is necessary to ensure that the attacker cannot react in response to eradication, and the organization can execute the plan and verify all steps before reconnecting the internal network back to the Internet. This approach is the most effective, but it also has a significant impact on business operations. For this reason, this approach is used relatively infrequently.
An alternative approach is to isolate the network segments in scope for the remediation. However, the victim organization must have confidence that the attacker did not compromise systems in any other environment. As with the previous approach, the remediation team must execute eradication activities before reconnecting the network segments to the Internet and other internal networks.
No network isolation typically works for small incidents that have no significant impact on business operations. Even in such cases, it is a good practice to isolate or disconnect the affected systems from the network. However, remediating large-scale incidents without network isolation is often a counterproductive effort that puts eradication at risk.
One vital consideration in eradication is execution timing. The incident response team must fully understand the scope and extent of the breach before executing eradication. The remediation team relies on the actionable intelligence that incident responders deliver to plan eradication. Another consideration is timing. If possible, the remediation team should plan the eradication window outside the attacker's standard operating hours to minimize the possibility of the attacker attempting to interrupt the eradication.
The following list provides examples of typical eradication activities that the remediation team may execute:
Once the victim organization executed the eradication plan, there is one more thing to do: monitor for attacker activity. In some cases, even a thorough investigation may not uncover all attacker backdoors and entry points into the network, or every possible IOC associated with the attacker. More advanced attackers plan for being discovered and take the necessary measures to maintain persistence in the compromised environment. In other cases, an attacker may re-compromise the network to continue their attack. Attackers often invest significant resources into their operations and may not give up that easily, even after successful eradication.
For the reasons mentioned here, organizations must import the IOCs discovered during the analysis phase into their security tools and monitor for attacker activity for several weeks before closing the investigation. Even after the formal incident closure, the victim organization should still monitor for any indicators as part of day-to-day operations.
Incident responders leverage a lifecycle-based approach to investigate incidents. The first step is to generate an incident hypothesis and establish objectives to ensure that business priorities drive the investigation. The next step is to acquire and preserve the necessary data to support the investigation. Incident responders typically acquire data from host and network systems, event logs from enterprise services, and data generated by security tools.
Data analysis is a complex, iterative process that includes domains, such as digital forensics, malware analysis, and CTI. Digital forensics focuses on the analysis and recovery of digital data to answer investigative questions. CTI augments forensic analysis by providing contextual threat information necessary to scope and remediate an incident. As part of an investigation, incident responders may also analyze malware to understand the threat. Analysts leverage static and dynamic analysis techniques to achieve this objective. The final step in the analysis process is to produce an analysis report tailored to the target audience.
Enterprises can employ threat hunting techniques to identify evidence of a historical or ongoing compromise. This approach is necessary when an organization detects a suspicious activity or receives a third-party incident notification but has no specific incident information available, such as IOCs, to establish leads and generate a hypothesis. Organizations can leverage a threat-centric and data-centric approach to threat hunting.
The final phase of an incident investigation is remediation. Victim organizations must start planning containment and eradication activities as soon as an investigation commences. Containment allows organizations to deny the attacker access to crucial assets. In contrast, eradication removes the attacker from the compromised environment. After containing and eradicating the attacker, the final step is to restore technology to a fully operational state.
www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/index.html
.detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html
.www.sans.org/blog/security-intelligence-attacking-the-cyber-kill-chain
.www.enisa.europa.eu/news/enisa-news/stuxnet-analysis
.www.us-cert.gov/ncas/alerts/aa19-339a
.www.carbonblack.com/resources/definitions/what-is-cyber-threat-hunting
.resources.infosecinstitute.com/category/enterprise/threat-hunting/threat-hunting-process/threat-hunting-techniques/conducting-the-hunt/#gref
.www.sans.org/reading-room/whitepapers/threathunting/paper/38710
.18.117.183.172