The ability to generate intelligence related to friendly and hostile systems can be the defining factor that makes or breaks an investigation. This chapter begins with an introduction to the traditional intelligence cycle and how it relates to NSM analysis intelligence. Following this, we look at methods for generating friendly intelligence by generating asset data from network scan and leveraging PRADS data. Finally, we examine the types of threat intelligence and discuss some basic methods for researching tactical threat intelligence related to hostile hosts.
Network Security Monitoring; Analysis; Intelligence; Threat; Hostile; Friendly; PRADS; nmap; Tactical; Strategic; Intel
The Intelligence Cycle for NSM
Generating Friendly Intelligence
The Network Asset History and Physical
Defining a Network Asset Model
Passive Real-time Asset Detection System (PRADS)
Generating Threat Intelligence
Intelligence has many definitions depending on the application. The definition that most closely aligns to NSM and information security is drawn from Department of Defense Joint Publication 1-02, and says that “intelligence is a product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements, or areas of actual or potential operations.1 ”
While this definition might not fit perfectly for a traditional SOC performing NSM services (particularly the part about information concerning foreign nations), it does provide the all-important framing required to begin thinking about generating intelligence. The key component of this definition is that intelligence is a product. This doesn’t mean that it is bought or sold for profit, but more specifically, that it is produced from collected data, based upon a specific requirement. This means that an IP address, or the registered owner of that address, or the common characteristics of the network traffic generated by that IP address are not intelligence products. When those things are combined with context through the analysis process and delivered to meet a specific requirement, they become an intelligence product.
Most SOC environments are generally concerned with the development of two types of intelligence products: friendly intelligence and threat intelligence. In this chapter, we will take a look at the traditional intelligence cycle and methods that can be used to generate these intelligence products. This includes the creation of friendly intelligence products, as well as threat products associated with tactical threat intelligence. While reading, you should keep in mind that there are many components to intelligence as a whole, and we are only covering a small subset of that here.
The generation of intelligence products in a SOC requires the coordinated effort of multiple stakeholders within the organization. Because there are so many moving parts to the process, it helps to be able to organize the intelligence generation process into an organized, repeatable framework. The framework that the government and military intelligence community (IC) have relied on for years is called the Intelligence Cycle.
Depending on the source you reference, the intelligence cycle can be broken down into any number of steps. For the purposes of this book, we will look at a model that uses six steps: defining requirements, planning, collection, processing, analysis, and dissemination. These steps form a cycle that can continually feed itself, ultimately allowing its products to shape how newer products are developed (Figure 14.1).
Let’s go through each of these steps to illustrate how this cycle applies to the development of friendly and hostile intelligence for NSM.
An intelligence product is generated based upon a defined requirement. This requirement is what all other phases of the intelligence cycle are derived from. Just like a movie can’t be produced without a script, an intelligence product can’t be produced without a clearly defined intelligence requirement.
In terms of information security and NSM, that requirement is generally focused on a need for information related to assets you are responsible for protecting (friendly intelligence), or focused on information related to hosts that pose a potential threat to friendly assets (hostile intelligence).
These requirements are, essentially, requests for information and context that can help NSM analysts make judgments relevant to their investigations. This phase is ultimately all about asking the right questions, and those questions depend on whether the intelligence requirement is continual or situational. For instance, the development of a friendly intelligence product is a continual process, meaning that questions should be phrased in a broad, repeatable manner.
Some examples of questions designed to create baselines for friendly communication patterns might be:
• What are the normal communication patterns occurring between friendly hosts?
• What are the normal communication patterns occurring between sensitive friendly hosts and unknown external entities?
• What services are normally provided by friendly hosts?
• What is the normal ratio of inbound to outbound communication for friendly hosts?
On the other end of the spectrum, the development of a threat intelligence product is a situational process, meaning that questions are often specific, and designed to generate a single intelligence product for a current investigation:
• Has the specific hostile host ever communicated with friendly hosts before, and if so, to what extent?
• Is the specific hostile host registered to an ISP where previous hostile activity has originated?
• How does the content of the traffic generated by the specific hostile host compare to activity that is known to be associated with currently identified hostile entities?
• Can the timing of this specific event be tied to the goals of any particular organization?
Once you have asked the right question, the rest of the cards should begin to fall into place. We will delve further into the nature of friendly and threat intelligence requirements later in their respective sections.
With an intelligence requirement defined, appropriate planning can ensure that the remaining steps of the intelligence cycle can be completed. This involves planning each of these steps and assigning resources to them. In NSM terms, this means different things for different steps. For instance, during the collection phase this may mean assigning level three analysts (thinking back to our Chapter 1 discussion of classifying analysts) and systems administrators to work with sensors and collection tools. In the processing and analysis phase this may mean assigning level one and two analysts to these processes and sectioning off a portion of their time to work on this task.
Of course, the types of resources, both human and technical, that you assign to these tasks will vary depending upon your environment and the makeup of your technical teams. In larger organizations you may have a separate team specifically for generating intelligence products. In smaller organizations, you might be a one-man show responsible for the entirety of intelligence product creation. No matter how large or small your organization, you can participate in the development of friendly and threat intelligence.
The collection phase of the intelligence cycle deals with the mechanisms used for collecting the data that supports the outlined requirements. This data will eventually be processed, analyzed, and disseminated as the intelligence product.
In a SOC environment, you may find that your collection needs for intelligence purposes will force you to modify your overall collection plan. For the purposes of continual friendly intelligence collection, this can include the collection of useful statistics, like those discussed in Chapter 11, or the collection of passive real-time asset data, like the data generated with a tool we will discuss later, called PRADS.
When it comes to situational threat intelligence collection, data will typically be collected from existing NSM data sources like FPC or session data. This data will generally be focused on what interaction the potentially hostile entity had with trusted network assets. In addition, open source intelligence gathering processes are utilized to ascertain publicly available information related to the potentially hostile entity. This might include items like information about the registrant of an IP address, or known intelligence surrounding a mysterious suspicious file.
In order for intelligence collection to occur in an efficient manner, collection processes for certain types of data (FPC, PSTR, Session, etc.) should be well-documented and easily accessible.
Once data has been collected, some types of data must be further processed to become useful for analysis. This can mean a lot of different things for a lot of different types of data.
At a higher level, processing can mean just paring down the collected data set into something more immediately useful. This might mean applying filters to a PCAP file to shrink the total working data set, or selecting log files of only a certain type from a larger log file collection.
At a more granular level, this might mean taking the output from a third party or custom tool and using some BASH commands to format the output of those tools into something more easily readable. In cases where an organization is using a custom tool or database for intelligence collection, it might mean writing queries to insert data into this format, or pull it out of that format into something more easily readable.
Ultimately, processing can sometimes be seen as an extension of collection where collected data is pared down, massaged, and tweaked into a form that is ideal for the analyst.
The analysis phase is where multiple collected and processed items are examined, correlated, and given the necessary context the make them useful. This is where intelligence goes from just being loosely related pieces of data to a finished product that is useful for decision-making.
In the analysis and generation of both friendly and threat intelligence products, the analyst will take the output of several tools and data sources and combine those data points on a per host basis, painting a picture of an individual host. A great deal more intelligence will be available for local hosts, and might allow this picture to include details about the tendencies and normal communication partners of the host. The analysis of potentially hostile hosts will be generated from a much smaller data set, and require the incorporation of open source intelligence into the analysis process.
What ultimately results from this process is the intelligence product, ready to be parsed by the analyst.
In most practical cases, an organization won’t have a dedicated intelligence team, meaning the NSM analysts will be generating intelligence products for their own use. This is a unique advantage, because the consumer of the intelligence will usually be the same person who generated it, or will at least be in the same room or under the same command structure. In the final phase of the intelligence cycle, the intelligence product is disseminated to the individual or group who initially identified the intelligence requirement.
In most cases, the intelligence product is constantly being evaluated and improved. The positive and negative aspects of the final product are critiqued, and this critique goes back into defining intelligence requirements and planning the product creation process. This is what makes this an intelligence cycle, rather than just an intelligence chain.
The remainder of this chapter is devoted to the friendly and threat intelligence products, and ways to generate and obtain that data. While the intelligence framework might not be referenced exclusively, the actions described in these sections will most certainly fit into this framework in a manner that can be adapted to nearly any organization.
You cannot effectively defend your network if you do not know what is on it, and how it communicates. This statement cannot be emphasized enough. No matter how simple or sophisticated an attack may be, if you don’t know the roles of the devices on your network, especially those where critical data exists, then you won’t be able to effectively identify when an incident has occurred, contain that incident, or eradicate the attacker from the network. That’s why the development of friendly intelligence is so important.
In the context of this book, we present friendly intelligence as a continually evolving product that can be referenced to obtain information about hosts an analyst is responsible for protecting. This information should include everything the analyst needs to aid in the event of an investigation, and should be able to be referenced at any given time. Generally, an analyst might be expected to reference friendly intelligence about a single host any time they are investigating alert data associated with that host. This would typically be when the friendly host appears to be the target of an attack. Because of that, it isn’t uncommon for an analyst to reference this data dozens of times per shift for a variety of hosts. Beyond this, you should also consider that the analysis of friendly intelligence could also result in the manual observance of anomalies that can spawn investigations. Let’s look at a few ways to create friendly intelligence from network data.
When a physician assesses a new patient, the first thing they perform is an evaluation of the medical history and physical condition of the patient. This is called a patient history and physical, or an H&P. This concept provides a useful framework that can be applied the friendly intelligence of network assets.
The patient history assessment includes current and previous medical conditions that could impact the patient’s current or future health. This also usually includes a history of the patient’s family’s health conditions, so that risk factors for those conditions in the patient can be identified and mitigated.
Shifting this concept to a network asset, we can translate a network asset’s medical history to its connection history. This involves assessing previous communication transactions between the friendly host and other hosts on the network, as well as hosts outside of the network. This connection profiling extends beyond the hosts involved in this communication, but also to the services used by the host, both as a client and a server. If we can assess this connection history, we can make educated guesses about the validity of new connections a friendly host makes in the context of an investigation.
The patient physical exam captures the current state of a patient’s physical health, and measures items such as the patient’s demographic information, their height and weight, their blood pressure, and so on. This product of the physical exam is an overall assessment of a patient’s health. Often physical exams will be conducted with a targeted goal, such as assessments that are completed for the purposes of health insurance, or for clearance to play a sport.
When we think about a friendly network asset in terms of the patient physical exam, we can begin to identify criteria that help define the state the asset on the network, opposed to a state of health in a patient. These criteria include items such as the IP address and DNS name of the asset, the VLAN it is located in, the role of the device (workstation, web server, etc.), the operating system architecture of the device, or its physical network location. The product of this assessment on the friendly network asset is a state of its operation on the network, which can be used to make determinations about the activity the host is presenting in the context of an investigation.
Now, we will talk about some methods that can be used to create a network asset H&P. This will include using tools like Nmap to define the “physical exam” portion of an H&P through the creation of an asset model, as well as the use of PRADS to help with the “history” portion of the H&P by collecting passive real-time asset data.
A network asset model is, very simply, a list of every host on your network and the critical information associated with it. This includes things like the host’s IP address, DNS name, general role (server, workstation, router, etc), the services it provides (web server, SSH server, proxy server, etc), and the operating system architecture. This is the most basic form of friendly intelligence, and something all SOC environments should strive to generate.
As you might imagine, there are a number of ways to build a network asset model. Most organizations will employ some form of enterprise asset management software, and this software often has the capacity to provide this data. If that is true for your organization, then that is often the easiest way to get this data to your analysts.
If your organization doesn’t have anything like that in place, then you may be left to generate this type of data yourself. In my experience, there is no discrete formula for creating an asset model. If you walk into a dozen organizations, you will likely find a dozen different methods used to generate the asset model and a dozen more ways to access and view that data. The point of this section isn’t to tell you exactly how to generate this data, because that is something that will really have to be adapted from the technologies that exist in your organization. The goal here is simply to provide an idea of what an asset model looks like, and to provide some idea of how you might start generating this data in the short term.
One way to actively generate asset data is through internal port scanning. This can be done with commercial software, or with free software like Nmap. For instance, you can run a basic SYN scan with this command:
nmap –sn 172.16.16.0/24
This command will perform a basic ICMP (ping) scan against all hosts in the 172.16.16.0/24 network range, and generate output similar to Figure 14.2.
As you can see in the data shown above, any host that is allowed to respond to ICMP echo request packets will respond with an ICMP echo reply. Assuming all of the hosts on your network are configured to respond to ICMP traffic (or they have an exclusion in a host-based firewall), this should allow you to map the active hosts on the network. The information provided to us is a basic list of IP addresses.
We can take this a step farther by utilizing more advanced scans. A SYN scan will attempt to communicate with any host on the network that has an open TCP port. This command can be used to initiate a SYN scan:
nmap –sS 172.16.16.0/24
This command will send a TCP SYN packet to the top 1000 most commonly used ports of every host on the 172.16.16.0/24 network. The output is shown in Figure 14.3.
This SYN scan gives us a bit more information. So now, in addition to IP addresses of live hosts on the network, we also have a listing of open ports on these devices, which can indicate the services they provide.
We can extend this even farther by using the version detection and operating system fingerprinting features of nmap:
nmap –sV -O 172.16.16.0/24
The command will perform a standard SYN port scan, followed by tests that will attempt to assess the services listening on open ports, and a variety of tests that will attempt to guess the operating system architecture of the device. This output is shown in Figure 14.4.
This type of scan will generate quite a bit of additional traffic on the network, but it will help round out the asset model by providing the operating system architecture and helping clarify the services running on open ports.
The data shown in the screenshots above is very easily readable when it is output by Nmap in its default format, however, it isn’t the easiest the search through. We can fix this by forcing Nmap to output its results in a single line format. This format is easily searchable with the grep tool, and very practical for analysts to reference. To force nmap to output its results in this format, simply add –oG < filename > at the end of any of the commands shown above. In figure 14.5, we use the grep command to search for data associated with a specific IP address (172.16.16.10) in a file that is generated using this format (data.scan).
You should keep in mind that using a scanner like nmap isn’t always the most conclusive way to build friendly intelligence. Most organizations schedule noisy scans like these in the evening, and this creates a scenario where devices might be missed in the scan because they are turned off. This also doesn’t account for mobile devices that are only periodically connected to the network, like laptops that employees take home at night, or laptops belonging to traveling staff. Because of this, intelligence built from network scan data should combine the results of multiple scans taking at different time periods. You may also need to use multiple scan types to ensure that all devices are detected. Generating an asset model with scan data is much more difficult than firing off a single scan and storing the results. It requires a concerted effort and may take quite a bit of finessing in order to get the results you are looking for on a consistent basis.
No matter how reliable your scan data may seem, it should be combined with another data source that can be used to validate the results. This can be something that is already generated on your network, like DNS transaction logs, or something that is part of your NSM data set, like session data. Chapter 4 and 11 describe some useful techniques for generating friendly host data with session data using SiLK. Another option is to use a passive tool, like PRADS, which we will talk about next.
PRADS is a tool that is designed to listen to network traffic and gather data about hosts and services that can be used to map your network. It is based upon two other very successful tools, PADS, the Passive Asset Detection System, and P0f, the passive OS fingerprinting tool. PRADS combines the functionality of these tools into a single service that is effective for building friendly intelligence. It does this by generating data that can be loosely compared to session data that might be used by SiLK or Argus.
PRADS is included in Security Onion by default, so we can examine this data by creating a query in Sguil. We will talk more about Sguil in the next chapter, but if you remember our brief mention of Sguil in Chapter 9, then you know that it is an analyst console that can be used for viewing alerts from detection mechanisms and data from other NSM collection and detection tools.
You can access Sguil by launching the Sguil client from the Security Onion desktop, or by launching the client from another device and connecting remotely. Once there, you can sort the visible alerts by the Event Message column to find PRADS entries. You may notice that Sguil still references PADS for these events, but don’t worry, this is certainly PRADS data. Figure 14.6 shows sample PRADS log entries.
There are a couple of different types of entries shown in this image. New Asset alerts are generated when a host that hasn’t been seen communicating on the network before is observed. Changed Asset alerts are generated when a host that has been seen before exhibits a communication behavior that hasn’t been observed, such as a new HTTP user agent, or a new service.
To better understand how these determinations are made, let’s look at an example of PRADS log data. In a default Security Onion installation, PRADS runs with a command similar to this one:
prads -i eth1 -c /etc/nsm/< sensor-name >/prads.conf -u sguil -g sguil -L /nsm/sensor_data/< sensor-name >/sancp/ -f /nsm/sensor_data/< sensor-name >/pads.fifo -b ip or (vlan and ip)
This arguments shown here, along with a few other useful PRADS command-line arguments are:
• -b < filter >: Listen to network traffic based upon BPFs.
• -c < config file >: The PRADS configuration file.
• -f < file >: Logs assets to a FIFO (first in, first out) file.
• -g < group >: The group that PRADS will run as.
• -i < interface >: The interface to listen on. PRADS will default to the lowest numbered interface if this is not specified.
• -L < directory >: Logs cxtracker type output to the specified directory.
• -l < file >: Logs assets to a flat file.
• -r < file >: Read from a PCAP file instead of listening on the wire.
• -u < username >: The user that PRADS will run as.
In the case of SO, PRADS runs as the Sguil user and listens for data on the wire. Collected data is stored in a FIFO file so that it can be sucked into a database that Sguil can access.
Since most of the runtime options for PRADS in SO are configured with command-line arguments, the only real purpose that prads.conf serves is to identify the home_nets IP range variable (Figure 14.7). This variable tells PRADS which networks it should consider assets that it should monitor. In most situations you will configure this similarly to the $HOME_NET variable used by Snort or Suricata, since it is used in a similar manner.
PRADS data stored in a database format is really convenient for querying asset data or writing tools that leverage this data, but it isn’t the greatest for viewing it in its raw form. Fortunately, asset data is also stored as a flat text file at /var/log/prads-assets.log. A sample of this file is shown in Figure 14.8.
The first line of this file defines the format for log entries. This is:
These fields break down as such:
• Asset: The IP address of asset in the home_nets variable that is detected
• VLAN: The VLAN tag of the asset
• Port: The port number of the detected service
• Proto: The protocol number of the detected service
• Service: The service PRADS has identified as being in use. This can involve the asset interacting the service as a CLIENT or a SERVER.
• Service Info: The fingerprint that matches the identifying service, along with its output.
• Distance: The distance to the asset based upon a guessed initial time-to-live value
• Discovered: The Unix timestamp when the data was collected
Based upon this log data, you can see that PRADS itself doesn’t actually make the determination we saw earlier in Sguil of whether or not an asset is new or changed. PRADS simply logs the data it observes and leaves any additional processing to the user or other third party scripts or applications. This means that the New and Changed Asset alerts we were seeing in Sguil are actually generated by Sguil itself based on PRADS data, and not by PRADS itself.
There are a couple of ways that we can use PRADS for friendly intelligence. The first method is to actually use Sguil and its notification of New and Changed assets. As an example, consider Figure 14.9.
In the figure above, I’ve made a Sguil query for all of the events related to a single alert. This can be done pretty easily in Sguil by right-clicking an event associated with a host, hovering over Quick Query, then Query Event Table, and selecting the SrcIP or DstIP option depending on which IP address you want events for. Here, we see a number of events associated with the host at 172.16.16.145. This includes some Snort alerts, visited URLs, and more PRADS alerts.
Of the PRADS alerts shown, there are 4 New Asset Alerts that showsthe first time this host has ever connected to each of the individual destination IP addresses listed in the alert:
• Alert ID 4.66: HTTP Connection to 220.127.116.11
• Alert ID 4.67: HTTPS Connection to 18.104.22.168
• Alert ID 4.68: HTTPS Connection to 22.214.171.124
When investigating this event, this provides useful context that can help you immediately determine whether a friendly device has ever connected to a specific remote device. In a case where you are seeing suspicious traffic going to an unknown address, the fact that the friendly device has never communicated with this address before might be an indicator that something suspicious is going on, and more investigation is required.
The figure also shows 1 Change Asset Alert showing the use of a new HTTP client user agent string.
This type of context demonstrates that a friendly host is doing something that it has never done before. While this can mean something as simple as a user downloading a new browser, this can also be an indicator of malicious activity. You should take extra notice of devices that begin offering new services, especially when those devices are user workstations that shouldn’t be acting as servers.
At this point, we have the ability to discern any new behavior or change in behavior for a friendly host, which is an incredibly powerful form of friendly intelligence. While it may take some time for PRADS to “learn” your network when you first configure it, eventually, it can provide a wealth of information that would otherwise require a fair bit of session data analysis to accomplish.
Another way to make PRADS data actionable is to use it to define a baseline asset model. Since PRADS stores all of the asset information it collects for assets defined in the home_nets variable, this data can be parsed to show all of the data it has gathered on a per host basis. This is accomplished by using the prads-asset-report script, which is a Perl script that is included with PRADS. This script will take the output from a PRADS asset log file, and output a listing of all of the information it knows about each IP address. If you are using PRADS to log data to /var/log/prads-asset.log, then you can simply run the command prads-asset-report to generate this data. Otherwise, you can specify the location of PRADS asset data by using the –r < file > argument. A sample of this data is shown in Figure 14.10.
Notice in this output that PRADS also makes its best guess at the operating system architecture of each device. In the figure above, it can only identify a single device. PRADS is able to guess more accurately the more it can observe devices communicating on the network.
In some cases it might make the most sense to generate this report regularly and provide it in a format where analysts can access and search it easily. You can save the file that this script generates by adding the –w < filename > argument. In other cases, analysts might have direct access to the PRADS log data, which means they can use the prads-asset-report script itself to generate near real-time data. This can be done on the basis of an individual IP address, using the –i switch like this:
prads-asset-data –i 172.16.16.145
The output of this command is shown in Figure 14.11.
When generating an asset model from PRADS, remember it is a passive tool that can only report on devices it sees communicate across a sensor boundary. This means that devices that only communicate within a particular network segment and never talk upstream through a link that a sensor is monitoring will never be observed by PRADS. Because of this, you should pair PRADS with another technique like active scanning to ensure that you are accurately defining network assets.
PRADS is an incredibly powerful but eloquently simple tool that can be used to build friendly intelligence. Because of its minimal requirements and flexibility, it can find its way into most SOC environments. You can read more about PRADS at http://gamelinux.github.io/prads/.
Once you know your network, you are prepared to begin to know your adversary. With this in mind, we begin to dive into threat intelligence. If you work in information security then you are no stranger to this term. With the prevalence of targeted attacks occurring daily, most every vendor claims to offer a solution that will allow you to “generate threat intelligence to stop the APT.” While this is typically a bunch of vendor sales garbage gone awry, the generation of threat intelligence is a critical component of analysis in NSM, and pivotal for the success of a SOC.
Threat intelligence is a subset of intelligence as we defined it earlier in this chapter. This subset focuses exclusively on the hostile component of that definition, and seeks to gather data to support the creation of an intelligence product that can be used to make determinations about the nature of the threat. This type of intelligence can be broken down into three sub categories: strategic, operational, and tactical threat intelligence (Figure 14.12).
Strategic Intelligence is information related to the strategy, policy, and plans of an attacker at a high level. Typically, intelligence collection and analysis at this level only occurs by government or military organizations in response to threats from other governments or militaries. With that said, larger organizations are now developing these capabilities, and some of these organizations now sell strategic intelligence as a service. This is focused on the long-term goals of the force supporting the individual attacker or unit. Artifacts of this type of intelligence can include policy documents, war doctrine, position statements, and government, military, or group objectives.
Operational Intelligence is information related to how an attacker or group of attackers plans and supports the operations that support strategic objectives. This is different from strategic intelligence because it focuses on narrower goals, often more timed for short-term objectives that are only a part of the big picture. While this is, once again, usually more within the purview of government or military organizations, it is common that individual organizations will fall victim to attackers who are performing actions aimed at satisfying operational goals. Because of this, some public organizations will have visibility into these attacks, with an ability to generate operational intelligence. Artifacts of this type of intelligence are similar, but often more focused versions of artifacts used for the creation of strategic intelligence.
Tactical Intelligence refers to the information regarding specific actions taken in conducting operations at the mission or task level. This is where we dive into the tools, tactics, and procedures used by an attacker, and where 99% of SOCs performing NSM will focus their efforts. It is here that the individual actions of an attacker or group of attackers are analyzed and collected. This often includes artifacts such as indicators of compromise (IP addresses, file names, text strings) or listings of attacker specific tools. This intelligence is the most transient, and becomes outdated quickly.
When analyzing tactical intelligence, the threat will typically begin as an IP address that shows up in an IDS alert or some other detection mechanism. Other times, it may manifest as a suspicious file downloaded by a client. Tactical threat intelligence is generated by researching this data and tying it together in an investigation. The remainder of this chapter is devoted to providing strategies for generating tactical threat intelligence about adversarial items that typically manifest in an NSM environment.
When an alert is generated for suspicious communication between a friendly host and a potentially hostile host, one of the steps an analyst should take is to generate tactical threat intelligence related to the potentially hostile host. After all, the most the IDS alert will typically provide you with is the host’s IP address and a sample of the communication that tripped the alert. In this section we will look at information that can be gained from having only the host’s IP address or a domain name.
The quickest way to obtain information about external and potentially hostile hosts is to examine the internal data sources you already have available. If you are concerned about a potentially hostile host, this is likely because it has already communicated with one of your hosts. If that is the case, then you should have collected some of this data. The questions you want to answer with this data are:
1. Has the hostile host ever communicated with this friendly host before?
2. What is the nature of this host’s communication with the friendly host?
3. Has the hostile host ever communicated with other friendly hosts on the network?
The answers to these questions can lie within different data sources.
Question 1 can be answered easily if you have the appropriate friendly intelligence available, such as the PRADS data we examined earlier. With this in place, you should be able to determine if this is the first time these hosts began communicating, or if it occurred at an earlier time. You might even be able to determine the operating system architecture of the host. If this data isn’t available, then session data is probably the quickest way to get this answer.
Question 2 is something that can only be answered by a data source with a higher level of granularity. While session data can tell you some basics of when the communication occurred and the ports that are in use, it doesn’t provide the depth necessary to accurately describe exactly what is occurring. In some cases, the detection tool that generated the initial alert will provide this detail. Snort and Suricata will typically provide the offending packet that tripped one of their signatures, and tools like Bro will provide as much additional data as you’ve configured it to. In other scenarios, you may need to look to FPC data or PSTR data to find answers. In these cases, packet analysis skills will come in handy.
Answering Question 3 will typically begin with session data, as it is the quickest way to get information pertaining to communication records between hosts. With that said, if you find that communication has occurred between the hostile host and other friendly devices then you will probably want to turn to another data source like FPC or PSTR data to determine the exact nature of the communication. If this data isn’t available, then PRADS data is another way to arrive at an answer.
The internal analysis performed at this level is all about connecting the dots and looking for patterns. At a high level, these patterns might include a hostile host communicating with devices using a specific service, at specific time intervals, or in conjunction with other real world or technical events. At a more granular level, you might find patterns that indicate the hostile host is using a custom C2 protocol, or that the communication is responsible for several clients downloading suspicious files from other hosts.
The combined answers to these three questions will help you build threat intelligence surrounding the behaviors of the hostile host on your network. Often, analyzing the behavior of the hostile host in relation to a single event or communication sequence won’t provide the evidence necessary to further an investigation, but that same analysis applied to communication across the network could be the key to determining whether an incident has occurred.
Once you’ve looked inward, it is time to examine other available intelligence sources. Open source intelligence (OSINT) is a classification given to intelligence that is collected from publicly available resources. In NSM, this typically refers to intelligence gathered from open websites. The key distinction with OSINT is that it allows you to gather information about a hostile entity without ever directly sending packets to them.
Now we will look at a few websites that can be used to perform OSINT research related to IP addresses, domain names, and malicious files. This is a broad topic with a variety of different approaches, and the topic of OSINT research could easily have its own book. If you’d like a much more detailed list of websites that can be used to perform OSINT research, then check out http://www.appliednsm.com/osint-resources.
The International Assigned Numbers Authority (IANA) is a department of the Internet Corporation for Assigned Names and Numbers (ICANN) that is responsible for overseeing the allocation of IP addresses, autonomous system number (ASN) allocation, DNS root zone management, and more. IANA delegates the allocation of addresses based upon region, to 5 individual Regional Internet Registries (RIRs). These organizations are responsible for maintaining records that associate each IP address with its registered owner. They are listed in Table 14.1.
Each of these registries allows you to query them for the registration records associated with an IP address. Figure 14.13 shows the results from querying the ARIN database for the registration records associated with an IP address in the 126.96.36.199/9 range. This was done from http://whois.arin.net/ui/advanced.jsp.
In this case, we can see that this block of IP addresses is allocated to Comcast. We can also click on links that will provide contact information for representatives at this organization, including abuse, technical, and administrative Points of Contact (POCs). This is useful when you detect a hostile device in IP space that is owned by a reputable company attempting to break into your network. In a lot of cases this will indicate that the hostile device has been compromised by another adversary and is being used as a hop point for launching an attack. When this occurs, it’s a common practice to notify the abuse contact for the organization that the attack appears to be coming from.
In a lot of cases, you will find that an IP address is registered to an ISP. In that case, you may have luck contacting the ISP if someone on their IP address space is attempting to attack your network, but in most cases I’ve experienced, this isn’t usually very fruitful. This is especially true when dealing with ISP’s outside of the jurisdiction of the US.
Because IP addresses are divided amongst the 5 RIR’s, you won’t necessarily know which one is responsible for a specific IP until you search for it. Fortunately, if you search for an IP address at an RIR’s website and the RIR isn’t responsible for that IP address, it will point you towards the correct RIR so that you can complete your search there. Another solution is to use a service that will make this determination for you, like Robtex, which we will look at in a moment.
Another useful piece of information that the registry record gives us is the Autonomous System Number (ASN) associated with the IP address. An ASN is a number used to identify a single network or group of networks controlled by a common entity. These are commonly assigned to ISPs, large corporations, or universities. While two IP address might be registered to two different entities, their sharing the same ASN might allow you to conclude that there is some relationship between the two addresses, though this is something to be evaluated on a case-by-case basis. You can search for ASN information specifically from each registry.
Just like with IP addresses, researching domain names usually begins with finding the registered owner of the domain. However, it is important to remember to distinguish between an actual physical host and a domain name. IP space is finite and exists with certain limitations. In general, if you see an IP address in your logs then you can usually assume that the data you have collected in relation to that host actually did come from that IP address (at least, for session-oriented communication). You can also have a reasonable amount of faith that the IP address does exist under the ownership of the entity that registered it, even though that machine might have been compromised and be controlled by someone else.
A domain name serves as a pointer to a location. When you see a domain name in your logs, usually because one of your friendly hosts is accessing that domain in some way, the truth is that the domain can be configured to point to any address at any given time. This means that the domain name you are researching from yesterday’s logs might point to a different IP address now. For that matter, the domain you research now may not point to anything later. It is common for attackers to compromise a host and then reassign domain names to the IP addresses of those hosts to serve malware or act in another malicious capacity. When the owner of that host discovers that it has been compromised and eradicates the attacker’s presence, the attacker will reassign the domain name to another compromised IP address. Even further to this point, malware now has the ability to use domain name generation algorithms to randomly register domains that can be used for command and control. Because of all this, a domain isn’t compromised; the IP address the domain points to is compromised. However, a domain name can be used for malicious purposes. This should be considered when researching potentially malicious domain names.
With that said, domain name registration is managed by ICANN, who delegates this authority to domain name registries. Whenever someone registers a domain name, they are required to provide contact information that is associated with this domain. Unfortunately, this process usually involves very little verification, so there is nothing to say that the domain registration information for any particular domain is actually valid. Furthermore, a lot of registries provide anonymous registration services, where they will mask the actual registered owner of the domain and provide their own information. With that said, there are plenty of instances where useful information can be obtained from domain name registration.
Domain information can be queried in a number of ways. One way is to simply pick any domain name registry such as GoDaddy or Network Solutions and perform a whois query from their website. Another method is to use the whois command from a Unix command line (Figure 14.14). This uses the simple syntax:
Figure 14.14 Whois Query Results for ESPN.com
whois < domain name >
You can see that this tells us the registrant information, as well as a few other useful pieces of information. The registration dates can be helpful in determining the validity of a domain. If you suspect that a domain you’ve seen one of your hosts communicating with is hosting malicious content and you find that the domain was registered only a couple of days ago, this would indicate a higher potential for malicious activity actually occurring.
This output also lists the DNS servers associated with the domain, which can be used to find correlation between multiple suspicious domains. You can also use some additional DNS Kung Fu to attempt various techniques (like zone transfers) to enumerate subdomains and DNS host names, but this isn’t recommended in most instances since it will involve actually interacting with potential DNS servers. If you want to know more about doing this, there are a fair number of guides on the Internet, as well as videos at http://www.securitytube.net.
Rather than going around to all of these different websites in order to research IP addresses and domain name registration information, I tend to use publicly available websites that will provide all of this information at a single stop. One of my favorites is Robtex (http://www.robtex.com). Robtex provides a lot of useful information, including everything we’ve discussed up until this point summarized in a very useful interface. In Figure 14.15 I’ve done a search for espn.com and browsed Robtex’s Records tab.
In this image, you can see that Robtex provides all of the DNS information that was obtained, the associated IP addresses, and the ASN’s tied to those IP addresses. The interface provided help to quickly build a picture that an analyst can use quickly.
In Chapter 8, we discussed IP and domain reputation at length. While reputation can be useful for detection, it is also immensely useful for analysis. If an IP address or domain has been associated with malicious activity in the past, then there is a good chance that it might be associated with malicious activity in the present.
I listed a few of my favorite sources of reputation information for detection purposes in Chapter 8. Those sites were optimized for detection because they generate lists that can be fed into detection mechanisms. Those sites can also be used for analysis of IP addresses and domain names, but I won’t rehash those here. Instead, I’ll discuss a couple of my other favorite reputation websites that are more suited for post-detection analysis: IPVoid and URLVoid.
IPVoid (http://www.ipvoid.com/) and URLVoid (http://www.urlvoid.com/) are two sites that were developed as free services by a company called NoVirusThanks. These services connect to multiple other reputation lists (including a few of those discussed in this book) and provide results that indicate whether or not the IP or domain you’ve entered is found on any of those lists. Figures 14.16 and 14.17 show example output from both services.
In both outputs, the services will provide a header with basic information about the IP or domain, along with a statistic of the number of blacklists that match your search. In the case of Figure 14.16 you can see that the domain was found on 3/28 (11%) of the blacklists that were searched by URLVoid. Those 3 blacklists are listed at the top of the report, and each blacklist has a link in the Info column that will take you directly to the reference to this domain name at those sites. The output shown in Figure 14.17 has had the IP address information from IPVoid trimmed off for size, but shows several of the IP blacklist services that are used.
IPVoid and URLVoid are a great one-stop shop for determining whether an IP address or domain has found its way onto a reputation blacklist somewhere. While this isn’t always a clear-cut indicator of malicious activity, it frequently points in that direction.
When performing analysis, keep in mind that sometimes multiple domains exist on a single IP address. While you may find that a single domain appears on multiple public blacklists, that doesn’t necessarily mean that every other domain whose content is hosted on the same IP address is also malicious. This is especially true of shared hosting servers. On these servers, it is typically a web application flaw that results in one site being compromised. More often than not, this compromised is limited to just the affected domain. With that said, there are certainly exceptions to this line of thought, but this is something you should keep in mind when analyzing IP and domain reputation.
One way to quickly identify domains that are hosted on an IP address while ensuring that you aren’t communicating with any remote DNS servers yourself is to use the Domains by IP service (http://www.domainsbyip.com/). The output of this tool is shown in Figure 14.18.
In the image above, the results tell us that four different domains are hosted on this IP address. This looks like it is probably a shared hosting server based upon the number of domains with no clear link between them. We can also see that the service provides a listing of “nearby” IP addresses. These are addresses that are numerically close to the IP address we searched for, and also host domains. This service is very useful, but it isn’t all-inclusive, so your mileage may vary when using it.
Now that we’ve looked at a few ways to get OSINT information on hosts, let’s look at OSINT sources for files.
After IP addresses and host names, the next most common artifacts you will encounter while performing NSM analysis are files. Sometimes this might just be a file name, other times it could include an MD5 hash, and in the best of scenarios you may have access to the entire file. Suspicious files are usually observed being downloaded from suspicious hosts, or in relation to an alert generated by a detection mechanism, such as an IDS. Regardless of how much of this information you have or where it came from, intelligence related to files can be used to build tactical intelligence about the threat you are investigating.
Just like with host intelligence, there are a number of sources available on the Internet that can be used for researching suspicious files. Let’s take a look at a few of these resources.
If you have the actual file that you suspect to be malicious, the easiest thing to do is perform a behavioral analysis of this file. This is something that can be done in house, but if you don’t have that capability, you may be better off submitting the file to an online malware sandbox. These sandboxes allow users to submit files and automatically perform a behavioral analysis based upon the changes the malware makes to the system and the type of actions it tries to take. Let’s take a look at a few of these sandboxes.
Perhaps the easiest way to determine if a file is malicious is to run an antivirus tool against it. Unfortunately, the detection rate for antivirus in the modern security landscape is very low, and the chances that a single antivirus product will be able to detect a strain of malware are 50/50 or less. Because of this, the chances of detecting malware are increased by submitting a malware sample to multiple antivirus engines. It isn’t entirely feasible to configure a single system with multiple AV engines, nor is it cheap to license it. However, there is an online solution called VirusTotal.
VirusTotal (http://www.virustotal.com) is a free service that was bought by Google in 2012, and analyzes suspicious files and URLs using multiple antivirus engines. There are multiple ways to submit files to VirusTotal, including their website, by e-mail, or by any tool that uses their API. My preferred mechanism is their Google Chrome extension. Once you submit the file, VirusTotal will perform its analysis and generate a report indicating which antivirus engines detected a match for the file or its content, and the name of the string(s) that match.
An example of this output is shown in Figure 14.19. As of now, VirusTotal currently supports 49 different antivirus engines, including all of those from the larger and more popular antivirus providers.
In the example above, you can see that this report indicates the file that was submitted was detected as malware by 7 out of 48 different antivirus engines. Two of the engines that detected this are shown; the antiy-AVL and Baidu-International engines. They both detect this file as some sort of VNC-based application, which can be used to remotely control a system. The meter at the top right of the screen shows an indication of whether the file is actually malicious based upon the number of matches and a few other factors. In this case, it thinks that the file we’ve submitted is probably malicious.
While VirusTotal doesn’t share submitted samples publicly, it does share samples that match at least one antivirus engine with antivirus companies. Keep this in mind when submitting files that might be highly sensitive or involved in targeted attacks.
One of the most popular sandbox environments for malware analysis is Cuckoo. Cuckoo (http://www.cuckoosandbox.org) will launch an instance of a virtual machine, execute malware, and perform a variety of analysis tasks. This includes recording the changes and actions the malware makes, any changes to the system that occur, Windows API calls, and files that are created or deleted. Beyond this, Cuckoo can also create a full memory dump of the system or selected processes, and takes screenshots of the virtual machine as the malware is executing. All of this goes into a final report that Cuckoo can generate. Cuckoo is designed around a modular system that allows the user to customize exactly what occurs during the processing of malware and the reporting of findings.
Cuckoo sandbox is a tool that you can download and deploy internally, and one that I’ve seen used successfully in a lot of environments. However, this section is about online malware analysis sandboxes, and that is what exists at http://www.malwr.com. Malwr is a website that utilizes Cuckoo to perform malware analysis services for free. It is operated as a non-commercial site that is run by volunteer security professionals with the exclusive intent to help the community. The files you submit are not shared publicly or privately unless you specify that this is allowed when you submit.
Figures 14.20 and 14.21 shows an excerpt of a Cuckoo report from Malwr.
In these figures, the first image shows Cuckoo providing information about signatures that the malware has matched, indicating that those sections of the report should be examined in more detail. This also shows screenshots from the virtual machine where the malware was executed. Figure 14.21 shows results form the behavioral analysis performed by Cuckoo. In this case, we see some of the actions taken by the file mypcbackup0529.exe.
Malwr publishes shared analysis reports on its home page, so you can go there and view these reports to get a real idea of the power that Cuckoo provides. You can also search these reports based on the MD5 hash of a malware sample to see if a report already exists for the file. This will get you to the results you want to see faster without waiting for analysis to be completed.
If you have the capacity to do so, setting up a Cuckoo sandbox internally is a useful venture for any SOC or NSM environment. The setup is a bit long and complicated, but that provides much more flexibility than you will find from the online service, including the ability to customize analysis routines and reporting. I think that you will find that Cuckoo is a very full-featured malware analysis sandbox that can come in handy in a variety of situations during daily analysis.
ThreatExpert is another online sandbox that provides similar functionality to Cuckoo and Malwr. ThreatExpert (http://www.threatexpert.com) allows for the submission of suspicious files via its website. It will execute submitted files in a sandbox environment to perform a limited behavioral analysis of the file. The end result of this analysis is a report that details the actions that the suspicious file took in relation to the file system, system registry, and more. Figures 14.22 and 14.23 show excerpts from a ThreatExpert Report.
In the first image, we can see that the file that was submitted appears to be packed with UPX, and that ThreatExpert thinks that this file contain characteristics that represent a security risk, including the creation of a startup registry entry, and communication with a remote IRC server. The second figure provides more technical details associated with these findings, including memory modifications, registry modifications, and the creation of a mutex and opening of a port.
ThreatExpert also has a very robust search feature. It will allow you to search for files that have already been analyzed by searching based upon the files’ MD5 or SHA1 hash, so that you don’t have to wait for it to re-analyze a file that may have already be submitted. Its most powerful feature, however, is the ability to search for terms within reports. This means that you can search for arbitrary text, file names, IP addresses, or domain names that you have found elsewhere on your network to see if they show up in relation to any of ThreatExpert’s malware analysis reports. I’ve been involved in many investigations where the only intelligence I was able to find regarding a certain address was within a ThreatExpert malware report, and many of those times that has been enough to lead me down the right path towards figuring out whether an incident had occurred.
While ThreatExpert is an efficient sandbox, it doesn’t go into quite as much detail as Cuckoo, and it doesn’t have the option of being downloaded and installed locally. With that said, in a lot of instances it will get the job done just fine, and its search feature makes it incredibly valuable for NSM analysis.
The quickest way to identify any file is by its cryptographic hash. Because of this, most files are uniquely identified by their file hash; typically MD5, but sometimes SHA1. This is advantageous because a single hash value can be used to identify a file regardless of its name. We’ve already seen instances where both Malwr and ThreatExpert identify files using these hashes, so it makes sense that it would be relatively easy for someone to compile a list of known malicious malware hashes. That is exactly what Team Cymru did.
The Team Cymru Malware Hash Registry (http://www.team-cymru.org/Services/MHR/) is a database containing known malware hashes from multiple sources. This database can be queried in a lot of ways, and provides a quick and efficient way to determine if a file you’ve collected during detection or analysis is malicious.
The easiest way to query the registry is actually with the WHOIS command. This may seem a bit odd, but it works surprisingly well. You can query the database by issuing a WHOIS command in the following format:
whois –h hash.cymru.com < hash >
The results of two of these queries are shown in Figure 14.24.
In the figure above, we complete two queries that each return three columns. The first column contains the hash value itself. The second column contains the timestamp (in epoch format, which you can convert to local time by using the date –d command) of the last time that the hash was observed. The third column shows a percentage number of antivirus detection engines that classified the file as malicious. In the first submission we see that the file was detected as malicious by 79% of antivirus engines. The second submission lists NO_DATA for this field, which means that the hash registry has no record for that hash value. The malware hash registry will not keep records on hash values that have below a 5% detection rate.
The Team Cymru Malware Hash Registry can be useful for the individual analysis of suspicious files, but because of the extensive number of ways you can query the database, it also lends itself well to automated analysis. For instance, Bro provides functionality to use its file extraction framework (Chapter 10) in conjunction with its intelligence framework (Chapter 8) to selectively extract files and automatically compare their hashes against the hash registry. This is incredibly valuable from a detection perspective, and can ease the analysis burden of an event.
You can read more about the malware hash registry and the numerous ways you can query it by visiting its website, listed above.
Combining all of the IP and domain name intelligence we’ve discussed here with the observations that you’ve made from your own network data should give you the resources you need to begin building a tactical intelligence product.
Know your network, know your adversary, and you will be able to get to the bottom of any investigation. The capability to collect and generate intelligence for friendly assets combined with the ability to research and derive information about potentially hostile entities is critical for the successful analysis of any network security event. In this chapter we discussed methods for doing all of these things. While there are a number of ways to approach intelligence, it is key that intelligence is approached as a product that is designed to help analysts make decisions that lead to escalation. In the next chapter we will discuss the analysis process, which will draw upon information gained during the analysis collection and generation process.