Open source information is one of the most overlooked resources available to analysts and researchers. Simply put, it’s any publicly available data that, when correlated and analyzed, can become actionable intelligence. At that point, researchers consider it open source intelligence (OSINT). While anyone can find open source information from resources such as the internet, books, and published research, these resources are vast, and unfortunately the sheer amount of data can overwhelm even experienced researchers. It’s easy to spend too much time hunting only to yield too few (or too many) results.
Luckily, plenty of publicly available tools, available for free or for cost, can help you with your investigations. This chapter will discuss these tools and the capabilities they provide, as well as how to leverage each in your research and analysis. I’ve selected the tools covered here based on their capability and availability. While some charge for certain features, they all have free, limited versions that you can leverage.
Most open source tools fall into one of two categories: active and passive. Passive tools do not alter or interfere with the endpoint system against which they are run. For example, you may use a tool to query DNS servers in search of IP addresses associated with a specific domain or URL. The tool uses legitimate queries to discover what other domains are present on the same infrastructure, and it does so without actively interacting with the target. Instead, it learns about the target infrastructure from domain records kept by unrelated second-party DNS servers.
Now let’s say you have a list of probable attacker-related IP addresses and domains. You decide that you want to identify any open ports and vulnerabilities present on the identified infrastructure. To accomplish this, you might use network and vulnerability scanning tools to profile the infrastructure. These are active tools: by scanning the machines, your tool creates noise, as it must actually connect to the remote hosts. Moreover, the interaction between the attacker’s domains and IPs not only uses the resources of your system but also your targets’ resources. The interaction could alert your attacker that you’re on to them. Even worse, the attacker could use it to trace the activity back to you.
For all of these reasons, you should know ahead of time whether the tool you’re leveraging uses active or passive means to achieve the desired results. Most tools have documentation or at least a README file providing such details.
It’s important to protect yourself by integrating operational security (OPSEC) into your open source research tasks.1 OPSEC is the act of protecting your anonymity when engaging in online research or operations. Much like a spy, good researchers never get caught, except when writing or talking about their work publicly. Think of it like this: If you were the witness to a bank robbery, would you want the criminals to know your name, where you live, and that you saw what they were doing? Of course, the answer is no; however, this is still a frequent mistake that security analysts and researchers make.
People write entire books about the topic of OPSEC, and we can’t cover every aspect of the subject in just one chapter. However, to ensure your anonymity, at a minimum, you should use the following:
As an additional precaution, you can use a virtual machine (VM) to create an image with your preferred operating systems and tools to use while conducting research. Once you configure the VM, you can take a snapshot, which captures the state of the system at the time at which it was created. Now, when you finish your research, you can revert the VM back to the clean, original state. Several open source VMs exist and are prebuilt with tools and configurations geared toward safety and research. However, these tools change constantly, so I recommend you do some research to find what best suits your needs.
Make OPSEC part of your standard research methodology. You can never be too careful when you’re dealing with criminals and nation-states that may try to hack into your organization’s infrastructure.
Before you begin hunting, consider any legal and ethical boundaries you may unintentionally cross if you misuse a tool. Laws differ in many parts of the world, and some tools obtain information through means that may not be legal.
Many of the tools we’ll discuss use both passive and active techniques to achieve the desired results. Unless you’re a penetration tester or have received the proper authorization, stick to passive techniques. Some active techniques may be considered hacking, and because of that, they carry legal penalties. Usually this happens when the researcher downloaded a tool that had a feature they didn’t truly understand. For example, certain network enumeration tools attempt to brute-force the DNS server to obtain the names of subdomains and infrastructure associated with the queried domain. This may yield the results you’re looking for, but it could still be illegal (and likely is) in the country or region where you reside.
Additionally, use open source tools only on approved systems. Often, corporate networks won’t allow this type of activity, and the network may mistake it for something malicious. This can bring unwanted attention to your research, which you want to avoid whenever possible.
Open source information can sometimes help you identify an adversary’s infrastructure, that is, if you know where to look and what to look for. Use the tools in this category to identify attacker resources used to distribute malware, exfiltrate victim data, and control attacks. Then enumerate attacker domains to identify the subdomains and IP addresses that host them. This intel can then help you discover additional malware, victims, and tactics that attackers use.
DNSDB2 is a for-pay service, offered by Farsight Security, that provides access to passive DNS data. The service includes information on the first and last time a domain resolved to a known IP address and vice versa. It also shows other domains hosted on the same IP address and the domains’ hosting timestamps. This passive DNS data is a valuable resource that you can use to identify additional adversary infrastructure.
Passive DNS providers usually charge a subscription fee for access to their data. However, most offer some level of free access to passive DNS records. Fortunately, if you are a researcher, you can request a free account. Apply for a “grant” account or find out how to purchase Farsight’s DNSDB service for professional use on their website.
PassiveTotal3 provides access to data that you can use to footprint, or discover and enumerate, infrastructure. It includes several useful data sources that you can query through a single web interface. Use it to find passive DNS information, domain registration records, and other infrastructure-related data. Free accounts are limited to 15 queries per day; paying members receive greater access. For more information about PassiveTotal, visit its website.
DomainTools4 is a service that lets you view domain registration and IP resolution data. Like many of the tools and resources discussed in this section, DomainTools offers free and for-pay services. Use it to find the following:
Unfortunately, DomainTools’s usefulness has degraded in recent years, with the increasing use of domain privacy protection services and the enforcement of the European Union’s General Data Protection Regulation (GDPR). Still, it offers many unique domain registration correlation capabilities that most other vendors don’t.
Whoisology5 maintains both current and historical domain registration records. It also lets you conduct queries against them. Unlike other services, this tool lets you cross-reference registration data. For example, an analyst using Whoisology can query an email address—or even the physical address—used to register a domain and return every other domain registered with the same information. Sometimes this information reveals an attacker’s additional infrastructure. Whoisology allows for a limited number of free queries per day and a larger volume for paying members.
DNSmap6 is a command line tool used to discover subdomains. As the name suggests, the tool relies on DNS records to map out infrastructure. This is useful in cases when you’ve identified an attacker-created domain and want to locate additional infrastructure. Note that the tool uses both passive and active methods to enumerate subdomains. Even so, it’s one of the best free and publicly available tools for finding adversary infrastructure.
Malware is at the center of almost all cyberattacks. As you’ve discovered throughout this book, you can often learn about your attacker by analyzing this malware. For example, if you know the malware’s purpose, you can identify the attacker’s motivations. Additionally, most malware communicates with adversary-controlled infrastructure, and if you can identify the IP addresses and domains used in these communications, you might be able to identify other malware associated with the same infrastructure.
Two types of malware analysis tools exist: dynamic analysis tools and static analysis tools. Dynamic analysis tools perform automated analysis using software that runs malicious binaries in a sandbox for monitoring and analysis purposes, without user interaction. A sandbox is an isolated, protected environment that mimics a legitimate system. The sandbox cannot access legitimate systems or resources, allowing malware to run safely for analysis purposes. The analysis software notes any changes made to the sandbox, as well as any network communications, and produces a human-readable report. Dynamic analysis is fast and efficient; however, in some cases attackers build antianalysis functions into their malware to detect and prevent analysis.
In these cases, you’ll have to perform static analysis. Static analysis is when a human, not automated software, manually examines a binary to determine its malicious purposes. In static analysis, you’ll often reverse engineer the malware and then go through its code and document your findings.
VirusTotal is one of the world’s largest and most popular malware repositories. Analysts can conduct a limited number of queries per day with a free account, while other options require a for-pay account. Figure 7-1 is VirusTotal’s front-end web interface.
One great use of VirusTotal is determining whether an IP or domain has ever been associated with malware. When malware attempts to communicate with other domains or IP addresses (regardless of whether they’re good or bad), VirusTotal captures that information. You can then query this against a list of the hashes corresponding to the malware seen calling out to attacker infrastructure, or vice versa.
What’s more, you can also see historical IP address resolution. While VirusTotal contains far less historical DNS data than a passive DNS provider like the ones covered in this chapter, it does provide another angle on the DNS-related data you’re querying: it provides the hashes of any malware associated with the domain or IP address in question.
VirusTotal offers many useful features with its for-pay membership. For example, you can download malware from VirusTotal into your own environment for situations where you want to conduct additional analysis. Also useful, VirusTotal provides the packet capture (PCAP) seen at the time of analysis. Using third-party tools like Wireshark, which we detail later in this chapter, you can review the malware’s network communication at the packet level. You can also write your own Yara rules, which identify unique characteristics of a malicious binary, and then apply the rules to run against VirusTotal data, which might help you identify additional malware samples that share the same characteristics. VirusTotal uses many search operators7 and variables that allow you to comb through its data to find specific types of information. Take the time to learn the various operators, because they’ll make the tool much more useful.
Hybrid Analysis8 is another malware repository that can provide dynamic analysis of malware and assist in discovering related infrastructure and samples. It can even provide context about the functions and purpose of a malicious binary. Anyone can submit files and query the repository with a free account, while other features require a paid membership (Figure 7-2).
One of the useful features Hybrid Analysis provides is a screenshot of the file while it’s actively running. For instance, when researching a lure document, a fraudulent document the attacker tricks the victim into opening, you might want to view what the victim would see when opening the file for the first time. Keep in mind that any file you submit will be publicly available.
The site allows for individuals to access queries and file submissions for free, while other features require a paid subscription. Combining the analysis reports from both VirusTotal and Hybrid Analysis can be extremely useful, as each of the services provides different information about the malware, allowing you to fill in gaps. (Both sites provide similar information for users who have paid memberships.)
Joe Sandbox9 is a malware repository that has both free and for-pay services. A free account allows users to search for malware samples using their hashes or other identifying traits. This tool is particularly useful when you’re looking for specific files; for example, it has several built-in filters that it constantly updates with information from users and its own built-in automation.
Joe Sandbox categorizes samples by the platform they’re designed to infect (such as Windows, Mac, or Linux). Figure 7-3 shows some of the filters and interface options that Joe Sandbox presents.
Joe Sandbox also provides the ability to query command line parameters. When malware executes, it will often run commands on the infected systems, and the ability to query these commands can help you find other related samples.
Another unique feature is the tool’s static analysis options. Although it requires a paid account, it allows users to submit a sample for static analysis. This may be necessary when dealing with malware in which the developer built in antianalysis components that prevent the malware from running in a sandbox.
A commercial platform, Hatching Triage has features available for free and additional capabilities as part of paid researcher accounts. Because Triage was developed for enterprise use, it might provide analysis results on samples that don’t execute in other sandbox environments due to antianalysis capabilities designed into the malware.10
Hatching Triage is especially useful when analyzing ransomware. The interface provides you with the ransom note, any of the attacker email addresses used to communicate with the victim, and any URLs included in the attack, such as payment and data-leak websites, making it easy to review and extract pertinent information. You can also look for samples by searching for the ransomware family name. This is a quick way to identify fresh samples and see if the attacker updated information such as their contact email or domains.
Cuckoo11 is different than the other malware analysis tools discussed thus far. While those malware repositories are owned by commercial companies, you can host and run Cuckoo Sandbox locally, in your environment. Thus, the malware you analyze won’t be made public, as it would with the other commercially owned solutions.
You can also tailor Cuckoo to fit your needs. For example, Cuckoo lets you execute malware within a virtual machine, monitor what the malware does, and document any changes it makes to the victim system.12 Then, once the automated analysis is complete, Cuckoo generates a report documenting these details and even provides screenshots of things like the lure document or fraudulent file that a victim might see. Cuckoo can also decode or decrypt encrypted and encoded binaries, along with their communications to command-and-control infrastructure, making it one of the best free tools available for researchers and analysts who might not have strong reverse engineering skills.
Cuckoo is the backend technology used in many of the for-pay repositories discussed in this chapter, so it can do many of the same things: it provides the files, registry keys, and services that the malware created; the detection names of signatures that identified the malware; and any associated infrastructure. Once you set up Cuckoo, you can choose to direct your local Cuckoo implementation to publicly available malware feeds. This allows researchers to populate their own internal databases with malicious binaries. Cuckoo then analyzes these samples and provides an output in both a printable report and an HTML interface to simplify its use. Figure 7-4 shows the Cuckoo user interface.
Cuckoo is open source and modular, which allows analysts and researchers to tailor it to fit their needs. The tool is extremely robust and does much more than the high-level functions discussed here. Explore its other features on its website.13
Search engines are one of the most powerful and underused tools available to analysts. They’re a great source of publicly available information, particularly in cyber research. For example, search engines can be useful in researching infrastructure and hosting records associated with each system you discover. They can also provide insight into malware and how it’s used. You can use them to find analysis and research blogs, reports done by other researchers, or even details about the past operations of an advanced attacker.
Most search engines, like Google, have their own query operators. These let you build advanced queries that can identify specific types of information, such as additional subdomains associated with known attacker infrastructure. For example, in Figure 7-5 the
site operator is used to search for any results from the website you enter after the operator.
There are some limitations to this method, however. First, if the website’s administrator has configured the
noindex clause, these pages won’t be crawled or included in your results. Second, if the domains were recently created and not yet crawled, they also won’t appear in your query results. However, it takes only a few seconds to run this query, and it often provides useful results.
To learn more about Google search operators, try running the command
site:.com and "google" AND "hacks" OR "dorks" to find all websites ending in .com that include the terms google hacks or google dorks. This will present you with many websites that provide information on this topic (Figure 7-6). Give it a try!
When adversaries compromise a website, they’ll usually modify the page’s source code, even if it’s only to redirect visitors elsewhere. In these cases, you can use the modified portion of the source code to identify other web pages that share the same code, which is particularly useful if you’re researching an ongoing attack.
Source code search engines, such as NerdyData, are tools that allow for searching the source code of web pages themselves, as opposed to the content you see when navigating to the page. Have you ever viewed the HTML code used to create a web page? This code is collected and indexed by source code search engines, which you can then search. For example, during the NotPetya ransomware attack in 2017, attackers compromised a number of legitimate financial regulatory organizations to attack other banking organizations the attacker knew would visit the compromised websites. The attacker introduced malicious code to these financial regulator websites. This malware would then silently redirect visitors to attacker-controlled infrastructure, where it would then infect their systems:16
<iframe name='forma' 1src='https://sap.misapor.ch/vishop/view.jsp?pagenum=1' width='145px' height='146px' style='left:-2144px;position:absolute;top
The URL https://sap.misapor.ch/vishop/view.jsp?pagenum=1 1 isn’t seen on the web page itself, but it’s present in the page’s HTML code. Traditional search engines don’t index this information, but source code engines do.
This malicious code and its associated domain have since been sanitized and removed. However, when first discovered, researchers could have taken this malicious code and used a source code search engine to conduct a query for any website sharing the same or similar code. They could have then identified other compromised sites, leading to a quicker mitigation. Figure 7-7 shows the query builder for NerdyData.17
You don’t have to be an HTML expert to search NerdyData’s interface. If you’ve identified malicious code on a page and want to find other sites that share that code, simply copy and paste it into the query window.
Social media is a great source of information. Twitter is especially useful to researchers because other researchers often use it to share news about their own findings. Navigating through all the available information, however, can be difficult. To help, you can use tools such as TweetDeck,18 whose dashboard integrates with Twitter, allowing you to search and track social media posts in an organized manner. You can search multiple accounts at the same time, which is convenient if you use separate accounts to track different types of content or to follow users you don’t want to follow from your primary account. One of TweetDeck’s most useful features is its ability to run concurrent searches. TweetDeck will save the search and update it in real time, alerting you when it identifies a new tweet matching the search criteria.
Attackers often leverage the anonymity and isolation that the Dark Web provides. The resources on the Dark Web are more difficult to access, often requiring invitations from other members, but if you can get on these sites, you might find data about attackers and their malware.
For example, in the summer of 2020, an individual using the moniker “Wexford” posted to a Russian-speaking forum on the Dark Web. In his post, Wexford claimed he worked for and supported the Suncrypt ransomware gang but never got paid. He listed a number of problems with the gang’s operations, including issues with the encryption method used by the malware, which kept the gang from being able to decrypt victims’ files.19 When this fact became apparent, victims refused to pay the ransom, leading to Wexford working for months with no revenue. In the forum, Wexford and the gang went back and forth, arguing about who was at fault and providing analysts with an interesting insight into the inner working of a Russian organized crime gang.
Due to its design, you can’t access the Dark Web through a traditional web browser. To reach its unindexed and hidden websites, you need to connect through encrypted relays that make up The Onion Router (Tor) infrastructure. Tor is anonymity software that allows you to browse these encrypted relays, or Darknet. The Tor Browser20 is freely available and preconfigured with both browser-based anonymity tools and everything you need to reach Dark Web sites.
Even once you understand how to access the Dark Web, finding what you’re looking for can be challenging, and knowing where to look for the information you need can be a daunting task. For the most part, the Dark Web doesn’t have a search engine like Google that allows you to simply search for a site or topic. At least one such service, known as Grams, has existed in the past, but unfortunately, it’s no longer operational. To get around this hurdle, you need to spend time on the Dark Web and catalog its useful website addresses. This can be a difficult task; however, various resources regularly enumerate and document links to Darknet websites. The website https://deeponionweb.com/ is a good place to find information on underground criminal markets. Often, resources come and go over time, but new sites pop up regularly. You can find other websites that track Darknet websites by simply searching for Darknet or Dark Web sites in a search engine.
Of course, not all analysts will need to access the Dark Web. Many probably shouldn’t do so unless they have a firm understanding of how to safely exist and interact with Dark Web entities. It’s also especially important to not do this from your employer’s infrastructure without their knowledge and consent, as you may get yourself in trouble. Most organizations aren’t going to want any part of their legitimate infrastructure touching a marketplace full of malware and malicious content. Another alternative is to purchase subscriptions to third-party resources, such as Flashpoint, a Dark Web intelligence provider that monitors, categorizes, and collects data from the Dark Web in a safe and controlled environment. Another benefit of using Dark Web data providers is the anonymity they provide. Since you access the data from a third party, you do not have to actively search shady and possibly malicious Dark Web sites. Nor do you leave behind any evidence that can be traced back to you.
If you are a researcher or work for an organization without a large security budget, these third-party resources may not be an option for you. In those cases, a good analyst and the right tools can get you the same information as long as it is presently available. The downside is that it will likely take much longer to find on your own and require you to accept additional risk by manually searching through the Dark Web yourself.
When conducting research, it’s important to mask the source of your activity, just as criminals mask theirs while conducting attacks. The worst thing you can do when investigating nation-state or criminal activity is draw unwanted attention to yourself. To prevent this, you must take care to cover your tracks and remove any traces of your online presence that can lead someone back to you. Thus, one of the most important resources to protect you while conducting online research is a VPN. While the Tor Browser technically falls into this category of a VPN, it has a specific use. In addition to this, because the Tor Browser is free, it isn’t known for its speed and efficiency. You’ll need a for-pay VPN provider for the day-to-day activities of conducting research.
A VPN provides online anonymity, masking and hiding the infrastructure from which your network traffic originates. Every time you visit a website or conduct a search in your browser, you leave a record of the time at which you accessed the resource and the IP address from which you accessed it, among other things. A VPN uses a proxy, which replaces your true IP address with its own and creates encrypted tunnels that your traffic traverses, making it nearly impossible to track back to you. This prevents cybercriminals, governments, or anyone else from following your activity and reading your data.
Furthermore, most providers have proxy relays located all over the world. This allows you to choose the region of the world from which your traffic originates, which is useful for an analyst. For instance, some websites restrict access by a country’s IP address space, and they’ll block or filter content based on those regional settings. Using VPN infrastructure, you can bypass these restrictions by giving the appearance that your requests originated from an unrestricted region.
There are many VPN providers, and the VPN market frequently changes, so do your research. When selecting a provider, you’ll want to consider a few things. In addition to considering speed and cost, pick a service that does not log your data or track your location internally. If a provider does this, any government can subpoena the provider to obtain all your activity records, or if an attacker breaches the VPN, they can steal the data. This completely defeats the purpose of using a VPN, but some providers track and log your information regardless, while several providers have been known to lie about doing so. Other VPN service providers, such as ExpressVPN21 and NordVPN,22 were unable to provide log data about their customer base even when ordered to do so by a judge, because they never collected it in the first place.
Also, select a provider that regularly conducts third-party auditing of its products and services. Auditing validates the provider’s security claims and ensures the provider is not tracking its customers.
Throughout an investigation, you’ll often collect large amounts of data. At that point, you’ll need to get organized, as you’ll want to piece the evidence together and document how various elements relate to one another. Moreover, you may have to address questions about a case you worked on months ago. Investigation tracking tools make it easier to review these details. They also allow you to share research findings with other analysts, which encourages internal collaboration.
ThreatNote is an open source threat intelligence platform; it provides a centralized platform to collect and track cyberattack-related content and events. You can use it to store various kinds of data collected during a cyber investigation, whether they be endpoint and network indicators or context about an attack campaign. You can also use this tool to keep track of details about the threat actors themselves or their victims. ThreatNote is best suited for small groups, teams, or individual analysts and researchers.
Once it’s downloaded, install the tool either locally or on an internal server, which will allow an entire team to access it. You can then log in and access ThreatNote’s dashboard through a web browser. The main dashboard (Figure 7-8) includes the various metrics derived from indicators and threat group activities created during your investigation.23
One of the benefits that ThreatNote provides is the ability to track threat groups and their associated indicators of compromise; you can then link to them and tag their associations. While this isn’t necessary in many general and nontargeted threats, threat group association is imperative when tracking targeted and advanced attackers.
ThreatNote also integrates with third-party integration tools for gathering passive DNS and Whois data, among other information. Consider using ThreatNote if you find yourself using a text editor or spreadsheet to track attack data.
Another free resource, the Malware Information Sharing Platform (MISP)24 was originally developed by MITRE as an open source threat intelligence sharing platform that allows organizations to share indicators of compromise seen in attacks. MISP accepts indicators and attack data in a common format. It relies on the Structured Threat Information eXpression (STIX), a standard used to format threat data, and the Trusted Automated eXchange of Intelligence Information (TAXII), which defines how to transmit and receive STIX threat data. Essentially, MISP provides a security platform that teams and organizations can use to manage and share threat data on a larger scale.
Analyst1 is a for-pay threat intelligence platform. Sometimes, free resources like ThreatNote can’t scale or provide the necessary level of support. In other situations, companies may not want open source software used in their production environment.
Analyst1 can ingest threat feeds, reports, and indicators of compromise and then use artificial intelligence to correlate and organize the data. By design, it supports investigations of nation-states, not just criminal activity. For example, the tool has a built-in feature for creating threat actor profiles, including the targets of nation-state operations, the malware and infrastructure the adversary used, and even details the vulnerabilities exploited to accomplish the breach. These manually created profiles will likely be more useful, detailed, and relevant to your organization than automatically generated ones. However, not all organizations have the expertise needed to create these. In those situations, tools such as Analyst1 can provide a basic profile derived from security reporting, indicators of compromise, and artificial intelligence.
Additionally, the tool can link to resources like defensive sensors, allowing you to automatically add threat information detected on your own network into the threat intelligence platform. You can then identify malicious activity present on your network by consulting the platform’s artificial intelligence and other external sources it ingested.
DEVONthink26 is an academic research tool (Figure 7-11). While it’s not designed for cyber investigations, several of its data management features are extremely useful: they let you store web pages (either local copies or bookmarks), emails, office documents, attack diagrams, PDFs, and notes. Additionally, DEVONthink allows you to tag and organize data, making it easy to sort and filter through your findings. Another useful feature is its built-in browser, which allows you to browse web pages and display files and documents from within the application itself.
DEVONthink’s only limitation is its platform availability. Currently, it’s available only on macOS and iOS operating systems. You can download and use DEVONthink for free for 30 days, and you can install it locally or on a network.
Wireshark is a tool that analyzes network traffic at the packet level. It’s especially useful for analyzing network communications between malware and its corresponding command-and-control server.
To see how it works, take a look at Figure 7-12, which shows Wireshark’s interface as it analyzes packet capture generated from malware known as Trojan.Sakural. You can see the network communication activity produced by the malware and the attacker’s command-and-control server.
There are several ways to acquire a PCAP file (the file format of captured network traffic), depending on your environment.
Open source recon frameworks can help you collect specific information about infrastructure, vulnerabilities, web pages, email, social media, companies, and even people. These recon frameworks are modular by design, allowing anyone to add or develop their own module. For example, you could create a module that enumerates your own dataset—say, your company’s corporate directory of usernames and password hashes—and then search the web for known data leaks matching the usernames and passwords. In this manner, the module could identify vulnerable accounts an attacker could use to gain access to your organization. Alternatively, you could search for an attacker’s username and identify their email and password.
Many researchers develop their own modules and post them publicly on software repositories such as GitHub for others to leverage. Frameworks provide many benefits to threat research, and because most do not have graphical interfaces and require using a command line interface, they are highly underused. Let’s discuss a few that you can use in your investigations.
Recon-ng is a free, publicly available reconnaissance framework. The tool, written in Python, is designed and laid out in a manner similar to the Metasploit framework.28 They have similar command line syntax, and both use modules to perform various tasks. For example, Recon-ng can identify public-facing infrastructure, existing subdomains, email addresses, protocols and ports in use, technologies and operating systems used in the target environment, and several other profiling resources. Because Recon-ng is module based, it constantly receives updates, making it a go-to resource for many researchers. Figure 7-13 displays the Recon-ng interface.
Recon-ng can run and post results to a user-defined output file, which allows you to organize your data into one central location. You can also create and write your Recon-ng modules to conduct research tailored to your or your organization’s needs. Furthermore, you can even apply API keys from many other open source tools and datasets to extend the tool’s capabilities by using many of its built-in query functions against these resources.
Because some of the Recon-ng modules are aggressive, make sure to research exactly what each one does before executing it.
Another modular-based information gathering tool designed for penetration testing, TheHarvester is similar to Recon-ng but does not have as many capabilities.30 However, TheHarvester excels at collecting email and infrastructure enumeration. Its fewer options also make it easier to use for less experienced investigators. Like Recon-ng, TheHarvester collects and gathers information about infrastructure, email, and companies but does not require loading modules or advanced knowledge. TheHarvester uses queries to collect information on a target from sources like Google, LinkedIn, DNS servers, and several other web-based resources designed for gathering information. You can also put the tool output into several formats, making data easier to parse, store, and ingest into automation.
SpiderFoot31 is a free open source tool whose graphical interface allows users to make queries against various data types. It is useful for day-to-day investigations, and it can save you time when you’re researching open source information. SpiderFoot makes use of many tools streamlined through one central interface, providing a framework that ties into several other tools discussed, including VirusTotal and Hybrid Analysis, among others.
Many of these resources that work in conjunction with SpiderFoot provide free access to API keys, though most limit the number of queries you can make without paying for their services. For an individual researcher, the free tools available via API should suffice. Companies wanting to leverage the resource will likely want to purchase subscriptions. SpiderFoot receives regular updates, which often add additional features.
SpiderFoot provides you with four types of queries, each of which comes with a description of the type of scanning taking place. More importantly, SpiderFoot provides a passive search, making it easier and safer for beginners. Finally, unlike many resources discussed, SpiderFoot can enumerate IPv6 infrastructure.
After you run your query, SpiderFoot can render the results as an interconnected diagram. This is useful, as SpiderFoot often returns a lot of data, which can be overwhelming. The diagram feature can help you sift through it all, as well as show which data came from where, so you can validate it later.
Maltego32 is a visual data analysis tool created by Paterva. It accepts entities, or indicators, and then runs Python code, known as transforms, to conduct various actions against an entity.
Like other tools discussed, Maltego works in conjunction with most of the resources discussed in this chapter. In fact, many security vendors and developers make their own Maltego transforms to query their datasets. For example, if you have a VirusTotal subscription, you can use your VirusTotal API key and VirusTotal’s custom transforms to query its malware samples from within Maltego. You can then see the results mapped and displayed in your Maltego chart. Maltego also allows you to import and export text data, such as spreadsheets, which is especially useful for working with log data. You can download Maltego for free. However, you’ll require a paid subscription for unlimited use.
This chapter discussed several analytical tools that you can use to comb through open source data, whether on your own or within an organization. Each of these tools maximizes the usefulness of the public data you might discover during a cyber investigation. Maltego is a great example of this; a user can provide their data and then apply transforms to visually analyze and discover relationships from various datasets. Using free resources such as Google, you can make tailored queries using their search syntax to comb through data and discover information about your target. Free malware analysis tools like Cuckoo provide you with the same advanced capabilities found in commercial applications, but you can use them in your environment for free. Finally, you can store and correlate information discovered from threat research in a threat intelligence platform to track and maintain indicators of compromise and other attack data. Many of the tools we’ve discussed have more than one purpose, and they may provide benefits not mentioned in this chapter. However, we’ve discussed the primary ways analysts use them in a cyber investigation against targeted threats.
Of course, while we have detailed the specifics of multiple tools in this chapter, the most important thing is understanding each tool’s capabilities. Often, developers will abandon tools; similarly, the underlying technology often changes and can make a particular tool less relevant. Understanding what each tool accomplishes will make it much easier to replace the tools when they grow obsolete (and, as mentioned, may keep you out of trouble).