Chapter 8: Email Forensics – Investigation Techniques

Email is just one portion of the global internet that has become a daily resource in the consumer and corporate realms. It has become one of the primary communication tools used by nearly every citizen of the industrialized world. Now that email has become part of everyone's everyday lives, it is very common that criminals will use this vector to commit crimes and to collaborate with their other co-conspirators.

It can be difficult for the digital forensic investigator to trace an email from its destination back to the source. The digital forensic investigator will have to be educated in the methods and delivery systems of the email life cycle. When the digital forensic investigator is successful in identifying the source of the email, that will lead to additional forensic investigations of the digital evidence that was found at the source.

Where can you find digital evidence relating to an email investigation? The local machine will have the destination version of the email, the email server(s), the device that was used to access the email, such as a cell phone, and logs from the internet service provider. The digital forensic investigator will have to be knowledgeable about which tools can analyze emails and the compound files of the email box that are used by some email suites. Knowledge of how to present this information to a non-technical person will be paramount for them to convey the relevance of the data that was recovered. By the end of this chapter, you will understand the protocols that are used to send and receive emails, how to decode the email headers, and how to analyze client and web-based emails.

We will cover the following topics in this chapter:

  • Understanding email protocols
  • Decoding emails
  • Understanding client-based email analysis
  • Understanding WebMail analysis

Understanding email protocols

An email protocol is a standard that is used to allow two computer hosts to exchange email communication. When an email is sent, it travels from the sender's host to an email server. The email server can forward the email through a series of relays until it arrives at an email server close to the recipient's host. The recipient will receive a notification stating that an email is available; the recipient will then reach out to the email server to get the email.

Users typically use an email client to access emails. An email client can use different protocols to access the email. We will now discuss some email protocols you may come across when conducting a digital forensics investigation.

Understanding SMTP – Simple Mail Transfer Protocol 

SMTP is the protocol for email transmission. It is an internet standard based on RFC 821 but was later updated to RFC 3207, RFC 5321/5322.

Tip

RFC stands for Request for Comments. This is used on internet/communications technology to create standards. An RFC may come from different bodies, such as the Internet Architecture Board/Internet Engineering Task Force, or even from an independent researcher. It was initially designed to track the development of the original ARPANET but has now evolved into a source of official documentation regarding internet specifications and communication protocols.

Mail servers use SMTP to send and receive email messages from all points of the internet. Typically, you will find an SMTP server utilizing Transmission Control Protocol (TCP) port 25 on the network. The path from the sender to the recipient is outlined in the following diagram:

Figure 8.1: SMTP map

Figure 8.1 – SMTP map

When the user sends an email, it will travel from the host to a series of SMTP servers until it reaches the destination SMTP server. The recipient will have to use a different protocol to retrieve the email, which is our next topic.

The next protocol we will discuss is POP3.

Understanding the Post Office Protocol

POP3 is the standardized protocol that allows users to access their inbox and download emails. POP3 is specifically designed only to receive emails; the system does not allow users to send emails. This protocol allows the user to be offline when drafting, reading, or replying and, at the user's request, can access the online mailbox on demand. Be aware that the email you are conducting your digital forensic examination on may be the only copy. The user has the option to not leave a copy of the email on the server. Once the email has been downloaded, the system can remove it from the server to reduce storage use.

You will find POP utilizing port 110 on the network. 

In the following diagram, you can see the general functionality of the SMTP-POP process:

Figure 8.2: SMTP-POP map

Figure 8.2 – SMTP-POP map

Here, you can see the path the email takes, which is as follows:

  1. The email originates from the sender.
  2. The SMTP server forwards it to the destination.
  3. The recipient collects the email from the server. The recipient can decide if a copy of the email stays on the server or whether the email will be deleted when the user downloads the email from the server.  

When we look at the next protocol, we will discuss functions similar to SMTP, but with some significant differences. We will discuss these differences in the next section.

IMAP – Internet Message Access Protocol

IMAP is the Internet Message Access Protocol and is a standard protocol used by an email client to access emails on an email server. The protocol was designed with the goal of complete inbox management with multiple clients. In most cases, email messages will be left on the server until the user deletes them. IMAP is a newer protocol than POP, but both are still prevailing email standards in use today. The most significant difference between IMAP and POP is that POP retrieves the contents of the mailbox and IMAP was designed as a remote access mailbox protocol.

In the following diagram, you can see the general functionality of the SMTP-IMAP process:

Figure 8.3: IMAP map

Figure 8.3 – IMAP map

Here, you can see the path the email takes:

  1. The email originates from the sender.
  2. The SMTP server forwards it to the destination.
  3. The recipient collects the email from the server. A copy of the email stays on the server until the user explicitly deletes it.   

All three protocols we just discussed are typically used in the email client-server relationship. Users also have another option when it comes to accessing emails known as web-based email, which is the topic of the next section.

Understanding web-based email

Web-based email is a service the user accesses with a web browser. Some standard webmail providers are Gmail, Yahoo Mail, and Outlook/Hotmail. Some internet service providers also provide an email account that can be accessed with a web browser.

User deleted emails stored on a web-based email server typically remain on the server until the system deletes them. A characteristic feature of web-based email is that when the user deletes an email, it is moved from the inbox into a Deleted/Trash folder and can still be accessed. After the email remains in the Deleted folder for a set timeframe, then the system will permanently delete it from the user's inbox.

With that, we have gone over the different methods of how a user may access email services. However, once you have the email dataset available for examination, you may find the contents of the email encoded. How do you decode the contents of the email to determine whether a crime/violation has/has not been committed?

In the next topic, we will decode the email header so that you can make an informed choice about your investigative endeavors.

Decoding email

An email has many globally unique identifiers for a digital forensic investigator to identify and to track down.  The mailbox and domain name, along with the message ID, will allow a digital forensic investigator to serve judicially approved subpoenas/search warrants on the vendor to follow any investigative leads. 

In this section, we will break down the email header one section at a time so that you can make a decision regarding how to conduct your investigation. We will start off by discussing the email envelope.

Understanding the email message format

The vast majority of email users are only familiar with basic email information, such as this:

Subject background checks Date 07/19/2008 23:39:57 +0 Sender [email protected] Recipients [email protected]

We are back to dealing with our friend Jean, and from looking at the email, we can see several fields commonly associated with an email. Here, we know the subject, background checks, the date and time when the email was sent, the sender, and the recipient. We also have the content of the email, as shown here:

Jean,One of the potential investors that I've been dealing with has asked me to get a background check of our current employees. Apparently they recently had some problems at some other company they funded.Could you please put together for me a spreadsheet specifying each of our employees, their current salary, and their SSN?Please do not mention this to anybody.Thanks.(ps: because of the sensitive nature of this, please do not include the text of this email in your message to me. Thanks.)

As we look at the email, it appears that the email was sent to Jean from Alison. Alison is requesting a spreadsheet of employee confidential information. Based on the basic examination of this email, there is nothing to contradict what it initially appears to be. 

The user has created the information in the to and from, as well as the subject and the content of the email. The system bases the date and time off of the system time, which can be set by the user.

Underneath the typical email information, there is another layer of information that is particularly useful when you are conducting your investigations. This is referred to as the email header, and it contains information about the source, transmission, and destination of a specific email. 

Most email clients would require an additional command to view the email header. For example, Gmail requires you to click Show original to see the email header. The following is the email header for the email Jean received from Alison:

-----HEADERS-----Return-Path: <[email protected]>X-Original-To: [email protected] Delivered-To: [email protected] Received: from smarty.dreamhost.com (sd-green-bigip-81.dreamhost.com [208.97.132.81]) by spunkymail-mx8.g.dreamhost.com (Postfix) with ESMTP id E32634D80F for <[email protected]>; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)Received: from xy.dreamhostps.com (apache2-xy.xy.dreamhostps.com [208.97.188.9]) by smarty.dreamhost.com (Postfix) with ESMTP id 6E408EE23D for <[email protected]>; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)Received: by xy.dreamhostps.com (Postfix, from userid 558838) id 64C683B1DAE; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)To: [email protected] From: [email protected] subject: background checks Message-Id: <[email protected]>Date: Sat, 19 Jul 2008 16:39:57 -0700 (PDT)

The email header shows where the email originated from and what servers it touched upon. Starting from the bottom, we can see the Message-Id field:

Message-Id: <[email protected]>

The Message-Id field is a unique identification for every email that has been sent. When a user sends an email, it will receive its message ID at the first email server it touches. The design of a message ID is that it will be globally unique, which means there should not be another email with the same message ID. If you find different emails that contain the same message ID, you are dealing with one of two scenarios:

  • The email server is not compliant with the standard.
  •  A user has altered the email.

When you look at the message ID, you will see a string of random alphanumeric characters, including the @ symbol and a domain name. Sometimes, the arbitrary string of alphanumeric characters may contain a date/timestamp. If we look at the preceding example, we can see the numbers 20080719233957, which can be translated to 2008 07 19 – the year, month, and day. 23:39:57 is the time in hours, minutes, and seconds (GMT) when the email touched the first server.

Continuing from the bottom to the top, we can see the first Received line. This email transverses three different email servers. As the email crosses a server on its journey to its destination, each email server will attach a Received line on top of the preceding Received line. You can follow the email path from source to destination. In the email, we are examining the first server the email touched, which is as follows:

Received: by xy.dreamhostps.com (Postfix, from userid 558838) id 64C683B1DAE; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)

This is the first server the email touched; we have the domain name, dreamhostps.com, along with a user ID. The next logical step would be to subpoena the ISP and try to identify the subscriber with user ID 558838. The term Postfix identifies the email server. Postfix is a free, open source mail transfer agent and could be a commercial email server or an email server maintained by a potential bad actor.

The next two Received lines identify the subsequent servers on the path to the destination:

Received: from smarty.dreamhost.com (sd-green-bigip-81.dreamhost.com [208.97.132.81]) by spunkymail-mx8.g.dreamhost.com (Postfix) with ESMTP id E32634D80F for <[email protected]>; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)Received: from xy.dreamhostps.com (apache2-xy.xy.dreamhostps.com [208.97.188.9]) by smarty.dreamhost.com (Postfix) with ESMTP id 6E408EE23D for <[email protected]>; Sat, 19 Jul 2008 16:39:57 -0700 (PDT)

In both cases, we now have IP addresses of the specific servers (and server names) that touched the email.

What's interesting is when we look at the Return-Path field:

Return-Path: <[email protected]>

The Return-Path is the address where undeliverable messages will be sent. The Return-Path will also override the From field that the user will see. You will see this being used in email mailing lists, where you can reply to the user of the post and not to the list.

There are optional fields that you may come across in your investigations.  These fields typically start with an X–, as shown here:

X-Priority: 3 X-Mailer: PHPMailer 5.2.9 (https://github.com/PHPMailer/PHPMailer/)Message-Id: <[email protected]>X-Report-Abuse: Please forward a copy of this message, including all headers, to [email protected] X-Report-Abuse: You can also report abuse here: http://mandrillapp.com/contact/abuse?id=30514476.1925a088d66f450cb25a4034f3ec6942 X-Mandrill-User: md_30514476

These fields are not part of the email protocol standard. They can contain information about a virus scan, spam scans, or information about the server. As you can see, it provides information about contact information regarding abuse, such as, spam. You may also see an optional field called X–Originating–IP that may contain the IP address of the sender when the message was sent. An email provider can strip that information and replace it with a server address, which is what happens when a message is sent from Gmail.

A note about IP addresses. There are two different types of IPv4 addresses: public and private. You may see both in the email header. If you see a private IP address, you cannot identify the provider (unless you are investigating within the organization). Private IPv4 addresses run from the following addressing schemes:

  • 10.X.X.X
  • 127.X.X.X
  • 172.16.X.X
  • 192.168.X.X

We will discuss email attachments in the next section.

Email attachments

MIME is the acronym for Multipurpose Internet Mail Extensions, which is the internet standard for allowing emails to accept text other than ASCII, binary attachments, multi-part message bodies, and non-ASCII base header information. When you are viewing the header, you will see MIME indicated with the following:

MIME-Version: 1.0

An example of this is as follows:

MIME-Version: 1.0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit

Here, we can see the content type, which is HTML, and with the following line, we see it is using 7-bit coding. If there was an attachment, we would also see Base64 encoding, which converts the binary data into ASCII text.

The system will separate the body of the email based upon the data type for each segment. For example, a JPEG image will accompany one segment; it will store ASCII text in a different segment. Each segment will start with a MIME header that includes the keyword _PART_.

Now that we have discussed the email and header, we need to look at some of the clients the user may use to access the emails. 

Understanding client-based email analysis

There are many email clients a user has access to in order to retrieve, read, and send emails. Depending on the environment, consumer versus commercial, you may run into different email clients.  In the consumer market, you will find that Microsoft Outlook/Outlook Express will prevail because it is preinstalled on the system. Microsoft Outlook comes with the Microsoft Office suite.  There are also freeware options available such as the Thunderbird email client.

You can conduct an email examination by exporting the container used by the client and opening it with the email client installed on your forensic computer. Another option is to utilize specialized commercial forensic software that is created for email examinations. The more common forensic suites will typically be able to analyze the more common email client containers.

We will discuss some more common email clients in the following sections.

Exploring Microsoft Outlook/Outlook Express

Outlook stores email information in several file types, such as pst, .mdb, or .ost. We will find the PST file on the user's hard disk at the following path:

Users$USER$AppDataLocalMicrosoftOutlook

The OST file is an offline file that may also be stored on the user's hard drive in the same path as the PST file. You will find the MDB file on the server. Typically, this file is found when you are investigating a corporate environment.

The system will store all the content used with the Outlook client in the PST/OST file. Be aware that the user can change the default location, as well as the naming convention. You do not need a login to access the PST/OST file.

If you need to carve out a PST/OST file from the unallocated space of the storage device, you may have to deal with fragmentation because of the potential size of the PST/OST file.

Microsoft has replaced Outlook Express with Windows Live. The next section will provide details about this client. 

Exploring Microsoft Windows Live Mail

Starting with Windows Vista and Windows 7, Windows Live became the default email client shipping with the Windows operating system. (Note that it has been discontinued and that Windows Mail is now included with Windows 10 instead.) The client stores email messages in the following path:

Users$USER$AppDataLocalMicrosoftWindows Live Mail

Users can use this client to access their web-based emails as well. Windows Live Mail will download the contents of those accounts and then create the folder structure within the user's path.

The client will store the emails as an.eml file under the Windows Live Mail folder, as shown here:

Windows Live Mail (96)│  ├──Calendars (21)
│  │  ├──DBStore (11)│  │  │  ├──LogFiles (4)│  │  │  └──Backup (3)│  │  │     └──new (3)│  │  └──[email protected] (10)│  │     └──DBStore (10)│  │        ├──Backup (3)│  │        │  └──new (3)│  │        └──LogFiles (4)│  ├──Outbox (0)│  ├──Sentinel (2)│  ├──[email protected] (1)│  ├──Hotmail (54)
│  │  ├──Inbox (30)│  │  ├──Drafts (0)│  │  ├──Junk email (2)│  │  ├──Sent items (15)│  │  ├──Deleted items (6)

As you can see, this user was using Hotmail with the Windows Live Mail application.  You can see the email address, [email protected], and see that 54 emails are being stored in the user's folders.

The emails are in the standard text format, .eml, which can be read by any forensic tool. Alternatively, you can use a text editor. The next client is also popular and free: Mozilla Thunderbird.

Mozilla Thunderbird

Thunderbird is a free, open source email client provided by Mozilla. Thunderbird will store emails within a .MBOX file. The MBOX format is a generic term for a family of file formats that is used to store emails. It will store all of the emails, based on folders, and a single database file. By default, the MBOX file can be found in the following path:

$USERNAME$AppDataRoamingThunderbirdProfiles

The following is the folder structure you will see when Thunderbird is installed:

u2xziaos.default-release (106)├──minidumps (0)├──crashes (1)│  └──events (0)├──extensions (1)├──calendar-data (4)├──storage (12)│  └──permanent (12)│     └──chrome (12)│        └──idb (11)│           └──3870112724rsegmnoittet-es.files (0)├──ImapMail (16)│  └──imap.mail.yahoo.com (15)└──Mail (4)   └──Local Folders (4)

The profile name is created by Thunderbird. The release version of the software the user has installed can also be seen here. As we analyze the folder structure, we will see that it contains information about crashes and stores data in a minidump when a crash occurs. There may also be calendar data and mailboxes.

Here, the user is using the IMAP protocol to access their Yahoo mail account, and there are 15 items stored within the folder.

When we look in the folder, we will see the following files:

  • Archive.msf
  • Archives.msf
  • Bulk Mail.msf
  • Draft.msf
  • Drafts.msf
  • INBOX
  • INBOX.msf
  • msgFilterRules.dat
  • Sent-1.msf
  • Sent.msf
  • Templates.msf
  • Trash.msf

The MSF files are Mail Summary files, which is one part of the email. The email client, Thunderbird, stores the email data in two different parts. The first part is the MBOX file, which does not have a file extension. The MSF files are the index files for Thunderbird and contain email headers and a summary. Thunderbird uses these files as an index to locate the email stored in the MBOX. 

In the following screenshot, you can see three emails are being stored in the MBOX. When X-Ways parses out the inbox, the emails will have a .eml file extension:

Figure 8.4: Thunderbird inbox

Figure 8.4 – Thunderbird inbox

The MBOX format is used by many email clients, including Apple Mail, Opera Mail, and Thunderbird. Most commercial and open source forensic suites will process the MBOX and provide access to emails.  

While the user can access their email from a client, there is another popular option that allows the user to access their email without using a client: webmail.

Understanding WebMail analysis 

Web-based email has become increasingly popular as we transition from the twentieth to the twenty-first century. It provides ease of access, requires little to no configuration from the user, and is available from any computer. In the simplest terms, WebMail is just another internet artifact for conducting browser analysis (we will cover internet artifacts in Chapter 9, Internet Artifacts).

The service provider maintains the user's email and may provide additional services, such as address books and calendars. Users have the option of using a client to access web-based email, but I have found that those users are in the minority. When content is being hosted by the service provider, that provides additional obstacles to the digital forensic investigator. The only artifacts relating to the content may be in the user's internet history, and that may be fragmented. If a digital forensic investigator wants to access the content of a user's web-based email, they will have to serve a search warrant (in the United States; your jurisdiction may have different requirements) on the service provider. You may be unable to access or recover any deleted emails from the account. It will depend on the specific set of circumstances for each service provider.

If the digital forensic investigator wants to investigate the user's use of web-based email, then they will have to analyze the temporary internet files or the internet cache on the user's system. The temporary internet files/cache contains images, text, or any component of the web page the user has viewed in their browser. 

Their browser saves this information in the temporary internet files/cache location to enhance the user experience. It does this by having a faster response time when presenting pages to the user. Instead of continually redownloading the content, you can reach back into the cache and present that information to the user.

Gmail is very popular and when its web application was first deployed, it changed the way WebMail was presented to users. No longer were static web pages displaying the content of the email and the user's email folders. Gmail dynamically created content on the fly for each user. No longer were image files and text being saved to the user's local storage device; instead, Gmail used Asynchronous JavaScript (AJAX) and XML files. This new method did not allow for a web page to be rebuilt by investigators.

You can still recover artifacts within the internet cache and other potential sources such as RAM or the pagefile on the user's local storage device. You will need to conduct keyword searches for email addresses or keyword searches for terms related to your investigation.

Before I look into the cache, I want to look into the internet history of the installed browser to see if the user has accessed web-based email. For the Chrome browser, you will find the history stored in an SQLite database named History at the following path:

$USER$AppDataLocalGoogleChromeUser DataDefault

The analysis of the History database shows the user accessed the Gmail web-based service, as shown in the following screenshot:

Figure 8.5: Email – History

Figure 8.5 – Email – History

We have a date/time stamp, along with the email address. The artifact also shows that the user had two unread emails in the inbox when they accessed the service.

I found this from the internet cache for the Google Chrome browser, which can be found at the following location:

$USER$AppDataLocalGoogleChromeUser DataDefaultHistory Provider Cache

As you can see from the following screenshot showing the Chrome cache, the content is not easily decipherable and does not give us a lot to follow up with:

Figure 8.6 – Chrome cache displayed

If we keep searching for the email address we found in the cache, [email protected], we may find other artifacts, such as the following:

{"endpoint_info_list":[{"endpoint":"smtp:[email protected]","c_id":"d24c.2d00","c_name":"Joe Badguy Smith"},{"endpoint":"smtp:[email protected]","c_id":"e80f.5b71","c_name":"John Badguy Smith"},{"endpoint":"smtp:[email protected]","c_id":"624f.10f0","c_name":"Yahoo! Inc."}]}

This artifact, which can also be found within the cache, gives us another email address, [email protected], to follow up. The content of the email still remains out of reach.

Let's look at the Firefox cache and see whether it can give us a better look at the cache and history.

The cache and history for the Firefox browser can be found at the following location:

$USERS$AppDataLocalMozillaFirefoxProfiles<profile>cache2

Firefox will store the internet history and cache underneath the user's profile. The folder structure you will see may look like this:

Mozilla (1,505)└──Firefox (1,505)   └──Profiles (1,504)      ├──55abhq00.default-release (1,504)      │  ├──safebrowsing (50)      │  │  └──google4 (10)      │  ├──jumpListCache (5)      │  ├──startupCache (236)      │  ├──cache2 (1,162)      │  │  ├──entries (1,160)      │  │  └──doomed (0)      │  ├──thumbnails (0)      │  ├──OfflineCache (1)      │  └──safebrowsing-updating (49)      │     └──google4 (9)      └──cqr6ioib.default (0)

It looks like the visual depiction of the content of the Firefox cache is not much better:

"matches": [    {      "lookupId": "[email protected]",      "personId": [        "114987255021342983529"      ]    }  ],  "people": {    "114987255021342983529": {      "personId": "114987255021342983529",      "metadata": {        "lastUpdateTimeMicros": "1567030765000",        "identityInfo": {          "originalLookupToken": [            "[email protected]"

It does not provide a wealth of information, but it does supply breadcrumbs for us to follow up and conduct additional investigative efforts with.

In the world of forensics, the artifacts you rely upon can quickly change with new updates to the software or changes in the operating system. Be flexible with your investigative techniques so that you can jump into the latest technology to make your investigation successful. Once you have identified that the subject of your investigation is using web-based email, your best course of action is to serve the service provider with the appropriate judicial paperwork to freeze the account and get the required content.  

Summary

In this chapter, we have gone over standard email protocols; the system uses SMTP for sending emails, while POP and IMAP are used for receiving emails. IMAP also includes features that can be used to manage the user's inbox. We went over the email header and the components that make up the header. WebMail and email clients were also discussed.

You now have the skills necessary to read an email header and determine the servers that were used to transmit the email, as well as what protocols the system used to send and receive the email. When conducting a digital forensic examination, you can now identify artifacts from typical email clients and web-based email.

In the next chapter, you will learn that some web-based emails have similarities among them.

Questions

  1. Which of the following is not an email protocol?

    a. HTML

    b. POP

    c. SMTP

    d. IMAP

  2. Which of the following will allow the user to manage their inbox?

    a. COC

    b. POP

    c. FreeBSD

    d. IMAP

  3. The email header is created by user input information.

    a. True

    b. False

  4. Thunderbird stores emails in which file?

    a. Inbox

    b. Outbox

    c. MBOX

    d. Letterbox

  5. Which email client uses a PST file?

    a. Thunderbird

    b. Gmail

    c. Yahoo Mail

    d. Outlook

  6. Windows Live Mail was replaced with which client?

    a. Outlook Express

    b. Outlook

    c. Windows Mail

    d. Windows Email

  7. You will always find the content of web-based email in the user's cache.

    a. True

    b. False

You will find the answers at the back of this book, under Assessments.

Further reading

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.186.164