14.2 Internet Email

Internet email is perhaps the oldest of today’s widely used internet applications. Network email sporting an “@” sign originated in the early 1970s. Even though raw file transfers accounted for most ARPANET traffic during its history, email was the most familiar and useful ARPANET service to most users.

Although both the ARPANET and today’s internet are packet-oriented, email is message-oriented. The email system accepts and delivers each message as a single unit. If a message must traverse several email servers, the receiving server collects the entire message before sending it on to the next one.

Internet email standards fall into two categories:

  1. Formatting standards: These describe the layout of an email message: the headers we use and how we format attachments.

  2. Protocol standards: These describe how email clients and servers interact to either deliver mail or pick up mail. There is a single standard for email delivery, but multiple standards for retrieving email.

The rest of this section will discuss formatting standards. The following section discusses protocol standards.

Message Formatting Standards

Although there are dozens of standards that describe details of internet email, most build upon other standards. Underlying all email formatting standards is one called “RFC 822,” which is the 822nd Request for Comments published by the ARPANET/internet community. Internet protocol standards begin as “Internet Drafts” and become “RFCs” when they are ready for serious consideration by the internet technical community at large. The RFC becomes a standard after it is accepted by the internet community.

RFC 822 was published in 1982, which illustrates how old internet email systems are. Subsequent changes to email format have been extensions to RFC 822. A major extension is the Multipurpose Internet Message Extensions (MIME), which describe how to include email attachments in different formats.

FIGURE 14.1 contains an email message sent from “rsmith6” to “kev1111.” An email message consists of two parts: the headers and the body. The headers start at the beginning, of course, and continue until the first blank line appears. The message body follows: a single line of text in this case.

A screenshot of an email message is shown.

FIGURE 14.1 Elements of email format.

Message Headers

As with individual packets, the real data is at the end with technically elaborate headers attached to the beginning. When we first compose a message, we identify the recipient and subject, and we type in the body. Our email client constructs four standard headers:

  1. From:

  2. To: (also Cc: and Bcc:)

  3. Date:

  4. Subject:

Modern email clients often insert additional headers for technical reasons. Many include headers describing the format and character set used, so the recipient’s email client can distinguish between “richly formatted” email and plain text in different languages. Many also include a “Message ID” so that every email message contains a unique identifier.

When we write an email, the client software constructs those initial headers. Our client software inserts the date and formats the email addresses of the recipients. It also adds the other formatting and identification headers.

The From: Header

The client software is responsible for constructing the “From” field to identify the email’s author. Many people own several email addresses. For example, Kevin has five addresses. One address is for his school account. Another address is for his employer. A third belongs to a free email service he uses from a website.

The other two belong to the “einsec.com” domain name he purchased; both addresses go to a single mailbox provided by the ISP who handles his domain for him. When Kevin sends an email, his email client lets him choose which address should appear in the “From” field.

Many email clients have no way to validate the “From” field; thus, Kevin could choose any address he wants to appear in that field. For example, he could choose the address of the school’s president or of the President of the United States (see Section 14.2.3). However, if anyone sent a reply to his forged From address, it would not go to Kevin’s email.

The To: Header

We must provide a list of one or more email addresses to receive the message. These addresses may appear in any of three headers: the “To” field, the “Cc” field, or the “Bcc” field. The difference between the To and Cc field in an email is probably a matter of etiquette: Everyone in both lists receives the email. The Bcc field identifies other recipients but doesn’t disclose those recipients to others who receive the same email. The client reformats the message to omit the Bcc names before submitting it to an email server for delivery.

Additional Headers

Originally, email clients displayed the entire email, headers and all, because the headers provided essential email details. As email has evolved, new headers have appeared. Many have to do with email formatting and others have to do with email delivery (see Section 14.2.2). The email client produces the formatting headers, and email servers add the delivery headers.

Modern email clients don’t usually display all email headers. Most clients extract the major headers listed above and display their information at the top of a message. The client hides the remaining headers, although most clients provide a way to examine them.

MIME Formatting

MIME formatting allows us to include non-ASCII text and files in email messages. Email messages originally were limited to 7-bit ASCII characters. MIME formatting transforms 8-bit data files into a 7-bit format that travels safely through email software. MIME supports graphically rich emails that include boldface text, italics, bulleted lists, international character sets, and graphics. FIGURE 14.2 shows a MIME email from Bob’s boss.

A screenshot of the Email in a rich text format is shown. The subject in the Email header is shown bold text. The addressing in the Email text is also in bold text and included bulleted list.

FIGURE 14.2 Email in a rich text format.

Courtesy of Dr. Richard Smith

We can’t embed such text directly into an ASCII email. Moreover, we can’t transmit 8-bit data files through plain ASCII email. If an email software component only supports 7-bit ASCII characters, it might change the 8th bit. Most application files, like spreadsheets or word-processing documents, contain 8-bit data. MIME encoding allows us to send 8-bit application files and graphics by converting the data to a 7-bit ASCII format.

The email text shown in Figure 14.2 uses a large typeface, a non-ASCII accent character, boldface, and a bulleted list. FIGURE 14.3 shows the raw 7-bit ASCII text for this message. We have omitted Received headers and others that aren’t relevant here.

A screenshot of the Email text with MIME formatting is shown.

FIGURE 14.3 Email text with MIME formatting.

The Content-Type header signals the presence of MIME parts. The header first describes the type of MIME formatting being used. In this case, the message is a multipart message, indicating that it has two or more separate sections, each with its own MIME formatting.

The header also defines a special string, the boundary marker, to mark the boundary between different sections of the message. In this case, the text string starting with “Apple-Mail” serves as the boundary marker. It divides the message body into two separate sections: the plain text section and the formatted text section.

The message’s plain text section begins immediately after the first boundary marker. The section begins with headers that describe the encoding and character set used in that section. Following the headers, the section contains a “plain text” version of the message. The email client constructs this as a convenience in case the recipient’s email client can’t display the formatted message in the second section. However, even this “plain text” is more than ASCII; it uses the ISO 8859-1 character set, which is essentially ASCII extended with common accented characters.

The second boundary marker indicates the start of the second section. Figure 14.3 shows only the beginning lines of that section. The format is “text/html,” and HTML (Hypertext Markup Language) refers to the formatting used on web pages. We describe this further in Section 15.1.

In some systems, the email client uses a web browser—or parts of it—to display html-style email. This allows such emails to incorporate any formatting that a web page might use. These include variations in type size, weight, and style, as well as special formats like bulleted lists, as we see in the previous figure. However, it may also retrieve and display images from websites and, in some cases, execute scripts or other programs. We discuss those issues further in Section 14.3.

14.2.1 Email Protocol Standards

There are two types of email protocols:

  1. Mailbox protocols: Describe how email client software on a user’s computer retrieves email from a personal mailbox stored on a server

  2. Delivery protocols: Describe how email client software takes a message and gives it to an email server for delivery, then how servers exchange messages among themselves

All of these protocols use TCP as the transport protocol. Each has specific port numbers assigned to servers. Most of these protocols initially evolved in the 1980s or early 1990s, and they trust the internet backbone. Modern sites often provide these protocols with SSL protection. Users may protect the protocol with SSL crypto by connecting to the appropriate port number.

Mailbox Protocols

Many email users retrieve their email by visiting their email service using a web browser. Others rely on email client software residing on their computer. Client software uses a mailbox protocol to retrieve the email. The protocol logs in to the server containing the user’s mailbox. It then examines the mailbox and tells the user which messages are available. Some protocols automatically copy all messages to the user’s client. Others leave the messages on the server until the user deletes them.

There are several well-known mailbox protocols, including the Post Office Protocol (POP3) and the Internet Message Access Protocol (IMAP). Other users rely on Microsoft’s proprietary email protocol, the “Message API” or MAPI, to access its Exchange Server product. POP3 and IMAP are supported by most email client software, including Mozilla’s Thunderbird, Qualcomm’s Eudora, Apple Mail on MacOS, and Microsoft’s email products.

POP3: An Example

As in all internet protocols, we find the official description for POP3 in an RFC; this one is RFC 1939, “Post Office Protocol—Version 3.” The POP3 protocol includes a strategy to incorporate new features, and these are documented in a few later RFCs. However, RFC 1939 describes the fundamentals of the POP3 operation.

POP3 is a text-oriented protocol that uses a single connection. The client opens the connection, and the server replies with “+OK.” The client sends commands and the server responds to each one. Some commands yield a single line response, like those for authenticating. Others yield multiple lines, like those that list available messages or retrieve messages. The server uses a special text marker to indicate the end of the message.

Here’s a transcript of a simple POP3 exchange in which Bob’s email client finds and retrieves a single email message. We have omitted the email text.

images

The client’s first real action is to provide the user ID and password in Steps 2 and 4. This immediately indicates a vulnerability: If attackers sniff this connection, they can retrieve Bob’s user ID and password. This is why most ISPs today allow users to protect POP3 connections with SSL.

The “STAT” command in Step 6 asks for the amount of mail in the mailbox. The “+OK” response in Step 7 includes the number of messages (1) and the size of the mailbox in bytes (1227). The next two commands, “UIDL” and “LIST,” retrieve information about the message to construct the list of available messages.

The “RETR” message in Step 14 includes a message number as an argument and retrieves the single message (Steps 15 and 16). Once Bob’s client has retrieved messages, it sends a QUIT command; then the client and server close the connection.

Email Delivery

When we create an email message for delivery, we don’t place it directly in the recipient’s mailbox. Instead, we contact an email server, transmit the message to the server, and let the server deliver it. The email delivery protocol is called the Simple Mail Transfer Protocol (SMTP), and it has been the workhorse of internet email since 1982. These SMTP servers, also called message transfer agents (MTAs), form the backbone of internet email delivery.

The SMTP protocol is extremely simple. We open a connection to the SMTP server and send a series of simple text commands to identify the recipients of the email, then we transmit the email message and close the connection. The server takes responsibility for delivering the message to its recipients.

In simple cases, we connect directly to an MTA, and that MTA places the email in the recipient’s mailbox. This often happens when sending email between users in the same site. When email must travel across the internet to a different site, it often passes through a series of MTAs. Section 14.2.2 provides a detailed example of this.

When two MTAs exchange email, they operate in a peer-to-peer relationship. Either MTA can initiate the connection, and either can respond. SMTP is a client/server protocol; the MTAs use it as peers because either host can play the role of client or server.

Port Number Summary

TABLE 14.1 lists TCP port numbers used by internet email protocols. Email client protocols like POP3 and IMAP may choose between a plaintext or SSL-protected connection by selecting the appropriate port number. When using SMTP, some sites associate port 465 with SSL protection, though this port is also used for a different protocol.

TABLE 14.1 Port Numbers for Email Protocols

Mail Protocol Port Number for Traditional Plaintext Port Number for SSL Protection
SMTP 25, 587 25, 465, 587
POP3 110 995
IMAP 143 993

SMTP does not in fact require a separate port number to support SSL. Recent SMTP servers incorporate a “STARTTLS” command that transforms a plaintext SMTP connection into an SSL-protected one.

14.2.2 Tracking an Email

Although SMTP may be simple by itself, modern email delivery is an elaborate process. Messages rarely flow directly from a client to a server and then directly into the recipient’s mailbox. Most emails traverse several MTAs on its trip to its destination. We can track this process by looking at an email’s header. For example, let’s trace the route of the email in Figure 14.1. We reproduce the email in FIGURE 14.4 and underline all headers. The figure also numbers the Received headers in their order of creation.

A screenshot of email depicts the email’s journey.

FIGURE 14.4 Tracking an email’s journey.

As the email travels through the network, each MTA adds a “Received” header to the front of the message. When the MTA creates the header, it includes one or more of the following sections:

  • ■   “By” provides the MTA’s identity—a domain name and/or IP address, and optionally, the MTA software package being used

  • ■   “With” identifies the mail protocol being used, and also may identify the MTA software

  • ■   “From” provides the sender’s identity—a domain name, and/or IP address, and possibly a user name if authentication was used when submitting the email for delivery

  • ■   “ID” provides a local identifier used by the MTA while processing this email message

  • ■   Ends with a timestamp

The series of Received headers tells the story of an email’s travels in reverse. FIGURE 14.5 illustrates the trail of our example email as it traversed the four different MTAs.

An illustration depicts following the email journey.

FIGURE 14.5 Following the email in Figure 14.4.

#1: From UC123 to USM01

The trip begins with the last Received header, which appears right before the From field. The Received header is three lines long in this example. When an email header fills more than one line, additional lines begin with at least one blank. Here is what the header says:

  • ■   Received from client UC123, using Microsoft’s MAPI

  • ■   Arrived at server USM01, running Microsoft Exchange

#2: From USM01 to USM02

Moving upward, the next Received header shows that the Exchange server USM01 uses SMTP to forward the email to USM02, an external email server:

  • ■   Received from USM01

  • ■   Received by USM02, running MailMarshal

The USM02 server runs a commercial package called MailMarshal that scans both incoming and outgoing email for spam, viruses, or other problems. Now the email leaves stthomas.edu in search of the email service for “einsec.com,” Kevin’s personal domain.

We use DNS to look up the email host that serves a particular domain. Although most DNS records we examined in Section 12.3 indicate a general-purpose server at that address, DNS provides additional records to look up email servers. These are called mail exchange (MX) records. When USM02 looks up Kevin’s domain, the answer directs it to server MMS01 at secureserver.net.

#3: From USM02 to MMS01

Now we move up to the next-to-first Received line. Here the MMS01 server at secureserver.net accepts the incoming email using “ESMTP.” This is a generic identifier for “extended” SMTP service. Servers that support ESMTP recognize additional commands to improve email handling security and efficiency. To summarize, the field says:

  • ■   Received from USM02

  • ■   Received by MMS01 using ESMTP

MMS01 doesn’t immediately deliver the message to its next destination. Before forwarding the email, the server applies a commercial antispam package, called “IronPort.” The package adds its own header to the email; headers that start with an “X” are ignored by standard email software but may be used by specialized, experimental, or proprietary software.

#4: From MMS01 to MMS02

Server MMS02 produces the topmost Received line. It is the final SMTP server to process the email. MMS02 runs “qmail” software. Qmail is an open-source SMTP server that can both forward email and deliver it to local mailboxes. Qmail delivers the email to Kevin’s “[email protected]” mailbox. When Kevin’s client retrieves the email, it adds no headers of its own.

Each email may follow its own path and encounter different MTA software. We can track any email using DNS lookups to translate domain names. (See Section 12.3.2.) We use whois lookups to track down domain owners or IP addresses. (See Section 12.3.4.) We use web searches to identify MTA software packages.

14.2.3 Forging an Email Message

FIGURE 14.6 shows an email message received by a Microsoft Outlook email client. Examine the From header. Is it real?

A screenshot of a browser window displaying the Email message is shown.

FIGURE 14.6 Is this really an email from the president of the United States?

Used with permission from Microsoft

Although this may be an actual email written by the president of the United States, it may also be a forgery. We can’t tell simply by looking at the From header. We might find evidence in the Received headers. If the first Received header isn’t from the White House, the Executive Office of the President, or a related government entity, the message is a forgery.

The email in Figure 14.6 was displayed by Microsoft Outlook. As in many email clients’ packages, Outlook filters out the messy and complicated email headers and displays only the information from the most familiar ones. Because it hides the Received headers, we can’t immediately judge the message’s authenticity. Fortunately, the Outlook desktop client provides an option to retrieve the detailed headers.

FIGURE 14.7 contains the headers from the “presidential” email as retrieved from the Outlook client. Note that there are two additional headings in the original message: a Message ID and a Return Path, both specifying the White House as the email’s source.

A screenshot of the email depicts Headers from the “Presidential email.”

FIGURE 14.7 Headers from the “presidential” email.

The forgery becomes obvious when we look at the earliest Received header. The IP address belongs to an ISP in Minneapolis, Minnesota. That’s half a continent away from the White House in Washington, DC. Moreover, the header says that the email originated from the email account of “[email protected].” That’s a surprising identity for the president to use.

We conclude that Kevin forged this email. To produce it, Kevin configured his email client to provide the president’s email information as one of his “From” addresses. He could then write an email and attribute it to the president simply by selecting that From address.

Even though we may identify many forgeries by examining the oldest Received header, we won’t detect all forgeries. A more sophisticated attacker could create an even more plausible forgery. The attacker could construct the entire email, including one or more fake Received headers. The forger could add a Received header that includes a White House source address.

We might still detect such a forgery by carefully analyzing each Received header. If one doesn’t clearly refer to the previous one, then we’ve detected a forgery. However, different MTAs provide different amounts of detail in their Received headers. The forger might produce a believable fake by sending it via an MTA that doesn’t clearly identify the source of incoming messages.

Authenticated Email

The only reliable way to authenticate email is to apply a cryptographic technique to the message itself. The most common approach is for the author to digitally sign the email. FIGURE 14.8 shows an excerpt from an email that was digitally signed using PGP software. We introduced PGP and other email security protocols in Section 13.3.

A screenshot of an excerpt from a signed email message is shown.

FIGURE 14.8 Excerpt from a signed email message.

The signed portion of the email is carefully marked with the “BEGIN PGP SIGNED MESSAGE” and ends with the signature itself in a special ASCII encoding. The process for signing email is the same one described earlier in Section 8.5. Note that any reformatting of the email text will invalidate the signature. Email servers must be sure to preserve the exact contents of email they transfer, or signature checks will fail even on legitimate messages.

PGP is a relatively old program; the simple format shown in Figure 14.8 is peculiar to PGP. There is an internet standard for secure email based on a MIME extension, called Secure MIME, or S/MIME, and there is also a MIME extension called “PGP MIME.” Most modern email clients support these extensions either directly or through plug-in software. It can, however, be challenging to ensure that all—or most—email recipients have the right software to verify a digital signature.

Secure email poses another challenge because all recipients must have copies of the author’s public-key certificate. Many authors include the certificate in the email along with the signature, or they provide a weblink to a site that contains the certificate. Then the recipients can validate the certificate and the signature.

Unfortunately, not all secure email users understand this shortcoming. The email in Figure 14.8 did not include a certificate, nor did it provide any way of locating the certificate. Thus, most recipients had no way to validate the email.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.159.10