Email: Semistructured Documents

The notion that messages carry semistructured data is central to this book. RFC934 (January 1985, Proposed Standard for Message Encapsulation) introduces the idea of a message body that is logically divided into regions separated by an “encapsulation boundary.” This idea was elaborated in a series of MIME RFCs, from RFC1341 (June 1992, MIME (Multipurpose Internet Mail Extensions) to RFC2045 (November 1996, MIME (Multipurpose Internet Mail Extensions) Part One: Format of Internet Message Bodies).

This series spells out the basic idea of MIME: a Content-Type: header can specify that a message body contains structured text, image data, other application-specific data, or a composite of these types.

The author of RFC1049 (March 1988, A Content-Type Header Field for Internet Messages) wrote, “A standardized Content-Type field allows mail reading systems to automatically identify the type of a structured message body and to process it for display accordingly.” This idea would become central not only to mailers and newsreaders, which use the Content-Type: header to identify and process rich content and attachments, but also to browsers. RFC2046 (November 1996, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types) extended and revised RFC1049.

RFC2048 (November 1996, Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures) describes rules and procedures for registering new MIME content types.

A series beginning with RFC1872 (December 1995, The MIME Multipart/Related Content-Type) and ending with RFC2387 (August 1998, same title) defines how email programs can format compound documents made of interrelated parts. It suggests the use of the cid: (Content-ID) URL scheme, supported in modern HTML-aware mailreaders, as a way to form intradocument links.

RFC1873 (December 1995, Message/External-Body Content-ID Access Type), a companion to RFC1872, defines the use of a Content-ID: header as a mechanism for intradocument references.

To illustrate how this can work, suppose I drag an image into an HTML mail message I’m writing with Netscape Composer. To the recipient, it appears that the image is embedded within the text, like this:

As you can see in this picture: 
 
+----------+ 
| picture  | 
+----------+ 
 
the graphic is shown inline.

If you inspect the body of such a message, you’ll see how the MIME multipart/related Content-Type, the cid: protocol, and the Content-ID: header interact:

Content-Type: multipart/related; 
 boundary="------------9F32153EFCC9C5CAFE0BDFE9" 
 
 
--------------9F32153EFCC9C5CAFE0BDFE9 
Content-Type: text/html; charset=us-ascii 
Content-Transfer-Encoding: 7bit 
 
<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> 
<html> 
As you can see in this picture: 
<p><img SRC="cid:[email protected]" ALT="" 
BORDER=0 height=62 width=150> 
<p>the graphic is shown inline. 
 
--------------9F32153EFCC9C5CAFE0BDFE9 
Content-Type: image/jpeg 
Content-ID: <[email protected]> 
Content-Transfer-Encoding: base64 
Content-Disposition: inline; filename="C:TEMP
smailN4.jpeg" 
 
/9j/4AAQSkZJRgABAgEASABIAAD/7QE0UGhvdG9zaG9wIDMuMAA4QklNA+0AAAAAABAASAAA 
AAEAAQBIAAAAAQABOEJJTQPzAAAAAAAIAAAAAAAAAAA4QklNJxAAAAAAAAoAAQAAAAAAAAAC 
OEJJTQP1AAAAAABIAC9mZgABAGxmZgAGAAAAAAABAC9mZgABAKGZmgAGAAAAAAABADIAAAAB 
AFoAAAAGAAAAAAABADUAAAABAC0AAAAGAAAAAAABOEJJTQP4AAAAAABwAAD///////////// 
////////////////A+gAAAAA/////////////////////////////wPoAAAAAP//////////

Another series, from RFC1523 (September 1993, The text/enriched MIME Content-Type) to RFC1896 (February 1996, same title) documents a predecessor to HTML email. It defines a simple, HTML-like tag language used to format ASCII text messages:

<bold>Now</bold> is the time for 
<italic>all</italic> good men

The mechanism supporting HTML email is described in RFC2110 (March 1997, MIME E-Mail Encapsulation of Aggregate Documents), which was superseded by RFC2557 (March 1999, MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)). Say the authors of RFC2557:

In order to transfer a complete HTML multimedia document in a single email message, it is necessary to: a) aggregate a text/html root resource and all of the subsidiary resources it references into a single composite message structure, and b) define a means by which URIs in the text/html root can reference subsidiary resources within that composite message structure.

HTML email messages need to be able to refer, by means of hyperlinks, to messages and to parts of messages. The mid: and cid: URL schemes defined in RFC2392 (August 1998, Content-ID and Message-ID Uniform Resource Locators) serve this purpose.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.111.9