email: Parsing and Composing Mails

The second edition of this book used a handful of standard library modules (rfc822, StringIO, and more) to parse the contents of messages, and simple text processing to compose them. Additionally, that edition included a section on extracting and decoding attached parts of a message using modules such as mhlib, mimetools, and base64.

Those tools are still available, but were, frankly, a bit clumsy and error-prone. Parsing attachments from messages, for example, was tricky, and composing even basic messages was tedious (in fact, an early printing of the prior edition contained a potential bug, because I forgot one character in a complex string formatting operation). Adding attachments to sent messages wasn’t even attempted, due to the complexity of the formatting involved.

Luckily, things are much simpler today. Since the second edition, Python has sprouted a new email package—a powerful collection of tools that automate most of the work behind parsing and composing email messages. This module gives us an object-based message interface and handles all the textual message structure details, both analyzing and creating it. Not only does this eliminate a whole class of potential bugs, it also promotes more advanced mail processing.

Things like attachments, for instance, become accessible to mere mortals (and authors with limited book real estate). In fact, the entire section on manual attachment parsing and decoding has been deleted in this edition—it’s essentially automatic with email. The new package parses and constructs headers and attachments; generates correct email text; decodes and encodes base64, quoted-printable, and uuencoded data; and much more.

We won’t cover the email package in its entirety in this book; it is well documented in Python’s library manual. Our goal here is to give some example usage code, which you can study in conjunction with the manuals. But to help get you started, let’s begin with a quick overview. In a nutshell, the email package is based around the Message object it provides:

Parsing mail

A mail’s full text, fetched from poplib or imaplib, is parsed into a new Message object, with an API for accessing its components. In the object, mail headers become dictionary-like keys, and components become a payload that can be walked with a generator interface (more on payloads in a moment).

Creating mail

New mails are composed by creating a Message object, using an API to attach headers and parts, and asking the object for its print representation—a correctly formatted mail message text, ready to be passed to the smtplib module for delivery. Headers are added by key assignment and attachments by method calls.

In other words, the Message object is used both for accessing existing messages and for creating new ones from scratch. In both cases, email can automatically handle details like encodings (e.g., attached binary images can be treated as text with base64 encoding and decoding), content types, and more.

Message Objects

Since the email module’s Message object is at the heart of its API, you need a cursory understanding of its form to get started. In short, it is designed to reflect the structure of a formatted email message. Each Message consists of three main pieces of information:

Type

A content type (plain text, HTML text, JPEG image, and so on), encoded as a MIME main type and a subtype. For instance, “text/html” means the main type is text and the subtype is HTML (a web page); “image/jpeg” means a JPEG photo. A “multipart/mixed” type means there are nested parts within the message.

Headers

A dictionary-like mapping interface, with one key per mail header (“From”, “To”, and so on). This interface supports almost all of the usual dictionary operations, and headers may be fetched or set by normal key indexing.

Content

A payload, which represents the mail’s content. This can be either a string for simple messages, or a list of additional Message objects for multipart container messages with attached or alternative parts. For some oddball types, the payload may be a Python None object.

For example, mails with attached images may have a main top-level Message (type multipart/mixed), with three more Message objects in its payload—one for its main text (type text/plain), followed by two of type image for the photos (type image/jpeg). The photo parts may be encoded for transmission as text with base64 or another scheme; the encoding type, as well as the original image filename, are specified in the part’s headers.

Similarly, mails that include both simple text and an HTML alternative will have two nested Messages in their payload, of type plain text (text/plain) and HTML text (text/html), along with a main root Message of type multipart/alternative. Your mail client decides which part to display, often based on your preferences.

Simpler messages may have just a root Message of type text/plain or text/html, representing the entire message body. The payload for such mails is a simple string. They may also have no explicitly given type at all, which generally defaults to text/plain. Some single-part messages are text/html, with no text/plain alternative—they require a web browser or other HTML viewer (or a very keen-eyed user).

Other combinations are possible, including some types that are not commonly seen in practice, such as message/delivery status. Most messages have a main text part, though it is not required, and may be nested in a multipart or other construct.

In all cases, these message structures are automatically generated when mail text is parsed, and are created by your method calls when new messages are composed. For instance, when creating messages, the message attach method adds parts for multipart mails, and set_payload sets the entire payload to a string for simple mails.

Message objects also have assorted properties (e.g., the filename of an attachment), and they provide a convenient walk generator method, which returns the next Message in the payload each time through in a for loop. Because the walker yields the root Message object first (i.e., self), this doesn’t become a special case this; a nonmultipart message is effectively a Message with a single item in its payload—itself.

Ultimately, the Message object structure closely mirrors the way mails are formatted as text. Special header lines in the mail’s text give its type (e.g., plain text or multipart), as well as the separator used between the content of nested parts. Since the underlying textual details are automated by the email package—both when parsing and when composing—we won’t go into further formatting details here.

If you are interested in seeing how this translates to real emails, a great way to learn mail structure is by inspecting the full raw text of messages displayed by the email clients we’ll meet in this book. For more on the Message object, and email in general, consult the email package’s entry in Python’s library manual. We’re skipping details such as its available encoders and MIME object classes here in the interest of space.

Beyond the email package, the Python library includes other tools for mail-related processing. For instance, mimetypes maps a filename to and from a MIME type:

mimetypes.guess_type(filename)

Maps a filename to a MIME type. Name spam.txt maps to text/plan.

mimetypes.guess_extension(contype)

Maps a MIME type to a filename extension. Type text/html maps to .html.

We also used the mimetypes module earlier in this chapter to guess FTP transfer modes from filenames (see Example 14-10), as well as in Chapter 6, where we used it to guess a media player for a filename (see the examples there, including playfile.py, Example 6-16). For email, these can come in handy when attaching files to a new message (guess_type) and saving parsed attachments that do not provide a filename (guess_extension). In fact, this module’s source code is a fairly complete reference to MIME types. See the library manual for more on these tools.

Basic email Interfaces in Action

Although we can’t provide an exhaustive reference here, let’s step through a simple interactive session to illustrate the fundamentals of email processing. To compose the full text of a message—to be delivered with smptlib, for instance—make a Message, assign headers to its keys, and set its payload to the message body. Converting to a string yields the mail text. This process is substantially simpler and less error-prone than the text operations we used earlier in Example 14-19:

>>>from email.Message import Message
>>> m = Message( )
>>> m['from'] = 'Sue Jones <[email protected]>'
>>> m['to']   = '[email protected]'
>>> m.set_payload('The owls are not what they seem...')
>>> s = str(m)
>>> print s
From nobody Sun Jan 22 21:26:53 2006
from: Sue Jones <[email protected]>
to: [email protected]

The owls are not what they seem...

Parsing a messages text—like the kind you obtain with poplib—is similarly simple, and essentially the inverse: we get back a Message object from the text, with keys for headers and a payload for the body:

>>>from email.Parser import Parser
>>> x = Parser( ).parsestr(s)
>>> x
<email.Message.Message instance at 0x00A7DA30>
>>> x['From']
'Sue Jones <[email protected]>'
>>> x.get_payload( )
'The owls are not what they seem...'
>>> x.items( )
[('from', 'Sue Jones <[email protected]>'), ('to', '[email protected]')]

This isn’t much different from the older rfc822 module, but as we’ll see in a moment, things get more interesting when there is more than one part. For simple messages like this one, the message walk generator treats it as a single-part mail, of type plain text:

>>>for part in x.walk( ):
...     print x.get_content_type( )
...     print x.get_payload( )
...
text/plain
The owls are not what they seem...

Making a mail with attachments is a little more work, but not much: we just make a root Message and attach nested Message objects created from the MIME type object that corresponds to the type of data we’re attaching. The root message is where we store the main headers of the mail, and we attach parts here, instead of setting the entire payload (the payload is a list now, not a string).

>>>from email.MIMEMultipart import MIMEMultipart
>>> from email.MIMEText import MIMEText
>>>
>>> top = MIMEMultipart( )
>>> top['from'] = 'Art <[email protected]>'
>>> top['to']   = '[email protected]'
>>>
>>> sub1 = MIMEText('nice red uniforms...
')
>>> sub2 = MIMEText(open('data.txt').read( ))
>>> sub2.add_header('Content-Disposition', 'attachment', filename='data.txt')
>>> top.attach(sub1)
>>> top.attach(sub2)

When we ask for the text, a correctly formatted full mail text is returned, separators and all, ready to be sent with smptlib—quite a trick, if you’ve ever tried this by hand:

>>>text = top.as_string( )    # same as str( ) or print
>>> print text
Content-Type: multipart/mixed; boundary="===============0257358049=="
MIME-Version: 1.0
from: Art <[email protected]>
to: [email protected]

--===============0257358049==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

nice red uniforms...

--===============0257358049==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="data.txt"

line1
line2
line3

--===============0257358049==--

If we are sent this message and retrieve it via poplib, parsing its full text yields a Message object just like the one we built to send this. The message walk generator allows us to step through each part, fetching their types and payloads:

>>>from email.Parser import Parser
>>> msg = Parser( ).parsestr(text)
>>> msg['from']
'Art <[email protected]>'

>>> for part in msg.walk( ):
...     print part.get_content_type( )
...     print part.get_payload( )
...     print
...
multipart/mixed
[<email.Message.Message instance at 0x00A82058>,      # line-break added
<email.Message.Message instance at 0x00A82260>]

text/plain
nice red uniforms...


text/plain
line1
line2
line3

Although this captures the basic flavor of the interface, we need to step up to a larger example to see more of the email package’s power. The next section takes us on the first of those steps.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.78