The second edition of this book used a handful of
standard library modules (rfc822
,
StringIO
, and more) to parse the
contents of messages, and simple text processing to compose them.
Additionally, that edition included a section on extracting and
decoding attached parts of a message using modules such as mhlib
, mimetools
, and base64
.
Those tools are still available, but were, frankly, a bit clumsy
and error-prone. Parsing attachments from messages, for example, was
tricky, and composing even basic messages was tedious (in fact, an
early printing of the prior edition contained a potential bug, because
I forgot one
character in a
complex string formatting operation). Adding attachments to sent
messages wasn’t even attempted, due to the complexity of the
formatting involved.
Luckily, things are much simpler today. Since the second
edition, Python has sprouted a new email
package—a powerful collection of tools
that automate most of the work behind parsing and composing email
messages. This module gives us an object-based message interface and
handles all the textual message structure details, both analyzing and
creating it. Not only does this eliminate a whole class of potential
bugs, it also promotes more advanced mail processing.
Things like attachments, for instance, become accessible to mere
mortals (and authors with limited book real estate). In fact, the
entire section on manual attachment parsing and decoding has been
deleted in this edition—it’s essentially automatic with email
. The new package parses and constructs
headers and attachments; generates correct email text; decodes and
encodes base64, quoted-printable, and uuencoded data; and much
more.
We won’t cover the email
package in its entirety in this book; it is well documented in
Python’s library manual. Our goal here is to give some example usage
code, which you can study in conjunction with the manuals. But to help
get you started, let’s begin with a quick overview. In a nutshell, the
email
package is based around the
Message
object it provides:
A mail’s full text, fetched from poplib
or imaplib
, is parsed into a new Message
object, with an API for
accessing its components. In the object, mail headers become
dictionary-like keys, and components become a payload that can
be walked with a generator interface (more on payloads in a
moment).
New mails are composed by creating a Message
object, using an API to attach
headers and parts, and asking the object for its print
representation—a correctly formatted mail message text, ready to
be passed to the smtplib
module for delivery. Headers are added by key assignment and
attachments by method calls.
In other words, the Message
object is used both for accessing existing messages and for creating
new ones from scratch. In both cases, email
can automatically handle details like
encodings (e.g., attached binary images can be treated as text with
base64 encoding and decoding), content types, and more.
Since the email
module’s Message
object is at the
heart of its API, you need a cursory understanding of its form to
get started. In short, it is designed to reflect the structure of a
formatted email message. Each Message
consists of three main pieces of
information:
A content type (plain text, HTML text, JPEG image, and so on), encoded as a MIME main type and a subtype. For instance, “text/html” means the main type is text and the subtype is HTML (a web page); “image/jpeg” means a JPEG photo. A “multipart/mixed” type means there are nested parts within the message.
A dictionary-like mapping interface, with one key per mail header (“From”, “To”, and so on). This interface supports almost all of the usual dictionary operations, and headers may be fetched or set by normal key indexing.
A payload, which represents the mail’s content. This can
be either a string for simple messages, or a list of
additional Message
objects
for multipart container messages with attached or alternative
parts. For some oddball types, the payload may be a Python
None
object.
For example, mails with attached images may have a main
top-level Message
(type
multipart/mixed), with three more Message
objects in its payload—one for its
main text (type text/plain), followed by two of type image
for the photos (type image/jpeg).
The photo parts may be encoded for transmission as text with base64
or another scheme; the encoding type, as well as the original image
filename, are specified in the part’s headers.
Similarly, mails that include both simple text and an HTML
alternative will have two nested Message
s in their payload, of type
plain text
(text/plain) and HTML
text (text/html), along with a main root Message
of type multipart/alternative.
Your mail client decides which part to display, often based on your
preferences.
Simpler messages may have just a root Message
of type text/plain or text/html,
representing the entire message body. The payload for such mails is
a simple string. They may also have no explicitly given type at all,
which generally defaults to text/plain. Some single-part messages
are text/html, with no text/plain alternative—they require a web
browser or other HTML viewer (or a very keen-eyed user).
Other combinations are possible, including some types that are not commonly seen in practice, such as message/delivery status. Most messages have a main text part, though it is not required, and may be nested in a multipart or other construct.
In all cases, these message structures are automatically
generated when mail text is parsed, and are created by your method
calls when new messages are composed. For instance, when creating
messages, the message attach
method adds parts for multipart mails, and set_payload
sets the entire payload to a
string for simple mails.
Message
objects also have
assorted properties (e.g., the filename of an attachment), and they
provide a convenient walk
generator method, which returns the next Message
in the payload each time through
in a for
loop. Because the walker
yields the root Message
object
first (i.e., self
), this doesn’t
become a special case this; a nonmultipart message is effectively a
Message
with a single item in its
payload—itself.
Ultimately, the Message
object structure closely mirrors the way mails are formatted as
text. Special header lines in the mail’s text give its type (e.g.,
plain text or multipart), as well as the separator used between the
content of nested parts. Since the underlying textual details are
automated by the email
package—both when parsing and when composing—we won’t go into
further formatting details here.
If you are interested in seeing how this translates to real
emails, a great way to learn mail structure is by inspecting the
full raw text of messages displayed by the email clients we’ll meet
in this book. For more on the Message
object, and email
in general, consult the email
package’s entry in Python’s library
manual. We’re skipping details such as its available encoders and
MIME object classes here in the interest of space.
Beyond the email
package,
the Python library includes other tools for mail-related processing.
For instance, mimetypes
maps a
filename to and from a MIME type:
mimetypes.guess_type(filename)
Maps a filename to a MIME type. Name spam.txt maps to text/plan.
mimetypes.guess_extension(contype)
Maps a MIME type to a filename extension. Type text/html maps to .html.
We also used the mimetypes
module earlier in this chapter to guess FTP transfer modes from
filenames (see Example
14-10), as well as in Chapter
6, where we used it to guess a media player for a filename
(see the examples there, including playfile.py,
Example 6-16). For email,
these can come in handy when attaching files to a new message
(guess_type
) and saving parsed
attachments that do not provide a filename (guess_extension
). In fact, this module’s
source code is a fairly complete reference to MIME types. See the
library manual for more on these tools.
Although we can’t provide an exhaustive reference here, let’s
step through a simple interactive session to illustrate the
fundamentals of email processing. To compose
the full text of a message—to be delivered with smptlib
, for instance—make a Message
, assign headers to its keys, and
set its payload to the message body. Converting to a string yields
the mail text. This process is substantially simpler and less
error-prone than the text operations we used earlier in Example 14-19:
>>>from email.Message import Message
>>>m = Message( )
>>>m['from'] = 'Sue Jones <[email protected]>'
>>>m['to'] = '[email protected]'
>>>m.set_payload('The owls are not what they seem...')
>>>s = str(m)
>>>print s
From nobody Sun Jan 22 21:26:53 2006 from: Sue Jones <[email protected]> to: [email protected] The owls are not what they seem...
Parsing a messages text—like the kind you
obtain with poplib
—is similarly
simple, and essentially the inverse: we get back a Message
object from the text, with keys
for headers and a payload for the body:
>>>from email.Parser import Parser
>>>x = Parser( ).parsestr(s)
>>>x
<email.Message.Message instance at 0x00A7DA30> >>>x['From']
'Sue Jones <[email protected]>' >>>x.get_payload( )
'The owls are not what they seem...' >>>x.items( )
[('from', 'Sue Jones <[email protected]>'), ('to', '[email protected]')]
This isn’t much different from the older rfc822
module, but as we’ll see in a
moment, things get more interesting when there is more than one
part. For simple messages like this one, the message walk
generator treats it as a single-part
mail, of type plain text:
>>>for part in x.walk( ):
...print x.get_content_type( )
...print x.get_payload( )
... text/plain The owls are not what they seem...
Making a mail with attachments is a
little more work, but not much: we just make a root Message
and attach nested Message
objects created from the MIME type
object that corresponds to the type of data we’re attaching. The
root message is where we store the main headers of the mail, and we
attach parts here, instead of setting the entire payload (the
payload is a list now, not a string).
>>>from email.MIMEMultipart import MIMEMultipart
>>>from email.MIMEText import MIMEText
>>> >>>top = MIMEMultipart( )
>>>top['from'] = 'Art <[email protected]>'
>>>top['to'] = '[email protected]'
>>> >>>sub1 = MIMEText('nice red uniforms... ')
>>>sub2 = MIMEText(open('data.txt').read( ))
>>>sub2.add_header('Content-Disposition', 'attachment', filename='data.txt')
>>>top.attach(sub1)
>>>top.attach(sub2)
When we ask for the text, a correctly formatted full mail text
is returned, separators and all, ready to be sent with smptlib
—quite a trick, if you’ve ever
tried this by hand:
>>>text = top.as_string( )
# same as str( ) or print >>>print text
Content-Type: multipart/mixed; boundary="===============0257358049==" MIME-Version: 1.0 from: Art <[email protected]> to: [email protected] --===============0257358049== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit nice red uniforms... --===============0257358049== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="data.txt" line1 line2 line3 --===============0257358049==--
If we are sent this message and retrieve it via poplib
, parsing its full text yields a
Message
object just like the one
we built to send this. The message walk
generator allows us to step through
each part, fetching their types and payloads:
>>>from email.Parser import Parser
>>>msg = Parser( ).parsestr(text)
>>>msg['from']
'Art <[email protected]>' >>>for part in msg.walk( ):
...print part.get_content_type( )
...print part.get_payload( )
...
Although this captures the basic flavor of the interface, we
need to step up to a larger example to see more of the email
package’s power. The next section
takes us on the first of those steps.
3.21.46.78