The mailtools Utility Package

The email package used by the pymail example of the prior section is a collection of powerful tools—in fact, perhaps too powerful to remember completely. At the minimum, some reusable boilerplate code for common use cases can help insulate you from some of its details. To simplify email interfacing for more complex mail clients, and to further demonstrate the use of standard library email tools, I developed the custom utility modules listed in this section—a package called mailtools.

mailtools is a Python modules package: a directory of code, with one module per tool class, and an initialization module run when the directory is first imported. This package’s modules are essentially just a wrapper layer above the standard library’s email package, as well as its poplib and smtplib modules. They make some assumptions about the way email is to be used, but they are reasonable and allow us to forget some of the underlying complexity of the standard library tools employed.

In a nutshell, the mailtools package provides three classes—to fetch, send, and parse email messages. These classes can be used as superclasses to mix in their methods to an application-specific class or to create standalone or embedded objects that export their methods. We’ll see these classes deployed both ways in this text.

One design note worth mentioning up front: none of the code in this package knows anything about the user interface it will be used in (console, GUI, web, or other), or does anything about things like threads; it is just a toolkit. As we’ll see, its clients are responsible for deciding how it will be deployed. By focusing on just email processing here, we simplify the code, as well as the programs that will use it.

As a simple example of this package’s tools in action, its selftest.py module serves as a self-test script. When run, it sends a message from you, to you, which includes the selftest.py file as an attachment. It also fetches and displays some mail headers and contents. These interfaces, along with some user-interface magic, will lead us to full-blown email clients and web sites in later chapters.

The next few sections list mailtools source code. We won’t cover all of this package’s code in depth—study its listings for more details, and see its self-test module for a usage example. Also, flip ahead to the three clients that will use it for examples: the modified pymail2.py following this listing, the PyMailGUI client in Chapter 15, and the PyMailCGI server in Chapter 17. By sharing and reusing this module, all three systems inherit its utility, as well as any future enhancements.

Initialization File

The module in Example 14-21 implements the initialization logic of the mailtools package; as usual, its code is run automatically the first time a script imports through the package’s directory. Notice how this file collects the contents of all the nested modules into the directory’s namespace with from * statements—because mailtools began life as a single .py file, this provides backward compatibility for existing clients. Since this is the root module, global comments appear here as well.

Example 14-21. PP3EInternetEmailmailtools\_ _init_ _.py

################################################################################
# interface to mail server transfers, used by pymail, PyMailGUI and PyMailCGI;
# does loads, sends, parsing, composing, and deleting, with attachment parts,
# encoding, etc.;  the parser, fetcher, and sender classes here are designed
# to be mixed-in to subclasses which use their methods, or used as embedded or
# standalone objects;  also has convenience subclasses for silent mode, etc.;
# loads all if pop server doesn't do top;  doesn't handle threads or UI here,
# and allows askPassword to differ per subclass;  progress callback funcs get
# status;  all calls raise exceptions on error--client must handle in GUI/other;
# this changed from file to package: nested modules imported here for bw compat;
# TBD: in saveparts, should file be opened in text mode for text/ contypes?
# TBD: in walkNamedParts, should we skip oddballs like message/delivery-status?
################################################################################

# collect modules here, when package dir imported directly
from mailFetcher import *
from mailSender  import *
from mailParser  import *

# export nested modules here, when from mailtools import *
_ _all_ _ = 'mailFetcher', 'mailSender', 'mailParser'

# test case moved to selftest.py to allow mailconfig's
# path to be set before importing nested modules above

MailTool Class

Example 14-22 contains common superclasses for the other classes in the package. At present, these are used only to enable or disable trace message output (some clients, such as web-based programs, may not want text to be printed to the output stream). Subclasses mix in the silent variant to turn off output.

Example 14-22. PP3EInternetEmailmailtoolsmailTool.py

###############################################################################
# common superclasses: used to turn trace massages on/off
###############################################################################

class MailTool:                    # superclass for all mail tools
    def trace(self, message):      # redef me to disable or log to file
        print message

class SilentMailTool:              # to mixin instead of subclassing
    def trace(self, message):
        pass

MailSender Class

The class used to compose and send messages is coded in Example 14-23. This module provides a convenient interface that combines standard library tools we’ve already met in this chapter—the email package to compose messages with attachments and encodings, and the smtplib module to send the resulting email text. Attachments are passed in as a list of filenames—MIME types and any required encodings are determined automatically with the module mimetypes. Moreover, date and time strings are automated with an email.Utils call. Study this file’s code and comments for more on its operation.

Example 14-23. PP3EInternetEmailmailtoolsmailSender.py

###############################################################################
# send messages, add attachments (see _ _init_ _ for docs, test)
###############################################################################

import mailconfig                                      # client's mailconfig
import smtplib, os, mimetypes                          # mime: name to type
import email.Utils, email.Encoders                     # date string, base64
from mailTool import MailTool, SilentMailTool

from email.Message       import Message                # general message
from email.MIMEMultipart import MIMEMultipart          # type-specific messages
from email.MIMEAudio     import MIMEAudio
from email.MIMEImage     import MIMEImage
from email.MIMEText      import MIMEText
from email.MIMEBase      import MIMEBase

class MailSender(MailTool):
    """
    send mail: format message, interface with SMTP server
    works on any machine with Python+Inet, doesn't use cmdline mail
    a nonauthenticating client: see MailSenderAuth if login required
    """
    def _ _init_ _(self, smtpserver=None):
        self.smtpServerName  = smtpserver or mailconfig.smtpservername

    def sendMessage(self, From, To, Subj, extrahdrs, bodytext, attaches,
                                            saveMailSeparator=(('='*80)+'PY
')):
        """
        format,send mail: blocks caller, thread me in a GUI
        bodytext is main text part, attaches is list of filenames
        extrahdrs is list of (name, value) tuples to be added
        raises uncaught exception if send fails for any reason
        saves sent message text in a local file if successful

        assumes that To, Cc, Bcc hdr values are lists of 1 or more already
        stripped addresses (possibly in full name+<addr> format); client
        must split these on delimiters, parse, or use multiline input;
        note that SMTP allows full name+<addr> format in recipients
        """
        if not attaches:
            msg = Message( )
            msg.set_payload(bodytext)
        else:
            msg = MIMEMultipart( )
            self.addAttachments(msg, bodytext, attaches)

        recip = To
        msg['From']    = From
        msg['To']      = ', '.join(To)              # poss many: addr list
        msg['Subject'] = Subj                       # servers reject ';' sept
        msg['Date']    = email.Utils.formatdate( )      # curr datetime, rfc2822 utc
        for name, value in extrahdrs:               # Cc, Bcc, X-Mailer, etc.
            if value:
                if name.lower( ) not in ['cc', 'bcc']:
                    msg[name] = value
                else:
                    msg[name] = ', '.join(value)     # add commas between
                    recip += value                   # some servers reject ['']
        fullText = msg.as_string( )                  # generate formatted msg

        # sendmail call raises except if all Tos failed,
        # or returns failed Tos dict for any that failed

        self.trace('Sending to...'+ str(recip))
        self.trace(fullText[:256])
        server = smtplib.SMTP(self.smtpServerName)           # this may fail too
        self.getPassword( )                                       # if srvr requires
        self.authenticateServer(server)                      # login in subclass
        try:
            failed = server.sendmail(From, recip, fullText)  # except or dict
        finally:
            server.quit( )                                    # iff connect OK
        if failed:
            class SomeAddrsFailed(Exception): pass
            raise SomeAddrsFailed('Failed addrs:%s
' % failed)
        self.saveSentMessage(fullText, saveMailSeparator)
        self.trace('Send exit')

    def addAttachments(self, mainmsg, bodytext, attaches):
        # format a multipart message with attachments
        msg = MIMEText(bodytext)                 # add main text/plain part
        mainmsg.attach(msg)
        for filename in attaches:                # absolute or relative paths
            if not os.path.isfile(filename):     # skip dirs, etc.
                continue

            # guess content type from file extension, ignore encoding
            contype, encoding = mimetypes.guess_type(filename)
            if contype is None or encoding is not None:  # no guess, compressed?
                contype = 'application/octet-stream'     # use generic default
            self.trace('Adding ' + contype)

            # build sub-Message of appropriate kind
            maintype, subtype = contype.split('/', 1)
            if maintype == 'text':
                data = open(filename, 'r')
                msg  = MIMEText(data.read( ), _subtype=subtype)
                data.close( )
            elif maintype == 'image':
                data = open(filename, 'rb')
                msg  = MIMEImage(data.read( ), _subtype=subtype)
                data.close( )
            elif maintype == 'audio':
                data = open(filename, 'rb')
                msg  = MIMEAudio(data.read( ), _subtype=subtype)
                data.close( )
            else:
                data = open(filename, 'rb')
                msg  = MIMEBase(maintype, subtype)
                msg.set_payload(data.read( ))
                data.close( )                            # make generic type
                email.Encoders.encode_base64(msg)         # encode using base64

            # set filename and attach to container
            basename = os.path.basename(filename)
            msg.add_header('Content-Disposition',
                           'attachment', filename=basename)
            mainmsg.attach(msg)

        # text outside mime structure, seen by non-MIME mail readers
        mainmsg.preamble = 'A multi-part MIME format message.
'
        mainmsg.epilogue = ''  # make sure message ends with a newline

    def saveSentMessage(self, fullText, saveMailSeparator):
        # append sent message to local file if worked
        # client: pass separator used for your app, splits
        # caveat: user may change file at same time (unlikely)
        try:
            sentfile = open(mailconfig.sentmailfile, 'a')
            if fullText[-1] != '
': fullText += '
'
            sentfile.write(saveMailSeparator)
            sentfile.write(fullText)
            sentfile.close( )
        except:
            self.trace('Could not save sent message')    # not a show-stopper

    def authenticateServer(self, server):
        pass  # no login required for this server/class

    def getPassword(self):
        pass  # no login required for this server/class


################################################################################
# specialized subclasses
################################################################################

class MailSenderAuth(MailSender):
    """
    use for servers that require login authorization;
    client: choose MailSender or MailSenderAuth super
    class based on mailconfig.smtpuser setting (None?)
    """
    def _ _init_ _(self, smtpserver=None, smtpuser=None):
        MailSender._ _init_ _(self, smtpserver)
        self.smtpUser = smtpuser or mailconfig.smtpuser
        self.smtpPassword = None

    def authenticateServer(self, server):
        server.login(self.smtpUser, self.smtpPassword)

    def getPassword(self):
        """
        get SMTP auth password if not yet known;
        may be called by superclass auto, or client manual:
        not needed until send, but don't run in GUI thread;
        get from client-side file or subclass method
        """
        if not self.smtpPassword:
            try:
                localfile = open(mailconfig.smtppasswdfile)
                self.smtpPassword = localfile.readline( )[:-1]
                self.trace('local file password' + repr(self.smtpPassword))
            except:
                self.smtpPassword = self.askSmtpPassword( )

    def askSmtpPassword(self):
        assert False, 'Subclass must define method'

class MailSenderAuthConsole(MailSender):
    def askSmtpPassword(self):
        import getpass
        prompt = 'Password for %s on %s?' % (self.smtpUser, self.smtpServerName)
        return getpass.getpass(prompt)

class SilentMailSender(SilentMailTool, MailSender):
    pass   # replaces trace

MailFetcher Class

The class defined in Example 14-24 does the work of interfacing with a POP email server—loading, deleting, and synchronizing.

General usage

This module deals strictly in email text; parsing email after it has been fetched is delegated to a different module in the package. Moreover, this module doesn’t cache already loaded information; clients must add their own mail-retention tools if desired. Clients must also provide password input methods or pass one in, if they cannot use the console input subclass here (e.g., GUIs and web-based programs).

The loading and deleting tasks use the standard library poplib module in ways we saw earlier in this chapter, but notice that there are interfaces for fetching just message header text with the TOP action in POP. This can save substantial time if clients need to fetch only basic details for an email index.

This module also supports the notion of progress indicators—for methods that perform multiple downloads or deletions, callers may pass in a function that will be called as each mail is processed. This function will receive the current and total step numbers. It’s left up to the caller to render this in a GUI, console, or other user interface.

Inbox synchronization tools

Also notice that Example 14-24 devotes substantial code to detecting synchronization errors between an email list held by a client, and the current state of the inbox at the POP email server. Normally, POP assigns relative message numbers to email in the inbox, and only adds newly arrived emails to the end of the inbox. As a result, relative message numbers from an earlier fetch may usually be used to delete and fetch in the future.

However, although rare, it is not impossible for the server’s inbox to change in ways that invalidate previously fetched message numbers. For instance, emails may be deleted in another client, and the server itself may move mails from the inbox to an undeliverable state on download errors (this may vary per ISP). In both cases, email may be removed from the middle of the inbox, throwing some prior relative message numbers out of sync with the server.

This situation can result in fetching the wrong message in an email client—users receive a different message than the one they thought they had selected. Worse, this can make deletions inaccurate—if a mail client uses a relative message number in a delete request, the wrong mail may be deleted if the inbox has changed since the index was fetched.

To assist clients, Example 14-24 includes tools, which match message headers on deletions to ensure accuracy and perform general inbox synchronization tests on demand. These tools can be used only by clients that retain the fetched email list as state information. We’ll use these in the PyMailGUI client in Chapter 15. There, deletions use the safe interface, and loads run the synchronization test on demand; on errors, the inbox index is automatically reloaded. For now, see Example 14-24 source code and comments for more details.

Note that the synchronization tests try a variety of matching techniques, but require the complete headers text and, in the worst case, must parse headers and match many header fields. In many cases, the single previously fetched message-id header field would be sufficient for matching against messages in the server’s inbox. However, because this field is optional and can be forged to have any value, it might not always be a reliable way to identify messages. In other words, a same-valued message-id may not suffice to guarantee a match, although it can be used to identify a mismatch; in Example 14-24, the message-id is used to rule out a match if either message has one, and they differ in value. This test is performed before falling back on slower parsing and multiple header matches.

Example 14-24. PP3EInternetEmailmailtoolsmailFetcher.py

###############################################################################
# retrieve, delete, match mail from a POP server (see _ _init_ _ for docs, test)
###############################################################################

import poplib, mailconfig       # client's mailconfig: script dir or pythonpath
print 'user:', mailconfig.popusername

from mailParser import MailParser                  # for headers matching
from mailTool   import MailTool, SilentMailTool    # trace control supers

# index/server msgnum out of synch tests
class DeleteSynchError(Exception): pass            # msg out of synch in del
class TopNotSupported(Exception): pass             # can't run synch test
class MessageSynchError(Exception): pass           # index list out of sync

class MailFetcher(MailTool):
    """
    fetch mail: connect, fetch headers+mails, delete mails
    works on any machine with Python+Inet; subclass me to cache
    implemented with the POP protocol; IMAP requires new class
    """
    def _ _init_ _(self, popserver=None, popuser=None, poppswd=None, hastop=True):
        self.popServer   = popserver or mailconfig.popservername
        self.popUser     = popuser   or mailconfig.popusername
        self.srvrHasTop  = hastop
        self.popPassword = poppswd  # ask later if None

    def connect(self):
        self.trace('Connecting...')
        self.getPassword( )                          # file, GUI, or console
        server = poplib.POP3(self.popServer)
        server.user(self.popUser)                    # connect,login POP server
        server.pass_(self.popPassword)               # pass is a reserved word
        self.trace(server.getwelcome( ))             # print returned greeting
        return server

    def downloadMessage(self, msgnum):
        """
        load full raw text of one mail msg, given its
        POP relative msgnum; caller must parse content
        """
        self.trace('load '+str(msgnum))
        server = self.connect( )
        try:
            resp, msglines, respsz = server.retr(msgnum)
        finally:
            server.quit( )
        return '
'.join(msglines)                 # concat lines for parsing

    def downloadAllHeaders(self, progress=None, loadfrom=1):
        """
        get sizes, raw header text only, for all or new msgs
        begins loading headers from message number loadfrom
        use loadfrom to load newly arrived mails only
        use downloadMessage to get a full msg text later
        progress is a function called with (count, total);
        returns: [headers text], [mail sizes], loadedfull?
        """
        if not self.srvrHasTop:                    # not all servers support TOP
            return self.downloadAllMsgs(progress)  # naively load full msg text
        else:
            self.trace('loading headers')
            server = self.connect( )                # mbox now locked until quit
            try:
                resp, msginfos, respsz = server.list( )   # 'num size' lines list
                msgCount = len(msginfos)                   # alt to srvr.stat[0]
                msginfos = msginfos[loadfrom-1:]           # drop already loadeds
                allsizes = [int(x.split( )[1]) for x in msginfos]
                allhdrs  = []
                for msgnum in range(loadfrom, msgCount+1):          # poss empty
                    if progress: progress(msgnum, msgCount)         # callback?
                    resp, hdrlines, respsz = server.top(msgnum, 0)  # hdrs only
                    allhdrs.append('
'.join(hdrlines))
            finally:
                server.quit( )                          # make sure unlock mbox
            assert len(allhdrs) == len(allsizes)
            self.trace('load headers exit')
            return allhdrs, allsizes, False

    def downloadAllMessages(self, progress=None, loadfrom=1):
        """
        load full message text for all msgs from loadfrom..N,
        despite any caching that may be being done in the caller;
        much slower than downloadAllHeaders, if just need hdrs;
        """
        self.trace('loading full messages')
        server = self.connect( )
        try:
            (msgCount, msgBytes) = server.stat( )          # inbox on server
            allmsgs  = []
            allsizes = []
            for i in range(loadfrom, msgCount+1):          # empty if low >= high
                if progress: progress(i, msgCount)
                (resp, message, respsz) = server.retr(i)  # save text on list
                allmsgs.append('
'.join(message))        # leave mail on server
                allsizes.append(respsz)                   # diff from len(msg)
        finally:
            server.quit( )                                    # unlock the mail box
        assert len(allmsgs) == (msgCount - loadfrom) + 1  # msg nums start at 1
       #assert sum(allsizes) == msgBytes                  # not if loadfrom > 1
        return allmsgs, allsizes, True

    def deleteMessages(self, msgnums, progress=None):
        """         delete multiple msgs off server; assumes email inbox
        unchanged since msgnums were last determined/loaded;
        use if msg headers not available as state information;
        fast, but poss dangerous: see deleteMessagesSafely
        """
        self.trace('deleting mails')
        server = self.connect( )
        try:
            for (ix, msgnum) in enumerate(msgnums):   # don't reconnect for each
                if progress: progress(ix+1, len(msgnums))
                server.dele(msgnum)
        finally:                                      # changes msgnums: reload
            server.quit( )

    def deleteMessagesSafely(self, msgnums, synchHeaders, progress=None):
        """
        delete multiple msgs off server, but use TOP fetches to
        check for a match on each msg's header part before deleting;
        assumes the email server supports the TOP interface of POP,
        else raises TopNotSupported - client may call deleteMessages;

        use if the mail server might change the inbox since the email
        index was last fetched, thereby changing POP relative message
        numbers;  this can happen if email is deleted in a different
        client;  some ISPs may also move a mail from inbox to the
        undeliverable box in response to a failed download;

        synchHeaders must be a list of already loaded mail hdrs text,
        corresponding to selected msgnums (requires state);  raises
        exception if any out of synch with the email server;  inbox is
        locked until quit, so it should not change between TOP check
        and actual delete: synch check must occur here, not in caller;
        may be enough to call checkSynchError+deleteMessages, but check
        each msg here in case deletes and inserts in middle of inbox;
        """
        if not self.srvrHasTop:
            raise TopNotSupported('Safe delete cancelled')

        self.trace('deleting mails safely')
        errmsg  = 'Message %s out of synch with server.
'
        errmsg += 'Delete terminated at this message.
'
        errmsg += 'Mail client may require restart or reload.'

        server = self.connect( )                       # locks inbox till quit
        try:                                            # don't reconnect for each
            (msgCount, msgBytes) = server.stat( )      # inbox size on server
            for (ix, msgnum) in enumerate(msgnums):
                if progress: progress(ix+1, len(msgnums))
                if msgnum > msgCount:                            # msgs deleted
                    raise DeleteSynchError(errmsg % msgnum)
                resp, hdrlines, respsz = server.top(msgnum, 0)   # hdrs only
                msghdrs = '
'.join(hdrlines)
                if not self.headersMatch(msghdrs, synchHeaders[msgnum-1]):
                    raise DeleteSynchError(errmsg % msgnum)
                else:
                    server.dele(msgnum)                # safe to delete this msg
        finally:                                       # changes msgnums: reload
            server.quit( )                             # unlock inbox on way out

    def checkSynchError(self, synchHeaders):
        """
        check to see if already loaded hdrs text in synchHeaders
        list matches what is on the server, using the TOP command in
        POP to fetch headers text; use if inbox can change due to
        deletes in other client, or automatic action by email server;
        raises except if out of synch, or error while talking to server;

        for speed, only checks last in last: this catches inbox deletes,
        but assumes server won't insert before last (true for incoming
        mails); check inbox size first: smaller if just deletes;  else
        top will differ if deletes and newly arrived messages added at
        end;  result valid only when run: inbox may change after return;
        """
        self.trace('synch check')
        errormsg  = 'Message index out of synch with mail server.
'
        errormsg += 'Mail client may require restart or reload.'
        server = self.connect( )
        try:
            lastmsgnum = len(synchHeaders)                      # 1..N
            (msgCount, msgBytes) = server.stat( )                    # inbox size
            if lastmsgnum > msgCount:                           # fewer now?
                raise MessageSynchError(errormsg)               # none to cmp
            if self.srvrHasTop:
                resp, hdrlines, respsz = server.top(lastmsgnum, 0)  # hdrs only
                lastmsghdrs = '
'.join(hdrlines)
                if not self.headersMatch(lastmsghdrs, synchHeaders[-1]):
                    raise MessageSynchError(errormsg)
        finally:
            server.quit( )

    def headersMatch(self, hdrtext1, hdrtext2):
        """"
        may not be as simple as a string compare: some servers add
        a "Status:" header that changes over time; on one ISP, it
        begins as "Status: U" (unread), and changes to "Status: RO"
        (read, old) after fetched once - throws off synch tests if
        new when index fetched, but have been fetched once before
        delete or last-message check;  "Message-id:" line is unique
        per message in theory, but optional, and can be anything if
        forged; match more common: try first; parsing costly: try last
        """
        # try match by simple string compare
        if hdrtext1 == hdrtext2:
            self.trace('Same headers text')             return True

        # try match without status lines
        split1 = hdrtext1.splitlines( )       # s.split('
'), but no final ''
        split2 = hdrtext2.splitlines( )
        strip1 = [line for line in split1 if not line.startswith('Status:')]
        strip2 = [line for line in split2 if not line.startswith('Status:')]
        if strip1 == strip2:
            self.trace('Same without Status')
            return True

        # try mismatch by message-id headers if either has one
        msgid1 = [line for line in split1 if line[:11].lower( ) == 'message-id:']
        msgid2 = [line for line in split2 if line[:11].lower( ) == 'message-id:']
        if (msgid1 or msgid2) and (msgid1 != msgid2):
            self.trace('Different Message-Id')
            return False

        # try full hdr parse and common headers if msgid missing or trash
        tryheaders  = ('From', 'To', 'Subject', 'Date')
        tryheaders += ('Cc', 'Return-Path', 'Received')
        msg1 = MailParser( ).parseHeaders(hdrtext1)
        msg2 = MailParser( ).parseHeaders(hdrtext2)
        for hdr in tryheaders:                          # poss multiple Received
            if msg1.get_all(hdr) != msg2.get_all(hdr):  # case insens, dflt None
                self.trace('Diff common headers')
                return False

        # all common hdrs match and don't have a diff message-id
        self.trace('Same common headers')
        return True

    def getPassword(self):
        """
        get POP password if not yet known
        not required until go to server
        from client-side file or subclass method
        """
        if not self.popPassword:
            try:
                localfile = open(mailconfig.poppasswdfile)
                self.popPassword = localfile.readline( )[:-1]
                self.trace('local file password' + repr(self.popPassword))
            except:
                self.popPassword = self.askPopPassword( )

    def askPopPassword(self):
        assert False, 'Subclass must define method'


################################################################################
# specialized subclasses
################################################################################

class MailFetcherConsole(MailFetcher):
    def askPopPassword(self):
        import getpass
        prompt = 'Password for %s on %s?' % (self.popUser, self.popServer)
        return getpass.getpass(prompt)

class SilentMailFetcher(SilentMailTool, MailFetcher):
    pass   # replaces trace

MailParser Class

Example 14-25 implements the last major class in the mailtools package—given the text of an email message, its tools parse the mail’s content into a message object, with headers and decoded parts. This module is largely just a wrapper around the standard library’s email package, but it adds convenience tools—finding the main text part of a message, filename generation for message parts, saving attached parts to files, and so on. See the code for more information. Also notice the parts walker here: by coding its search logic in one place, we guarantee that all three clients implement the same traversal.

Example 14-25. PP3EInternetEmailmailtoolsmailParser.py

###############################################################################
# parsing and attachment extract, analyse, save (see _ _init_ _ for docs, test)
###############################################################################

import os, mimetypes                               # mime: type to name
import email.Parser
from email.Message import Message
from mailTool import MailTool

class MailParser(MailTool):
    """
    methods for parsing message text, attachments

    subtle thing: Message object payloads are either a simple
    string for non-multipart messages, or a list of Message
    objects if multipart (possibly nested); we don't need to
    distinguish between the two cases here, because the Message
    walk generator always returns self first, and so works fine
    on non-multipart messages too (a single object is walked);

    for simple messages, the message body is always considered
    here to be the sole part of the mail;  for multipart messages,
    the parts list includes the main message text, as well as all
    attachments;  this allows simple messages not of type text to
    be handled like attachments in a UI (e.g., saved, opened);
    Message payload may also be None for some oddball part types;
    """

    def walkNamedParts(self, message):
        """
        generator to avoid repeating part naming logic
        skips multipart headers, makes part filenames
        message is already parsed email.Message object
        doesn't skip oddball types: payload may be None
        """
        for (ix, part) in enumerate(message.walk( )):    # walk includes message
            maintype = part.get_content_maintype( )      # ix includes multiparts
            if maintype == 'multipart':
                continue                                  # multipart/*: container
            else:
                filename, contype = self.partName(part, ix)
                yield (filename, contype, part)

    def partName(self, part, ix):
        """
        extract filename and content type from message part;
        filename: tries Content-Disposition, then Content-Type
        name param, or generates one based on mimetype guess;
        """
        filename = part.get_filename( )                # filename in msg hdrs?
        contype  = part.get_content_type( )            # lower maintype/subtype
        if not filename:
            filename = part.get_param('name')          # try content-type name
        if not filename:
            if contype == 'text/plain':                # hardcode plain text ext
                ext = '.txt'                           # else guesses .ksh!
            else:
                ext = mimetypes.guess_extension(contype)
                if not ext: ext = '.bin'              # use a generic default
            filename = 'part-%03d%s' % (ix, ext)
        return (filename, contype)

    def saveParts(self, savedir, message):
        """
        store all parts of a message as files in a local directory;
        returns [('maintype/subtype', 'filename')] list for use by
        callers, but does not open any parts or attachments here;
        get_payload decodes base64, quoted-printable, uuencoded data;
        mail parser may give us a None payload for oddball types we
        probably should skip over: convert to str here to be safe;
        """
        if not os.path.exists(savedir):
            os.mkdir(savedir)
        partfiles = []
        for (filename, contype, part) in self.walkNamedParts(message):
            fullname = os.path.join(savedir, filename)
            fileobj  = open(fullname, 'wb')             # use binary mode
            content  = part.get_payload(decode=1)       # decode base64,qp,uu
            fileobj.write(str(content))                 # make sure is a str
            fileobj.close( )
            partfiles.append((contype, fullname))       # for caller to open
        return partfiles

    def saveOnePart(self, savedir, partname, message):
        """
        ditto, but find and save just one part by name
        """
        if not os.path.exists(savedir):
            os.mkdir(savedir)
        fullname = os.path.join(savedir, partname)
        (contype, content) = self.findOnePart(partname, message)
        open(fullname, 'wb').write(str(content))
        return (contype, fullname)

    def partsList(self, message):
        """"
        return a list of filenames for all parts of an
        already parsed message, using same filename logic
        as saveParts, but do not store the part files here
        """
        validParts = self.walkNamedParts(message)
        return [filename for (filename, contype, part) in validParts]

    def findOnePart(self, partname, message):
        """
        find and return part's content, given its name
        intended to be used in conjunction with partsList
        we could also mimetypes.guess_type(partname) here
        we could also avoid this search by saving in dict
        """
        for (filename, contype, part) in self.walkNamedParts(message):
            if filename == partname:
                content = part.get_payload(decode=1)          # base64,qp,uu
                return (contype, content)

    def findMainText(self, message):
        """
        for text-oriented clients, return the first text part;
        for the payload of a simple message, or all parts of
        a multipart message, looks for text/plain, then text/html,
        then text/*, before deducing that there is no text to
        display;  this is a heuristic, but covers most simple,
        multipart/alternative, and multipart/mixed messages;
        content-type defaults to text/plain if not in simple msg;

        handles message nesting at top level by walking instead
        of list scans;  if non-multipart but type is text/html,
        returns the HTML as the text with an HTML type: caller
        may open in web browser;  if nonmultipart and not text,
        no text to display: save/open in UI;  caveat: does not
        try to concatenate multiple inline text/plain parts
        """
        # try to find a plain text
        for part in message.walk( ):                        # walk visits message
            type = part.get_content_type( )                 # if nonmultipart
            if type == 'text/plain':
                return type, part.get_payload(decode=1)     # may be base64,qp,uu

        # try to find an HTML part
        for part in message.walk( ):
            type = part.get_content_type( )
            if type == 'text/html':
                return type, part.get_payload(decode=1)    # caller renders

        # try any other text type, including XML
        for part in message.walk( ):
            if part.get_content_maintype( ) == 'text':
                return part.get_content_type( ), part.get_payload(decode=1)

        # punt: could use first part, but it's not marked as text
        return 'text/plain', '[No text to display]'

    # returned when parses fail
    errorMessage = Message( )
    errorMessage.set_payload('[Unable to parse message - format error]')

    def parseHeaders(self, mailtext):
        """
        parse headers only, return root email.Message object
        stops after headers parsed, even if nothing else follows (top)
        email.Message object is a mapping for mail header fields
        payload of message object is None, not raw body text
        """
        try:
            return email.Parser.Parser( ).parsestr(mailtext, headersonly=True)
        except:
            return self.errorMessage

    def parseMessage(self, fulltext):
        """
        parse entire message, return root email.Message object
        payload of message object is a string if not is_multipart( )
        payload of message object is more Messages if multiple parts
        the call here same as calling email.message_from_string( )
        """
        try:
            return email.Parser.Parser( ).parsestr(fulltext)       # may fail!
        except:
            return self.errorMessage     # or let call handle? can check return

    def parseMessageRaw(self, fulltext):
        """
        parse headers only, return root email.Message object
        stops after headers parsed, for efficiency (not yet used here)
        payload of message object is raw text of mail after headers
        """
        try:
            return email.Parser.HeaderParser( ).parsestr(fulltext)
        except:
            return self.errorMessage

Self-Test Script

The last file in the mailtools package, Example 14-26, lists the self-test code for the package. This code is a separate script file, in order to allow for import search path manipulation—it emulates a real client, which is assumed to have a mailconfig.py module in its own source directory (this module can vary per client).

Example 14-26. PP3EInternetEmailmailtoolsselftest.py

###############################################################################
# self-test when this file is run as a program
###############################################################################

#
# mailconfig normally comes from the client's source directory or
# sys.path; for testing, get it from Email directory one level up
#
import sys
sys.path.append('..')
import mailconfig
print 'config:', mailconfig._ _file_ _

# get these from _ _init_ _
from mailtools import MailFetcherConsole, MailSender, MailSenderAuthConsole

if not mailconfig.smtpuser:
    sender = MailSender( )
else:
    sender = MailSenderAuthConsole( )

sender.sendMessage(From      = mailconfig.myaddress,
                   To        = [mailconfig.myaddress],
                   Subj      = 'testing 123',
                   extrahdrs = [('X-Mailer', 'mailtools')],
                   bodytext  = 'Here is my source code',
                   attaches  = ['selftest.py'])

fetcher = MailFetcherConsole( )
def status(*args): print args

hdrs, sizes, loadedall = fetcher.downloadAllHeaders(status)
for num, hdr in enumerate(hdrs[:5]):
    print hdr
    if raw_input('load mail?') in ['y', 'Y']:
        print fetcher.downloadMessage(num+1), '
', '-'*70

last5 = len(hdrs)-4
msgs, sizes, loadedall = fetcher.downloadAllMessages(status, loadfrom=last5)
for msg in msgs:
    print msg[:200], '
', '-'*70
raw_input('Press Enter to exit')

Updating the pymail Console Client

Finally, to give a use case for the mailtools module package of the preceding sections, Example 14-27 provides an updated version of the pymail program we met earlier, which uses mailtools to access email instead of older tools. Compare its code to the original pymail in this chapter to see how mailtools is employed here. You’ll find that its mail download and send logic is substantially simpler.

Example 14-27. pymail2.py

#!/usr/local/bin/python
##########################################################################
# pymail2 - simple console email interface client in Python; this
# version uses the mailtools package, which in turn uses poplib,
# smtplib, and the email package for parsing and composing emails;
# displays first text part of mails, not entire full text;
# fetches just mail headers initially, using the TOP command;
# fetches full text of just email selected to be displayed;
# caches already fetched mails; caveat: no way to refresh index;
# uses standalone mailtools objects - they can also be superclasses;
##########################################################################

mailcache = {}

def fetchmessage(i):
    try:
        fulltext = mailcache[i]
    except KeyError:
        fulltext = fetcher.downloadMessage(i)
        mailcache[i] = fulltext
    return fulltext

def sendmessage( ):
    from pymail import inputmessage
    From, To, Subj, text = inputmessage( )
    sender.sendMessage(From, To, Subj, [], text, attaches=None)

def deletemessages(toDelete, verify=True):
    print 'To be deleted:', toDelete
    if verify and raw_input('Delete?')[:1] not in ['y', 'Y']:
        print 'Delete cancelled.'
    else:
        print 'Deleting messages from server.'
        fetcher.deleteMessages(toDelete)

def showindex(msgList, msgSizes, chunk=5):
    count = 0
    for (msg, size) in zip(msgList, msgSizes):      # email.Message, int
        count += 1
        print '%d:	%d bytes' % (count, size)
        for hdr in ('From', 'Date', 'Subject'):
            print '	%s=>%s' % (hdr, msg.get(hdr, '(unknown)'))
        if count % chunk == 0:
            raw_input('[Press Enter key]')         # pause after each chunk

def showmessage(i, msgList):
    if 1 <= i <= len(msgList):
        fulltext = fetchmessage(i)
        message  = parser.parseMessage(fulltext)
        ctype, maintext = parser.findMainText(message)
        print '-'*80
        print maintext                # main text part, not entire mail
        print '-'*80                  # and not any attachments after
    else:
        print 'Bad message number'

def savemessage(i, mailfile, msgList):
    if 1 <= i <= len(msgList):
        fulltext = fetchmessage(i)
        open(mailfile, 'a').write('
' + fulltext + '-'*80 + '
')
    else:
        print 'Bad message number'

def msgnum(command):
    try:
        return int(command.split( )[1])
    except:
        return -1   # assume this is bad

helptext = """
Available commands:
i     - index display
l n?  - list all messages (or just message n)
d n?  - mark all messages for deletion (or just message n)
s n?  - save all messages to a file (or just message n)
m     - compose and send a new mail message
q     - quit pymail
?     - display this help text
"""

def interact(msgList, msgSizes, mailfile):     showindex(msgList, msgSizes)
    toDelete = []
    while 1:
        try:
            command = raw_input('[Pymail] Action? (i, l, d, s, m, q, ?) ')
        except EOFError:
            command = 'q'
        if not command: command = '*'

        if command == 'q':                     # quit
            break

        elif command[0] == 'i':                # index
            showindex(msgList, msgSizes)

        elif command[0] == 'l':                # list
            if len(command) == 1:
                for i in range(1, len(msgList)+1):
                    showmessage(i, msgList)
            else:
                showmessage(msgnum(command), msgList)

        elif command[0] == 's':                # save
            if len(command) == 1:
                for i in range(1, len(msgList)+1):
                    savemessage(i, mailfile, msgList)
            else:
                savemessage(msgnum(command), mailfile, msgList)

        elif command[0] == 'd':                # mark for deletion later
            if len(command) == 1:
                toDelete = range(1, len(msgList)+1)
            else:
                delnum = msgnum(command)
                if (1 <= delnum <= len(msgList)) and (delnum not in toDelete):
                    toDelete.append(delnum)
                else:
                    print 'Bad message number'

        elif command[0] == 'm':                # send a new mail via SMTP
            try:
                sendmessage( )
            except:
                print 'Error - mail not sent'

        elif command[0] == '?':
            print helptext
        else:
            print 'What? -- type "?" for commands help'
    return toDelete

def main( ):     global parser, sender, fetcher
    import mailtools, mailconfig
    mailserver = mailconfig.popservername
    mailuser   = mailconfig.popusername
    mailfile   = mailconfig.savemailfile

    parser     = mailtools.MailParser( )
    sender     = mailtools.MailSender( )
    fetcher    = mailtools.MailFetcherConsole(mailserver, mailuser)

    def progress(i, max): print i, 'of', max
    hdrsList, msgSizes, ignore = fetcher.downloadAllHeaders(progress)
    msgList = [parser.parseHeaders(hdrtext) for hdrtext in hdrsList]

    print '[Pymail email client]'
    toDelete   = interact(msgList, msgSizes, mailfile)
    if toDelete: deletemessages(toDelete)

if _ _name_ _ == '_ _main_ _': main( )

This program is used interactively, the same as the original. In fact, the output is nearly identical, so we won’t go into further details. Here’s a quick look at this script in action; run this on your own machine to see it firsthand:

C:...PP3EInternetEmail>pymail2.py
user: pp3e
loading headers
Connecting...
Password for pp3e on pop.earthlink.net?
+OK NGPopper vEL_6_10 at earthlink.net ready <[email protected]...
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
load headers exit
[Pymail email client]
1:      876 bytes
        From=>[email protected]
        Date=>Wed, 08 Feb 2006 05:23:13 -0000
        Subject=>I'm a Lumberjack, and I'm Okay
2:      800 bytes
        From=>[email protected]
        Date=>Wed, 08 Feb 2006 05:24:06 -0000         Subject=>testing
3:      818 bytes
        From=>[email protected]
        Date=>Tue Feb 07 22:51:08 2006
        Subject=>A B C D E F G
4:      770 bytes
        From=>[email protected]
        Date=>Tue Feb 07 23:19:51 2006
        Subject=>testing smtpmail
5:      819 bytes
        From=>[email protected]
        Date=>Tue Feb 07 23:34:23 2006
        Subject=>a b c d e f g
[Press Enter key]
[Pymail] Action? (i, l, d, s, m, q, ?) l 5
load 5
Connecting...
+OK NGPopper vEL_6_10 at earthlink.net ready <[email protected]...
--------------------------------------------------------------------------------

Spam; Spam and eggs; Spam, spam, and spam


--------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) s 1
load 1
Connecting...
+OK NGPopper vEL_6_10 at earthlink.net ready <[email protected]...
[Pymail] Action? (i, l, d, s, m, q, ?) m
From? [email protected]
To?   [email protected]
Subj? test pymail2 send
Type message text, end with line="."
Run away! Run away!
.
Sending to...['[email protected]']
From: [email protected]
To: [email protected]
Subject: test pymail2 send
Date: Wed, 08 Feb 2006 07:09:40 -0000

Run away! Run away!

Send exit
[Pymail] Action? (i, l, d, s, m, q, ?) q

As you can see, this version’s code eliminates some complexities, such as the manual formatting of composed mail message text. It also does a better job of displaying a mail’s text—instead of blindly listing the full mail text (attachments and all), it uses mailtools to fetch the first text part of the message. The messages we’re using are too simple to show the difference, but for a mail with attachments, this new version will be more focused about what it displays.

Moreover, because the interface to mail is encapsulated in the mailtools package’s modules, if it ever must change, it will only need to be changed in that module, regardless of how many mail clients use its tools. And because this code is shared, if we know it works for one client, we can be sure it will work in another; there is no need to debug new code.

On the other hand, pymail2 doesn’t really leverage much of the power of either mailtools or the underlying email package it uses. Things like attachments and inbox synchronization are not handled at all, for example. To see the full scope of the email package, we need to explore a larger email system, such as PyMailGUI or PyMailCGI. The first of these is the topic of the next chapter, and the second appears in Chapter 17. First, though, let’s quickly survey a handful of additional client-side protocol tools.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.72.245