Chapter 13. Network Programming

Introduction

Credit: Guido van Rossum, creator of Python

Network programming is one of my favorite Python applications. I wrote or started most of the network modules in the Python Standard Library, including the socket and select extension modules and most of the protocol client modules (such as ftplib). I also wrote a popular server framework module, SocketServer, and two web browsers in Python, the first predating Mosaic. Need I say more?

Python’s roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late 1980s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn’t very useful for writing Amoeba system administration scripts. Of course, I designed Python to be platform independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.

This approach proved to be a great early testimony of Python’s strengths. Writing socket code in C is tedious: the code necessary to do error checking on every call quickly overtakes the logic of the program. Quick: in which order should a server call accept, bind, connect, and listen? This is remarkably difficult to find out if all you have is a set of Unix manpages. In Python, you don’t have to write separate error-handling code for each call, making the logic of the code stand out much clearer. You can also learn about sockets by experimenting in an interactive Python shell, where misconceptions about the proper order of calls and the argument values that each call requires are cleared up quickly through Python’s immediate error messages.

Python has come a long way since those first days, and now few applications use the socket module directly; most use much higher-level modules such as urllib or smtplib, and third-party extensions such as the Twisted framework, whose popularity keeps growing. The examples in this chapter are a varied bunch: some construct and send complex email messages, while others dwell on lower-level issues such as tunneling. My favorite is Recipe 13.11, which implements PyHeartBeat: it’s useful, it uses the socket module, and it’s simple enough to be an educational example. I do note, with that mixture of pride and sadness that always accompanies a parent’s observation of children growing up, that, since the Python Cookbook’s first edition, even PyHeartBeat has acquired an alternative server implementation based on Twisted!

Nevertheless, my own baby, the socket module itself, is still the foundation of all network operations in Python. It’s a plain transliteration of the socket APIs—first introduced in BSD Unix and now widespread on all platforms—into the object-oriented paradigm. You create socket objects by calling the socket.socket factory function, then you call methods on these objects to perform typical low-level network operations. You don’t have to worry about allocating and freeing memory for buffers and the like—Python handles that for you automatically. You express IP addresses as (host, port) pairs, in which host is a string in either dotted-quad ('1.2.3.4') or domain-name ('www.python.org') notation. As you can see, even low-level modules in Python aren’t as low level as all that.

Despite the various conveniences, the socket module still exposes the actual underlying functionality of your operating system’s network sockets. If you’re at all familiar with sockets, you’ll quickly get the hang of Python’s socket module, using Python’s own Library Reference. You’ll then be able to play with sockets interactively in Python to become a socket expert, if that is what you want. The classic, highly recommended work on this subject is W. Richard Stevens, UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, 2d ed. (Prentice-Hall). For many practical uses, however, higher-level modules will serve you better.

The Internet uses a sometimes dazzling variety of protocols and formats, and the Python Standard Library supports many of them. In the Python Standard Library, you will find dozens of modules dedicated to supporting specific Internet protocols (such as smtplib to support the SMTP protocol to send mail and nntplib to support the Network News Transfer Protocol (NNTP) to send and receive Network News). In addition, you’ll find about as many modules that support specific Internet formats (such as htmllib to parse HTML data, the email package to parse and compose various formats related to email—including attachments and encoding).

I cannot even come close to doing justice to the powerful array of tools mentioned in this introduction, nor will you find all of these modules and packages used in this chapter, nor in this book, nor in most programming shops. You may never need to write any program that deals with Network News, for example; if that is the case, you don’t need to study nntplib. But it is still reassuring to know it’s there (part of the “batteries included” approach of the Python Standard Library).

Two higher-level modules that stand out from the crowd, however, are urllib and urllib2. Each of these two modules can deal with several protocols through the magic of URLs—those now-familiar strings, such as http://www.python.org/index.html, that identify a protocol (such as http), a host and port (such as www.python.org, port 80 being the default for the HTTP protocol), and a specific resource at that address (such as /index.html). urllib is very simple to use, but urllib2 is more powerful and extensible. HTTP is the most popular protocol for URLs, but these modules also support several others, such as FTP. In many cases, you’ll be able to use these modules to write typical client-side scripts that interact with any of the supported protocols much quicker and with less effort than it might take with the various protocol-specific modules.

To illustrate, I’d like to conclude with a cookbook example of my own. It’s similar to Recipe 13.2, but, rather than a program fragment, it’s a little script. I call it wget.py because it does everything for which I’ve ever needed wget. (In fact, I originally wrote this script on a system where wget wasn’t installed but Python was; writing wget.py was a more effective use of my time than downloading and installing the real thing.)

import sys, urllib
def reporthook(*a): print a
for url in sys.argv[1:]:
    i = url.rfind('/')
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)

Pass this script one or more URLs as command-line arguments; the script retrieves them into local files whose names match the last components of the URLs. The script also prints progress information of the form:

(block number, block size, total size)

Obviously, it’s easy to improve on this script; but it’s only seven lines, it’s readable, and it works—and that’s what’s so cool about Python.

Another cool thing about Python is that you can incrementally improve a program like this, and after it’s grown by two or three orders of magnitude, it’s still readable, and it still works! To see what this particular example might evolve into, check out Tools/webchecker/websucker.py in the Python source distribution. Enjoy!

13.1. Passing Messages with Socket Datagrams

Credit: Jeff Bauer

Problem

You want to communicate small messages between machines on a network in a lightweight fashion, without needing absolute assurance of reliability.

Solution

This task is just what the UDP protocol is for, and Python makes it easy for you to access UDP via datagram sockets. You can write a UDP server script (server.py) as follows:

import socket
port = 8081
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# Accept UDP datagrams, on the given port, from any sender
s.bind(("", port))
print "waiting on port:", port
while True:
    # Receive up to 1,024 bytes in a datagram
    data, addr = s.recvfrom(1024)
    print "Received:", data, "from", addr

You can write a corresponding UDP client script (client.py) as follows:

import socket
port = 8081
host = "localhost"
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.sendto("Holy Guido! It's working.", (host, port))

Discussion

Sending short text messages with socket datagrams is simple to implement and provides a lightweight message-passing idiom. Socket datagrams should not be used, however, when reliable delivery of data must be guaranteed. If the server isn’t available, your message is lost. However, in many situations, you won’t care whether the message gets lost, or, at least, you do not want to abort a program just because a message can’t be delivered.

Note that the sender of a UDP datagram (the “client” in this example) does not bind the socket before calling the sendto method. On the other hand, to receive UDP datagrams, the socket does have to be bound before calling the recvfrom method.

Don’t use this recipe’s simple code to send large datagram messages, especially under Windows, which may not respect the buffer limit. To send larger messages, you may want to do something like this:

BUFSIZE = 1024
while msg:
    bytes_sent = s.sendto(msg[:BUFSIZE], (host, port))
    msg = msg[bytes_sent:]

The sendto method returns the number of bytes it has actually managed to send, so each time, you retry from the point where you left off, while ensuring that no more than BUFSIZE octets are sent in each datagram.

Note that with datagrams (UDP) you have no guarantee that all (or any) of the pieces that you send as separate datagrams arrive to the destination, nor that the pieces that do arrive are in the same order in which they were sent. If you need to worry about any of these reliability issues, you may be better off with a TCP connection, which gives you all of these assurances and handles many delicate behind-the-scenes aspects nicely on your behalf. Still, I often use socket datagrams for debugging, especially (but not exclusively) where an application spans more than one machine on the same, reliable local area network. The Python Standard Library’s logging module also supports optional use of UDP for its logging output.

See Also

Recipe 13.11 for a typical, useful application of UDP datagrams in network operations; documentation for the standard library modules socket and logging in the Library Reference and Python in a Nutshell.

13.2. Grabbing a Document from the Web

Credit: Gisle Aas, Magnus Bodin

Problem

You need to grab a document from a URL on the Web.

Solution

urllib.urlopen returns a file-like object, and you can call the read method on that object to get all of its contents:

from urllib import urlopen
doc = urlopen("http://www.python.org").read( )
print doc

Discussion

Once you obtain a file-like object from urlopen, you can read it all at once into one big string by calling its read method, as I do in this recipe. Alternatively, you can read the object as a list of lines by calling its readlines method, or, for special purposes, just get one line at a time by looping over the object in a for loop. In addition to these file-like operations, the object that urlopen returns offers a few other useful features. For example, the following snippet gives you the headers of the document:

doc = urlopen("http://www.python.org")
print doc.info( )

such as the Content-Type header (text/html in this case) that defines the MIME type of the document. doc.info returns a mimetools.Message instance, so you can access it in various ways besides printing it or otherwise transforming it into a string. For example, doc.info( ).getheader(`Content-Type') returns the 'text/html' string. The maintype attribute of the mimetools.Message object is the 'text' string, subtype is the 'html' string, and type is also the 'text/html' string. If you need to perform sophisticated analysis and processing, all the tools you need are right there. At the same time, if your needs are simpler, you can meet them in very simple ways, as this recipe shows.

If what you need to do with the document you grab from the Web is specifically to save it to a local file, urllib.urlretrieve is just what you need, as the “Introduction” to this chapter describes.

urllib implicitly supports the use of proxies (as long as the proxies do not require authentication: the current implementation of urllib does not support authentication-requiring proxies). Just set environment variable HTTP_PROXY to a URL, such as 'http://proxy.domain.com:8080', to use the proxy at that URL. If the environment variable HTTP_PROXY is not set, urllib may also look for the information in other platform-specific locations, such as the Windows registry if you’re running under Windows.

If you have more advanced needs, such as using proxies that require authentication, you may use the more sophisticated urllib2 module of the Python Standard Library, rather than simple module urllib. At http://pydoc.org/2.3/urllib2.html, you can find an example of how to use urllib2 for the specific task of accessing the Internet through a proxy that does require authentication.

See Also

Documentation for the standard library modules urllib, urllib2, and mimetools in the Library Reference and Python in a Nutshell.

13.3. Filtering a List of FTP Sites

Credit: Mark Nenadov

Problem

Several of the FTP sites on your list of sites could be down at any time. You want to filter that list and obtain the list of those sites that are currently up.

Solution

Clearly, we first need a function to check whether one particular site is up:

import socket, ftplib
def isFTPSiteUp(site):
    try:
        ftplib.FTP(site).quit( )
    except socket.error:
        return False
    else:
        return True

Now, a simple list comprehension can perform the recipe’s task, but we may as well wrap that list comprehension inside another function:

def filterFTPsites(sites):
    return [site for site in sites if isFTPSiteUp(site)]

Alternatively, filter(isFTPSiteUp, sites) returns exactly the same resulting list as the list comprehension.

Discussion

Lists of FTP sites are sometimes difficult to maintain, since sites may be closed or temporarily down for all sorts of reasons. The code in this recipe is simple and suitable, for example, for use inside a small interactive program that must let the user choose among FTP sites—we may as well not even present for choice those sites we know are down! If you run this code regularly a few times a day and append the results to a file, the results may also be a basis for long-term maintenance of a list of FTP sites. Any site that has been down for more than a certain number of days should probably be moved away from the main list and into a list of sites that may well have croaked.

Very similar ideas could be used to filter lists of sites that serve protocols other than FTP, by using, instead of standard Python library module ftplib, other such modules, such as nntplib for the NNTP protocol, httplib for the Hypertext Transport Protocol (HTTP), and so on.

When you’re checking many FTP sites within one program run, it could be much faster to use multiple threads to check on multiple sites at once (so that the delays while waiting for the various sites to respond can overlap), or else use an asynchronous approach. The simple approach presented in this recipe is easiest to program and to understand, but for most real-life networking programs, you do want to enhance performance by using either multithreading or asynchronous approaches, as other recipes in this chapter demonstrate.

See Also

Documentation for the standard library modules socket, ftplib, nntplib, and httplib, and built-in function filter, in the Library Reference and Python in a Nutshell.

13.4. Getting Time from a Server via the SNTP Protocol

Credit: Simon Foster

Problem

You need to contact an SNTP (Simplified Network Time Protocol) server (which respects RFC 2030) to obtain the time of day as returned by that server.

Solution

SNTP is quite simple to implement, for example in a small script:

import socket, struct, sys, time
TIME1970 = 2208988800L                        # Thanks to F.Lundh
client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
data = 'x1b' + 47 * ''
client.sendto(data, (sys.argv[1], 123))
data, address = client.recvfrom(1024)
if data:
    print 'Response received from:', address
    t = struct.unpack('!12I', data)[10]
    t -= TIME1970
    print '	Time=%s' % time.ctime(t)

Discussion

An SNTP exchange begins with a client sending a 48-byte UDP datagram which starts with byte 'x1b‘. The server answers with a 48-byte UDP datagram made up of twelve network-order longwords (4 bytes each). We can easily unpack the server’s returned datagram into a tuple of ints, by using standard Python library module struct’s unpack function. Then, for simplicity, we look only at the eleventh of those twelve longwords. That integer gives the time in seconds—but it measures time from an epoch that’s different from the 1970-based one normally used in Python. The difference in epochs is easily fixed by subtracting the magic number (kindly supplied by F. Lundh) that is named TIME1970 in the recipe. After the subtraction, we have a time in seconds from the epoch that complies with Python’s standard time module, and we can handle it with the functions in module time. In this recipe, we just display it on standard output as formatted by function time.ctime.

See Also

Documentation for the standard library modules socket, struct and time in the Library Reference and Python in a Nutshell; the SNTP protocol is defined in RFC 2030 (http://www.ietf.org/rfc/rfc2030.txt), and the richer NTP protocol is defined in RFC 1305 (http://www.ietf.org/rfc/rfc1305.txt); Chapter 3 for general issues dealing with time in Python.

13.5. Sending HTML Mail

Credit: Art Gillespie

Problem

You need to send HTML mail and accompany it with a plain text version of the message’s contents, so that the email message is also readable by MUAs that are not HTML-capable.

Solution

Although the modern Python way to perform any mail manipulation is with the standard Python library email package, the functionality we need for this recipe is also supplied by the MimeWriter and mimetools modules (which are also in the Python Standard Library). We can easily code a function that just accesses and uses that functionality:

def createhtmlmail(subject, html, text=None):
    " Create a mime-message that will render as HTML or text, as appropriate"
    import MimeWriter, mimetools, cStringIO
    if text is None:
        # Produce an approximate textual rendering of the HTML string,
        # unless you have been given a better version as an argument
        import htmllib, formatter
        textout = cStringIO.StringIO( )
        formtext = formatter.AbstractFormatter(formatter.DumbWriter(textout))
        parser = htmllib.HTMLParser(formtext)
        parser.feed(html)
        parser.close( )
        text = textout.getvalue( )
        del textout, formtext, parser
    out = cStringIO.StringIO( )              # output buffer for our message
    htmlin = cStringIO.StringIO(html)    # input buffer for the HTML
    txtin = cStringIO.StringIO(text)     # input buffer for the plain text
    writer = MimeWriter.MimeWriter(out)
    # Set up some basic headers. Place subject here because smtplib.sendmail
    # expects it to be in the message, as relevant RFCs prescribe.
    writer.addheader("Subject", subject)
    writer.addheader("MIME-Version", "1.0")
    # Start the multipart section of the message.  Multipart/alternative seems
    # to work better on some MUAs than multipart/mixed.
    writer.startmultipartbody("alternative")
    writer.flushheaders( )
    # the plain-text section: just copied through, assuming iso-8859-1
    subpart = writer.nextpart( )
    pout = subpart.startbody("text/plain", [("charset", 'iso-8859-1')])
    pout.write(txtin.read( ))
    txtin.close( )
    # the HTML subpart of the message: quoted-printable, just in case
    subpart = writer.nextpart( )
    subpart.addheader("Content-Transfer-Encoding", "quoted-printable")
    pout = subpart.startbody("text/html", [("charset", 'us-ascii')])
    mimetools.encode(htmlin, pout, 'quoted-printable')
    htmlin.close( )
    # You're done; close your writer and return the message as a string
    writer.lastpart( )
    msg = out.getvalue( )
    out.close( )
    return msg

Discussion

This recipe’s module is completed in the usual style with a few lines to ensure that, when run as a script, it runs a self-test by composing and sending a sample HTML mail:

if _ _name_ _=="_ _main_ _":
    import smtplib
    f = open("newsletter.html", 'r')
    html = f.read( )
    f.close( )
    try:
        f = open("newsletter.txt", 'r')
        text = f.read( )
    except IOError:
        text = None
    subject = "Today's Newsletter!"
    message = createhtmlmail(subject, html, text)
    server = smtplib.SMTP("localhost")
    server.sendmail('[email protected]',
        '[email protected]', message)
    server.quit( )

Sending HTML mail is a popular concept, and (as long as you avoid sending it to newsgroups and open mailing lists) there’s no reason your Python scripts shouldn’t do it. When you do send HTML mail, never forget to embed a text-only version of your message along with the HTML version. Lots of folks still prefer character-mode mail readers (technically known as MUAs), and it makes no sense to alienate those users by sending mail that they can’t conveniently read. This recipe shows how easy Python makes the task of sending an email in both HTML and text forms.

Ideally, your input will be a properly formatted text version of the message, as well as the HTML version. But, if you don’t have such nice textual input, you can still prepare a text version on the fly starting from the HTML version; one way to prepare such text is shown in the recipe. Remember that htmllib has some limitations, so you may want to use alternative approaches, such as saving the HTML string to disk and then using:

text = os.popen('lynx -dump %s' % tempfile).read( )

or whatever works best for you. Alternatively, if all you have as input is plain text (following some specific conventions, such as empty lines to mark paragraphs and underlines for emphasis), you can parse the text and throw together some HTML markup on the fly.

The emails generated by this code have been successfully read on Outlook 2000, Eudora 4.2, Hotmail, and Netscape Mail. It’s likely that they will work in other HTML-capable MUAs as well. Mutt has been used to test the acceptance of messages generated by this recipe in text-only MUAs. Again, other such MUAs can be expected to work just as acceptably.

See Also

Recipe 13.6 shows how the email package in the Python Standard Library can also be used to compose a MIME multipart message; documentation in the Library Reference and Python in a Nutshell about the standard library package email, as well as modules mimetools, MimeWriter, htmllib, formatter, cStringIO, and smtplib; Henry Minsky’s article about MIME (http://www.arsdigita.com/asj/mime/) for information on various issues related to sending HTML mail.

13.6. Bundling Files in a MIME Message

Credit: Matthew Dixon Cowles, Hans Fangohr, John Pywtorak

Problem

You want to create a multipart MIME (Multipurpose Internet Mail Extensions) message that includes all files in the current directory.

Solution

If you often deal with composing or parsing mail messages, or mail-like messages such as Usenet news posts, the Python Standard Library email package gives you very powerful tools to work with. Here is a module that uses email to solve the task posed in the “Problem”:

#!/usr/bin/env python
import base64, quopri
import mimetypes, email.Generator, email.Message
import cStringIO, os
# sample addresses
toAddr = "[email protected]"
fromAddr = "[email protected]"
outputFile = "dirContentsMail"
def main( ):
    mainMsg = email.Message.Message( )
    mainMsg["To"] = toAddr
    mainMsg["From"] = fromAddr
    mainMsg["Subject"] = "Directory contents"
    mainMsg["Mime-version"] = "1.0"
    mainMsg["Content-type"] = "Multipart/mixed"
    mainMsg.preamble = "Mime message
"
    mainMsg.epilogue = "" # to ensure that message ends with newline
    # Get names of plain files (not subdirectories or special files)
    fileNames = [f for f in os.listdir(os.curdir) if os.path.isfile(f)]
    for fileName in fileNames:
        contentType, ignored = mimetypes.guess_type(fileName)
        if contentType is None:     # If no guess, use generic opaque type
            contentType = "application/octet-stream"
        contentsEncoded = cStringIO.StringIO( )
        f = open(fileName, "rb")
        mainType = contentType[:contentType.find("/")]
        if mainType=="text":
            cte = "quoted-printable"
            quopri.encode(f, contentsEncoded, 1)   # 1 to also encode tabs
        else:
            cte = "base64"
            base64.encode(f, contentsEncoded)
        f.close( )
        subMsg = email.Message.Message( )
        subMsg.add_header("Content-type", contentType, name=fileName)
        subMsg.add_header("Content-transfer-encoding", cte)
        subMsg.set_payload(contentsEncoded.getvalue( ))
        contentsEncoded.close( )
        mainMsg.attach(subMsg)
    f = open(outputFile, "wb")
    g = email.Generator.Generator(f)
    g.flatten(mainMsg)
    f.close( )
    return None
if _ _name_ _=="_ _main_ _":
    main( )

Discussion

The email package makes manipulating MIME messages a snap. The Python Standard Library also offers other older modules that can serve many of the same purposes, but I suggest you look into email as an alternative to all such other modules. email requires some study because it is a very functionally rich package, but it will amply repay the time you spend studying it.

MIME is the Internet standard for sending files and non-ASCII data by email. The standard is specified in RFCs 2045-2049. A few points are especially worth keeping in mind:

  • The original specification for the format of an email (RFC 822) didn’t allow for non-ASCII characters and had no provision for attaching or enclosing a file along with a text message. Therefore, not surprisingly, MIME messages are very common these days.

  • Messages that follow the MIME standard are backward compatible with ordinary RFC 822 (now RFC 2822) messages. An old mail reader (technically, an MUA) that doesn’t understand the MIME specification will probably not be able to display a MIME message in a way that’s useful to the user, but the message will still be legal and therefore shouldn’t cause unexpected behavior.

  • An RFC 2822 message consists of a set of headers, a blank line, and a body. MIME handles attachments and other multipart documents by specifying a format for the message’s body. In multipart MIME messages, the body is divided into submessages, each of which has a set of headers, a blank line, and a body. Generally, each submessage is referred to as a MIME part, and parts may nest recursively.

  • MIME parts (whether or not in a multipart message) that contain characters outside of the strict US-ASCII range are encoded as either base-64 or quoted-printable data, so that the resulting mail message contains only ordinary ASCII characters. Data can be encoded with either method, but, generally, only data that has few non-ASCII characters (basically text, possibly with a few extra characters outside of the ASCII range, such as national characters in Latin-1 and similar codes) is worth encoding as quoted-printable, because even without decoding it may be readable. If the data is essentially binary, with all bytes being equally likely, base-64 encoding is more compact.

Not surprisingly, given all of these issues, manipulating MIME messages is often considered to be a nuisance. In the old times, back before Python 2.2, the standard library’s modules for dealing with MIME messages were quite useful but rather miscellaneous. In particular, putting MIME messages together and taking them apart required two distinct approaches. The email package, which was added in Python 2.2, unified and simplified these two related jobs.

See Also

Recipe 13.7 shows how the email package can be used to unpack a MIME message; documentation for the standard library modules email, mimetypes, base64, quopri, and cStringIO in the Library Reference and Python in a Nutshell.

13.7. Unpacking a Multipart MIME Message

Credit: Matthew Cowles

Problem

You want to unpack a multipart MIME message.

Solution

The walk method of message objects generated by the email package makes this task really easy. Here is a script that uses email to solve the task posed in the “Problem”:

import email.Parser
import os, sys
def main( ):
    if len(sys.argv) != 2:
        print "Usage: %s filename" % os.path.basename(sys.argv[0])
        sys.exit(1)
    mailFile = open(sys.argv[1], "rb")
    p = email.Parser.Parser( )
    msg = p.parse(mailFile)
    mailFile.close( )
    partCounter = 1
    for part in msg.walk( ):
        if part.get_main_type( ) == "multipart":
            continue
        name = part.get_param("name")
        if name == None:
            name = "part-%i" % partCounter
        partCounter += 1
        # In real life, make sure that name is a reasonable filename 
        # for your OS; otherwise, mangle that name until it is!
        f = open(name, "wb")
        f.write(part.get_payload(decode=1))
        f.close( )
        print name
if _ _name_ _=="_ _main_ _":
    main( )

Discussion

The email package makes parsing MIME messages reasonably easy. This recipe shows how to unbundle a MIME message with the email package by using the walk method of message objects.

You can create a message object in several ways. For example, you can instantiate the email.Message.Message class and build the message object’s contents with calls to its methods. In this recipe, however, I need to read and analyze an existing message, so I work the other way around, calling the parse method of an email.Parser.Parser instance. The parse method takes as its only argument a file-like object (in the recipe, I pass it a real file object that I just opened for binary reading with the built-in open function) and returns a message object, on which you can call message object methods.

The walk method is a generator (i.e., it returns an iterator object on which you can loop with a for statement). You usually will use this method exactly as I use it in this recipe:

for part in msg.walk( ):

The iterator sequentially returns (depth-first, in case of nesting) the parts that make up the message. If the message is not a container of parts (i.e., has no attachments or alternates—message.is_multipart returns false), no problem: the walk method will then return an iterator with a single element—the message itself. In any case, each element of the iterator is also a message object (an instance of email.Message.Message), so you can call on it any of the methods that a message object supplies.

In a multipart message, parts with a type of 'multipart/something' (i.e., a main type of 'multipart') may be present. In this recipe, I skip them explicitly since they’re just glue holding the true parts together. I use the get_main_type method to obtain the main type and check it for equality with 'multipart'; if equality holds, I skip this part and move to the next one with a continue statement. When I know I have a real part in hand, I locate its name (or synthesize one if it has no name), open that name as a file, and write the message’s contents (also known as the message’s payload), which I get by calling the get_payload method, into the file. I use the decode=1 argument to ensure that the payload is decoded back to a binary content (e.g., an image, a sound file, a movie) if needed, rather than remaining in text form. If the payload is not encoded, decode=1 is innocuous, so I don’t have to check before I pass it.

See Also

Recipe 13.6; documentation for the standard library package email in the Library Reference.

13.8. Removing Attachments from an Email Message

Credit: Anthony Baxter

Problem

You’re handling email in Python and need to remove from email messages any attachments that might be dangerous.

Solution

Regular expressions can help us identify dangerous content types and file extensions, and thus code a function to remove any potentially dangerous attachments:

ReplFormat = """
This message contained an attachment that was stripped out.
The filename was: %(filename)s,
The original type was: %(content_type)s
(and it had additional parameters of:
%(params)s)
"""
import re
BAD_CONTENT_RE = re.compile('application/(msword|msexcel)', re.I)
BAD_FILEEXT_RE = re.compile(r'(.exe|.zip|.pif|.scr|.ps)$')
def sanitise(msg):
    ''' Strip out all potentially dangerous payloads from a message '''
    ct = msg.get_content_type( )
    fn = msg.get_filename( )
    if BAD_CONTENT_RE.search(ct) or (fn and BAD_FILEEXT_RE.search(fn)):
        # bad message-part, pull out info for reporting then destroy it
        # present the parameters to the content-type, list of key, value
        # pairs, as key=value forms joined by comma-space
        params = msg.get_params( )[1:]
        params = ', '.join([ '='.join(p) for p in params ])
        # put informative message text as new payload
        replace = ReplFormat % dict(content_type=ct, filename=fn, params=params)
        msg.set_payload(replace)
        # now remove parameters and set contents in content-type header
        for k, v in msg.get_params( )[1:]:
            msg.del_param(k)
        msg.set_type('text/plain')
        # Also remove headers that make no sense without content-type
        del msg['Content-Transfer-Encoding']
        del msg['Content-Disposition']
    else:
        # Now we check for any sub-parts to the message
        if msg.is_multipart( ):
            # Call sanitise recursively on any subparts
            payload = [ sanitise(x) for x in msg.get_payload( ) ]
            # Replace the payload with our list of sanitised parts
            msg.set_payload(payload)
    # Return the sanitised message
    return msg
# Add a simple driver/example to show how to use this function
if _ _name_ _ == '_ _main_ _':
    import email, sys
    m = email.message_from_file(open(sys.argv[1]))
    print sanitise(m)

Discussion

This issue has come up a few times on the newsgroup comp.lang.python, so I decided to post a cookbook entry to show how easy it is to deal with this kind of task. Specifically, this recipe shows how to read in an email message, strip out any dangerous or suspicious attachments, and replace them with a harmless text message informing the user of the alterations that we’re performed.

This kind of task is particularly important when end users are using something like Microsoft Outlook, which is targeted by harmful virus and worm messages (collectively known as malware) on a daily basis.

The email parser in Python 2.4 has been completely rewritten to be robust first, correct second. Prior to that version, the parser was written for correctness first. But focusing on correctness was a problem because many virus/worm messages and other malware routinely send email messages that are broken and nonconformant—malformed to the point that the old email parser chokes and dies. The new parser is designed to never actually break when reading a message. Instead, it tries its best to fix whatever it can fix in the message. (If you have a message that causes the parser to crash, please let us, the core Python developers, know. It’s a bug, and we’ll fix it. Please include a copy of the message that makes the parser crash, or else it’s very unlikely that we can reproduce your problem!)

The recipe’s code itself is fairly well commented and should be easy enough to follow. A mail message consists of one or more parts; each of these parts can contain nested parts. We call the sanitise function on the top-level Message object, and it calls itself recursively on the subobjects if and as needed.

The sanitise function first checks the Content-Type of the part, and if there’s a filename, it also checks that filename’s extension against a known-to-be-bad list. If the message part is bad, we replace the message itself with a short text description describing the now-removed part and clean out the headers that are relevant. We set this message part’s Content-Type to 'text/plain' and remove other headers related to the now-removed message.

Finally, we check whether the message is a multipart message. If so, it means the message has subparts, so we recursively call the sanitise function on each of them. We then replace the payload with our list of sanitized subparts.

If you’re interested in working further on this recipe, the most important extra functionality, which is easy to add with a small amount of work, might be to store the attached file in some directory (instead of destroying all suspect attachments), and give the user a link to that file. Also consider extending the check in sanitise that filters dangerous attachments to have it verify more than just the content type and file extension; other headers may be able to carry known signs of worm or virus messages.

See Also

Documentation for the standard library modules email and re in the Library Reference and Python in a Nutshell.

13.9. Fixing Messages Parsed by Python 2.4 email.FeedParser

Credit: Matthew Cowles

Problem

You’re using Python 2.4’s new email.FeedParser module, but sometimes, when dealing with badly malformed incoming messages, that module produces message objects that are internally inconsistent (e.g., a message has a content-type header that says the message is multipart, but the body isn’t), and you need to fix those inconsistencies.

Solution

Python 2.4’s new standard library module email.FeedParser is very useful, but a little post-processing on the messages it returns can heuristically fix some inconsistencies and make it even better. Here’s a module containing a class and a few functions to help with this task:

import email, email.FeedParser
import re, sys, sgmllib
# what chars are non-Ascii, what max fraction of them can be in a text part
kGuessBinaryThreshold = 0.2
kGuessBinaryRE = re.compile("[\0000-\0025\0200-\0377]")
# what max fraction of HTML tags can be in a text (non-HTML) part
kGuessHTMLThreshold = 0.05
class Cleaner(sgmllib.SGMLParser):
    entitydefs = {"nbsp": " "}  # I'll break if I want to
    def _ _init_ _(self):
        sgmllib.SGMLParser._ _init_ _(self)
        self.result = [  ]
    def do_p(self, *junk):
        self.result.append('
')
    def do_br(self, *junk):
        self.result.append('
')
    def handle_data(self, data):
        self.result.append(data)
    def cleaned_text(self):
        return ''.join(self.result)
def stripHTML(text):
    ''' return text, with HTML tags stripped '''
    c = Cleaner( )
    try:
      c.feed(text)
    except sgmllib.SGMLParseError:
      return text
    else:
      return c.cleaned_text( )
def guessIsBinary(text):
    ''' return whether we can heuristically guess 'text' is binary '''
    if not text: return False
    nMatches = float(len(kGuessBinaryRE.findall(text)))
    return nMatches/len(text) >= kGuessBinaryThreshold
def guessIsHTML(text):
    ''' return whether we can heuristically guess 'text' is HTML '''
    if not text: return False
    lt = len(text)
    textWithoutTags = stripHTML(text)
    tagsChars = float(lt-len(textWithoutTags))
    return tagsChars/lt >= kGuessHTMLThreshold
def getMungedMessage(openFile):
    openFile.seek(0)
    p = email.FeedParser.FeedParser( )
    p.feed(openFile.read( ))
    m = p.close( )
    # Fix up multipart content-type when message isn't multi-part
    if m.get_main_type( )=="multipart" and not m.is_multipart( ):
        t = m.get_payload(decode=1)
        if guessIsBinary(t):
            # Use generic "opaque" type
            m.set_type("application/octet-stream")
        elif guessIsHTML(t):
            m.set_type("text/html")
        else:
            m.set_type("text/plain")
    return m

Discussion

FeedParser is a new module in the Python 2.4 Standard Library’s email package. The module’s name comes from the fact that it maintains a buffer, so that you don’t have to give it all the text at once. Possibly more interesting is that the module doesn’t raise an error when called on malformed messages; instead, it tries to make some sense of them and return a useful email.Message object. That’s useful because so much mail is spam and so much spam is malformed.

The other side of the coin, given that the heroic feed parser works on incorrect messages, is that you can get back from it an email.Message object that’s internally inconsistent. This recipe tries to make sense of one kind of inconsistency: a message with a content-type header that says that the message is multipart, but the body isn’t.

The heuristics that the recipe uses to guess at the correct content-type are inevitably messy. Still, better to have such messy heuristics in recipes, rather than embedded forever in the Python Standard Library.

See Also

Documentation for the standard library package email in the Python 2.4 Library Reference.

13.10. Inspecting a POP3 Mailbox Interactively

Credit: Xavier Defrang

Problem

You have a POP3 mailbox somewhere, perhaps on a slow connection, and need to examine messages and possibly mark them for deletion interactively.

Solution

The poplib module of the Python Standard Library lets you write a script to solve this task quite easily:

# Interactive script to clean POP3 mailboxes from malformed or too-large mails
#
# Iterates over nonretrieved mails, prints selected elements from the headers,
# prompts interactively about whether each message should be deleted
import sys, getpass, poplib, re
# Change according to your needs: POP host, userid, and password
POPHOST = "pop.domain.com"
POPUSER = "jdoe"
POPPASS = ""
# How many lines to retrieve from body, and which headers to retrieve
MAXLINES = 10
HEADERS = "From To Subject".split( )
args = len(sys.argv)
if args>1: POPHOST = sys.argv[1]
if args>2: POPUSER = sys.argv[2]
if args>3: POPPASS = sys.argv[3]
if args>4: MAXLINES= int(sys.argv[4])
if args>5: HEADERS = sys.argv[5:]
# An RE to identify the headers you're actually interested in
rx_headers  = re.compile('|'.join(headers), re.IGNORECASE)
try:
    # Connect to the POP server and identify the user
    pop = poplib.POP3(POPHOST)
    pop.user(POPUSER)
    # Authenticate user
    if not POPPASS or POPPASS=='=':
        # If no password was supplied, ask for the password
        POPPASS = getpass.getpass("Password for %s@%s:" % (POPUSER, POPHOST))
    pop.pass_(POPPASS)
    # Get and print some general information (msg_count, box_size)
    stat = pop.stat( )
    print "Logged in as %s@%s" % (POPUSER, POPHOST)
    print "Status: %d message(s), %d bytes" % stat
    bye = False
    count_del = 0
    for msgnum in range(1, 1+stat[0]):
        # Retrieve headers
        response, lines, bytes = pop.top(msgnum, MAXLINES)
        # Print message info and headers you're interested in
        print "Message %d (%d bytes)" % (msgnum, bytes)
        print "-" * 30
        print "
".join(filter(rx_headers.match, lines))
        print "-" * 30
        # Input loop
        while True:
            k = raw_input("(d=delete, s=skip, v=view, q=quit) What? ")
            k = k[:1].lower( )
            if k == 'd':
                # Mark message for deletion
                k = raw_input("Delete message %d? (y/n) " % msgnum)
                if k in "yY":
                    pop.dele(msgnum)
                    print "Message %d marked for deletion" % msgnum
                    count_del += 1
                    break
            elif k == 's':
                print "Message %d left on server" % msgnum
                break
            elif k == 'v':
                print "-" * 30
                print "
".join(lines)
                print "-" * 30
            elif k == 'q':
                bye = True
                break
        # Time to say goodbye?
        if bye:
            print "Bye"
            break
    # Summary
    print "Deleting %d message(s) in mailbox %s@%s" % (
        count_del, POPUSER, POPHOST)
    # Commit operations and disconnect from server
    print "Closing POP3 session"
    pop.quit( )
except poplib.error_proto, detail:
    # Fancy error handling
    print "POP3 Protocol Error:", detail

Discussion

Sometimes your POP3 mailbox is behind a slow Internet link, and you don’t want to wait for that funny 10MB MPEG movie that you already received twice yesterday to be fully downloaded before you can read your mail. Or maybe a peculiar malformed message is hanging your MUA. Issues of this kind are best tackled interactively, but you need a helpful script to let you examine data about each message and determine which messages should be removed.

I used to deal with this kind of thing by telneting to the POP (Post Office Protocol) server and trying to remember the POP3 protocol commands (while hoping that the server implements the help command in particular). Nowadays, I use the script presented in this recipe to inspect my mailbox and do some cleaning. Basically, the Python Standard Library POP3 module, poplib, remembers the protocol commands on my behalf, and this script helps me use those commands appropriately.

The script in this recipe uses the poplib module to connect to your mailbox. It then prompts you about what to do with each undelivered message. You can view the top of the message, leave the message on the server, or mark the message for deletion. No particular tricks or hacks are used in this piece of code: it’s a simple example of poplib usage. In addition to being practically useful in emergencies, it can show you how poplib works. The poplib.POP3 call returns an object that is ready for connection to a POP3 server specified as its argument. We complete the connection by calling the user and pass_ methods to specify a user ID and password. Note the trailing underscore in pass_: this method could not be called pass because that is a Python keyword (the do-nothing statement), and by convention, such issues are often solved by appending an underscore to the identifier.

After connection, we keep working with methods of the pop object. The stat method returns the number of messages and the total size of the mailbox in bytes. The top method takes a message-number argument and returns information about that message, as well as the message itself as a list of lines. (You can specify a second argument n to ensure that no more than n lines are returned.) The dele method also takes a message-number argument and deletes that message from the mailbox (without renumbering all other messages). When we’re done, we call the quit method. If you’re familiar with the POP3 protocol, you’ll notice the close correspondence between these methods and the POP3 commands.

See Also

Documentation for the standard library modules poplib and getpass in the Library Reference and Python in a Nutshell; the POP protocol is described in RFC 1939 (http://www.ietf.org/rfc/rfc1939.txt).

13.11. Detecting Inactive Computers

Credit: Nicola Larosa

Problem

You need to monitor the working state of a number of computers connected to a TCP/IP network.

Solution

The key idea in this recipe is to have every computer periodically send a heartbeat UDP packet to a computer acting as the server for this heartbeat-monitoring service. The server keeps track of how much time has passed since each computer last sent a heartbeat and reports on computers that have been silent for too long.

Here is the “client” program, HeartbeatClient.py, which must run on every computer we need to monitor:

""" Heartbeat client, sends out a UDP packet periodically """
import socket, time
SERVER_IP = '192.168.0.15'; SERVER_PORT = 43278; BEAT_PERIOD = 5
print 'Sending heartbeat to IP %s , port %d' % (SERVER_IP, SERVER_PORT)
print 'press Ctrl-C to stop'
while True:
    hbSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    hbSocket.sendto('PyHB', (SERVER_IP, SERVER_PORT))
    if _ _debug_ _:
        print 'Time: %s' % time.ctime( )
    time.sleep(BEAT_PERIOD)

The server program, which receives and keeps track of these “heartbeats”, must run on the machine whose address is given as SERVER_IP in the “client” program. The server must support concurrency, since many heartbeats from different computers might arrive simultaneously. A server program has essentially two ways to support concurrency: multithreading, or asynchronous operation. Here is a multithreaded ThreadedBeatServer.py, using only modules from the Python Standard Library:

""" Threaded heartbeat server """
import socket, threading, time
UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
class Heartbeats(dict):
    """ Manage shared heartbeats dictionary with thread locking """
    def _ _init_ _(self):
        super(Heartbeats, self)._ _init_ _( )
        self._lock = threading.Lock( )
    def _ _setitem_ _(self, key, value):
        """ Create or update the dictionary entry for a client """
        self._lock.acquire( )
        try:
            super(Heartbeats, self)._ _setitem_ _(key, value)
        finally:
            self._lock.release( )
    def getSilent(self):
        """ Return a list of clients with heartbeat older than CHECK_TIMEOUT """
        limit = time.time( ) - CHECK_TIMEOUT
        self._lock.acquire( )
        try:
            silent = [ip for (ip, ipTime) in self.items( ) if ipTime < limit]
        finally:
            self._lock.release( )
        return silent
class Receiver(threading.Thread):
    """ Receive UDP packets and log them in the heartbeats dictionary """
    def _ _init_ _(self, goOnEvent, heartbeats):
        super(Receiver, self)._ _init_ _( )
        self.goOnEvent = goOnEvent
        self.heartbeats = heartbeats
        self.recSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        self.recSocket.settimeout(CHECK_TIMEOUT)
        self.recSocket.bind(('', UDP_PORT))
    def run(self):
        while self.goOnEvent.isSet( ):
            try:
                data, addr = self.recSocket.recvfrom(5)
                if data == 'PyHB':
                    self.heartbeats[addr[0]] = time.time( )
            except socket.timeout:
                pass
def main(num_receivers=3):
    receiverEvent = threading.Event( )
    receiverEvent.set( )
    heartbeats = Heartbeats( )
    receivers = [  ]
    for i in range(num_receivers):
        receiver = Receiver(goOnEvent=receiverEvent, heartbeats=heartbeats)
        receiver.start( )
        receivers.append(receiver)
    print 'Threaded heartbeat server listening on port %d' % UDP_PORT
    print 'press Ctrl-C to stop'
    try:
        while True:
            silent = heartbeats.getSilent( )
            print 'Silent clients: %s' % silent
            time.sleep(CHECK_PERIOD)
    except KeyboardInterrupt:
        print 'Exiting, please wait...'
        receiverEvent.clear( )
        for receiver in receivers:
            receiver.join( )
        print 'Finished.'
if _ _name_ _ == '_ _main_ _':
    main( )

As an alternative, here is an asynchronous AsyncBeatServer.py program based on the powerful Twisted framework:

import time
from twisted.application import internet, service
from twisted.internet import protocol
from twisted.python import log
UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
class Receiver(protocol.DatagramProtocol):
    """ Receive UDP packets and log them in the "client"s dictionary """
    def datagramReceived(self, data, (ip, port)):
        if data == 'PyHB':
            self.callback(ip)
class DetectorService(internet.TimerService):
    """ Detect clients not sending heartbeats for too long """
    def _ _init_ _(self):
        internet.TimerService._ _init_ _(self, CHECK_PERIOD, self.detect)
        self.beats = {  }
    def update(self, ip):
        self.beats[ip] = time.time( )
    def detect(self):
        """ Log a list of clients with heartbeat older than CHECK_TIMEOUT """
        limit = time.time( ) - CHECK_TIMEOUT
        silent = [ip for (ip, ipTime) in self.beats.items( ) if ipTime < limit]
        log.msg('Silent clients: %s' % silent)
application = service.Application('Heartbeat')
# define and link the silent clients' detector service
detectorSvc = DetectorService( )
detectorSvc.setServiceParent(application)
# create an instance of the Receiver protocol, and give it the callback
receiver = Receiver( )
receiver.callback = detectorSvc.update
# define and link the UDP server service, passing the receiver in
udpServer = internet.UDPServer(UDP_PORT, receiver)
udpServer.setServiceParent(application)
# each service is started automatically by Twisted at launch time
log.msg('Asynchronous heartbeat server listening on port %d
'
    'press Ctrl-C to stop
' % UDP_PORT)

Discussion

When a number of computers are connected by a TCP/IP network, we are often interested in monitoring their working state. The client and server programs presented in this recipe help you detect when a computer stops working, while having minimal impact on network traffic and requiring very little setup. Note that this recipe does not monitor the working state of single, specific services running on a machine, just that of the TCP/IP stack and the underlying operating system and hardware components.

This PyHeartBeat approach is made up of two files: a client program, HeartbeatClient.py, sends UDP packets to the server, while a server program, either ThreadedBeatServer.py (using only modules from the Python Standard Library to implement a multithreaded approach) or AsyncBeatServer.py (implementing an asynchronous approach based on the powerful Twisted framework), runs on a central computer to listen for such packets and detect inactive clients. Client programs, running on any number of computers, periodically send UDP packets to the server program that runs on the central computer. The server program, in either version, dynamically builds a dictionary that stores the IP addresses of the “client” computers and the timestamp of the last packet received from each one. At the same time, the server program periodically checks the dictionary, checking whether any of the timestamps are older than a defined timeout, to identify clients that have been silent too long.

In this kind of application, there is no need to use reliable TCP connections since the loss of a packet now and then does not produce false alarms, as long as the server-checking timeout is kept suitably larger than the “client"-sending period. Since we may have hundreds of computers to monitor, it is best to keep the bandwidth used and the load on the server at a minimum: we do this by periodically sending a small UDP packet, instead of setting up a relatively expensive TCP connection per client.

The packets are sent from each client every 5 seconds, while the server checks the dictionary every 20 seconds, and the server’s timeout defaults to 15 seconds. These parameters, along with the server IP number and port used, can be adapted to one’s needs.

Threaded server

In the threaded server, a small number of worker threads listen to the UDP packets coming from the “client"s, while the main thread periodically checks the recorded heartbeats. The shared data structure, a dictionary, must be locked and released at each access, both while writing and reading, to avoid data corruption on concurrent access. Such data corruption would typically manifest itself as intermittent, time-dependent bugs that are difficult to reproduce, investigate, and correct.

A very sound alternative to such meticulous use of locking around access to a resource is to dedicate a specialized thread to be the only one interacting with the resource (in this case, the dictionary), while all other threads send work requests to the specialized thread with a Queue.Queue instance. A Queue-based approach is more scalable when per-resource locking gets too complicated to manage easily: Queue is less bug-prone and, in particular, avoids worries about deadlocks. See Recipe 9.3, Recipe 9.5, Recipe 9.4, and Recipe 11.9 for more information about Queue and examples of using Queue to structure the architecture of a multithreaded program.

Asynchronous server

The Twisted server employs an asynchronous, event-driven model based on the Twisted framework (http://www.twistedmatrix.com/). The framework is built around a central “reactor” that dispatches events from a queue in a single thread, and monitors network and host resources. The user program is composed of short code fragments invoked by the reactor when dispatching the matching events. Such a working model guarantees that only one user code fragment is executing at any given time, eliminating at the root all problems of concurrent access to shared data structures. Asynchronous servers can provide excellent performance and scalability under very heavy loads, by avoiding the threading and locking overheads of multithreader servers.

The asynchronous server program presented in this recipe is composed of one application and two services, the UDPServer and the DetectorService, respectively. It is invoked at any command shell by means of the twistd command, with the following options:

$ twistd -ony AsyncBeatServer.py

The twistd command controls the reactor, and many other delicate facets of a server’s operation, leaving the script it loads the sole responsibility of defining a global variable named application, implementing the needed services, and connecting the service objects to the application object.

Normally, twistd runs as a daemon and logs to a file (or to other logging facilities, depending on configuration options), but in this case, with the -ony flags, we’re specifically asking twistd to run in the foreground and with logging to standard output, so we can better see what’s going on. Note that the most popular file extension for scripts to be loaded by twistd is .tac, although in this recipe I have used the more generally familiar extension .py. The choice of file extension is just a convention, in this case: twistd can work with Python source files with any file extension, since you pass the full filename, extension included, as an explicit command-line argument anyway.

See Also

Documentation for the standard library modules socket, threading, Queue and time in the Library Reference and Python in a Nutshell; twisted is at http://www.twistedmatrix.com; Jeff Bauer has a related program, known as Mr. Creosote (http://starship.python.net/crew/jbauer/creosote/), using UDP for logging information; UDP is described in depth in W. Richard Stevens, UNIX Network Programming, Volume 1: Networking APIs-Sockets and XTI, 2d ed. (Prentice-Hall); for the truly curious, the UDP protocol is defined in the two-page RFC 768 (http://www.ietf.org/rfc/rfc768.txt), which, when compared with current RFCs, shows how much the Internet infrastructure has evolved in 20 years.

13.12. Monitoring a Network with HTTP

Credit: Magnus Lyckå

Problem

You want to implement special-purpose HTTP servers to enable you to monitor your network.

Solution

The Python Standard Library BaseHTTPServer module makes it easy to implement special-purpose HTTP servers. For example, here is a special-purpose HTTP server program that runs local commands on the server host to get the data for replies to each GET request:

import BaseHTTPServer, shutil, os
from cStringIO import StringIO
class MyHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    # HTTP paths we serve, and what commandline-commands we serve them with
    cmds = {'/ping': 'ping www.thinkware.se',
            '/netstat' : 'netstat -a',
            '/tracert': 'tracert www.thinkware.se',
            '/srvstats': 'net statistics server',
            '/wsstats': 'net statistics workstation',
            '/route' : 'route print',
            }
    def do_GET(self):
        """ Serve a GET request. """
        f = self.send_head( )
        if f:
            f = StringIO( )
            machine = os.popen('hostname').readlines( )[0]
            if self.path == '/':
                heading = "Select a command to run on %s" % (machine)
                body = (self.getMenu( ) +
                        "<p>The screen won't update until the selected "
                        "command has finished. Please be patient.")
            else:
                heading = "Execution of ``%s'' on %s" % (
                           self.cmds[self.path], machine)
                cmd = self.cmds[self.path]
                body = '<a href="/">Main Menu&lt;/a&gt;<pre>%s</pre>
' % 
                       os.popen(cmd).read( )
                # Translation CP437 -> Latin 1 needed for Swedish Windows.
                body = body.decode('cp437').encode('latin1')
            f.write("<html><head><title>%s</title></head>
" % heading)
            f.write('<body><H1>%s</H1>
' % (heading))
            f.write(body)
            f.write('</body></html>
')
            f.seek(0)
            self.copyfile(f, self.wfile)
            f.close( )
        return f
    def do_HEAD(self):
        """ Serve a HEAD request. """
        f = self.send_head( )
        if f:
            f.close( )
    def send_head(self):
        path = self.path
        if not path in ['/'] + self.cmds.keys( ):
            head = 'Command "%s" not found. Try one of these:<ul>' % path
            msg = head + self.getMenu( )
            self.send_error(404, msg)
            return None
        self.send_response(200)
        self.send_header("Content-type", 'text/html')
        self.end_headers( )
        f = StringIO( )
        f.write("A test %s
" % self.path)
        f.seek(0)
        return f
    def getMenu(self):
        keys = self.cmds.keys( )
        keys.sort( )
        msg = [  ]
        for k in keys:
            msg.append('<li><a href="%s">%s => %s&lt;/a&gt;</li>' %(
                                     k,  k,    self.cmds[k]))
        msg.append('</ul>')
        return "
".join(msg)
    def copyfile(self, source, outputfile):
        shutil.copyfileobj(source, outputfile)
def main(HandlerClass = MyHTTPRequestHandler,
         ServerClass = BaseHTTPServer.HTTPServer):
    BaseHTTPServer.test(HandlerClass, ServerClass)
if _ _name_ _ == '_ _main_ _':
    main( )

Discussion

The Python Standard Library module BaseHTTPServer makes it easy to set up custom web servers on an internal network. This way, you can run commands on various machines by just visiting those servers with a browser. The code in this recipe is Windows-specific, indeed specific to the version of Windows normally run in Sweden, because it knows about code page 437 providing the encoding for the various commands’ results. The commands themselves are Windows ones, but that’s just as easy to customize for your own purposes as the encoding issue—for example, using traceroute (the Unix spelling of the command) instead of tracert (the way Windows spells it).

In this recipe, all substantial work is performed by external commands invoked by os.popen calls. Of course, it would be perfectly feasible to satisfy some or all of the requests by running actual Python code within the same process as the web server. We would normally not worry about concurrency issues for this kind of special-purpose, ad hoc, administrative server (unlike most web servers): the scenario it’s intended to cover is one system administrator sitting at her system and visiting, with her browser, various machines on the network being administered/monitored—concurrency is not really needed. If your scenario is somewhat different so that you do need concurrency, then multithreading and asynchronous operations, shown in several other recipes, are your fundamental options.

See Also

Documentation for the standard library modules BaseHTTPServer, shutil, os, and cStringIO in the Library Reference and Python in a Nutshell.

13.13. Forwarding and Redirecting Network Ports

Credit: Simon Foster

Problem

You need to forward a network port to another host (forwarding), possibly to a different port number (redirecting).

Solution

Classes using the threading and socket modules can provide port forwarding and redirecting:

import sys, socket, time, threading
LOGGING = True
loglock = threading.Lock( )
def log(s, *a):
    if LOGGING:
        loglock.acquire( )
        try:
            print '%s:%s' % (time.ctime( ), (s % a))
            sys.stdout.flush( )
        finally:
            loglock.release( )
class PipeThread(threading.Thread):
    pipes = [  ]
    pipeslock = threading.Lock( )
    def _ _init_ _(self, source, sink):
        Thread._ _init_ _(self)
        self.source = source
        self.sink = sink
        log('Creating new pipe thread %s ( %s -> %s )',
             self, source.getpeername( ), sink.getpeername( ))
        self.pipeslock.acquire( )
        try: self.pipes.append(self)
        finally: self.pipeslock.release( )
        self.pipeslock.acquire( )
        try: pipes_now = len(self.pipes)
        finally: self.pipeslock.release( )
        log('%s pipes now active', pipes_now)
    def run(self):
        while True:
            try:
                data = self.source.recv(1024)
                if not data: break
                self.sink.send(data)
            except:
                break
        log('%s terminating', self)
        self.pipeslock.acquire( )
        try: self.pipes.remove(self)
        finally: self.pipeslock.release( )
        self.pipeslock.acquire( )
        try: pipes_left = len(self.pipes)
        finally: self.pipeslock.release( )
        log('%s pipes still active', pipes_left)
class Pinhole(threading.Thread):
    def _ _init_ _(self, port, newhost, newport):
        Thread._ _init_ _(self)
        log('Redirecting: localhost:%s -> %s:%s', port, newhost, newport)
        self.newhost = newhost
        self.newport = newport
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.sock.bind(('', port))
        self.sock.listen(5)
    def run(self):
        while True:
            newsock, address = self.sock.accept( )
            log('Creating new session for %s:%s', *address)
            fwd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            fwd.connect((self.newhost, self.newport))
            PipeThread(newsock, fwd).start( )
            PipeThread(fwd, newsock).start( )

A short ending to this pinhole.py module, with the usual guard to run this part only when pinhole is run as a main script rather than imported, lets us offer this recipe’s functionality as a command-line script:

if _ _name_ _ == '_ _main_ _':
    print 'Starting Pinhole port forwarder/redirector'
    import sys
    # get the arguments, give help in case of errors
    try:
        port = int(sys.argv[1])
        newhost = sys.argv[2]
        try: newport = int(sys.argv[3])
        except IndexError: newport = port
    except (ValueError, IndexError):
        print 'Usage: %s port newhost [newport]' % sys.argv[0]
        sys.exit(1)
    # start operations
    sys.stdout = open('pinhole.log', 'w')
    Pinhole(port, newhost, newport).start( )

Discussion

Port forwarding and redirecting can often come in handy when you’re operating a network, even a small one. Applications or other services, possibly not under your control, may be hardwired to connect to servers on certain addresses or ports; by interposing a forwarder and redirector, you can send such applications’ connection requests onto any other host and/or port that suits you better.

The code in this recipe supplies two classes that liberally use threading to provide this functionality and a small “main script” at the end, with the usual if _ _name_ _ = = '_ _main_ _' guard, to deliver this functionality as a command-line script. For once, the small “main script” is not just for demonstration and testing purposes but is actually quite useful on its own. For example:

# python pinhole.py 80 webserver

forwards all incoming HTTP sessions on standard port 80 to host webserver;

# python pinhole.py 23 localhost 2323

redirects all incoming telnet sessions on standard port 23 to port 2323 on this same host (since localhost is the conventional hostname for “this host” in all TCP/IP implementations).

See Also

Documentation for the standard library modules socket and threading in the Library Reference and Python in a Nutshell.

13.14. Tunneling SSL Through a Proxy

Credit: John Nielsen

Problem

You need to tunnel SSL (Secure Socket Layer) communications through a proxy, but the Python Standard Library doesn’t support that functionality out of the box.

Solution

We can code a generic proxy, defaulting to SSL but, in fact, good for all kinds of network protocols. Save the following code as module file pytunnel.py somewhere along your Python sys.path:

import threading, socket, traceback, sys, base64, time
def recv_all(the_socket, timeout=1):
    ''' receive all data available from the_socket, waiting no more than
        ``timeout'' seconds for new data to arrive; return data as string.'''
    # use non-blocking sockets
    the_socket.setblocking(0)
    total_data = [  ]
    begin = time.time( )
    while True:
        ''' loop until timeout '''
        if total_data and time.time( )-begin > timeout:
            break     # if you got some data, then break after timeout seconds
        elif time.time( )-begin > timeout*2:
            break     # if you got no data at all yet, wait a little longer
        try:
            data = the_socket.recv(4096)
            if data:
                total_data.append(data)
                begin = time.time( )       # reset start-of-wait time
            else:
                time.sleep(0.1)           # give data some time to arrive
        except:
            pass
    return ''.join(total_data)
class thread_it(threading.Thread):
    ''' thread instance to run a tunnel, or a tunnel-client '''
    done = False
    def _ _init_ _(self, tid='', proxy='', server='', tunnel_client='', 
                 port=0, ip='', timeout=1):
        threading.Thread._ _init_ _(self)
        self.tid = tid
        self.proxy = proxy
        self.port = port
        self.server = server
        self.tunnel_client = tunnel_client
        self.ip = ip; self._port = port
        self.data = {  }     #   store data here to get later
        self.timeout = timeout
    def run(self):
        try:
            if self.proxy and self.server:
                ''' running tunnel operation, so bridge server <-> proxy '''
                new_socket = False
                while not thread_it.done:    # loop until termination
                    if not new_socket:
                        new_socket, address = self.server.accept( )
                    else:
                        self.proxy.sendall(
                            recv_all(new_socket, timeout=self.timeout))
                        new_socket.sendall(
                            recv_all(self.proxy, timeout=self.timeout))
            elif self.tunnel_client:
                ''' running tunnel client, just mark down when it's done '''
                self.tunnel_client(self.ip, self.port)
                thread_it.done = True     # normal termination
        except Exception, error:
            print traceback.print_exc(sys.exc_info( )), error
            thread_it.done = True         # orderly termination upon exception
class build(object):
    ''' build a tunnel object, ready to run two threads as needed '''
    def _ _init_ _(self, host='', port=443, proxy_host='', proxy_port=80, 
                 proxy_user='', proxy_pass='', proxy_type='', timeout=1):
        self._port=port; self.host=host; self._phost=proxy_host
        self._puser=proxy_user; self._pport=proxy_port; self._ppass=proxy_pass
        self._ptype=proxy_type; self.ip='127.0.0.1'; self.timeout=timeout
        self._server, self.server_port = self.get_server( )
    def get_proxy(self):
        if not self._ptype:
            proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            proxy.connect((self._phost, self._pport))
            proxy_authorization = ''
            if self._puser:
                proxy_authorization = 'Proxy-authorization: Basic '+
                    base64.encodestring(self._puser+':'+self._ppass
                                       ).strip( )+'
'
            proxy_connect = 'CONNECT %s:%sHTTP/1.0
' % (
                             self.host, self._port)
            user_agent = 'User-Agent: pytunnel
'
            proxy_pieces = proxy_connect+proxy_authorization+user_agent+'
'
            proxy.sendall(proxy_pieces+'
')
            response = recv_all(proxy, timeout=0.5)
            status = response.split(None, 1)[1]
            if int(status)/100 != 2:
                print 'error', response
                raise RuntimeError(status)
            return proxy
    def get_server(self):
        port = 2222
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        server.bind(('localhost', port))
        server.listen(5)
        return server, port
    def run(self, func):
        Threads = [  ]
        Threads.append(thread_it(tid=0, proxy=self.get_proxy( ),
                                 server=self._server, timeout=self.timeout))
        Threads.append(thread_it(tid=1, tunnel_client=func, ip=self.ip,
                                 port=self.server_port, timeout=0.5))
        for Thread in Threads:
            Thread.start( )
        for Thread in Threads:
            Thread.join( )

Discussion

Here is how you would typically use this pytunnel module in a small example script that tunnels an SSL connection through a proxy:

import pytunnel, httplib
def tunnel_this(ip, port):
    conn = httplib.HTTPSConnection(ip, port=port)
    conn.putrequest('GET', '/')
    conn.endheaders( )
    response = conn.getresponse( )
    print response.read( )
tunnel = pytunnel.build(host='login.yahoo.com', proxy_host='h1',
                        proxy_user='u', proxy_pass='p')
tunnel.run(tunnel_this)

This example assumes you have a proxy server running on host h1, which is ready to accept basic authentication for a proxy user named u with a proxy password of p. Since it’s unlikely that this is, in fact, your specific setup, you’ll have to tweak these parameters if you want to see an example of this recipe’s code running. But you understand the general idea: you instantiate class pytunnel.build, with all appropriate parameters passed with named-argument syntax, to build a tunnel object; then, you call the tunnel object’s method run, passing as its argument your function that you want to be “tunneled” through the proxy. That function, in turn, receives as its arguments an IP address and a port number, and can connect to that address and port via SSL or any protocol implying SSL/TLS (Transport Layer Security), such as HTTPS.

Internally, the tunnel object instantiates two threads that are instances of thread_it, one to run the tunnel client function, the other to perform the tunneling operation itself. The tunneling operation, in turn, is nothing more than an endless loop where all data available are received from one party and resent to the other, and vice versa; function recv_all deals with the task of receiving all available data, while the socket method send_all does the sending. The thread_it instance which runs the tunneling operation, therefore, does no more than an endless loop of just such calls.

The code shown in this recipe is still being actively developed at the time of writing. For the latest version, see http://ftp.gnu.org/pub/savannah/files/pytunnel/pytunnel.py. Another alternative worth considering for tunneling and forwarding is Twisted’s simple proxy (http://www.twistedmatrix.com/), but I have not personally tried that one yet.

See Also

For SSL/TLS standards, http://www.ietf.org/html.charters/tls-charter.html; documentation for the standard library modules socket, threading and time in the Library Reference and Python in a Nutshell.

13.15. Implementing the Dynamic IP Protocol

Credit: Nicola Paolucci, Mark Rowe, Andrew Notspecified

Problem

You use a Dynamic DNS Service which accepts the GnuDIP protocol (like yi.org), and need a command-line script to update your IP which is recorded with that service.

Solution

The Twisted framework has plenty of power for all kinds of network tasks, so we can use it to write a script to implement GnuDIP:

import md5, sys
from twisted.internet import protocol, reactor
from twisted.protocols import basic
from twisted.python import usage
def hashPassword(password, salt):
    ''' compute and return MD5 hash for given password and `salt'. '''
    p1 = md5.md5(password).hexdigest( ) + '.' + salt.strip( )
    return md5.md5(p1).hexdigest( )
class DIPProtocol(basic.LineReceiver):
    """ Implementation of GnuDIP protocol(TCP) as described at:
    http://gnudip2.sourceforge.net/gnudip-www/latest/gnudip/html/protocol.html
    """
    delimiter = '
'
    def connectionMade(self):
        ''' at connection, we start in state "expecting salt". '''
        basic.LineReceiver.connectionMade(self)
        self.expectingSalt = True
    def lineReceived(self, line):
        ''' we received a full line, either "salt" or normal response '''
        if self.expectingSalt:
            self.saltReceived(line)
            self.expectingSalt = False
        else:
            self.responseReceived(line)
    def saltReceived(self, salt):
        """ Override this 'abstract method' """
        raise NotImplementedError
    def responseReceived(self, response):
        """ Override this 'abstract method' """
        raise NotImplementedError
class DIPUpdater(DIPProtocol):
    """ A simple class to update an IP, then disconnect. """
    def saltReceived(self, salt):
        ''' having received `salt', login to the DIP server '''
        password = self.factory.getPassword( )
        username = self.factory.getUsername( )
        domain = self.factory.getDomain( )
        msg = '%s:%s:%s:2' % (username, hashPassword(password, salt), domain)
        self.sendLine(msg)
    def responseReceived(self, response):
        ''' response received: show errors if any, then disconnect. '''
        code = response.split(':', 1)[0]
        if code == '0':
            pass  # OK
        elif code == '1':
            print 'Authentication failed'
        else:
            print 'Unexpected response from server:', repr(response)
        self.transport.loseConnection( )
class DIPClientFactory(protocol.ClientFactory):
     """ Factory used to instantiate DIP protocol instances with
         correct username, password and domain.
     """
     protocol = DIPUpdater
     # simply collect data for login and provide accessors to them
     def _ _init_ _(self, username, password, domain):
         self.u = username
         self.p = password
         self.d = domain
     def getUsername(self):
         return self.u
     def getPassword(self):
         return self.p
     def getDomain(self):
         return self.d
     def clientConnectionLost(self, connector, reason):
         ''' terminate script when we have disconnected '''
         reactor.stop( )
     def clientConnectionFailed(self, connector, reason):
         ''' show error message in case of network problems '''
         print 'Connection failed. Reason:', reason
class Options(usage.Options):
     ''' parse options from commandline or config script '''
     optParameters = [['server', 's', 'gnudip2.yi.org', 'DIP Server'],
                      ['port', 'p', 3495, 'DIP Server  port'],
                      ['username', 'u', 'durdn', 'Username'],
                      ['password', 'w', None, 'Password'],
                      ['domain', 'd', 'durdn.yi.org', 'Domain']]
if _ _name_ _ == '_ _main_ _':
     # running as main script: first, get all the needed options
     config = Options( )
     try:
         config.parseOptions( )
     except usage.UsageError, errortext:
         print '%s: %s' % (sys.argv[0], errortext)
         print '%s: Try --help for usage details.' % (sys.argv[0])
         sys.exit(1)
     server = config['server']
     port = int(config['port'])
     password = config['password']
     if not password:
         print 'Password not entered. Try --help for usage details.'
         sys.exit(1)
     # and now, start operations (via Twisted's ``reactor'')
     reactor.connectTCP(server, port,
            DIPClientFactory(config['username'], password, config['domain']))
     reactor.run( )

Discussion

I wanted to use a Dynamic DNS Service called yi.org, but I did not like the option of installing the suggested small client application to update my IP address on my OpenBSD box. So I resorted to writing the script shown in this recipe. I put it into my crontab to keep my domain always up-to-date with my dynamic IP address at home.

This little script is now at version 0.4, and its development history is quite instructive. I thought that even the first version. 0.1, which I got working in a few minutes, effectively demonstrated the power of the Twisted framework in developing network applications, so I posted that version on the ActiveState cookbook site. Lo and behold—Mark first, then Andrew, showered me with helpful suggestions, and I repeatedly updated the script in response to their advice. So it now demonstrates even better, not just the power of Twisted, but more generally the power of collaborative development in an open-source or free-software community.

To give just one example: originally, I had overridden buildProtocol and passed the factory object to the protocol object explicitly. The factory object, in the Twisted framework architecture, is where shared state is kept (in this case, the username, password, and domain), so I had to ensure the protocol knew about the factory—I thought. It turns out that, exactly because just about every protocol needs to know about its factory object, Twisted takes care of it in its own default implementation of buildProtocol, making the factory object available as the factory attribute of every protocol object. So, my code, which duplicated Twisted’s built-in functionality in this regard, was simply ripped out, and the recipe’s code is simpler and better as a result.

Too often, software is presented as a finished and polished artifact, as if it sprang pristine and perfect like Athena from Zeus’ forehead. This gives entirely the wrong impression to budding software developers, making them feel inadequate because their code isn’t born perfect and fully developed. So, as a counterweight, I thought it important to present one little story about how software actually grows and develops!

One last detail: it’s tempting to place methods updateIP and removeIP in the DIPProtocol class, to ease the writing of subclasses such as DIPUpdater. However, in my view, that would be an over-generalization, overkill for such a simple, lightweight recipe as Python and Twisted make this one. In practice we won’t need all that many dynamic IP protocol subclasses, and if it turns out that we’re wrong and we do, in fact, need them, hey, refactoring is clearly not a hard task with such a fluid, dynamic language and powerful frameworks to draw on. So, respect the prime directive: “do the simplest thing that can possibly work.”

In a sense, the code in this recipe could be said to violate the prime directive, because it uses an elegant object-oriented architecture with an abstract base class, a concrete subclass to specialize it, and, in the factory class, accessor methods rather than simple attribute access for the login data (i.e., user, password, domain). All of these niceties are lifesavers in big programs, but they admittedly could be foregone for a program of only 120 lines (which would shrink a little further if it didn’t use all these niceties). However, adopting a uniform style of program architecture, even for small programs, eases the refactoring task in those not-so-rare cases where a small program grows into a big one. So, I have deliberately developed the habit of always coding in such an “elegant OO way”, and once the habit is acquired, I find that it enhances, rather than reduces, my productivity.

13.16. Connecting to IRC and Logging Messages to Disk

Credit: Gian Mario Tagliaretti, J P Calderone

Problem

You want to connect to an IRC (Internet Relay Chat) server, join a channel, and store private messages into a file on your hard disk for future reading.

Solution

The Twisted framework has excellent support for many network protocols, including IRC, so we can perform this recipe’s task with a very simple script:

from twisted.internet import reactor, protocol
from twisted.protocols import irc
class LoggingIRCClient(irc.IRCClient):
    logfile = file('/tmp/msg.txt', 'a+')
    nickname = 'logging_bot'
    def signedOn(self):
        self.join('#test_py')
    def privmsg(self, user, channel, message):
        self.logfile.write(user.split('!')[0] + ' -> ' + message + '
')
        self.logfile.flush( )
def main( ):
    f = protocol.ReconnectingClientFactory( )
    f.protocol = LoggingIRCClient
    reactor.connectTCP('irc.freenode.net', 6667, f)
    reactor.run( )
if _ _name_ _ == '_ _main_ _':
    main( )

Discussion

If, for some strange reason, you cannot use Twisted, then you can implement similar functionality from scratch based only on the Python Standard Library. Here’s a reasonable approach—nowhere as simple, solid, and robust as, and lacking the beneficial performance of, Twisted, but nevertheless sort of workable:

import socket
SERVER = 'irc.freenode.net'
PORT = 6667
NICKNAME = 'logging_bot'
CHANNEL = '#test_py'
IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
def irc_conn( ):
    IRC.connect((SERVER, PORT))
def send_data(command):
    IRC.send(command + '
')
def join(channel):
    send_data("JOIN %s" % channel)
def login(nickname, username='user', password=None,
          realname='Pythonist', hostname='Helena', servername='Server'):
    send_data("USER %s %s %s %s" %
               (username, hostname, servername, realname))
    send_data("NICK %s" % nickname)
irc_conn( )
login(NICKNAME)
join(CHANNEL)
filetxt = open('/tmp/msg.txt', 'a+')
try:
    while True:
        buffer = IRC.recv(1024)
        msg = buffer.split( )
        if msg[0] == "PING":
            # answer PING with PONG, as RFC 1459 specifies
            send_data("PONG %s" % msg[1])  
        if msg [1] == 'PRIVMSG' and msg[2] == NICKNAME:
            nick_name = msg[0][:msg[0].find("!")]
            message = ' '.join(msg[3:])
            filetxt.write(nick_name.lstrip(':') + ' -> ' +
                          message.lstrip(':') + '
')
            filetxt.flush( )
finally:
    filetxt.close( )

For this roll-our-own reimplementation, we do need some understanding of the protocol’s RFC, such as the need to answer a server’s PING with a proper PONG to confirm that our connection is alive. In any case, since the code has already grown to over twice as much as Twisted requires, we’ve omitted niceties (which are very important for reliable unattended operation) such as automatic reconnection attempts when the connection drops, which Twisted gives us effortlessly via its protocol.ReconnectingClientFactory.

See Also

Documentation for the standard library module socket in the Library Reference and Python in a Nutshell; twisted is at http://www.twistedmatrix.com.

13.17. Accessing LDAP Servers

Credit: John Nielsen

Problem

You need to access an LDAP (Lightweight Directory Access Protocol) server from your Python programs.

Solution

The simplest solution is offered by the freely downloadable third-party extension ldap (http://python-ldap.sourceforge.net). This script shows a few LDAP operations with ldap:

try:
    path = 'cn=people,ou=office,o=company'
    l = ldap.open('hostname')
    # set which protocol to use, if you do not like the default
    l.protocol_version = ldap.VERSION2
    l.simple_bind('cn=root,ou=office,o=company','password')
    # search for surnames beginning with a
    # available options for how deep a search you want:
    # LDAP_SCOPE_BASE, LDAP_SCOPE_ONELEVEL,LDAP_SCOPE_SUBTREE,
    a = l.search_s(path, ldap.SCOPE_SUBTREE, 'sn='+'a*')
    # delete fred
    l.delete_s('cn=fred,'+path)
    # add barney
    # note: objectclass depends on the LDAP server
    user_info = {'uid':'barney123',
                'givenname':'Barney',
                'cn':'barney123',
                'sn':'Smith',
                'telephonenumber':'123-4567',
                'facsimiletelephonenumber':'987-6543',
                'objectclass':('Remote-Address','person', 'Top'),
                'physicaldeliveryofficename':'Services',
                'mail':'[email protected]',
                'title':'programmer',
                }
    id = 'cn=barney,'+path
    l.add_s(id, user_info.items( ))
except ldap.LDAPError, error:
    print 'problem with ldap:', error

Discussion

The ldap module wraps the open source Openldap C API. However, with ldap, your Python program can talk to various versions of LDAP servers, as long as they’re standards compliant, not just to Openldap servers.

The recipe shows a script with a few example uses of the ldap module. For simplicity, all the functions the recipe calls from the library are the '_s' versions (e.g., search_s): this means the functions are synchronous—that is, they wait for a response or an error code and don’t return control to your program until either an error or a response appears from the server. Asynchronous programming is less elementary than synchronous, although it can often offer far better performance and scalability.

LDAP is widely used to keep and coordinate network-accessible information, particularly in large and geographically distributed organizations. Essentially, LDAP lets you organize information, search for it, create new items, and delete existing items. The ldap module lets your Python program perform the search, creation, and deletion functions.

See Also

http://python-ldap.sourceforge.net/docs.shtml for all the documentation about the ldap module and other relevant pointers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.67.27