Chapter 16. Network Programming

For more than a decade at the time this book is being written, one of the main reasons driving the purchase of personal computers is the desire to get online: to connect in various ways to other computers throughout the world. Network connectivity — specifically, Internet connectivity — is the "killer app" for personal computing, the feature that got a computer-illiterate general population to start learning about and buying personal computers en masse.

Without networking, you can do amazing things with a computer, but your audience is limited to the people who can come over to look at your screen or who can read the printouts or load the CDs and DVDs you distribute. Connect the same computer to the Internet and you can communicate across town or across the world.

The Internet's architecture supports an unlimited number of applications, but it boasts two killer apps of its own — two applications that people get online just to use. One is, of course, the incredibly popular World Wide Web, which is covered in Chapter 20, "Web Applications and Web Services."

The Internet's other killer app is e-mail, which is covered in depth in this chapter.

In this chapter you learn:

  • To use standard libraries to write applications that compose, send, and receive e-mail

  • To create programs that send and receive data in custom formats.

  • The basics of socket programming

smtplib takes its name from SMTP, the Simple Mail Transport Protocol. That's the protocol, or standard, defined for sending Internet mail. As you see, Python comes packaged with modules that help you speak many Internet protocols, and the module is always named after the protocol: imaplib, poplib, httplib, ftplib, and so on.

Put your own e-mail address in me@mydomain, and if you've got a mail server running on your machine, you should be able to send mail to yourself, as shown in Figure 16-1.

Figure 16-1

Figure 16.1. Figure 16-1

However, you probably don't have a mail server running on your machine. (You might have one if you're running these scripts on a shared computer, or if you set the mail server up yourself, in which case you probably already know a bit about networking and are impatiently waiting for the more advanced parts of this chapter.) If there's no mail server on the machine where you run this script, you'll get an exception when you try to instantiate the remote SMTP mail server object, something similar to this:

Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
    server=smtplib.SMTP("localhost",25)
  File "C:Python31libsmtplib.py", line 239, in __init__
(code, msg) = self.connect(host, port)
  File "C:Python31libsmtplib.py", line 295, in connect
    self.sock = self._get_socket(host, port, self.timeout)
  File "C:Python31libsmtplib.py", line 273, in _get_socket
    return socket.create_connection((host, port), timeout)
  File "C:Python31libsocket.py", line 307, in create_connection
    raise error(msg)
socket.error: [Errno 10061] No connection could be made because the target machine actively refused it)

What's going on here? Look at the line that caused the exception:

>>> server = smtplib.SMTP("localhost", 25)

The constructor for the smtplib class is trying to start up a network connection using IP, the Internet Protocol. The string "localhost" and the number 25 identify the Internet location of the putative mail server. Because you're not running a mail server, there's nothing at the other end of the connection, and when Python discovers this fact, it can't continue.

To understand the mystical meanings of "localhost" and 25, it helps to know a little about protocols, and the Internet Protocol in particular.

Understanding Protocols

A protocol is a convention for structuring the data sent between two or more parties on a network. It's analogous to the role of protocol or etiquette in relationships between humans. For instance, suppose that you wanted to go out with friends to dinner or get married to someone. Each culture has defined conventions describing the legal and socially condoned behavior in such situations. When you go out for dinner, there are conventions about how to behave in a restaurant, how to use the eating utensils, and how to pay. Marriages are carried out according to conventions regarding rituals and contracts, conventions that can be very elaborate.

These two activities are very different, but the same lower-level social protocols underlie both of them. These protocols set standards for things such as politeness and the use of a mutually understood language. On the lowest level, you may be vibrating your vocal cords in a certain pattern, but on a higher level you're finalizing your marriage by saying "I do." Violate a lower-level protocol (say, by acting rudely in the restaurant) and your chances of carrying out your high-level goal can be compromised. All of these aspects of protocols for human behavior have their correspondence in protocols for computer networking.

Comparing Protocols and Programming Languages

Thousands of network protocols for every imaginable purpose have been invented over the past few decades; it might be said that the history of networking is the history of protocol design. Why so many protocols? To answer this question, consider another analogy to the world of network protocols: Why so many programming languages? Network protocols have the same types of interrelation as programming languages, and people create new protocols for the same reasons they create programming languages.

Different programming languages have been designed for different purposes. It would be madness to write a word processor in the FORTRAN language, not because FORTRAN is objectively "bad," but because it was designed for mathematical and scientific research, not end-user GUI applications.

Similarly, different protocols are intended for different purposes. SMTP, the protocol you just got a brief look at, could be used for all sorts of things besides sending mail. No one does this because it makes more sense to use SMTP for the purpose for which it was designed, and use other protocols for other purposes.

A programming language may be created to compete with others in the same niche. The creators of a new language may see technical or aesthetic flaws in existing languages and want to make their own tasks easier. A language author may covet the riches and fame that come with being the creator of a popular language. A person may invent a new protocol because he's come up with a new type of application that requires one.

Some programming languages are designed specifically for teaching students how to program, or, at the other end of programming literacy, how to write compilers. Some languages are designed to explore new ideas, not for real use, and other languages are created as a competitive tool by one company for use against another company.

These factors also come into play in protocol design. Companies sometimes invent new, incompatible protocols to try to take business from a competitor. Some protocols are intended only for pedagogical purposes. For instance, this chapter, under the guise of teaching network programming, also teaches designing protocols for things like online chat rooms. Perfectly good protocols for this already exist, but they're too complex to be given a proper treatment in the available space.

The ADA programming language was defined by the U.S. Department of Defense to act as a common language across all military programming projects. The Internet Protocol was created to enable multiple previously incompatible networks to communicate with one another (hence the name "Internet").

Nowadays, even internal networks (intranets) usually run atop the Internet Protocol, but the old motives (the solving of new problems, competition, and so on) remain in play at higher and lower levels, which brings us to the most interesting reason for the proliferation of programming languages and protocols.

The Internet Protocol Stack

Different programming languages operate at different levels of abstraction. Python is a very high-level language capable of all kinds of tasks, but the Python interpreter itself isn't written in Python: It's written in C, a lower-level language. C, in turn, is compiled into a machine language specific to your computer architecture. Whenever you type a statement into a Python interpreter, there is a chain of abstraction reaching down to the machine code, and even lower to the operation of the digital circuits that actually drive the computer.

There's a Python interpreter written in Java (Jython), but Java is written in C. PyPy is a project that aims to implement a Python interpreter in Python, but PyPy runs on top of the C or Java implementation. You can't escape C!

In one sense, when you type a statement into the Python interpreter, the computer simply "does what you told it to." In another, it runs the Python statement you typed. In a third sense, it runs a longer series of C statements, written by the authors of Python and merely activated by your Python statement. In a fourth sense, the computer runs a very long, nearly incomprehensible series of machine code statements. In a fifth, it doesn't "run" any program at all: You just cause a series of timed electrical impulses to be sent through the hardware. The reason we have high-level programming languages is because they're easier to use than the lower-level ones. That doesn't make lower-level languages superfluous, though.

English is a very high-level human language capable of all kinds of tasks, but one can't speak English just by "speaking English." To speak English, one must actually make some noises, but a speaker can't just "make some noises" either: We have to send electrical impulses from our brains that force air out of the lungs and constantly reposition the tongues and lips. It's a very complicated process, but we don't even think about the lower levels — only the words we're saying and the concepts we're trying to convey.

The soup of network protocols can be grouped into a similar hierarchical structure based on levels of abstraction, or layers. On the physical layer, the lowest level, it's all just electrical impulses and EM radiation. Just above the physical layer, every type of network hardware needs its own protocol, implemented in software (for instance, the Ethernet protocol for networks that run over LAN wires). The electromagnetic phenomena of the physical layer can now be seen as the sending and receiving of bits from one device to another. This is called the data link layer. As you go up the protocol stack, these raw bits take on meaning: They become routing instructions, commands, responses, images, web pages, and so on.

Because different pieces of hardware communicate in different ways, connecting (for example) an Ethernet network to a wireless network requires a protocol that works on a higher level than the data link layer. As mentioned earlier, the common denominator for most networks nowadays is the Internet Protocol (IP), which implements the network layer and connects all those networks together. IP works on the network layer.

Directly atop the network layer is the transport layer, which makes sure the information sent over IP gets to its destination reliably, in the right order, and without errors. IP doesn't care about reliability or error-checking: It just takes some data and a destination address, sends it across the network, and assumes it gets to that address intact.

TCP, the Transmission Control Protocol, does care about these things. TCP implements the transport layer of the protocol stack, making reliable, orderly communication possible between two points on the network. It's so common to stack TCP on top of IP that the two protocols are often treated as one and given a unified name, TCP/IP.

All of the network protocols you study and design in this chapter are based on top of TCP/IP. These protocols are at the application layer and are designed to solve specific user problems. Some of these protocols are known by name even to nonprogrammers: You may have heard of HTTP, FTP, BitTorrent, and so on.

When people think of designing protocols, they usually think of the application layer, the one best suited to Python implementations. The other current field of interest is at the other end in the data link layer: embedded systems programming for connecting new types of devices to the Internet. Thanks to the overwhelming popularity of the Internet, TCP/IP has more or less taken over the middle of the protocol stack.

A Little Bit About the Internet Protocol

Now that you understand where the Internet Protocol fits into the protocol stack your computer uses, there are only two things you really need to know about it: addresses and ports.

Internet Addresses

Each computer on the Internet (or on a private TCP/IP network) has one or more IP addresses, usually represented as a dotted series of four numbers, like "208.215.179.178." That same computer may also have one or more hostnames, which look like "wrox.com."

To connect to a service running on a computer, you need to know its IP address or one of its hostnames. (Hostnames are managed by DNS, a protocol that runs on top of TCP/IP and silently turns hostnames into IP addresses.) Recall the script at the beginning of this chapter that sent out mail. When it tried to connect to a mail server, it mentioned the seemingly magic string "localhost":

>>> server = smtplib.SMTP("localhost", 25)

"localhost" is a special hostname that always refers to the computer you're using when you mention it (each computer also has a special IP address that does the same thing: 127.0.0.1). The hostname is how you tell Python where on the Internet to find your mail server.

It's generally better to use hostnames instead of IP addresses, even though the former immediately gets turned into the latter. Hostnames tend to be more stable over time than IP addresses. Another example of the protocol stack in action: The DNS protocol serves to hide the low-level details of IP's addressing scheme.

Of course, if you don't run a mail server on your computer, "localhost" won't work. The organization that gives you Internet access should be letting you use its mail server, possibly located at mail.[your ISP].com or smtp.[your ISP].com. Whatever mail client you use, it probably has the hostname of a mail server somewhere in its configuration, so that you can use it to send out mail. Substitute that for "localhost" in the example code listed previously and you should be able to send mail from Python:

>>> fromAddress = '[email protected]'
>>> toAddress = '[your e-mail address]'
>>> msg = "Subject: Hello

This is the body of the message."
>>> import smtplib
>>> server = smtplib.SMTP("mail.[your ISP].com", 25)
>>> server.sendmail(fromAddress, toAddress, msg)
{}

Unfortunately, you still might not be able to send mail, for any number of reasons. Your SMTP server might demand authentication, which this sample session doesn't provide. It might not accept mail from the machine on which you're running your script (try the same machine you normally use to send mail). It might be running on a nonstandard port (see the following section ). The server might not like the format of this bare-bones message, and expect something more like a "real" e-mail message; if so, the e-mail module described in the following section might help. If all else fails, ask your system administrator for help.

Internet Ports

The string "localhost" has been explained as a DNS hostname that masks an IP address. That leaves the mysterious number 25. What does it mean? Well, consider the fact that a single computer may host more than one service. A single machine with one IP address may have a web server, a mail server, a database server, and a dozen other servers. How should clients distinguish between an attempt to connect to the web server and an attempt to connect to the database server?

A computer that implements the Internet Protocol can expose up to 65,536 numbered ports. When you start an Internet server (say, a web server), the server process "binds" itself to one or more of the ports on your computer (say, port 80, the conventional port for a web server) and begins listening for outside connections to that port. If you've ever seen a website address that looked like "http://www.example.com:8000/", that number is the port number for the web server — in this case, a port number that violates convention. The enforcer of convention in this case is the Internet Assigned Numbers Authority.

The IANA list of protocols and conventional port numbers is published at www.iana.org/assignments/port-numbers.

According to the IANA, the conventional port number for SMTP is 25. That's why the constructor to the SMTP object in the above example received 25 as its second argument (if you don't specify a port number at all, the SMTP constructor will assume 25):

>>> server = smtplib.SMTP("localhost", 25)

The IANA divides the port numbers into "well-known ports" (ports from 0 to 1023), "registered ports" (from 1024 to 49151), and "dynamic ports" (from 49152 to 65535). On most operating systems, you must have administrator privileges to bind a server to a well-known port, because processes that bind to those ports are often themselves given administrator privileges. Anyone can bind servers to ports in the registered range, and that's what you'll do for the custom servers written in this chapter. The dynamic range is used by clients, not servers; we cover that later when talking about sockets.

Sending Internet E-mail

With a basic understanding of how TCP/IP works, the Python session from the beginning of this chapter should now make more sense:

>>> fromAddress = '[email protected]'
>>> toAddress = '[email protected]'
>>> msg = "Subject: Hello

This is the body of the message."
>>> import smtplib
>>> server = smtplib.SMTP("localhost", 25)
>>> server.sendmail(fromAddress, toAddress, msg)
{}

If you don't have an SMTP server running on your machine, you should now be able to find out a hostname and port number that will work for you. The only aspect of the code I haven't explained is why the e-mail message looks the way it does.

The E-mail File Format

In addition to the large number of e-mail–related protocols that have been created, Internet engineers have designed a couple of file formats for packaging the parts of an e-mail message. Both of these protocols and file formats have been published in numbered documents called RFCs.

Throughout this chapter, until you start writing your own protocols, you'll be working with protocols and formats designed by others and specified in RFCs. These documents often contain formal language specifications and other not-quite-light reading, but for the most part they're pretty readable.

The current standard defining the format of e-mail messages is RFC 2822. Published in 2001, it updated the venerable RFC 822, which dates from 1982 (maybe RFC 2822 would have been published earlier if they hadn't had to wait for the numbers to match up). You may still see references to "RFC 822" as shorthand for "the format of e-mail messages," such as in Python's now deprecated rfc822 module.

To find a particular RFC, you can just search the web for "RFC x", or look on the official site at www.ietf.org/rfc.html. RFC 2822 is hosted at (among other places) www.ietf.org/rfc/rfc2822.txt.

An e-mail message consists of a set of headers (metadata describing the message) and a body (the message itself). The headers are actually sent in a form like key-value pairs in which a colon and a space separate the key and the value (for instance, "Subject: Hello"). The body is just that: the text of the message.

You can create RFC 2822–compliant messages with Python using the Message class in Python's e-mail module. The Message object acts like a dictionary that maps message header names to their values. It also has a "payload," which is the body text:

>>>import os
>>>import sys
>>>import smtplib
>>>import mimetypes
>>>from optparse import OptionParser
>>>from e-mail import encoders
>>>from e-mail.message import Message
>>>message=Message()
>>>message['Subject']='Hello'
>>>message.set_payload('This is the body of the message')
>>>print(str(message))

Subject: Hello
This is the body of the message

That's more code than just specifying the e-mail string, but it's less error-prone, especially for a complex message. Also, you'll notice that you got back information that you didn't put into the message. This is because the smtplib adds some required headers onto your message when you send it.

RFC 2822 defines some standard message headers, described in the following table. It also defines data representation standards for some of the header values (for instance, it defines a way of representing e-mail addresses and dates). The standard also gives you space to define custom headers for use in your own programs that send and receive e-mail.

Header

Example

Purpose

To

To: Leonard Richardson <[email protected]>

Addresses of people who should receive the message

From

From: Peter C. Norton <[email protected]>

The e-mail address of the person who (allegedly) sent the message

Date

Date: Wed, 16 Mar 2009 14:36:07 −0500 (EST)

The date the message was sent

Subject

Subject: Python book

A summary or title of the message, intended for human consumption

Cc

Cc: [email protected], Jason Diamond <[email protected]>

Addresses of people who should receive the message, even though it's not addressed to them

Note a few restrictions on the content of the body. RFC 2822 requests that there be fewer than 1000 characters in each line of the body. A more onerous restriction is that your headers and body can only contain U.S. ASCII characters (that is, the first 127 characters of ASCII): no "international" or binary characters are allowed. By itself this doesn't make sense because you've probably already seen e-mail messages in other languages. How that happens is explained next.

MIME Messages

If RFC 2822 requires that your e-mail message contain only U.S. ASCII characters, how is it possible that people routinely send e-mail with graphics and other binary files attached? This is achieved with an extension to the RFC 2822 standard called MIME, the Multi-purpose Internet Mail Extension.

MIME is a series of standards designed around fitting non-U.S.-ASCII data into the 127 7-bit characters that make up U.S. ASCII. Thanks to MIME, you can attach binary files to e-mail messages, write messages and even headers (such as your name) using non-English characters, and have it all come out right on the other end (assuming the other end understands MIME, which almost everyone does nowadays).

The main MIME standard is RFC 1521, which describes how to fit binary data into the body of e-mail messages. RFC 1522 describes how to do the same thing for the headers of e-mail messages.

MIME Encodings: Quoted-printable and Base64

The most important parts of MIME are its encodings, which provide ways of encoding 8-bit characters into 7 bits. MIME defines two encodings: quoted-printable encoding and Base64 encoding. Python provides a module for moving strings into and out of each encoding,

The quoted-printable encoding is intended for text that contains only a few 8-bit characters, with the majority of characters being U.S. ASCII. The advantage of the quoted-printable encoding is that the text remains mostly legible once encoded, making it ideal for text written in or borrowing words from Western European languages (languages that can be represented in U.S. ASCII except for a few characters that use diacritical marks). Even if the recipient of your message can't decode the quoted-printable message, they should still be able to read it. They'll just see some odd-looking equal signs and hexadecimal numbers in the middle of words.

The Python module for encoding and decoding is quopri:

>>> import quopri
>>> encoded = quopri.encodestring(bytes("I will have just a
soupçon of soup.",'utf-8'))
>>> print(encoded)
I will have just a soup=E7on of soup.
>>> print(quopri.decodestring(encoded))
I will have just a soupxe7on of soup.

Depending on your terminal settings, you might see the actual "ç" character in the last line, or you might see "xe7." "xe7" is the Python string representation of the "ç" character, just as "E7" is the quoted-printable representation. In the session reproduced in the preceding code, that string was decoded into a Python string, and then re-encoded in a Python-specific form for display! (Note, the str object is wrapped in a bytes object because the encodestring method requires a string or buffer object. A str is really a list of characters, which is different from a list of bytes.)

The Base64 encoding, on the other hand, is intended for binary data. It should not be used for human-readable text, because it totally obscures the text:

>>> import base64
>>> encoded = base64.encodestring(bytes("I will have just a
soupçon of soup.",'utf-8'))
>>> print(encoded)
SSB3aWxsIGhhdmUganVzdCBhIHNvdXBvbiBvZiBzb3VwLg==
>>> print(base64.decodestring(encoded))
I will have just a soupçon of soup.

Why bother with Base64 when quoted-printable works on anything and doesn't mangle human-readable text? Apart from the fact that it would be kind of misleading to encode something as "quoted-printable" when it's not "printable" in the first place, Base64 encoding is much more efficient at representing binary data than quoted-printable encoding. Here's a comparison of the two encodings against a long string of random binary characters:

>>> import random
>>> import quopri
>>> import base64
>>> length = 10000
>>> randomBinary = ''.join([chr(random.randint(0,255)) for x in range(0, length)])
>>> len(quopri.encodestring(bytes(randomBinary, 'utf-8'))) / float(length)
2.0663999999999998
>>> len(base64.encodestring(randomBinary)) / float(length)
1.3512

Those numbers will vary slightly across runs because the strings are randomly generated, but if you try this experiment you should get similar results to these every time. A binary string encoded as quoted-printable encoding is safe to send in an e-mail, but it's (on average) about twice as long as the original, unsendable string. The same binary string, encoded with Base64 encoding, is just as safe, but only about 1.35 times as long as the original. Using Base64 to encode mostly binary data saves space and bandwidth.

At the same time, it would be overkill to encode an ASCII string with Base64 just because it contains a few characters outside of the U.S. ASCII range. Here's the same comparison done with a long random string that's almost entirely composed of U.S. ASCII characters:

>>> import random
>>> import quopri
>>> import base64
>>> length = 10000
>>> randomBinary = ''.join([chr(random.randint(0,128)) for x in range(0, length)])
>>> len(quopri.encodestring(bytes(randomBinary,'utf-8'))) / float(length)
1.0661
>>> len(base64.encodestring(bytes(randomBinary,'utf-8'))) / float(length)
1.3512

Here, the quoted-printable representation is barely larger than the original text (it's almost the same as the original text), but the Base64 representation is 1.35 times as long as the original, just as before. This demonstrates why MIME supports two different encodings: to quote RFC1521, "a 'readable' encoding [quoted-printable] and a 'dense' encoding [Base64]."

MIME is more "multi-purpose" than its name implies. Many features of MIME have been picked up for use outside of e-mail applications. The idea of using Base64 or quoted-printable to turn non-ASCII characters into ASCII shows up in other domains. Base64 encoding is also sometimes used to obscure text from human readability without actually encrypting it.

MIME Content Types

The other important part of MIME is its idea of a content type. Suppose you send your friend an e-mail message: "Here's that picture I took of you." and attach an image. Thanks to Base64 encoding, the recipient will get the encoded data as you sent it, but how is her mail reader supposed to know that it's an image and not some other form of binary data?

MIME solves this problem by defining a custom RFC 2822–format header called Content-Type. This header describes what kind of file the body is, so that the recipient's mail client can figure out how to display it. Content types include text/plain (what you'd get if you put a normal e-mail message into a MIME envelope), text/html, image/jpeg, video/mpeg, audio/mp3, and so on. Each content type has a "major type" and a "minor type," separated by a slash. The major types are very general and there are only seven of them, defined in the MIME standard itself. The minor types usually designate particular file formats.

The idea of a string having a "Content-Type," which tells the recipient what to do with it, is another invention of MIME used outside of the e-mail world. The most common use is in HTTP, the protocol used by the World Wide Web and covered in Chapter 20. Every HTTP response is supposed to have a "Content-Type" header (just like a MIME e-mail message), which tells the web browser how to display the response.

MIME Multipart Messages

There's just one problem. This isn't quite the e-mail message described earlier. That message was a short piece of text ("Here's that picture I took of you.") and an attached image. This message is just the image. There's no space for the text portion in the body of the message; putting it there would compromise the image file. The Content-Type header of a mail message can be text/plain or image/jpeg; it can't be both. So how do mail clients create messages with attachments?

In addition to classifying the file formats defined by other standards (for instance, image for image file formats), MIME defines a special major type called multipart. A message with a major content type of multipart can contain other MIME messages in its body, each with its own set of headers and its own content type.

The best way to see how this works is to create a multipart message using the e-mail.mime.multipart module, in conjunction with the e-mail.mime* modules for the files you want to attach. Here is a script called FormatMimeMultipartMessage.py, a slightly more complicated version of the previous example:

#!/usr/bin/python
from e-mail.mime.multipart import MIMEMultipart
import os
import sys
filename='C:Python30photos.jpg'
msg = MIMEMultipart()
msg['From'] = 'Me <[email protected]>'
msg['To'] = 'You <[email protected]>'
msg['Subject'] = 'Your picture'
from e-mail.mime.text import MIMEText
text = MIMEText("Here's that picture I took of you.")
msg.attach(text)
from e-mail.mime.image import MIMEImage
image = MIMEImage(open(filename, 'rb').read(), name=os.path.split(filename)[1])
msg.attach(image)

Run this script, passing in the path to an image file, and you'll see a MIME multipart e-mail message that includes a brief text message and the image file, encoded in Base64:

# python FormatMimeMultipartMessage.py ./photo.jpg
From nobody Sun June 20 15:41:23 2009
Content-Type: multipart/mixed; boundary="===============1011273258=="
MIME-Version: 1.0
From: Me <[email protected]>
To: You <[email protected]>
Subject: Your picture

- -===============1011273258==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

Here's that picture I took of you.
- -===============1011273258==
Content-Type: image/jpeg; name="photo.jpg"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

/4AAQSkZJRgABAQEASABIAAD//gAXQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q/9sAQwAIBgYHBgUI
...
[As before, much base64 encoded text omitted.]
...
3f7kklh4dg+UTZ1TsAAv1F69UklmZ9hrzogZibOqSSA8gZySSSJI/9k=
- -===============1011273258==

When you send this message, it will show up at the other end looking more like you'd expect a message with an attachment to look. This is the kind of e-mail your e-mail client creates when you send a message with attachments.

Several features of this e-mail bear mentioning:

  • The content type (multipart/mixed) isn't enough, by itself, to make sense of the message body. MIME also requires the definition of a "boundary," a string generated semi-randomly by Python and used in the body of the message to note where one part stops and another begins.

  • The message as a whole has all the headers you associate with e-mail messages: Subject, From, To, and the MIME-specific Content-Type header. In addition to this, each part of the message has a separate set of headers. These are not message headers, although they're in the RFC 2822 header format; and some headers (MIME-Version and Content-Type) show up in both the message headers and the body. These are MIME message body headers, interpreted by the MIME parser. As far as RFC 2822 is concerned, they're part of the message body, just like the files they describe, the boundaries that separate MIME parts, and the text "Here's that picture I took of you."

  • The MIME part containing the body of the message has an encoding of 7bit. This just means that the part is not encoded at all. Every character in the part body was U.S. ASCII, so there was no need to encode it.

Python's mail classes are very useful once you know what kind of mail you want to construct: for text-only messages, use the simple e-mail.message class. To attach a file to a message, use one of the e-mail.mime* classes. To send multiple files, or a combination of text and files, use e-mail.mime.multipart in conjunction with the other e-mail.mime* classes.

A problem arises when you're not sure ahead of time which class to use to represent your e-mail message. Here's a class called SmartMessage for building e-mail messages that starts out keeping body text in a simple Message representation, but which will switch to MimeMultipart if you add an attachment. This strategy will generate the same range of e-mail message bodies as a typical end-user mail application: simple RFC 2822 bodies for simple messages, and complex MIME bodies for messages with attachments. Put this class in a file called SendMail.py:

from e-mail import encoders as Encoders
from e-mail.message import Message
from e-mail.mime.text import MIMEText
from e-mail.mime.multipart import MIMEMultipart
from e-mail.mime.nonmultipart import MIMENonMultipart
import mimetypes
class SmartMessage:
    """A simplified interface to Python's library for creating e-mail
    messages, with and without MIME attachments."""
    def __init__(self, fromAddr, toAddrs, subject, body):
       """Start off on the assumption that the message will be a simple RFC
        2822 message with no MIME."""
        self.msg = Message()
        self.msg.set_payload(body)
        self['Subject'] = subject
        self.setFrom(fromAddr)
        self.setTo(toAddrs)
        self.hasAttachments = False
def setFrom(self, fromAddr):
        "Sets the address of the sender of the message."
        if not fromAddr or not type(fromAddr)==type(''):
            raise Exception ('A message must have one and only one sender.')
        self['From'] = fromAddr
    def setTo(self, to):
       "Sets the address or addresses that will receive this message."
        if not to:
            raise Exception ('A message must have at least one recipient.')
        self._addresses(to, 'To')
        #Also store the addresses as a list, for the benefit of future
        #code that will actually send this message.
        self.to = to
    def setCc(self, cc):
        """Sets the address or addresses that should receive this message,
        even though it's not addressed directly to them ("carbon-copy")."""
        self._addresses(cc, 'Cc')
    def addAttachment(self, attachment, filename, mimetype=None):
        "Attaches the given file to this message."
        #Figure out the major and minor MIME type of this attachment,
        #given its filename.
        if not mimetype:
            mimetype = mimetypes.guess_type(filename)[0]
        if not mimetype:
            raise Exception ("Couldn't determine MIME type for ", filename)
        if '/' in mimetype:
            major, minor = mimetype.split('/')
        else:
            major = mimetype
            minor = None
        #The message was constructed under the assumption that it was
        #a single-part message. Now that we know there's to be at
        #least one attachment, we need to change it into a multi-part
        #message, with the first part being the body of the message.
        if not self.hasAttachments:
            body = self.msg.get_payload()
            newMsg = MIMEMultipart()
            newMsg.attach(MIMEText(body))
            #Copy over the old headers to the new object.
            for header, value in self.msg.items():
                newMsg[header] = value
            self.msg = newMsg
            self.hasAttachments = True
        subMessage = MIMENonMultipart(major, minor, name=filename)
        subMessage.set_payload(attachment)
        #Encode text attachments as quoted-printable, and all other
        #types as base64.
        if major == 'text':
            encoder = Encoders.encode_quopri
        else:
            encoder = Encoders.encode_base64
        encoder(subMessage)
#Link the MIME message part with its parent message.
        self.msg.attach(subMessage)
    def _addresses(self, addresses, key):
       """Sets the given header to a string representation of the given
        list of addresses."""
        if hasattr(addresses, '__iter__'):
            addresses = ', '.join(addresses)
        self[key] = addresses
    #A few methods to let scripts treat this object more or less like
    #a Message or MultipartMessage, by delegating to the real Message
    #or MultipartMessage this object holds.
    def __getitem__(self, key):
       "Return a header of the underlying message."
        return self.msg[key]
    def __setitem__(self, key, value):
       "Set a header of the underlying message."
        self.msg[key] = value
    def __getattr__(self, key):
        return getattr(self.msg, key)
    def __str__(self):
       "Returns a string representation of this message."
        return self.msg.as_string()

Sending Mail with SMTP and smtplib

Now that you know how to construct e-mail messages, it's appropriate to revisit in a little more detail the protocol used to send them. This is SMTP, another TCP/IP-based protocol, defined in RFC 2821.

Look at the original example one more time:

>>> fromAddress = '[email protected]'
>>> toAddress = [your e-mail address]
>>> msg = "Subject: Hello

This is the body of the message."
>>> import smtplib
>>> server = smtplib.SMTP("localhost", 25)
>>> server.sendmail(fromAddress, toAddress, msg)
{}

You connect to an SMTP server (at port 25 on localhost) and send a string message from one address to another. Of course, the location of the SMTP server shouldn't be hard-coded, and because some servers require authentication, it would be nice to be able to accept authentication information when creating the SMTP object. Here's a class that works with the SmartMessage class defined in the previous section to make it easier to send mail. Because the two classes go together, add this class to SendMail.py, the file that also contains the SmartMessage class:

from smtplib import SMTP
class MailServer(SMTP):

 "A more user-friendly interface to the default SMTP class."

    def __init__(self, server, serverUser=None, serverPassword=None, port=25):
       "Connect to the given SMTP server."
        SMTP.__init__(self, server, port)
        self.user = serverUser
        self.password = serverPassword
        #Uncomment this line to see the SMTP exchange in detail.
        #self.set_debuglevel(True)

    def sendMessage(self, message):
       "Sends the given message through the SMTP server."
        #Some SMTP servers require authentication.
        if self.user:
            self.login(self.user, self.password)

        #The message contains a list of destination addresses that
        #might have names associated with them. For instance,
        #"J. Random Hacker <[email protected]>".  Some mail servers
        #will only accept bare e-mail addresses, so we need to create a
        #version of this list that doesn't have any names associated
        #with it.
        destinations = message.to
        if hasattr(destinations, '__iter__'):
            destinations = map(self._cleanAddress, destinations)
        else:
            destinations = self._cleanAddress(destinations)
        self.sendmail(message['From'], destinations, str(message))

    def _cleanAddress(self, address):
       "Transforms 'Name <e-mail@domain>' into 'e-mail@domain'."
        parts = address.split('<', 1)
        if len(parts) > 1:
            #This address is actually a real name plus an address:
            newAddress = parts[1]
            endAddress = newAddress.find('>')
            if endAddress != −1:
                address = newAddress[:endAddress]
        return address

Retrieving Internet E-mail

Now that you've seen how to send mail, it's time to go all the way toward fulfilling Jamie Zawinski's prophecy and expand your programs so that they can read mail. There are three main ways to do this, and the choice is probably not up to you. How you retrieve mail depends on your relationship with the organization that provides your Internet access.

Parsing a Local Mail Spool with mailbox

If you have a UNIX shell account on your mail server (because, for instance, you run a mail server on your own computer), mail for you is appended to a file (probably /var/spool/mail/[your username]) as it comes in. If this is how your mail setup works, your existing mail client is probably set up to parse that file. It may also be set up to move messages out of the spool file and into your home directory as they come in.

The incoming mailbox in /var/spool/mail/ is kept in a particular format called "mbox format." You can parse these files (as well as mailboxes in other formats such as MH or Maildir) by using the classes in the mailbox module.

Here's a simple script, MailboxSubjectLister.py, that iterates the messages in a mailbox file, printing out the subject of each one:

#!/usr/bin/python
import e-mail
import mailbox
import sys
if len(sys.argv) <2:
print("Usage: %s [path to mailbox file]" % sys.argv[0])
sys.exit([1])
path = sys.argv[1]
fp = open(path, 'rb')
subjects = []
for message in mailbox.PortableUnixMailbox(fp, e-mail.message_from_file):
    subjects.append(message['Subject'])
print('s message(s) in mailbox "%s":' % (len(subjects), path))
for subject in subjects:
    print('', subject)

UnixMailbox (and the other Mailbox classes in the mailbox module) take as their constructor a file object (the mailbox file), and a function that reads the next message from the file-type object. In this case, the function is the e-mail module's message_from_file. The output of this useful function is a Message object, or one of its MIME* subclasses, such as MIMEMultipart. This and the e-mail.message_from_string function are the most common ways of creating Python representations of messages you receive.

You can work on these Message objects just as you could with the Message objects created from scratch in earlier examples, where the point was to send e-mail messages. Python uses the same classes to represent incoming and outgoing messages.

Fetching Mail from a POP3 Server with poplib

Parsing a local mail spool didn't require going over the network, because you ran the script on the same machine that had the mail spool. There was no need to involve a network protocol, only a file format (the format of UNIX mailboxes, derived mainly from RFC 2822).

However, most people don't have a UNIX shell account on their mail server (or if they do, they want to read mail on their own machine instead of on the server). To fetch mail from your mail server, you need to go over a network, which means you must use a protocol. There are two popular protocols for doing this. The first, which was once near-universal though now it's waning in popularity, is POP3, the third revision of the Post Office Protocol.

POP3 is defined in RFC 1939, but as with most popular Internet protocols, you don't need to delve very deeply into the details, because Python includes a module that wraps the protocol around a Python interface.

Here's POP3SubjectLister, a POP3-based implementation of the same idea as the mailbox parser script. This script prints the subject line of each message on the server:

#!/usr/bin/python
from poplib import POP3
import e-mail
class SubjectLister(PpOP3):
    """Connect to a POP3 mailbox and list the subject of every message
    in the mailbox."""
    def __init__(self, server, username, password):
       "Connect to the POP3 server."
        POP3.__init__(self, server, 110)
        #Uncomment this line to see the details of the POP3 protocol.
        #self.set_debuglevel(2)
        self.user(username)
        response = self.pass_(password)
        if response[:3] != '+OK':
            #There was a problem connecting to the server.
            raise Exception (response)
    def summarize(self):
       "Retrieve each message, parse it, and print the subject."
        numMessages = self.stat()[0]
        print('%d message(s) in this mailbox.' % numMessages)
        parser = e-mail.Parser.Parser()
        for messageNum in range(1, numMessages+1):
            messageString = '
'.join(self.top(messageNum, 0)[1])
            message = parser.parsestr(messageString)
            #message = parser.parsestr(messageString, True)
            print('', message['Subject'])

After the data is on this side of the network, there's no fundamental difference between the way it's handled with this script and the one based on the UnixMailbox class. As with the UnixMailbox script, you use the e-mail module to parse each message into a Python data structure (although here, you use the Parser class, defined in the e-mail.Parser module, instead of the message_from_file convenience function).

The downside of using POP3 for this purpose is that the POP3.retr method has side effects. When you call retr on a message on the server, the server marks that message as having been read. If you use a mail client or a program like fetchmail to retrieve new mail from the POP3 server, then running this script might confuse the other program. The message will still be on the server, but your client might not download it if it thinks the message has already been read.

POP3 also defines a command called top, which doesn't mark a message as having been read and which only retrieves the headers of a message. Both of these — top and retr — are ideal for the purposes of this script; you'll save bandwidth (not having to retrieve the whole message just to get the subject) and your script won't interfere with the operation of other programs that use the same POP3 mailbox. Unfortunately, not all POP3 servers implement the top command correctly. Because it's so useful when implemented correctly, though, here's a subclass of the SubjectLister class that uses the top command to get message headers instead of retrieving the whole message. If you know your server supports top correctly, this is a better implementation:

class TopBasedSubjectLister(SubjectLister):

    def summarize(self):
        """Retrieve the first part of the message and find the 'Subject:'
        header."""
        numMessages = self.stat()[0]
        print('%d message(s) in this mailbox.' % numMessages)
        for messageNum in range(1, numMessages+1):
            #Just get the headers of each message. Scan the headers
            #looking for the subject.
            for header in self.top(messageNum, 0)[1]:
                if header.find('Subject:') == 0:
                   print(header[len('Subject:'):])
                    break

Both SubjectLister and TopBasedSubjectLister will yield the same output, but you'll find that TopBasedSubjectLister runs a lot faster (assuming your POP3 server implements top correctly).

Finally, you'll create a simple command-line interface to the POP3-based SubjectLister class, just as you did for the MailboxSubjectLister.py. This time, however, you need to provide a POP3 server and credentials on the command line, instead of the path to a file on disk:

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 4:
        print('Usage: %s [POP3 hostname] [POP3 user] [POP3 password]' % sys.argv[0])
        sys.exit(0)
    lister = TopBasedSubjectLister(sys.argv[1], sys.argv[2], sys.argv[3])
    lister.summarize()

Fetching Mail from an IMAP Server with imaplib

The other protocol for accessing a mailbox on a remote server is IMAP, the Internet Message Access Protocol. The most recent revision of IMAP is defined in RFC 3501, and it has significantly more features than POP3. It's also gaining in popularity over POP3.

The main difference between POP3 and IMAP is that POP3 is designed to act like a mailbox: It just holds your mail for a while until you collect it. IMAP is designed to keep your mail permanently stored on the server. Among other things, you can create folders on the server, sort mail into them, and search them. These are more complex features that are typically associated with end-user mail clients. With IMAP, a mail client only needs to expose these features of IMAP; it doesn't need to implement them on its own.

Keeping your mail on the server makes it easier to keep the same mail setup while moving from computer to computer. Of course, you can still download mail to your computer and then delete it from the server, as with POP3.

Here's IMAPSubjectLister.py, an IMAP version of the script you've already written twice, which prints out the subject lines of all mail on the server. IMAP has more features than POP3, so this script exercises proportionately fewer of them. However, even for the same functionality, it's a great improvement over the POP3 version of the script. IMAP saves bandwidth by retrieving the message subjects and nothing else: a single subject header per message. Even when POP3's top command is implemented correctly, it can't do better than fetching all of the headers as a group.

What's the catch? As the imaplib module says of itself, "to use this module, you must read the RFCs pertaining to the IMAP4 protocol." The imaplib module provides a function corresponding to each of the IMAP commands, but it doesn't do many transformations between the Python data structures you're used to creating and the formatted strings used by the IMAP protocol. You'll need to keep a copy of RFC 3501 on hand or you won't know what to pass into the imaplib methods.

For instance, to pass a list of message IDs into imaplib, you need to pass in a string like 1,2,3, — not the Python list (1,2,3). To make sure only the subject is pulled from the server, IMAPSubjectLister.py passes the string "(BODY[HEADER.FIELDS (SUBJECT)])" as an argument to an imaplib method. The result of that command is a nested list of formatted strings, only some of which are actually useful to the script.

This is not exactly the kind of intuitiveness one comes to expect from Python. imaplib is certainly useful, but it doesn't do a very good job of hiding the details of IMAP from the programmer:

#!/usr/bin/python
from imaplib import IMAP4
class SubjectLister(IMAP4):
    """Connect to an IMAP4 mailbox and list the subject of every message
    in the mailbox."""
    def __init__(self, server, username, password):
       "Connect to the IMAP server."
        IMAP4.__init__(self, server)
        #Uncomment this line to see the details of the IMAP4 protocol.
        #self.debug = 4
        self.login(username, password)
    def summarize(self, mailbox='Inbox'):
       "Retrieve the subject of each message in the given mailbox."
        #The SELECT command makes the given mailbox the 'current' one,
        #and returns the number of messages in that mailbox. Each message
        #is accessible via its message number. If there are 10 messages
        #in the mailbox, the messages are numbered from 1 to 10.
        numberOfMessages = int(self._result(self.select(mailbox)))

        print('%s message(s) in mailbox "%s":' % (numberOfMessages, mailbox))
        #The FETCH command takes a comma-separated list of message
        #numbers, and a string designating what parts of the
        #message you want. In this case, we want only the
        #'Subject' header of the message, so we'll use an argument
        #string of '(BODY[HEADER.FIELDS (SUBJECT)])'.
        #
        #See section 6.4.5 of RFC3501 for more information on the
        #format of the string used to designate which part of the
        #message you want. To get the entire message, in a form
        #acceptable to the e-mail parser, ask for '(RFC822)'.
        subjects = self._result(self.fetch('1:%d' % numberOfMessages,
                                         '(BODY[HEADER.FIELDS (SUBJECT)])'))
        for subject in subjects:
            if hasattr(subject, '__iter__'):
subject = subject[1]
                print('', subject[:subject.find('
')])
    def _result(self, result):
       """Every method of imaplib returns a list containing a status
        code and a set of the actual result data. This convenience
        method throws an exception if the status code is other than
 "OK", and returns the result data if everything went all
        right."""
        status, result = result
        if status != 'OK':
            raise status (result)
        if len(result) == 1:
            result = result[0]
        return result
if __name__ == '__main__':
    import sys
    if len(sys.argv) < 4:
        print('Usage: %s [IMAP hostname] [IMAP user] [IMAP password]' % sys.argv[0])
        sys.exit(0)
    lister = SubjectLister(sys.argv[1], sys.argv[2], sys.argv[3])
    lister.summarize()

IMAP's Unique Message IDs

Complaints about imaplib's user-friendliness aside, you might have problems writing IMAP scripts if you assume that the message numbers don't change over time. If another IMAP client deletes messages from a mailbox while this script is running against it (suppose you have your mail client running, and you use it to delete some spam while this script is running), the message numbers will be out of sync from that point on.

The IMAP-based SubjectLister class minimizes this risk by getting the subject of every message in one operation, immediately after selecting the mailbox:

self.fetch('1:%d' % numberOfMessages, '(BODY[HEADER.FIELDS (SUBJECT)])')

If there are 10 messages in the inbox, the first argument to fetch will be 1:10. This is a slice of the mailbox, similar to a slice of a Python list, which returns all of the messages: message 1 through message 10 (IMAP and POP3 messages are numbered starting from 1).

Getting the data you need as soon as you connect to the server minimizes the risk that you'll pass a no-longer-valid message number onto the server, but you can't always do that. You may write a script that deletes a mailbox's messages, or that files them in a second mailbox. After you change a mailbox, you may not be able to trust the message numbers you originally got.

POP3 servers also support UIDs, though it's less common for multiple clients to access a single POP3 mailbox simultaneously. A POP3 object's uidl method will retrieve the UIDs of the messages in its mailbox. You can then pass a UID into any POP3 object's other methods that take message IDs: for instance, retr and top. IMAP's UIDs are numeric; POP3's are the "message digests": hexadecimal signatures derived from the contents of each message.

Secure POP3 and IMAP

Both the POP3 or IMAP examples covered earlier in this section have a security problem: They send your username and password over the network without encrypting it. That's why both POP and IMAP are often run atop the Secure Socket Layer (SSL). This is a generic encryption layer also used to secure HTTP connections on the World Wide Web. POP and IMAP servers that support SSL run on different ports from the ones that don't: The standard port number for POP over SSL is 995 instead of 23, and IMAP over SSL uses port 993 instead of port 143.

If your POP3 or IMAP server supports SSL, you can get an encrypted connection to it by just swapping out the POP3 or IMAP4 class for the POP3_SSL or IMAP4_SSL class. Each SSL class is in the same module and has the same interface as its insecure counterpart but encrypts all data before sending it over the network.

Webmail Applications Are Not E-mail Applications

If you use a webmail system such as Yahoo! Mail or Gmail, you're not technically using a mail application at all: You're using a web application that happens to have a mail application on the other side. The scripts in this section won't help you fetch mail from or send mail through these services, because they implement HTTP, not any of the e-mail protocols (however, Yahoo! Mail offers POP3 access for a fee). Instead, you should look at Chapter 20 for information on how web applications work.

The libgmail project aims to create a Python interface to Gmail, one that can treat Gmail as an SMTP, POP3, or IMAP server. The libgmail homepage is at http://libgmail.sourceforge.net/.

Socket Programming

So far, you've concerned yourself with the protocols and file formats surrounding a single Internet application: e-mail. E-mail is certainly a versatile and useful application, but e-mail–related protocols account for only a few of the hundreds implemented atop the Internet Protocol. Python makes it easier to use the e-mail–related protocols (and a few other protocols not covered in this chapter) by providing wrapper libraries, but Python doesn't come with a library for every single Internet protocol. It certainly won't have one for any new protocols you decide to create for your own Internet applications.

To write your own protocols, or to implement your own Python libraries along the lines of imaplib or poplib, you'll need to go down a level and learn how programming interfaces to IP-based protocols actually works. Fortunately, it's not hard to write such code: smtplib, poplib, and others do it without becoming too complicated. The secret is the socket library, which makes reading and writing to a network interface look a lot like reading and writing to files on disk.

Introduction to Sockets

In many of the previous examples, you connected to a server on a particular port of a particular machine (for instance, port 25 of localhost for a local SMTP server). When you tell imaplib or smtplib to connect to a port on a certain host, behind the scenes Python is opening a connection to that host and port. Once the connection is made, the server opens a reciprocal connection to your computer. A single Python "socket" object hides the outgoing and incoming connections under a single interface. A socket is like a file you can read to and write from at the same time.

To implement a client for a TCP/IP-based protocol, you open a socket to an appropriate server. You write data to the socket to send it to the server, and read from the socket the data the server sends you. To implement a server, it's just the opposite: You bind a socket to a hostname and a port and wait for a client to connect to it. Once you have a client on the line, you read from your socket to get data from the client, and write to the socket to send data back.

It takes an enormous amount of work to send a single byte over the network, but between TCP/IP and the socket library, you get to skip almost all of it. You don't have to figure out how to get your data halfway across the world to its destination, because TCP/IP handles that for you. Nor need you worry about turning your data into TCP/IP packets, because the socket library handles that for you.

Just as e-mail and the Web are the killer apps for the use of the Internet, sockets might be considered the killer app for the adoption of TCP/IP. Sockets were introduced in an early version of BSD UNIX, but since then just about every TCP/IP implementation has used sockets as its metaphor for how to write network programs. Sockets make it easy to use TCP/IP (at least, easier than any alternative), and this has been a major driver of TCP/IP's popularity.

As a first socket example, here's a super-simple socket server, SuperSimpleSocketServer.py:

#!/usr/bin/python
import socket
import sys
if len(sys.argv) < 3:
    print('Usage: %s [hostname] [port number]' % sys.argv[0])
    sys.exit(1)
hostname = sys.argv[1]
port = int(sys.argv[2])
#Set up a standard Internet socket. The setsockopt call lets this
#server use the given port even if it was recently used by another
#server (for instance, an earlier incarnation of
#SuperSimpleSocketServer).
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
#Bind the socket to a port, and bid it listen for connections.
sock.bind((hostname, port))
sock.listen(1)
print("Waiting for a request.")
#Handle a single request.
request, clientAddress = sock.accept()
print("Received request from", clientAddress)
request.send(bytes('-=SuperSimpleSocketServer 3000=-
', 'utf-8'))
request.send(bytes('Go away!
', 'utf-8'))
request.shutdown(2) #Stop the client from reading or writing anything.
print("Have handled request, stopping server.")
sock.close()

This server will serve only a single request. As soon as any client connects to the port to which it's bound, it will tell the client to go away, close the connection, stop serving requests, and exit.

Binding to an External Hostname

If you tried to telnet into the SuperSimpleSocketServer from another machine, as suggested previously, you might have noticed that you weren't able to connect to the server. If so, it may be because you started the server by binding it to localhost. The special "localhost" hostname is an internal hostname, one that can't be accessed from another machine. After all, from someone else's perspective, "localhost" means their computer, not yours.

This is actually very useful because it enables you to test out the servers from this chapter (and Chapter 20) without running the risk of exposing your computer to connections from the Internet at large (of course, if you are running these servers on a multiuser machine, you might have to worry about the other users on the same machine, so try to run these on a system that you have to yourself). However, when it comes time to host a server for real, and external connections are what you want, you need to bind your server to an external hostname.

If you can log in to your computer remotely via SSH, or you already run a web server, or you ever make a reference to your computer from another one, you already know an external hostname for your computer. On the other hand, if you have a dial-up or broadband connection, you're probably assigned a hostname along with an IP address whenever you connect to your ISP. Find your computer's IP address and do a DNS lookup on it to find an external hostname for your computer. If all else fails, you can bind servers directly to your external IP address (not 127.0.0.1, because that will have the same problem as binding to "localhost").

If you bind a server to an external hostname and still can't connect to it from the outside, there may be a firewall in the way. Fixing that is beyond what this book can cover. You should ask your local computer guru to help you with this.

The Mirror Server

Here's a server that's a little more complex (though not more useful) and that shows how Python enables you to treat socket connections like files. This server accepts lines of text from a socket, just as a script might on standard input. It reverses the text and writes the reversed version back through the socket, just as a script might on standard output. When it receives a blank line, it terminates the connection:

#!/usr/bin/python
import socket

class MirrorServer:
    """Receives text on a line-by-line basis and sends back a reversed
    version of the same text."""

    def __init__(self, port):
       "Binds the server to the given port."
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
       self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self.socket.bind(port)
        #Queue up to five requests before turning clients away.
        self.socket.listen(5)

    def run(self):
       "Handles incoming requests forever."
        while True:
            request, client_address = self.socket.accept()
            #Turn the incoming and outgoing connections into files.
            input = request.makefile('rb', 0)
            output = request.makefile('wb', 0)
            l = True
            try:
                while l:
                    l = input.readline().strip()
                    if l:
                        output.write(l[::−1] + bytes('
','utf-8'))
                    else:
                        #A blank line indicates a desire to terminate the
                        #connection.
                        request.shutdown(2) #Shut down both reads and writes.
            except socket.error:
                #Most likely the client disconnected.
                pass
if __name__ == '__main__':
    import sys
    if len(sys.argv) < 3:
        print('Usage: %s [hostname] [port number]' % sys.argv[0])
        sys.exit(1)
    hostname = sys.argv[1]
    port = int(sys.argv[2])
    MirrorServer((hostname, port)).run()

The Mirror Client

Though you've just seen that the mirror server is perfectly usable through telnet, not everyone is comfortable using telnet. What you need is a flashy mirror server client with bells and whistles, so that even networking novices can feel the thrill of typing in text and seeing it printed out backward. Here's a simple client that takes command-line arguments for the server destination and the text to reverse. It connects to the server, sends the data, and prints the reversed text:

#!/usr/bin/python
import socket

class MirrorClient:
 "A client for the mirror server."
def __init__(self, server, port):
       "Connect to the given mirror server."
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.socket.connect((server, port))

    def mirror(self, s):
       "Sends the given string to the server, and prints the response."
        if s[−1] != '
':
            s += '
'
        self.socket.send(bytes(s, 'utf-8'))

        #Read server response in chunks until we get a newline; that
        #indicates the end of the response.
        buf = []
        input = ''
        while not '
' in input:
            try:
                input = self.socket.recv(1024)
                buf.append(input)
            except socket.error:
                break
        return ''.join(buf)[:−1]

    def close(self):
        self.socket.send(bytes('
', 'utf-8')) #We don't want to mirror         anything else.
        self.socket.close()

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 4:
        print('Usage: %s [host] [port] [text to be mirrored]' % sys.argv[0])
        sys.exit(1)
    hostname = sys.argv[1]
    port = int(sys.argv[2])
    toMirror = sys.argv[3]

    m = MirrorClient(hostname, port)
    print (m.mirror(toMirror))
    m.close()

The mirror server turns its socket connection into a pair of files, but this client reads from and writes to the socket directly. There's no compelling reason for this; I just felt this chapter should include at least one example that used the lower-level socket API. Note how the server response is read in chunks, and each chunk is scanned for the newline character that indicates the end of the response. If this example had created a file for the incoming socket connection, that code would have been as simple as calling input.readline.

It's important to know when the response has ended, because calling socket.recv (or input.readline) will block your process until the server sends some more data. If the server is waiting for more data from the client, your process will block forever.

SocketServer

Sockets are very useful, but Python isn't satisfied with providing the same C-based socket interface you can get with most languages on most operating systems. Python goes one step further and provides socketserver, a module full of classes that let you write sophisticated socket-based servers with very little code.

Most of the work in building a socketserver is defining a request handler class. This is a subclass of the socketserver module's BaseRequestHandler class, and the purpose of each request handler object is to handle a single client request for as long as the client is connected to the server. This is implemented in the handler's handle method. The handler may also define per-request setup and tear-down code by overriding setup and finish.

The methods of a BaseRequestHandler subclass have access to the following three members:

  • request: A socket object representing the client request: the same object obtained from socket.accept in the MirrorServer example.

  • client_address: A 2-tuple containing the hostname and port to which any data the server outputs will be sent. The other object obtained from socket.accept in the MirrorServer example.

  • server: A reference to the socketserver that created the request handler object.

By subclassing StreamRequestHandler instead of BaseRequestHandler, you also get access to the file-like objects that let you read from and write to the socket connection. BaseRequestHandler gives you access to two other members:

  • rfile: The file corresponding to the data that comes in over the socket (from the client if you're writing a server, from the server if you're writing a client). Equivalent to what you get when you call request.makefile('rb').

  • wfile: The file corresponding to the data that you send over the socket (to the client if you're writing a server, to the server if you're writing a client). Equivalent to what you get when you call request.makefile('wb').

By rewriting the MirrorServer as a socketserver server (specifically, a TCPServer), you can eliminate a lot of code to do with socket setup and teardown, and focus on the arduous task of reversing text. Here's MirrorSocketServer.py:

#!/usr/bin/python
import socketserver

class RequestHandler(socketserver.StreamRequestHandler):
    "Handles one request to mirror some text."

    def handle(self):
        """Read from StreamRequestHandler's provided rfile member,
        which contains the input from the client. Mirror the text
        and write it to the wfile member, which contains the output
        to be sent to the client."""
        l = True
        while l:
            l = self.rfile.readline().strip()
            if l:
                self.wfile.write(l[::−1] + bytes('
', 'utf-8'))

if __name__ == '__main__':
    import sys
   if len(sys.argv) < 3:
        print('Usage: %s [hostname] [port number]' % sys.argv[0])
        sys.exit(1)
    hostname = sys.argv[1]
    port = int(sys.argv[2])

    socketserver.TCPServer((hostname, port), RequestHandler).serve_forever()

Almost all of the socket-specific code is gone. Whenever anyone connects to this server, the TCPServer class will create a new RequestHandler with the appropriate members and call its handle method to handle the request.

The MirrorClient you wrote earlier will work equally well with this server, because across the network both servers take the same input and yield the same output. The same principle applies as when you change the implementation of a function in a module to get rid of redundant code but leave the interface the same.

Multithreaded Servers

One problem with both of these implementations of the mirror server is that only one client at a time can connect to a running server. If you open two telnet sessions to a running server, the second session won't finish connecting until you close the first one. If real servers worked this way, nothing would ever get done. That's why most real servers spawn threads or subprocesses to handle multiple connections.

The SocketServer module defines two useful classes for handling multiple connections at once: ThreadingMixIn and ForkingMixIn. A SocketServer class that subclasses ThreadingMixIn will automatically spawn a new thread to handle each incoming request. A subclass of ForkingMixIn will automatically fork a new subprocess to handle each incoming request. I prefer ThreadingMixIn because threads are more efficient and more portable than subprocesses. It's also much easier to write code for a thread to communicate with its parent than for a subprocess to communicate with its parent.

See Chapter 9 for an introduction to threads and subprocesses.

Here's MultithreadedMirrorServer.py, a multithreaded version of the MirrorSocketServer. Note that it uses the exact same RequestHandler definition as MirrorSocketServer.py. The difference here is that instead of running a TCPServer, you run a ThreadingTCPServer, a standard class that inherits both from ThreadingMixIn and TCPServer:

#!/usr/bin/python
import socketserver

class RequestHandler(SocketServer.StreamRequestHandler):
    "Handles one request to mirror some text."

    def handle(self):
        """Read from StreamRequestHandler's provided rfile member,
        which contains the input from the client. Mirror the text
        and write it to the wfile member, which contains the output
        to be sent to the client."""
        l = True
        while l:
            l = self.rfile.readline().strip()
            if l:
                self.wfile.write(l[::−1] + bytes('
', 'utf-8'))

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 3:
        print('Usage: %s [hostname] [port number]' % sys.argv[0])
        sys.exit(1)
    hostname = sys.argv[1]
    port = int(sys.argv[2])
    server = socketserver.ThreadingTCPServer((hostname, port), RequestHandler)
    server.serve_forever()

With this server running, you can run a large number of telnet sessions and MirrorClient sessions in parallel. ThreadingMixIn hides the details of spawning threads, just as TCPServer hides the details of sockets. The goal of all these helper classes is to keep your focus on what you send and receive over the network.

The Python Chat Server

For the mirror server, the capability to support multiple simultaneous connections is useful but it doesn't change what the server actually does. Each client interacts only with the server, and not even indirectly with the other clients. This model is a popular one; web servers and mail servers use it, among others.

There is another type of server, though, that exists to connect clients to each other. For many applications, it's not the server that's interesting: it's who else is connected to it. The most popular applications of this sort are online chat rooms and games. In this section, you design and build a simple chat server and client.

Perhaps the original chat room was the (non-networked) UNIX wall command, which enables you to broadcast a message to everyone logged in on a UNIX system. Internet Relay Chat, invented in 1988 and described in RFC 1459, is the most popular TCP/IP-based chat room software. The chat software you write here will have some of the same features as IRC, although it won't be compatible with IRC.

Design of the Python Chat Server

In IRC, a client that connects to a server must provide a nickname: a short string identifying the person who wants to chat. A nickname must be unique across a server so that users can't impersonate one another. Your server will carry on this tradition.

An IRC server provides an unlimited number of named channels, or rooms, and each user can join any number of rooms. Your server will provide only a single, unnamed room, which all connected users will inhabit.

Entering a line of text in an IRC client broadcasts it to the rest of your current room, unless it starts with the slash character. A line starting with the slash character is treated as a command to the server. Your server will act the same way.

IRC implements a wide variety of server commands: For instance, you can use a server command to change your nickname, join another room, send a private message to another user, or try to send a file to another user.

For example, if you issue the command /nick leonardr to an IRC server, you're attempting to change your nickname from its current value to leonardr. Your attempt might or might not succeed, depending on whether or not there's already a leonardr on the IRC server.

Your server will support the following three commands, taken from IRC and simplified:

  • /nick [nickname]: As described earlier, this attempts to change your nickname. If the nickname is valid and not already taken, your nickname will be changed and the change will be announced to the room. Otherwise, you'll get a private error message.

  • /quit [farewell message]: This command disconnects the user from the chat server. Your farewell message, if any, will be broadcast to the room.

  • /names: This retrieves the nicknames of the users in the chat room as a space-separated string.

The Python Chat Server Protocol

Having decided on a feature set and a design, you must now define an application-specific protocol for your Python Chat Server. This protocol will be similar to SMTP, HTTP, and the IRC protocol in that it will run atop TCP/IP to provide the structure for a specific type of application. However, it will be much simpler than any of those protocols.

The mirror server also defined a protocol, though it was so simple it may have escaped notice. The mirror server protocol consists of three simple rules:

  1. Send lines of text to the server.

  2. Every time you send a newline, the server will send you back that line of text, reversed, with a newline at the end.

  3. Send a blank line to terminate the connection.

The protocol for the Python Chat Server will be a little more complex than that, but by the standards of protocol design it's still a fairly simple protocol. The following description is more or less the information that would go into an RFC for this protocol. If you were actually writing an RFC, you would go into a lot more detail and provide a formal definition of the protocol; that's not as necessary here, because the protocol definition will be immediately followed by an implementation in Python.

Of course, if you did write an RFC for this, it wouldn't be accepted. The IRC protocol already has an RFC, and it's a much more useful protocol than this example.

Your Hypothetical Protocol in Action

One good way to figure out the problems involved in defining a protocol is to write a sample session to see what the client and server need to say to each other. Here's a sample session of the Python Chat Server. In the following transcript, a user nicknamed jamesp connects to a chat room in which a shady character nicknamed nrini is already lurking. The diagram shows what jamesp might send to the server, what the server would send to him in response, and what it would send to the other client (nrini) as a result of jamesp's input.

Me to the Server

The Server to Me

The Server to nrini

 

Who are you?

 

jamesp

  
 

Hello, jamesp, welcome to the Python Chat Server.

jamesp has joined the chat.

/names

  
 

nrini jamesp

 

Hello!

  
 

<jamesp> Hello!

<jamesp> Hello!

/nick nrini

  
 

There's already a user named nrini here.

 

/nick james

  
 

jamesp is now known as james

jamesp is now known as james

Hello again!

  
 

<james> Hello again!

<james> Hello again!

/quit Goodbye

  
  

james has quit: Goodbye

Initial Connection

After establishing a connection between the client and server, the first stage of the protocol is to get a nickname for the client. A client can't be allowed into a chat room without a nickname because that would be confusing to the other users. Therefore, the server will ask each new client: "Who are you?" and expect a nickname in response, terminated by a newline. If what's sent is an invalid nickname or the nickname of a user already in the chat room, the server will send an error message and terminate the connection. Otherwise, the server will welcome the client to the chat room and broadcast an announcement to all other users that someone has joined the chat.

Chat Text

After a client is admitted into the chat room, any line of text he sends will be broadcast to every user in the room, unless it's a server command. When a line of chat is broadcast, it will be prefaced with the nickname of the user who sent it, enclosed in angle brackets (for example, "<jamesp> Hello, all."). This will prevent confusion about who said what, and visually distinguish chat messages from system messages.

Server Commands

If the client sends a recognized server command, the command is executed and a private system message may be sent to that client. If the execution of the command changes the state of the chat room (for instance, a user changes his nickname or quits), all users will receive a system message notifying them of the change (for example, "jamesp is now known as james"). An unrecognized server command will result in an error message for the user who sent it.

General Guidelines

For the sake of convenience and readability, the chat protocol is designed to have a line-based and human-readable format. This makes the chat application usable even without a special client (although you will write a special client to make chatting a little easier). Many TCP/IP protocols work in similar ways, but it's not a requirement. Some protocols send only binary data, to save bandwidth or because they encrypt data before transmitting it.

Here's the server code, in PythonChatServer.py. Like MultithreadedMirrorServer, its actual server class is a ThreadingTCPServer. It keeps a persistent map of users' nicknames that point to the wfile members. That lets the server send those users' data. This is how one user's input can be broadcast to everyone in the chat room:

#!/usr/bin/python
import socketserver
import re
import socket

class ClientError(Exception):
    "An exception thrown because the client gave bad input to the server."
    pass

class PythonChatServer(socketserver.ThreadingTCPServer):
   "The server class."

    def __init__(self, server_address, RequestHandlerClass):
       """Set up an initially empty mapping between a user's nickname
        and the file-like object used to send data to that user."""
       SocketServer.ThreadingTCPServer.__init__(self, server_address,
                                                RequestHandlerClass)
        self.users = {}

class RequestHandler(SocketServer.StreamRequestHandler):
    """Handles the life cycle of a user's connection to the chat
    server: connecting, chatting, running server commands, and
    disconnecting."""

    NICKNAME = re.compile('^[A-Za-z0-9_-]+$') #Regex for a valid nickname

    def handle(self):
        """Handles a connection: gets the user's nickname, then
        processes input from the user until they quit or drop the
        connection."""
        self.nickname = None

        self.privateMessage('Who are you?')
        nickname = self._readline()
        done = False
        try:
            self.nickCommand(nickname)
            self.privateMessage('Hello %s, welcome to the Python Chat Server.'
                                % nickname)
            self.broadcast('%s has joined the chat.' % nickname, False)
        except ClientError (error):
            self.privateMessage(error.args[0])
            done = True
        except socket.error:
            done = True

        #Now they're logged in; let them chat.
        while not done:
            try:
done = self.processInput()
            except ClientError (error):
                self.privateMessage(str(error))
            except socket.error (e):
                done = True

    def finish(self):
        "Automatically called when handle() is done."
        if self.nickname:
            #The user successfully connected before disconnecting.
            #Broadcast that they're quitting to everyone else.
            message = '%s has quit.' % self.nickname
            if hasattr(self, 'partingWords'):
                message = '%s has quit: %s' % (self.nickname,
                                              self.partingWords)
            self.broadcast(message, False)

            #Remove the user from the list so we don't keep trying to
            #send them messages.
            if self.server.users.get(self.nickname):
               del(self.server.users[self.nickname])
        self.request.shutdown(2)
        self.request.close()

    def processInput(self):
        """Reads a line from the socket input and either runs it as a
        command, or broadcasts it as chat text."""
        done = False
        l = self._readline()
        command, arg = self._parseCommand(l)
        if command:
            done = command(arg)
        else:
            l = '<%s> %s
' % (self.nickname, l)
            self.broadcast(l)
        return done
Each server command is implemented as a method. The _parseCommand method, defined later, takes a line that looks like /nick and calls the corresponding method (in this case, nickCommand):
    #Below are implementations of the server commands.

    def nickCommand(self, nickname):
        "Attempts to change a user's nickname."
        if not nickname:
            raise ClientError ('No nickname provided.')
        if not self.NICKNAME.match(nickname):
            raise ClientError (Invalid nickname: %s' % nickname)
        if nickname == self.nickname:
            raise ClientError ('You are already known as %s.' % nickname)
        if self.server.users.get(nickname, None):
            raise ClientError ('There's already a user named "%s" here.' % nickname)
        oldNickname = None
if self.nickname:
            oldNickname = self.nickname
           del(self.server.users[self.nickname])
        self.server.users[nickname] = self.wfile
        self.nickname = nickname
        if oldNickname:
            self.broadcast('%s is now known as %s' % (oldNickname, self.nickname))

    def quitCommand(self, partingWords):
        """Tells the other users that this user has quit, then makes
        sure the handler will close this connection."""
        if partingWords:
            self.partingWords = partingWords
        #Returning True makes sure the user will be disconnected.
        return True

    def namesCommand(self, ignored):
        "Returns a list of the users in this chat room."
        self.privateMessage(', '.join(self.server.users.keys()))

    # Below are helper methods.

    def broadcast(self, message, includeThisUser=True):
        """Send a message to every connected user, possibly exempting the
        user who's the cause of the message."""
        message = self._ensureNewline(message)
        for user, output in self.server.users.items():
            if includeThisUser or user != self.nickname:
                output.write(message)

    def privateMessage(self, message):
       "Send a private message to this user."
       self.wfile.write(self._ensureNewline(message))

    def _readline(self):
        "Reads a line, removing any whitespace."
        return self.rfile.readline().strip()

    def _ensureNewline(self, s):
        "Makes sure a string ends in a newline."
        if s and s[−1] != '
':
            s += '
'
        return s

    def _parseCommand(self, input):
       """Try to parse a string as a command to the server. If it's an
        implemented command, run the corresponding method."""
        commandMethod, arg = None, None
        if input and input[0] == '/':
            if len(input) < 2:
                raise ClientError, 'Invalid command: "%s"' % input
            commandAndArg = input[1:].split(' ', 1)
            if len(commandAndArg) == 2:
command, arg = commandAndArg
            else:
                command, = commandAndArg
            commandMethod = getattr(self, command + 'Command', None)
            if not commandMethod:
                raise ClientError, 'No such command: "%s"' % command
        return commandMethod, arg

if __name__ == '__main__':
    import sys
    if len(sys.argv) < 3:
        print('Usage: %s [hostname] [port number]' % sys.argv[0])
        sys.exit(1)
    hostname = sys.argv[1]
    port = int(sys.argv[2])
    PythonChatServer((hostname, port), RequestHandler).serve_forever()

The Python Chat Client

As with the mirror server, this chat server defines a simple, human-readable protocol. It's possible to use the chat server through telnet, but most people would prefer to use a custom client.

Here's PythonChatClient.py, a simple text-based client for the Python Chat Server. It has a few niceties that are missing when you connect with telnet. First, it handles the authentication stage on its own: If you run it on a UNIX-like system, you won't even have to specify a nickname, because it will use your account name as a default. Immediately after connecting, the Python Chat Client runs the /names command and presents the user with a list of everyone in the chat room.

After connecting, this client acts more or less like a telnet client would. It spawns a separate thread to handle user input from the keyboard even as it reads the server's output from the network:

#!/usr/bin/python
import socket
import select
import sys
import os
from threading import Thread

class ChatClient:

    def __init__(self, host, port, nickname):
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.socket.connect((host, port))
        self.input = self.socket.makefile('rb', 0)
        self.output = self.socket.makefile('wb', 0)

        #Send the given nickname to the server.
        authenticationDemand = self.input.readline()
        if not authenticationDemand.startswith("Who are you?"):
            raise Exception ("This doesn't seem to be a Python Chat Server.")
        self.output.write(nickname + '
')
        response = self.input.readline().strip()
if not response.startswith("Hello"):
            raise Exception (response)
        print(response)

        #Start out by printing out the list of members.
        self.output.write('/names
')
        print("Currently in the chat room:", self.input.readline().strip())

        self.run()

    def run(self):
        """Start a separate thread to gather the input from the
        keyboard even as we wait for messages to come over the
        network. This makes it possible for the user to simultaneously
        send and receive chat text."""

        propagateStandardInput = self.PropagateStandardInput(self.output)
        propagateStandardInput.start()

        #Read from the network and print everything received to standard
        #output. Once data stops coming in from the network, it means
        #we've disconnected.
        inputText = True
        while inputText:
            inputText = self.input.readline()
            if inputText:
                print inputText.strip()
        propagateStandardInput.done = True

    class PropagateStandardInput(Thread):
        """A class that mirrors standard input to the chat server
        until it's told to stop."""

        def __init__(self, output):
           """Make this thread a daemon thread, so that if the Python
            interpreter needs to quit it won't be held up waiting for this
            thread to die."""
            Thread.__init__(self)
            self.setDaemon(True)
            self.output = output
            self.done = False

        def run(self):
            "Echo standard input to the chat server until told to stop."
            while not self.done:
                inputText = sys.stdin.readline().strip()
                if inputText:
                    self.output.write(inputText + '
')

if __name__ == '__main__':
    import sys
    #See if the user has an OS-provided 'username' we can use as a default
    #chat nickname. If not, they have to specify a nickname.
    try:
import pwd
        defaultNickname = pwd.getpwuid(os.getuid())[0]
    except ImportError:
        defaultNickname = None

    if len(sys.argv) < 3 or not defaultNickname and len(sys.argv) < 4:
        print('Usage: %s [hostname] [port number] [username]' % sys.argv[0])
        sys.exit(1)

    hostname = sys.argv[1]
    port = int(sys.argv[2])

    if len(sys.argv) > 3:
        nickname = sys.argv[3]
    else:
        #We must be on a system with usernames, or we would have
        #exited earlier.
        nickname = defaultNickname

    ChatClient(hostname, port, nickname)

A more advanced chat client might have a GUI that put incoming text in a separate window from the text the user types, to keep input from being visually confused with output. As it is, in a busy chat room, you might be interrupted by an incoming message while you're typing, and lose your place.

Single-Threaded Multitasking with select

The reason PythonChatClient spawns a separate thread to gather user input is that a call to sys.stdin.readline won't return until the user enters a chat message or server command. A naïve chat client might call sys.stdin.readline and wait for the user to type something in, but while it was waiting the other users would keep chatting and the socket connection from the server would fill up with a large backlog of chat. No chat messages would be displayed until the user pressed the Enter key (causing sys.stdin.readline to return), at which time the whole backlog would come pouring onto the screen. Trying to read from the socket connection would cause the opposite problem: The user would be unable to enter any chat text until someone else in the chat room said something. Using two threads avoids these problems: One thread can keep an eye on standard input while the other keeps an eye on the socket connection.

However, it's possible to implement the chat client without using threads. (After all, telnet works more or less the same way as PythonChatClient, and the telnet program is older than the idea of threads.) The secret is to just peek at standard input and the socket connection — not trying to read from them, just seeing if there's anything to read. You do this by using the select function, provided by Python's select module.

select takes three lists of lists, and each second-level list contains file-type objects: one for objects you read (like sys.stdin), one for objects to which you write (like sys.stdout), and one for objects to which you write errors (like sys.stdout). By default, a call to select will block (wait for input), but only until at least one of the file-type objects you passed in is ready to be used. It will then return three lists of lists, which contain a subset of the objects you passed in: only the ones that are ready and have some data for the program to pay attention to. You might think of select as acting sort of like Python's built-in filter function, filtering out the objects that aren't ready for use. By using select, you can avoid the trap of calling read on a file-type object that doesn't have any data to read.

Here's a subclass of ChatClient that uses a loop over select to check whether standard input or the server input have unread data:

class SelectBasedChatClient(ChatClient):

    def run(self):
        """In a tight loop, see whether the user has entered any input
        or whether there's any from the network. Keep doing this until
        the network connection returns EOF."""
        socketClosed = False
        while not socketClosed:
            toRead, ignore, ignore = select.select([self.input, sys.stdin],
                                                   [], [])
            #We're not disconnected yet.
            for input in toRead:
                if input == self.input:
                    inputText = self.input.readline()
                    if inputText:
                        print(inputText.strip())
                    else:
                        #The attempt to read failed. The socket is closed.
                        socketClosed = True
                elif input == sys.stdin:
                    input = sys.stdin.readline().strip()
                    if input:
s                        self.output.write(input + '
')

You must pass in three lists to select, but you pass in empty lists of output files and error files. All you care about are the two sources of input (from the keyboard and the network), because those are the ones that might block and cause problems when you try to read them.

In one sense, this code is more difficult to understand than the original ChatClient, because it uses a trick to rapidly switch between doing two things, instead of just doing both things at once. In another sense, it's less complex than the original ChatClient because it's less code and it doesn't involve multithreading, which can be difficult to debug.

It's possible to use select to write servers without forking or threading, but I don't recommend writing such code yourself.

Other Topics

Many aspects of network programming are not covered in this chapter. The most obvious omission (the technologies and philosophies that drive the World Wide Web) are taken up Chapter 20. The following sections outline some other topics in networking that are especially interesting or important from the perspective of a Python programmer.

Miscellaneous Considerations for Protocol Design

The best way to learn about protocol design is to study existing, successful protocols. Protocols are usually well documented, and you can learn a lot by using them and reading RFCs. Here are some common design considerations for protocol design not covered earlier in this chapter.

Trusted Servers

The Python Chat Server is used by one client to broadcast information to all other clients. Sometimes, however, the role of a server is to mediate between its clients. To this end, the clients are willing to trust the server with information they wouldn't trust to another client.

This happens often on websites that bring people together, such as auction sites and online payment systems. It's also implemented at the protocol level in many online games, in which the server acts as referee.

Consider a game in which players chase each other around a map. If one player knew another's location on the map, that player would gain an unfair advantage. At the same time, if players were allowed to keep their locations secret, they could cheat by teleporting to another part of the map whenever a pursuer got too close. Players give up the ability to cheat in exchange for a promise that other players won't be allowed to cheat either. A trusted server creates a level playing field.

Terse Protocols

Information that can be pieced together by a client is typically not put into the protocol. It would be wasteful for a server that ran chess games to transfer a representation of the entire board to both players after every successful move. It would suffice to send "Your move was accepted." to the player who made the move, and describe the move to the other player. State-based protocols usually transmit the changes in state, rather than send the whole state every time it changes.

The protocol for the Python Chat Server sends status messages in complete English sentences. This makes the code easier to understand and the application easier to use through telnet. The client behavior depends on those status messages: For instance, PythonChatClient expects the string "Who are you?" as soon as it connects to the server. Doing a protocol this way makes it difficult for the server to customize the status messages, or for the client to translate them into other languages. Many protocols define numeric codes or short abbreviations for status messages and commands, and explain their meanings in the protocols' RFC or other definition document.

The Peer-to-Peer Architecture

All of the protocols developed in this chapter were designed according to the client-server architecture. This architecture divides the work of networking between two different pieces of software: the clients, who request data or services, and the servers, which provide the data or carry out the services. This architecture assumes a few powerful computers will act as servers, and a large number of computers will act as clients. Information tends to be centralized on the server: to allow for central control, to ensure fairness (for instance, in a game with hidden information), to make it unnecessary for clients to trust each other, or just to make information easier to find.

The other popular architecture is the peer-to-peer architecture. In this architecture, every client is also a server. A peer-to-peer protocol may define "client" actions and "server" actions, but every process that makes requests is also capable of serving them.

Though most of the protocols implemented on top of it use the client-server architecture, TCP/IP is a peer-to-peer protocol. Recall that a socket connection actually covers two unidirectional TCP/IP connections: one from you to your destination and one going the other way. You can't be a TCP/IP client without also being a TCP/IP server: you'd be sending data without any way of receiving a response.

At the application level, the most popular peer-to-peer protocol is BitTorrent. BitTorrent makes it easy to distribute a large file by sharing the cost of the bandwidth across all of the people who download it. Under the client-server architecture, someone who wanted to host a file would put it on her server and bear the full cost of the bandwidth for every download. The original BitTorrent implementation is written in Python, and the first release was in 2002. BitTorrent is proof positive that there's still room for clever new TCP/IP protocols, and that it's possible to implement high-performance protocols in Python.

Summary

Python provides high-level tools for using existing TCP/IP-based protocols, making it easy to write custom clients. It also comes packaged with tools that help you design your own networked applications. Whether you just want to send mail from a script, or you have an idea for the Internet's next killer app, Python can do what you need.

The key points to take away from this chapter are:

  • The smtplib module takes its name from SMTP, the Simple Mail Transport Protocol. That's the protocol, or standard, defined for sending Internet mail.

  • Protocols are a convention for structuring the data sent between two or more parties on a network.

  • Localhost is a special hostname that always refers to the computer you're using when you mention it. The hostname is how you tell Python where on the Internet to find your mail server.

  • MIME is a series of standards designed around fitting non-U.S.-ASCII data into the 127 7-bit characters that make up U.S. ASCII.

  • You can use the mailbox module to parse files of the mbox type.

  • PoP3 stands for Post Office Protocol. The 3 stands for the version.

  • IMAP stands for Internet Message Access Protocol.

Exercises

  1. Distinguish between the following e-mail-related standards: RFC 2822, SMTP, IMAP, MIME, and POP.

  2. Write a script that connects to a POP server, downloads all of the messages, and sorts the messages into files named after the sender of the message. (For instance, if you get two e-mails from , they should both go into a file called "").

    What would be the corresponding behavior if you had an IMAP server instead? Write that script, too (use RFC 3501 as a reference).

  3. Suppose that you were designing an IRC-style protocol for low-bandwidth embedded devices such as cell phones. What changes to the Python Chat Server protocol would it be useful to make?

  4. A feature of IRC not cloned in the Python Chat Server is the /msg command, which enables one user to send a private message to another instead of broadcasting it to the whole room. How could the /msg command be implemented in the Python Chat Server?

  5. When does it make sense to design a protocol using a peer-to-peer architecture?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.176.99