Chapter 10. System Administration

Introduction

Credit: Donn Cave, University of Washington

In this chapter, we consider a class of programmer—the humble system administrator—in contrast to other chapters’ focus on functional domains. As a programmer, the system administrator faces most of the same problems that other programmers face and should find the rest of this book of at least equal interest.

Python’s advantages in the system administration domain are also quite familiar to other Python programmers, but Python’s competition is different. On Unix platforms, at any rate, the landscape is dominated by a handful of lightweight languages such as the Bourne shell and awk that aren’t exactly made obsolete by Python. These little languages can often support a simpler, clearer, and more concise solution than Python, particularly for commands that you’re typing interactively at the shell command prompt. But Python can do things these languages can’t, and it’s often more robust when dealing with issues such as unusually large data inputs. Another notable competitor, especially on Unix systems, is Perl (which isn’t really a little language at all), with just about the same overall power as Python, and usable for typing a few commands interactively at the shell’s command prompt. Python’s strength here is readability and maintainability: when you dust off a script you wrote in a hurry eight months ago, because you need to make some changes to it, you don’t spend an hour to figure out whatever exactly you had in mind when you wrote this or that subtle trick. You just don’t use any tricks at all, subtle or gross, so that your Python scrips work just fine and you don’t burn your time, months later, striving to reverse-engineer them for understanding.

One item that stands out in this chapter’s solutions is the wrapper: the alternative, programmed interface to a software system. On Unix (including, these days, Mac OS X), this is usually a fairly prosaic matter of diversion and analysis of text I/O. Life is easy when the programs you’re dealing with are able to just give clean textual output, without requiring complex interaction (see Eric Raymond, The Art of Unix Programming, http://www.faqs.org/docs/artu/, for an informative overview of how programs should be architected to make your life easy). However, even when you have to wrap a program that’s necessarily interactive, all is far from lost. Python has very good support in this area, thanks, first of all, to the fact that it places C-level pseudo-TTY functions at your disposal (see the pty module of the Python Standard Library). The pseudo-TTY device is like a bidirectional pipe with TTY driver support, so it’s essential for things such as password prompts that insist on a TTY. Because it appears to be a TTY, applications writing to a pseudo-TTY normally use line buffering, instead of the block buffering that gives problems with pipes. Pipes are more portable and less trouble to work with, but they don’t work for interfacing to every application. Excellent third-party extensions exist that wrap pty into higher-level layers for ease of use, most notably Pexpect, http://pexpect.sourceforge.net/.

On Windows, the situation often is not as prosaic as on Unix-like platforms, since the information you need to do your system administration job may be somewhere in the registry, may be available via some Windows APIs, and/or may be available via COM. The standard Python library _winreg module, Mark Hammond’s PyWin32 package, and Thomas Heller’s ctypes, taken together, give the Windows administrator reasonably easy access to all of these sources, and you’ll see more Windows administration recipes here than you will ones for Unix. The competition for Python as a system administration language on Windows is feeble compared to that on Unix, which is yet another reason for the platform’s prominence here. The PyWin32 extensions are available for download at http://sourceforge.net/projects/pywin32/. PyWin32 also comes with ActiveState’s ActivePython distribution of Python (http://www.activestate.com/ActivePython/). To use this rich and extremely useful package most effectively, you also need Mark Hammond and Andy Robinson, Python Programming on Win32 (O’Reilly). ctypes is available for download at http://sourceforge.net/projects/ctypes.

While it may sometimes be difficult to see what brought all the recipes together in this chapter, it isn’t difficult to see why system administrators deserve their own chapter: Python would be nowhere without them! Who else, back when Python was still an obscure, fledgling language, could bring it into an organization and almost covertly infiltrate it into the working environment? If it weren’t for the offices of these benevolent and pragmatic anarchists, Python might well have languished in obscurity despite its merits.

10.1. Generating Random Passwords

Credit: Devin Leung

Problem

You need to create new passwords randomly—for example, to assign them automatically to new user accounts.

Solution

One of the chores of system administration is installing new user accounts. Assigning a different, totally random password to each new user is a good idea. Save the following code as makepass.py:

from random import choice
import string
def GenPasswd(length=8, chars=string.letters+string.digits):
    return ''.join([ choice(chars) for i in range(length) ])

Discussion

This recipe is useful when you are creating new user accounts and assigning each of them a different, totally random password. For example, you can print six passwords of length 12:

>>> import makepass
>>> for i in range(6):
...    print makepass.GenPasswd(12)
...uiZWGSJLWjOI
               FVrychdGsAaT
               CGCXZAFGjsYI
               TPpQwpWjQEIi
               HMBwIvRMoIvh

Of course, such totally random passwords, while providing an excellent theoretical basis for security, are impossibly hard to remember for most users. If you require users to stick with their assigned passwords, many users will probably write them down. The best you can hope for is that new users will set their own passwords at their first login, assuming, of course, that the system you’re administering lets each user change his own password. (Most operating systems do, but you might be assigning passwords for other kinds of services that unfortunately often lack such facilities.)

A password that is written down anywhere is a serious security risk: pieces of paper get lost, misplaced, and peeked at. From a pragmatic point of view, you might be better off assigning passwords that are not totally random; users are more likely to remember them and less likely to write them down (see Recipe 10.2). This practice may violate the theory of password security, but, as all practicing system administrators know, pragmatism trumps theory.

See Also

Recipe 10.2; documentation of the standard library module random in the Library Reference and Python in a Nutshell.

10.2. Generating Easily Remembered Somewhat-Random Passwords

Credit: Luther Blissett

Problem

You need to create new passwords randomly—for example, to assign them automatically to new user accounts. You want the passwords to be somewhat feasible to remember for typical users, so they won’t be written down.

Solution

We can use a pastiche approach for this, mimicking letter n-grams in actual English words. A grander way to look at the same approach is to call it a Markov Chain Simulation of English:

import random, string
class password(object):
    # Any substantial file of English words will do just as well: we
    # just need self.data to be a big string, the text we'll pastiche
    data = open("/usr/share/dict/words").read( ).lower( )
    def renew(self, n, maxmem=3):
        ''' accumulate into self.chars `n' random characters, with a
            maximum-memory "history" of `maxmem` characters back. '''
        self.chars = [  ]
        for i in range(n):
            # Randomly "rotate" self.data
            randspot = random.randrange(len(self.data))
            self.data = self.data[randspot:] + self.data[:randspot]
            # Get the n-gram
            where = -1
            # start by trying to locate the last maxmem characters in
            # self.chars.  If i<maxmem, we actually only get the last
            # i, i.e., all of self.chars -- but that's OK: slicing
            # is quite tolerant in this way, and it fits the algorithm
            locate = ''.join(self.chars[-maxmem:])
            while where<0 and locate:
                # Locate the n-gram in the data
                where = self.data.find(locate)
                # Back off to a shorter n-gram if necessary
                locate = locate[1:]
            # if where==-1 and locate='', we just pick self.data[0] --
            # it's a random item within self.data, tx to the rotationc = self.data[where+len(locate)+1]
            # we only want lowercase letters, so, if we picked another
            # kind of character, we just choose a random letter instead
            if not c.islower( ): c = random.choice(string.lowercase)
            # and finally we record the character into self.chars
            self.chars.append(c)
    def _ _str_ _(self):
        return ''.join(self.chars)
if _ _name_ _ == '_ _main_ _':
    "Usage: pastiche [passwords [length [memory]]]"
    import sys
    if len(sys.argv)>1: dopass = int(sys.argv[1])
    else: dopass = 8
    if len(sys.argv)>2: length = int(sys.argv[2])
    else: length = 10
    if len(sys.argv)>3: memory = int(sys.argv[3])
    else: memory = 3
    onepass = password( )
    for i in range(dopass):
        onepass.renew(length, memory)
        print onepass

Discussion

This recipe is useful when creating new user accounts and assigning each user a different, random password: it uses passwords that a typical user will find it feasible to remember, hopefully so they won’t get written down. See Recipe 10.1 if you prefer totally random passwords.

The recipe’s idea is based on the good old pastiche concept. Each letter (always lowercase) in the password is chosen pseudo-randomly from data that is a collection of words in a natural language familiar to the users. This recipe uses the file that is /usr/share/dict/words supplied with Linux systems (on my machine, a file of over 45,000 words), but any large document in plain text will do just as well. The trick that makes the passwords sort of memorable, and not fully random, is that each letter is chosen based on the last few letters already picked for the password as it stands so far. Thus, letter transitions will tend to be “repetitive” according to patterns that are familiar to the user.

The code in the recipe takes some care to locate each time a random occurrence, in the text being pastiched, of the last maxmem characters picked so far. Since it’s easy to find the first occurrence of a substring, the code “rotates” the text string randomly, to ensure that the first occurrence is a random one from the point of view of the original text. If the substring made up with the last maxmem characters picked is not found in the text, the code “backs down” to search for just the last maxmem-1, and so on, backing down until, worst case, it just picks the first character in the rotated text (which is a random character from the point of view of the original text).

A break in this Markov Chain process occurs when this picking procedure chooses a character that is not a lowercase letter, in which case, a random lowercase letter is chosen instead (any lowercase letter is picked with equal probability).

Here are a couple of typical sample runs of this pastiche.py password-generation script:

[situ@tioni cooker]$ python pastiche.pyyjackjaceh
               ackjavagef
               aldsstordb
               dingtonous
               stictlyoke
               cvaiwandga
               lidmanneck
               olexnarinl
[situ@tioni cooker]$ python pastiche.py
ptiontingt
               punchankin
               cypresneyf
               sennemedwa
               iningrated
               fancejacev
               sroofcased
               nryjackman

As you can see, some of these are definitely word-like, others less so, but for a typical human being, none are more problematic to remember than a sequence of even fewer totally random, uncorrelated letters. No doubt some theoretician will complain (justifiably, in a way) that they aren’t as random as all that. Well, tough. My point is that they had better not be, if some poor fellow is going to have to remember them! You can compensate for this limitation by making them a bit longer. If said theoretician demonstrates how to compute the entropy per character of this method of password generation (versus the obvious 4.7 bits/character, the base-2 logarithm of 26, for passwords made up of totally random lowercase letters), now that would be a useful contribution indeed. Meanwhile, I’ll keep generating passwords this way, rather than in a totally random way. If nothing else, it’s the closest thing I’ve found to a useful application for the lovely pastiche concept.

The concept of passwords that are not totally random, but rather a bit more memorable, goes back a long way—at least to the 1960s and to works by Morrie Gasser and Daniel Edwards. A Federal Information Processing Standard (FIPS), FIPS 181, specifies in detail how “pronounceable” passwords are to be generated; see http://www.itl.nist.gov/fipspubs/fip181.htm.

See Also

Recipe 10.1; documentation of the standard library module random in the Library Reference and Python in a Nutshell.

10.3. Authenticating Users by Means of a POP Server

Credit: Magnus Lyckå

Problem

You are writing a Python application that must authenticate users. All of the users have accounts on some POP servers, and you’d like to reuse, for your own authentication, the user IDs and passwords that your users have on those servers.

Solution

To log into the application, a user must provide the server, user ID and password for his mail account. We try logging into that POP server with these credentials—if that attempt succeeds, then the user has authenticated successfully. (Of course, we don’t peek into the user’s mailbox!)

def popauth(popHost, user, passwd):
    """ Log in and log out, only to verify user identity.
        Raise exception in case of failure.
    """
    import poplib
    try:
        pop = poplib.POP3(popHost)
    except:
        raise RuntimeError("Could not establish connection "
                           "to %r for password check" % popHost)
    try:
        # Log in and perform a small sanity check
        pop.user(user)
        pop.pass_(passwd)
        length, size = pop.stat( )
        assert type(length) == type(size) == int
        pop.quit( )
    except:
        raise RuntimeError("Could not verify identity. 
"
              "User name %r or password incorrect." % user)
        pop.quit( )

Discussion

To use this recipe, the application must store somewhere the list of known users and either the single POP server they all share, or the specific POP server on which each user authenticates—it need not be the same POP server for all users. Either a text file, or a simple table in any kind of database, will do just fine for this purpose.

This solution is neat, but it does have some weaknesses:

  • Users must trust that any application implementing this authentication system won’t abuse their email accounts.

  • POP passwords are, alas!, sent in plain text over the Internet.

  • We have to trust that the POP server security isn’t compromised.

  • Logging in might take a few seconds if the POP server is slow.

  • Logging in won’t work if the POP server is down.

However, to offset all of these potential drawbacks is the convenience of applications not having to store any passwords, nor forcing a poor overworked system administrator to administer password changes. It’s also quite simple! In short, I wouldn’t use this approach for a bank system, but I would have no qualms using it, for example, to give users rights to edit web pages at a somewhat restricted WikiWiki, or similarly low-risk applications.

See Also

Documentation of the standard library module poplib in the Library Reference and Python in a Nutshell.

10.4. Calculating Apache Hits per IP Address

Credit: Mark Nenadov, Ivo Woltring

Problem

You need to examine a log file from Apache to count the number of hits recorded from each individual IP address that accessed it.

Solution

Many of the chores of administering a web server have to do with analyzing Apache logs, which Python makes easy:

def calculateApacheIpHits(logfile_pathname):
   ''' return a dict mapping IP addresses to hit counts '''
    ipHitListing = {  }
    contents = open(logfile_pathname, "r")
    # go through each line of the logfile
    for line in contents:
        # split the string to isolate the IP addressip = line.split(" ", 1)[0]
        # Ensure length of the IP address is proper (see discussion)
        if 6 < len(ip) <= 15:
            # Increase by 1 if IP exists; else set hit count = 1
            ipHitListing[ip] = ipHitListing.get(ip, 0) + 1
   return ipHitListing

Discussion

This recipe supplies a function that returns a dictionary containing the hit counts for each individual IP address that has accessed your Apache web server, as recorded in an Apache log file. For example, a typical use would be:

HitsDictionary = calculateApacheIpHits(
                 "/usr/local/nusphere/apache/logs/access_log")

This function has many quite useful applications. For example, I often use it in my code to determine the number of hits that are actually originating from locations other than my local host. This function is also used to chart which IP addresses are most actively viewing the pages that are served by a particular installation of Apache.

This function performs a modest validation of each IP address, which is really just a length check: an IP address cannot be longer than 15 characters (4 sets of triplets and 3 periods) nor shorter than 7 (4 sets of single digits and 3 periods). This validation is not stringent, but it does reduce, at tiny runtime cost, the probability of placing into the dictionary some data that is obviously garbage. As a general technique, low-cost, highly approximate sanity checks for data that is expected to be OK (but one never knows for sure) are worth considering. However, if you want to be stricter, regular expressions can help. Change the loop in this recipe’s function’s body to:

    import re
    # an IP is: 4 strings, each of 1-3 digits, joined by periods
    ip_specs = r'.'.join([r'd{1,3}']*4)
    re_ip = re.compile(ip_specs)
    for line in contents:
        match = re_ip.match(line)
        if match:
            # Increase by 1 if IP exists; else set hit count = 1
            ip = match.group( )
            ipHitListing[ip] = ipHitListing.get(ip, 0) + 1

In this variant, we use a regular expression to extract and validate the IP at the same time. This approach enables us to avoid the split operation as well as the length check, and thus amortizes most of the runtime cost of matching the regular expression. This variant is only a few percentage points slower than the recipe’s solution.

Of course, the pattern given here as ip_specs is not entirely precise either, since it accepts, as components of an IP quad, arbitrary strings of one to three digits, while the components should be more constrained. But to ward off garbage lines, this level of sanity check is sufficient.

Another alternative is to convert and check the address: extract string ip just as we do in this recipe’s Solution, then:

        # Ensure the IP address is proper
        try:
            quad = map(int, ip.split('.'))
        except ValueError:
            pass
        else:
            if len(quad)==4 and min(quad)>=0 and max(quad)<=255:
                # Increase by 1 if IP exists; else set hit count = 1
                ipHitListing[ip] = ipHitListing.get(ip, 0) + 1

This approach is more work, but it does guarantee that only IP addresses that are formally valid get counted at all.

See Also

The Apache web server is available and documented at http://httpd.apache.org; regular expressions are covered in the docs of the re module in the Library Reference and Python in a Nutshell.

10.5. Calculating the Rate of Client Cache Hits on Apache

Credit: Mark Nenadov

Problem

You need to monitor how often client requests are refused by your Apache web server because the client’s cache of the page is already up to date.

Solution

When a browser queries a server for a page that the browser has in its cache, the browser lets the server know about the cached data, and the server returns a special error code (rather than serving the page again) if the client’s cache is up to date. Here’s how to find the statistics for such occurrences in your server’s logs:

def clientCachePercentage(logfile_pathname):
    contents = open(logfile_pathname, "r")
    totalRequests = 0
    cachedRequests = 0
    for line in contents:
        totalRequests += 1if line.split(" ")[8] == "304":
            # if server returned "not modified"
            cachedRequests += 1
    return int(0.5+float(100*cachedRequests)/totalRequests)

Discussion

The percentage of requests to your Apache server that are met by the client’s own cache is an important factor in the perceived performance of your server. The code in this recipe helps you get this information from the server’s log. Typical use would be:

log_path = "/usr/local/nusphere/apache/logs/access_log"
print "Percentage of requests that were client-cached: " + str(
       clientCachePercentage(log_path)) + '%'

The recipe reads the log file one line at a time by looping over the file—the normal way to read a file nowadays. Trying to read the whole log file in memory, by calling the readlines method on the file object, would be an unsuitable approach for very large files, which server log files can certainly be. That approach might not work at all, or might work but damage performance considerably by swamping your machine’s virtual memory. Even when it works, readlines offers no advantage over the approach used in this recipe.

The body of the for loop calls the split method on each line string, with a string of a single space as the argument, to split the line into a tuple of its space-separated fields. Then it uses indexing ([8]) to get the ninth such field. Apache puts the error code into the ninth field of each line in the log. Code "304" means “not modified” (i.e., the client’s cache was already correctly updated). We count those cases in the cachedRequests variable and all lines in the log in the totalRequests variable, so that, in the end, we can return the percentage of cache hits. The expression we use in the return statement computes the percentage as a float number, then rounds it correctly to the closest int, because an integer result is most useful in practice.

See Also

The Apache web server is available and documented at http://httpd.apache.org.

10.6. Spawning an Editor from a Script

Credit: Larry Price, Peter Cogolo

Problem

You want users to work with their favorite text-editing programs to edit text files, to provide input to your script.

Solution

Module tempfile lets you create temporary files, and module os has many tools to check the environment and to work with files and external programs, such as text editors. A couple of functions can wrap this functionality into an easy-to-use form:

import sys, os, tempfile
def what_editor( ):
    editor = os.getenv('VISUAL') or os.getenv('EDITOR')
    if not editor:
        if sys.platform == 'windows':
            editor = 'Notepad.Exe'
        else:
            editor = 'vi'
    return editor
def edited_text(starting_text=''):
    temp_fd, temp_filename = tempfile.mkstemp(text=True)
    os.write(temp_fd, starting_text)
    os.close(temp_fd)
    editor = what_editor( )
    x = os.spawnlp(os.P_WAIT, editor, editor, temp_filename)
    if x:
        raise RuntimeError, "Can't run %s %s (%s)" % (editor, temp_filename, x)
    result = open(temp_filename).read( )
    os.unlink(temp_filename)
    return result
if _ _name_ _=='_ _main_ _':
    text = edited_text('''Edit this text a little,
go ahead,
it's just a demonstration, after all...!
''')
    print 'Edited text is:', text

Discussion

Your scripts may often need a substantial amount of textual input from the user. Letting users edit the text with their favorite text editor is an excellent feature for your script to have, and this recipe shows how you can obtain it. I have used variants of this approach for such purposes as adjusting configuration files, writing blog posts, and sending emails.

If your scripts do not need to run on Windows, a more secure and slightly simpler way to code function edited_text is available:

def edited_text(starting_text=''):
    temp_file = tempfile.NamedTemporaryFile( )
    temp_file.write(starting_text)
    temp_file.seek(0)
    editor = what_editor( )
    x = os.spawnlp(os.P_WAIT, editor, editor, temp_file.name)
    if x:
        raise RuntimeError, "Can't run %s %s (%s)" % (editor, temp_file.name, x)
    return temp_file.read( )

Unfortunately, this alternative relies on the editor we’re spawning being able to open and modify the temporary file while we are holding that file open, and this capability is not supported on most versions of Windows. The version of edited_text given in the recipe is more portable.

When you’re using this recipe to edit text files that must respect some kind of syntax or other constraints, such as a configuration file, you can make your script simpler and more effective by using a cycle of “input/parse/re-edit in case of errors,” providing immediate feedback to users when you can diagnose they’ve made a mistake in editing the file. Ideally, in such cases, you should reopen the editor already pointing at the line in error, which is possible with most Unix editors by passing them a first argument such as '+23', specifying that they start editing at line 23, before the filename argument. Unfortunately, such an argument would confuse many Windows editors, so you have to make some hard decisions here (if you do need to support Windows).

See Also

Documentation for modules tempfile and os in the Library Reference and Python in a Nutshell.

10.7. Backing Up Files

Credit: Anand Pillai, Tiago Henriques, Mario Ruggier

Problem

You want to make frequent backup copies of all files you have modified within a directory tree, so that further changes won’t accidentally obliterate some of your editing.

Solution

Version-control systems, such as RCS, CVS, and SVN, are very powerful and useful, but sometimes a simple script that you can easily edit and customize can be even handier. The following script checks for new files to back up in a tree that you specify. Run the script periodically to keep your backup copies up to date.

import sys, os, shutil, filecmp
MAXVERSIONS=100
def backup(tree_top, bakdir_name='bakdir'):
    for dir, subdirs, files in os.walk(tree_top):
        # ensure each directory has a subdir called bakdir
        backup_dir = os.path.join(dir, bakdir_name)
        if not os.path.exists(backup_dir):
            os.makedirs(backup_dir)
        # stop any recursing into the backup directories
        subdirs[:] = [d for d in subdirs if d != bakdir_name]
        for file in files:
            filepath = os.path.join(dir, file)
            destpath = os.path.join(backup_dir, file)
            # check existence of previous versions
            for index in xrange(MAXVERSIONS):
                backup = '%s.%2.2d' % (destpath, index)
                if not os.path.exists(backup): break
            if index > 0:
                # no need to backup if file and last version are identical
                old_backup = '%s.%2.2d' % (destpath, index-1)
                try:
                    if os.path.isfile(old_backup
                       ) and filecmp.cmp(abspath, old_backup, shallow=False):
                        continue
                    except OSError:
                        pass
            try:
                shutil.copy(filepath, backup)
            except OSError:
                pass
if _ _name_ _ == '_ _main_ _':
    # run backup on the specified directory (default: the current directory)
    try: tree_top = sys.argv[1]
    except IndexError: tree_top = '.'
    backup(tree_top)

Discussion

Although version-control systems are more powerful, this script can be useful in development work. I often customize it, for example, to keep backups only of files with certain extensions (or, when that’s handier, of all files except those with certain extensions); it suffices to add an appropriate test at the very start of the for file in files loop, such as:

        name, ext = os.path.splitext(file)
        if ext not in ('.py', '.txt', '.doc'): continue

This snippet first uses function splitext from the standard library module os.path to extract the file extension (starting with a period) into local variable ext, then conditionally executes statement continue, which passes to the next leg of the loop, unless the extension is one of a few that happen to be the ones of interest in the current subtree.

Other potentially useful variants include backing files up to some other subtree (potentially on a removable drive, which has some clear advantages for backup purposes) rather than the current one, compressing the files that are being backed up (look at standard library module gzip for this purpose), and more refined ones yet. However, rather than complicating function backup by offering all of these variants as options, I prefer to copy the entire script to the root of each of the various subtrees of interest, and customize it with a little simple editing. While this strategy would be a very bad one for any kind of complicated, highly reusable production-level code, it is reasonable for a simple, straightforward system administration utility such as the one in this recipe.

Worthy of note in this recipe’s implementation is the use of function os.walk, a generator from the standard Python library’s module os, which makes it very simple to iterate over all or most of a filesystem subtree, with no need for such subtleties as recursion or callbacks, just a straightforward for statement. To avoid backing up the backups, this recipe uses one advanced feature of os.walk: the second one of the three values that os.walk yields at each step through the loop is a list of subdirectories of the current directory. We can modify this list in place, removing some of the subdirectory names it contains. When we perform such an in-place modification, os.walk does not recurse through the subdirectories whose names we removed. The following steps deal only with the subdirectories whose names are left in. This subtle but useful feature of os.walk is one good example of how a generator can receive information from the code that’s iterating on it, to affect details of the iteration being performed.

See Also

Documentation of standard library modules os, shutils, and gzip in the Library Reference and Python in a Nutshell.

10.8. Selectively Copying a Mailbox File

Credit: Noah Spurrier, Dave Benjamin

Problem

You need to selectively copy a large mailbox file (in mbox style), passing each message through a filtering function that may alter or skip the message.

Solution

The Python Standard Library package email is the modern Python approach for this kind of task. However, standard library modules mailbox and rfc822 can also supply the base functionality to implement this task:

def process_mailbox(mailboxname_in, mailboxname_out, filter_function):
    mbin = mailbox.PortableUnixMailbox(file(mailboxname_in,'r'))
    fout = file(mailboxname_out, 'w')
    for msg in mbin:
        if msg is None: break
        document = filter_function(msg, msg.fp.read( ))
        if document:
            assert document.endswith('

')
            fout.write(msg.unixfrom)
            fout.writelines(msg.headers)
            fout.write('
')
            fout.write(document)
    fout.close( )

Discussion

I often write lots of little scripts to filter my mailbox, so I wrote this recipe’s small module. I can import the module from each script and call the module’s function process_mailbox as needed. Python’s future direction is to perform email processing with the standard library package email, but lower-level modules, such as mailbox and rfc822, are still available in the Python Standard Library. They are sometimes easier to use than the rich, powerful, and very general functionality offered by package email.

The function you pass to process_mailbox as the third argument, filter_function, must take two arguments—msg, an rfc822 message object, and document, a string that is the message’s entire body, ending with two line-end characters ( ). filter_function can return False, meaning that this message must be skipped (i.e., not copied at all to the output), or else it must return a string terminated with that is written to the output as the message body. Normally, filter_function returns either False or the same document argument it was called with, but in some cases you may find it useful to write to the output file an altered version of the message’s body rather than the original message body.

Here is an example of a filter function that removes duplicate messages:

import sets
found_ids = sets.Set( )
def no_duplicates(msg, document):
    msg_id = msg.getheader('Message-ID')
    if msg_id in found_ids:
        return False
    found_ids.add(msg_id)
    return document

In Python 2.4, you could use the built-in set rather than sets.Set, but for a case as simple as this, it makes no real difference in performance (and the usage is exactly the same, anyway).

See Also

Documentation about modules mailbox and rfc822, and package email, in the Library Reference and Python in a Nutshell.

10.9. Building a Whitelist of Email Addresses From a Mailbox

Credit: Noah Spurrier

Problem

To help you configure an antispam system, you want a list of email addresses, commonly known as a whitelist, that you can trust won’t send you spam. The addresses to which you send email are undoubtedly good candidates for this whitelist.

Solution

Here is a script to output “To” addresses given a mailbox path:

#!/usr/bin/env python
""" Extract and print all 'To:' addresses from a mailbox """
import mailbox
def main(mailbox_path):
    addresses = {  }
    mb = mailbox.PortableUnixMailbox(file(mailbox_path))
    for msg in mb:
        toaddr = msg.getaddr('To')[1]
        addresses[toaddr] = 1
    addresses = addresses.keys( )
    addresses.sort( )
    for address in addresses:
        print address
if _ _name_ _ == '_ _main_ _':
    import sys
    main(sys.argv[1])

Discussion

In addition to bypassing spam filters, identifying addresses of people you’ve sent mail to may also help in other ways, such as flagging emails from them as higher priority, depending on your mail-reading habits and your mail reader’s capabilities. As long as your mail reader keeps mail you have sent in some kind of “Sent Items” mailbox in standard mailbox format, you can call this script with the path to the mailbox as its only argument, and the addresses to which you’ve sent mail will be emitted to standard output.

The script is simple because the Python Standard Library module mailbox does all the hard work. All the script needs to do is collect the set of email addresses as it loops through all messages, then emit them. While collecting, we keep addresses as a dictionary, since that’s much faster than keeping a list and checking each toaddr in order to append it only if it wasn’t already in the list. When we’re done collecting, we just extract the addresses from the dictionary as a list because we want to emit its items in sorted order. In Python 2.4, function main can be made even slightly more elegant, thanks to the new built-ins set and sorted:

def main(mailbox_path):
    addresses = set( )
    mb = mailbox.PortableUnixMailbox(file(mailbox_path))
    for msg in mb:
        toaddr = msg.getaddr('To')[1]
        addresses.add(toaddr)
    for address in sorted(addresses):
        print address

If your mailbox is not in the Unix mailbox style supported by mailbox.PortableUnixMailbox, you may want to use other classes supplied by the Python Standard Library module mailbox. For example, if your mailbox is in Qmail maildir format, you can use the mailbox.Maildir class to read it.

See Also

Documentation of the standard library module mailbox in the Library Reference and Python in a Nutshell.

10.10. Blocking Duplicate Mails

Credit: Marina Pianu, Peter Cogolo

Problem

Many of the mails you receive are duplicates. You need to block the duplicates with a fast, simple filter before they reach a more time-consuming step, such as an anti-spam filter, in your email pipeline.

Solution

Many mail systems, such as the popular procmail, and KDE’s KMail, enable you to control your mail-reception pipeline. Specifically, you can insert in the pipeline your filter programs, which get messages on standard input, may modify them, and emit them again on standard output. Here is one such filter, with the specific purpose of performing the task described in the Problem—blocking messages that are duplicates of other messages that you have received recently:

#!/usr/bin/python
import time, sys, os, email
now = time.time( )
# get archive of previously-seen message-ids and times
kde_dir = os.expanduser('~/.kde')
if not os.path.isdir(kde_dir):
    os.mkdir(kde_dir)
arfile = os.path.join(kde_dir, 'duplicate_mails')
duplicates = {  }
try:
    archive = open(arfile)
except IOError:
    pass
else:
    for line in archive:
        when, msgid = line[:-1].split(' ', 1)
        duplicates[msgid] = float(when)
    archive.close( )
redo_archive = False
# suck message in from stdin and study it
msg = email.message_from_file(sys.stdin)
msgid = msg['Message-ID']
if msgid:
    if msgid in duplicates:
        # duplicate message: alter its subject
        subject = msg['Subject']
        if subject is None:
            msg['Subject'] = '**** DUP **** ' + msgid
        else:
            del msg['Subject']
            msg['Subject'] = '**** DUP **** ' + subject
    else:
        # non-duplicate message: redo the archive file
        redo_archive = True
        duplicates[msgid] = now
else:
    # invalid (missing message-id) message: alter its subject
    subject = msg['Subject']
    if subject is None:
        msg['Subject'] = '**** NID **** '
    else:
        del msg['Subject']
        msg['Subject'] = '**** NID **** ' + subject
# emit message back to stdout
print msg
if redo_archive:
    # redo archive file, keep only msgs from the last two hours
    keep_last = now - 2*60*60.0
    archive = file(arfile, 'w')
    for msgid, when in duplicates.iteritems( ):
        if when > keep_last:
            archive.write('%9.2f %s
' % (when, what))
    archive.close( )

Discussion

Whether it is because of spammers’ malice or incompetence, or because of hiccups at my Internet ISP (Internet service provider), at times I get huge amounts of duplicate messages that can overload my mail-reception pipeline, particularly antispam filters. Fortunately, like many other mail systems, KDE’s KMail, the one I use, lets me insert my own filters in the mail reception pipeline. In particular, I can diagnose duplicate messages, alter their headers (I use “Subject” for clarity), and tell later stages in the filters’ pipeline to throw away messages with such subjects or to shunt them aside into a dedicated mailbox for later perusal, without passing them on to the antispam and other filters.

The email module from the Python Standard Library performs all the required parsing of the message and lets me access headers with dictionary-like indexing syntax. I need some “memory” of recently seen messages. Fortunately, I have noticed all duplicates happen within a few minutes of each other, so I don’t have to keep that memory for long—two hours are plenty. Therefore, I keep that memory in a simple text file, which records the time when a message was received and the message ID. I thought I might have to find a more advanced way to keep this kind of FIFO (first-in, first-out) archive, but I tried a simple approach first—a simple text file that is entirely rewritten whenever a new nonduplicate message arrives. This approach appears to perform quite adequately for my needs (at most a couple hundred messages an hour), even on my somewhat dated PC. “Do the simplest thing that could possibly work” strikes again!

See Also

Documentation about package email and modules time, sys and os in the Library Reference and Python in a Nutshell.

10.11. Checking Your Windows Sound System

Credit: Anand Pillai

Problem

You need to check whether the sound subsystem on your Windows PC is properly configured.

Solution

The winsound module of the Python Standard Library makes this check really simple:

import winsound
try:
    winsound.PlaySound("*", winsound.SND_ALIAS)
except RuntimeError, e:
    print 'Sound system has problems,', e
else:
    print 'Sound system is OK'

Discussion

The sound system might pass this test and still be unable to produce sound correctly, due to a variety of possible problems—starting from simple ones such as powered loudspeakers being turned off (there’s no sensible way you can check for that in your program!), all the way to extremely subtle and complicated ones. When sound is a problem in your applications, using this recipe at least you know whether you should be digging into a subtle issue of device driver configuration or start by checking whether the loudspeakers are on!

See Also

Documentation on the Python Standard Library winsound module.

10.12. Registering or Unregistering a DLL on Windows

Credit: Bill Bell

Problem

You want to register or unregister a DLL in Windows, just as it is normally done by regsrv32.exe, but you want to do it from Python, without requiring that executable to be present or bothering to find it.

Solution

All that Microsoft’s regsrv32.exe does is load a DLL and call its entries named DllRegisterServer or DllUnregisterServer. This behavior is very easy to replicate via Thomas Heller’s ctypes extension:

from ctypes import windll
dll = windll[r'C:PathToSome.DLL']
result = dll.DllRegisterServer( )
result = dll.DllUnregisterServer( )

The result is of Windows type HRESULT, so, if you wish, ctypes can also implicitly check it for you, raising a ctypes.WindowsError exception when an error occurs; you just need to use ctypes.oledll instead of ctypes.windll. In other words, to have the result automatically checked and an exception raised in case of errors, instead of the previous script, use this one:

from ctypes import oledll
dll = oledll[r'C:PathToSome.DLL']
dll.DllRegisterServer( )
dll.DllUnregisterServer( )

Discussion

Thomas Heller’s ctypes enables your Python code to load DLLs on Windows (and similar dynamic/shared libraries on other platforms) and call functions from such libraries, and it manages to perform these tasks with a high degree of both power and elegance. On Windows, in particular, it offers even further “added value” through such mechanisms as the oledll object, which, besides loading DLLs and calling functions from them, also checks the returned HRESULT instances and raises appropriate exceptions when the HRESULT values indicate errors.

In this recipe, we’re using ctypes (either the windll or oledll objects from that module) specifically to avoid the need to use Microsoft’s regsrv32.exe to register or unregister DLLs that implement in-process COM servers for some CLSIDs. (A CLSID is a globally unique identifier that identifies a COM class object, and the abbreviation presumably stands for class identifier.) The cases in which you’ll use this specific recipe are only those in which you need to register or unregister such COM DLLs (whether they’re implemented in Python or otherwise makes no difference). Be aware, however, that the applicability of ctypes is far wider, as it extends to any case in which you wish your Python code to load and interact with a DLL (or, on platforms other than Windows, equivalent dynamically loaded libraries, such as .so files on Linux and .dynlib files on Mac OS X).

The protocol that regsrv32.exe implements is well documented and very simple, so our own code can reimplement it in a jiffy. That’s much more practical than requiring regsrv32.exe to be installed on the machine on which we want to register or unregister the DLLs, not to mention finding where the EXE file might be to run it directly (via os.spawn or whatever) and also finding an effective way to detect errors and show them to the user.

10.13. Checking and Modifying the Set of Tasks Windows Automatically Runs at Login

Credit: Daniel Kinnaer

Problem

You need to check which tasks Windows is set to automatically run at login and possibly change this set of tasks.

Solution

When administering Windows machines, it’s crucial to keep track of the tasks each machine runs at login. Like so many Windows tasks, this requires working with the registry, and standard Python module _winreg enables this:

import _winreg as wr
aReg = wr.ConnectRegistry(None, wr.HKEY_LOCAL_MACHINE)
try:
    targ = r'SOFTWAREMicrosoftWindowsCurrentVersionRun'
    print "*** Reading from", targ, "***"
    aKey = wr.OpenKey(aReg, targ)
    try:
        for i in xrange(1024):
            try:
                n, v, t = wr.EnumValue(aKey, i)
                print i, n, v, t
            except EnvironmentError:
                print "You have", i, "tasks starting at logon"
                break
    finally:
        wr.CloseKey(aKey)
    print "*** Writing to", targ, "***"
    aKey = wr.OpenKey(aReg, targ, 0, wr.KEY_WRITE)
    try:
        try:
            wr.SetValueEx(aKey, "MyNewKey", 0, REG_SZ, r"c:winntexplorer.exe")
        except EnvironmentError:
            print "Encountered problems writing into the Registry..."
            raise
    finally:
        CloseKey(aKey)
finally:
    CloseKey(aReg)

Discussion

The Windows registry holds a wealth of crucial system administration data, and the Python standard module _winreg makes it feasible to read and alter data held in the registry. One of the items held in the Windows registry is a list of tasks to be run at login (in addition to other lists held elsewhere, such as the user-specific Startup folder that this recipe does not deal with).

This recipe shows how to examine the registry list of login tasks, and how to add a task to the list so it is run at login. (This recipe assumes you have Explorer installed at the specific location c:winnt. If you have it installed elsewhere, edit the recipe accordingly.)

If you want to remove the specific key added by this recipe, you can use the following simple script:

import _winreg as wr
aReg = wr.ConnectRegistry(None, wr.HKEY_LOCAL_MACHINE)
targ = r'SOFTWAREMicrosoftWindowsCurrentVersionRun'
aKey = wr.OpenKey(aReg, targ, 0, wr.KEY_WRITE)
wr.DeleteValue(aKey, "MyNewKey")
wr.CloseKey(aKey)
wr.CloseKey(aReg)

The try/finally constructs used in the recipe are far more robust than the simple sequence of function calls used in this latest snippet, since they ensure that everything is closed correctly regardless of whether the intervening calls succeed or fail. This care and prudence are strongly advisable for scripts that are meant be run in production, particularly for system-administration scripts that must generally run with administrator privileges. Such scripts therefore might harm a system’s setup if they don’t clean up after themselves properly. However, you can omit the try/finally when you know the calls will succeed or don’t care what happens if they fail. In this case, if you have successfully added a task with the recipe’s script, the calls in this simple cleanup script should work just fine.

See Also

Documentation for the standard module _winreg in the Library Reference; Windows API documentation available from Microsoft (http://msdn.microsoft.com); information on what is where in the registry tends to be spread information among many sources, but for some useful collections of such information, see http://www.winguides.com/registry and http://www.activewin.com/tips/reg/index.shtml.

10.14. Creating a Share on Windows

Credit: John Nielsen

Problem

You want to share a folder of your Windows PC on a LAN.

Solution

PyWin32’s win32net module makes this task very easy:

import win32net
import win32netcon
shinfo={  }
shinfo['netname'] = 'python test'
shinfo['type'] = win32netcon.STYPE_DISKTREE
shinfo['remark'] = 'data files'
shinfo['permissions'] = 0
shinfo['max_uses'] = -1
shinfo['current_uses'] = 0
shinfo['path'] = 'c:\my_data'
shinfo['passwd'] = ''
server = 'servername'
win32net.NetShareAdd(server, 2, shinfo)

Discussion

While the task of sharing a folder is indeed fairly easy to accomplish, finding the information on how you do so isn’t. All I could find in the win32net documentation was that you needed to pass a dictionary holding the share’s data “in the format of SHARE_INFO_*.” I finally managed to integrate this tidbit with the details from the Windows SDK (http://msdn.microsoft.com) and produce the information in this recipe. One detail that took me some effort to discover is that the constants you need to use as the value for the 'type' entry are “hidden away” in the win32netcon module.

See Also

PyWin32 docs at http://sourceforge.net/projects/pywin32/; Microsoft’s MSDN site, http://msdn.microsoft.com.

10.15. Connecting to an Already Running Instance of Internet Explorer

Credit: Bill Bell, Graham Fawcett

Problem

Instantiating Internet Explorer to access its interfaces via COM is easy, but you want to connect to an already running instance.

Solution

The simplest approach is to rely on Internet Explorer’s CLSID:

from win32com.client import Dispatch
ShellWindowsCLSID = '{9BA05972-F6A8-11CF-A442-00A0C90A8F39}'
ShellWindows = Dispatch(ShellWindowsCLSID)
print '%d instances of IE' % len(shellwindows)
print
for shellwindow in ShellWindows :
    print shellwindow
    print shellwindos.LocationName
    print shellwindos.LocationURL
    print

Discussion

Dispatching on the CLSID provides a sequence of all the running instances of the application with that class. Of course, there could be none, one, or more. If you’re interested in a specific instance, you may be able to identify it by checking, for example, for its properties LocationName and LocationURL.

You’ll see that Windows Explorer and Internet Explorer have the same CLSID—they’re basically the same application. If you need to distinguish between them, you can try adding at the start of your script the statement:

from win32gui import GetClassName

and then checking each shellwindow in the loop with:

    if GetClassName(shellwindow.HWND) == 'IEFrame':...

'IEFrame' is supposed to result from this call (according to the docs) for all Internet Explorer instances and those only. However, I have not found this check to be wholly reliable across all versions and patch levels of Windows and Internet Explorer, so, take this approach as just one possibility (which is why I haven’t added this check to the recipe’s official “Solution”).

This recipe does not let you receive IE events. The most important event is probably DocumentComplete. You can roughly substitute checks on the Busy property for the inability to wait for that event, but remember not to poll too frequently (for that or any other property) or you may slow down your PC excessively. Something like:

    while shellwindow.Busy:
        time.sleep(0.2)

Sleeping 0.2 seconds between checks may be a reasonable compromise between responding promptly and not loading your PC too heavily with a busy-waiting-loop.

See Also

PyWin32 docs at http://sourceforge.net/projects/pywin32/; Microsoft’s MSDN site, http://msdn.microsoft.com.

10.16. Reading Microsoft Outlook Contacts

Credit: Kevin Altis

Problem

Your Microsoft Outlook Contacts house a wealth of useful information, and you need to extract some of it in text form.

Solution

Like many other problems of system administration on Windows, this one is best approached by using COM. The most popular way to interface Python to COM is to use the win32com package, which is part of Mark Hammond’s pywin32 extension package:

from win32com.client import gencache, constants
DEBUG = False
class MSOutlook(object):
    def _ _init_ _(self):
        try:
            self.oOutlookApp = gencache.EnsureDispatch("Outlook.Application")
            self.outlookFound = True
        except:
            print "MSOutlook: unable to load Outlook"
            self.outlookFound = False
        self.records = [  ]
    def loadContacts(self, keys=None):
        if not self.outlookFound: return
        onMAPI = self.oOutlookApp.GetNamespace("MAPI")
        ofContacts = onMAPI.GetDefaultFolder(constants.olFolderContacts)
        if DEBUG: print "number of contacts:", len(ofContacts.Items)
        for oc in range(len(ofContacts.Items)):
            contact = ofContacts.Items.Item(oc + 1)
            if contact.Class == constants.olContact:
                if keys is None:
                    # no keys were specified, so build up a list of all keys
                    # that belong to some types we know we can deal with
                    good_types = int, str, unicode
                    keys = [key for key in contact._prop_map_get_
                        if isinstance(getattr(contact, key), good_types) ]
                    if DEBUG:
                        print "Fields
== == == == == == == == == == == =="
                        keys.sort( )
                        for key in keys: print key
                record = {  }
                for key in keys:
                    record[key] = getattr(contact, key)
                self.records.append(record)
                if DEBUG:
                    print oc, contact.FullName
if _ _name_ _ == '_ _main_ _':
    if '-d' in sys.argv:
        DEBUG = True
    if DEBUG:
        print "attempting to load Outlook"
    oOutlook = MSOutlook( )
    if not oOutlook.outlookFound:
        print "Outlook not found"
        sys.exit(1)
    fields = ['FullName', 'CompanyName',
              'MailingAddressStreet', 'MailingAddressCity',
              'MailingAddressState', 'MailingAddressPostalCode',
              'HomeTelephoneNumber', 'BusinessTelephoneNumber',
              'MobileTelephoneNumber', 'Email1Address', 'Body',
             ]
    if DEBUG:
        import time
        print "loading records..."
        startTime = time.time( )
    # to get all fields just call oOutlook.loadContacts( )
    # but getting a specific set of fields is much faster
    oOutlook.loadContacts(fields)
    if DEBUG:
        print "loading took %f seconds" % (time.time( ) - startTime)
    print "Number of contacts: %d" % len(oOutlook.records)
    print "Contact: %s" % oOutlook.records[0]['FullName']
    print "Body:
%s" % oOutlook.records[0]['Body']

Discussion

This recipe’s code could use more error-checking, and you could get it by using nested try/except blocks, but I didn’t want to obscure the code’s fundamental simplicity in this recipe. This recipe should work with different versions of Outlook, but I’ve tested it only with Outlook 2000. If you have applied the Outlook security patches then you will be prompted with a dialog requesting access to Outlook for 1-10 minutes from an external program, which in this case is Python.

The code has already been optimized in two important ways. First, by ensuring that the Python COM wrappers for Outlook have been generated, which is guaranteed by calling gencache.EnsureDispatch. Second, in the loop that reads the contacts, the Contact reference is obtained only once and then kept in a local variable contact to avoid repeated references. This simple but crucial optimization is the role of the statement:

contact = ofContacts.Items.Item(oc + 1)

Both of these optimizations have a dramatic impact on total import time, and both are important enough to keep in mind. Specifically, the EnsureDispatch idea is important for most uses of COM in Python; the concept of getting an object reference, once, into a local variable (rather than repeating indexing, calls, and attribute accesses) is even more important and applies to every use of Python.

Simple variations of this script can be applied to other elements of the Outlook object model such as the Calendar and Tasks. You’ll want to look at the Python wrappers generated for Outlook in the C:Python23Libsite-packageswin32comgen_py directory. I also suggest that you look at the Outlook object model documentation on MSDN and/or pick up a book on the subject.

See Also

PyWin32 docs at http://sourceforge.net/projects/pywin32/; Microsoft’s MSDN site, http://msdn.microsoft.com.

10.17. Gathering Detailed System Informationon Mac OS X

Credit: Brian Quinlan

Problem

You want to retrieve detailed information about a Mac OS X system. You want either complete information about the system or information about particular keys in the system-information database.

Solution

Mac OS X’s system_profiler command can provide system information as an XML stream that we can parse and examine:

#!/usr/bin/env python
from xml import dom
from xml.dom.xmlbuilder import DOMInputSource, DOMBuilder
import datetime, time, os
def group(seq, n):
    """group([0, 3, 4, 10, 2, 3, 1], 3) => [(0, 3, 4), (10, 2, 3)]
       Group a sequence into n-subseqs, discarding incomplete subseqs.
    """
    return [ seq[i:i+n] for i in xrange(0, len(seq)-n+1, n) ]
def remove_whitespace_nodes(node):
    """Removes all of the whitespace-only text descendants of a DOM node."""
    remove_list = [  ]
    for child in node.childNodes:
        if child.nodeType == dom.Node.TEXT_NODE and not child.data.strip( ):
            remove_list.append(child)
        elif child.hasChildNodes( ):
            remove_whitespace_nodes(child)
    for child in remove_list:
        node.removeChild(child)
        child.unlink( )
class POpenInputSource(DOMInputSource):
    "Use stdout from an external program as a DOMInputSource"
    def _ _init_ _(self, command):
        super(DOMInputSource, self)._ _init_ _( )
        self.byteStream = os.popen(command)
class OSXSystemProfiler(object):
    "Provide information from the Mac OS X System Profiler"
    def _ _init_ _(self, detail=-1):
        """detail can range from -2 to +1.  Larger numbers return more info.
           Beware of +1, can take many minutes to get all info!"""
        b = DOMBuilder( )
        self.document = b.parse(
            POpenInputSource('system_profiler -xml -detailLevel %d' % detail))
        remove_whitespace_nodes(self.document)
    def _content(self, node):
        "Get the text node content of an element, or an empty string"
        if node.firstChild:
            return node.firstChild.nodeValue
        else:
            return ''
    def _convert_value_node(self, node):
        """Convert a 'value' node (i.e. anything but 'key') into a Python data
           structure"""
        if node.tagName == 'string':
            return self._content(node)
        elif node.tagName == 'integer':
            return int(self._content(node))
        elif node.tagName == 'real':
            return float(self._content(node))
        elif node.tagName == 'date': #  <date>2004-07-05T13:29:29Z</date>
            return datetime.datetime(
                *time.strptime(self._content(node), '%Y-%m-%dT%H:%M:%SZ')[:5])
        elif node.tagName == 'array':
            return [self._convert_value_node(n) for n in node.childNodes]
        elif node.tagName == 'dict':
            return dict([(self._content(n), self._convert_value_node(m))
                          for n, m in group(node.childNodes, 2)])
        else:
            raise ValueError, 'Unknown tag %r' % node.tagName
    def _ _getitem_ _(self, key):
        from xml import xpath
        # pyxml's xpath does not support /element1[...]/element2...
        nodes = xpath.Evaluate('//dict[key=%r]' % key, self.document)
        results = [  ]
        for node in nodes:
            v = self._convert_value_node(node)[key]
            if isinstance(v, dict) and '_order' in v:
                # this is just information for display
                pass
            else:
                results.append(v)
        return results
    def all(self):
        """Return the complete information from the system profiler
           as a Python data structure"""
        return self._convert_value_node(
            self.document.documentElement.firstChild)
def main( ):
    from optparse import OptionParser
    from pprint import pprint
    info = OSXSystemProfiler( )
    parser = OptionParser( )
    parser.add_option("-f", "--field", action="store", dest="field",
                      help="display the value of the specified field")
    options, args = parser.parse_args( )
    if args:
        parser.error("no arguments are allowed")
    if options.field is not None:
        pprint(info[options.field])
    else:
        # print some keys known to exist in only one important dict
        for k in ['cpu_type', 'current_processor_speed', 'l2_cache_size',
                  'physical_memory', 'user_name', 'os_version', 'ip_address']:
            print '%s: %s' % (k, info[k][0])
if _ _name_ _ == '_ _main_ _':
    main( )

Discussion

Mac OS X puts at your disposal a wealth of information about your system through the system_profiler application. This recipe shows how to access that information from your Python code. First, you have to instantiate class OSXSystemProfiler, for example, via a statement such as info = OSXSystemProfiler( ); once you have done that, you can obtain all available information by calling info.all( ), or information for one specific key by indexing info[thekey]. The main function in the recipe, which executes when you run this module as a main script, emits information to standard output—either a specific key, requested by using switch -f when invoking the script, or, by default, a small set of keys known to be generally useful.

For example, when run on the old Apple iBook belonging to one of this book’s editors (no prize for guessing which one), the script in this recipe emits the following output:

cpu_type: PowerPC G4  (3.3)
current_processor_speed: 800 MHz
l2_cache_size: 256 KB
physical_memory: 640 MB
user_name: Alex (alex)
os_version: Mac OS X 10.3.6 (7R28)
ip_address: [u'192.168.0.190']

system_profiler returns XML data in pinfo format, so this recipe implements a partial pinfo parser, using Python’s standard library XML-parsing facilities, and the xpath implementation from the PyXML extensions. More information about Python’s facilities that help you deal with XML can be found in Chapter 12.

See Also

Documentation of the standard Python library support for XML in the Library Reference and Python in a Nutshell; PyXML docs at http://pyxml.sourceforge.net/; Mac OS X system_profiler docs at http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/system_profiler.8.html; Chapter 12.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.36.38