We will take a quick tour through the standard library sys
and os
modules in the first few sections of this
chapter before moving on to larger system programming concepts. As you
can tell from the length of their attribute lists, both of these are
large modules (their content may vary slightly per Python version and
platform):
>>>import sys, os
>>>len(dir(sys))
# 56 attributes 56 >>>len(dir(os))
# 118 on Windows, more on Unix 118 >>>len(dir(os.path))
# a nested module within os 43
As I’m not going to demonstrate every item in every built-in module, the first thing I want to do is show you how to get more details on your own. Officially, this task also serves as an excuse for introducing a few core system scripting concepts; along the way, we’ll code a first script to format documentation.
Most system-level interfaces in Python are shipped in
just two modules: sys
and os
. That’s somewhat oversimplified; other standard
modules belong to this domain too. Among them are the
following:
glob
For filename expansion
socket
For network connections and Inter-Process Communication (IPC)
thread
and queue
For concurrent threads
time
For accessing system time details
fcntl
For low-level file control
In addition, some built-in functions are actually system
interfaces as well (e.g., open
).
But sys
and os
together form the core of Python’s
system tools arsenal.
In principle at least, sys
exports components related to the Python
interpreter itself (e.g., the module search
path), and os
contains variables
and functions that map to the operating system on which Python is
run. In practice, this distinction may not always seem clear-cut
(e.g., the standard input and output streams show up in sys
, but they are arguably tied to
operating system paradigms). The good news is that you’ll soon use
the tools in these modules so often that their locations will be
permanently stamped on your memory.[*]
The os
module also attempts
to provide a portable programming interface to
the underlying operating system; its functions may be implemented
differently on different platforms, but to Python scripts, they look
the same everywhere. In addition, the os
module exports a nested submodule,
os.path
, which provides a
portable interface to file and directory processing tools.
As you can probably deduce from the preceding paragraphs, learning to write system scripts in Python is mostly a matter of learning about Python’s system modules. Luckily, there are a variety of information sources to make this task easier—from module attributes to published references and books.
For instance, if you want to know everything that a built-in
module exports, you can read its library manual entry, study its
source code (Python is open source software, after all), or fetch
its attribute list and documentation string interactively. Let’s import sys
and see what it has:
C:...PP3ESystem>python
>>>import sys
>>>dir(sys)
['_ _displayhook_ _', '_ _doc_ _', '_ _excepthook_ _', '_ _name_ _', '_ _stderr_ _', '_ _stdin_ _', '_ _stdout_ _', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'exc_clear', 'exc_info', 'exc_traceback', 'exc_type', 'exc_value', 'excepthook', 'exec_prefix', 'executable', 'exit', 'exitfunc', 'getcheckinterval', 'getdefaultencoding', 'getfilesystemencoding', 'getrecursionlimit', 'getrefcount', 'getwindowsversion', 'hexversion', 'maxint', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout', 'version', 'version_info', 'warnoptions', 'winver']
The dir
function simply
returns a list containing the string names of all the attributes in
any object with attributes; it’s a handy memory jogger for modules
at the interactive prompt. For example, we know there is something
called sys.version
, because the
name version
came back in the
dir
result. If that’s not enough,
we can always consult the _ _doc_
_
string of built-in modules:
>>>sys._ _doc_ _
"This module provides access to some objects used or maintained by the
interpreter
and to functions that interact strongly with the interpreter.
Dynamic
objects:
argv -- command line arguments; argv[0] is the script pathname if
known
path -- module
search path; path[0] is the script directory, else ''
modules
...
...lots of text deleted here...
...
"
The _ _doc_ _
built-in attribute usually contains a string of documentation, but
it may look a bit weird when displayed this way—it’s one long string
with embedded end-line characters that print as
, not as a nice list of lines. To format
these strings for a more humane display, you can simply use a
print
statement:
>>>print sys._ _doc_ _
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
...
...lots of lines deleted here...
...
The print
statement, unlike
interactive displays, interprets end-line characters correctly.
Unfortunately, print
doesn’t, by
itself, do anything about scrolling or paging and so can still be
unwieldy on some platforms. Tools such as the built-in help
function can do better:
>>>help(sys)
Help on built-in module sys:
NAME
sys
FILE
(built-in)
MODULE DOCS
http://www.python.org/doc/current/lib/module-sys.html
DESCRIPTION
This module provides access to some objects used or maintained by the
interpreter and to functions that interact strongly with the interpreter.
Dynamic objects:
argv -- command line arguments; argv[0] is the script pathname if known
...
...lots of lines deleted here...
...
The help
function is one
interface provided by the PyDoc system—code that ships with Python
and renders documentation (documentation strings, as well as
structural details) related to an object in a formatted way. The
format is either like a Unix manpage, which we get for help
, or an HTML page, which is more
grandiose. It’s a handy way to get basic information when working
interactively, and it’s a last resort before falling back on manuals
and books. It is also fairly fixed in the way it displays
information; although it attempts to page the display in some
contexts, its page size isn’t quite right on some of the machines I
use. When I want more control over the way help text is printed, I
usually use a utility script of my own, like the one in Example 3-1.
Example 3-1. PP3ESystemmore.py
######################################################### # split and interactively page a string or file of text; ######################################################### def more(text, numlines=15): lines = text.split(' ') while lines: chunk = lines[:numlines] lines = lines[numlines:] for line in chunk: print line if lines and raw_input('More?') not in ['y', 'Y']: break if _ _name_ _ == '_ _main_ _': import sys # when run, not imported more(open(sys.argv[1]).read( ), 10) # page contents of file on cmdline
The meat of this file is its more
function, and if you know any Python
at all, it should be fairly straightforward. It simply splits up a
string around end-line characters, and then slices off and displays
a few lines at a time (15 by default) to avoid scrolling off the
screen. A slice expression, lines[:15]
, gets the first 15 items in a
list, and lines[15:]
gets the
rest; to show a different number of lines each time, pass a number
to the numlines
argument (e.g.,
the last line in Example
3-1 passes 10 to the numlines
argument of the more
function).
The split
string object
method call that this script employs returns a list of substrings
(e.g., ["line", "line",...]
). In
recent Python releases, a new splitlines
method does similar
work:
>>>line = 'aaa bbb ccc '
>>>line.split(' ')
['aaa', 'bbb', 'ccc', ''] >>>line.splitlines( )
['aaa', 'bbb', 'ccc']
As we’ll see in the next chapter, the end-of-line character is
always
(which stands for a
byte having a binary value of 10) within a Python script, no matter
what platform it is run upon. (If you don’t already know why this
matters, DOS
characters are
dropped when read.)
Now, this is a simple Python program, but it already brings up three important topics that merit quick detours here: it uses string methods, reads from a file, and is set up to be run or imported. Python string methods are not a system-related tool per se, but they see action in most Python programs. In fact, they are going to show up throughout this chapter as well as those that follow, so here is a quick review of some of the more useful tools in this set. String methods include calls for searching and replacing:
>>>str = 'xxxSPAMxxx'
>>>str.find('SPAM')
# return first offset 3 >>>str = 'xxaaxxaa'
>>>str.replace('aa', 'SPAM')
# global replacement 'xxSPAMxxSPAM' >>>str = ' Ni '
>>>str.strip( )
# remove whitespace 'Ni'
The find
call returns the
offset of the first occurrence of a substring, and replace
does global search and
replacement. Like all string operations, replace
returns a new string instead of
changing its subject in-place (recall that strings are immutable).
With these methods, substrings are just strings; in Chapter 21, we’ll also meet a module
called re
that allows regular
expression patterns to show up in searches and
replacements.
String methods also provide functions that are useful for
things such as case conversions, and a standard library module named
string
defines some useful preset
variables, among other things:
>>>str = 'SHRUBBERY'
>>>str.lower( )
# case converters 'shrubbery' >>>str.isalpha( )
# content tests True >>>str.isdigit( )
False >>>import string
# case constants >>>string.lowercase
'abcdefghijklmnopqrstuvwxyz'
There are also methods for splitting up strings around a substring delimiter and putting them back together with a substring in between. We’ll explore these tools later in this book, but as an introduction, here they are at work:
>>>str = 'aaa,bbb,ccc'
>>>str.split(',')
# split into substrings list ['aaa', 'bbb', 'ccc'] >>>str = 'a b c d'
>>>str.split( )
# default delimiter: whitespace ['a', 'b', 'c', 'd'] >>>delim = 'NI'
>>>delim.join(['aaa', 'bbb', 'ccc'])
# join substrings list 'aaaNIbbbNIccc' >>>' '.join(['A', 'dead', 'parrot'])
# add a space between 'A dead parrot' >>>chars = list('Lorreta')
# covert to characters list >>>chars
['L', 'o', 'r', 'r', 'e', 't', 'a'] >>>chars.append('!')
>>>''.join(chars)
# to string: empty delimiter 'Lorreta!'
These calls turn out to be surprisingly powerful. For example,
a line of data columns separated by tabs can be parsed into its
columns with a single split
call;
the more.py script uses it to split a string
into a list of line strings. In fact, we can emulate the replace
call we saw earlier in this
section with a split/join combination:
>>>str = 'xxaaxxaa'
>>>'SPAM'.join(str.split('aa'))
# replace, the hard way 'xxSPAMxxSPAM'
For future reference, also keep in mind that Python doesn’t automatically convert strings to numbers, or vice versa; if you want to use one as you would use the other, you must say so with manual conversions:
>>>int("42"), eval("42")
# string to int conversions (42, 42) >>>str(42), repr(42), ("%d" % 42)
# int to string conversions ('42', '42', '42') >>>"42" + str(1), int("42") + 1
# concatenation, addition ('421', 43)
In the last command here, the first expression triggers string concatenation (since both sides are strings), and the second invokes integer addition (because both objects are numbers). Python doesn’t assume you meant one or the other and convert automatically; as a rule of thumb, Python tries to avoid magic whenever possible. String tools will be covered in more detail later in this book (in fact, they get a full chapter in Part V), but be sure to also see the library manual for additional string method tools.
A section on the original string
module was removed in this
edition. In the past, string method calls were also available by
importing the string
module and
passing the string object as an argument to functions
corresponding to the current methods. For instance, given a name
str
assigned to a string
object, the older call form:
import string string.replace(str, old, new) # requires an import
is the same as the more modern version:
str.replace(old, new)
But the latter form does not require a module import, and it
will run quicker (the older module call form incurs an extra call
along the way). You should use string object methods today, not
string
module functions, but
you may still see the older function-based call pattern in some
Python code. Although most of its functions are now deprecated,
the original string
module
today still contains predefined constants (such as string.lowercase
) and a new template
interface in 2.4.
The more.py script also opens the
external file whose name is listed on the command line using the
built-in open
function, and reads
that file’s text into memory all at once with the file object
read
method. Since file objects
returned by open
are part of the
core Python language itself, I assume that you have at least a
passing familiarity with them at this point in the text. But just in
case you’ve flipped to this chapter early on in your Pythonhood, the
calls:
open('file').read( ) # read entire file into string open('file').read(N) # read next N bytes into string open('file').readlines( ) # read entire file into line strings list open('file').readline( ) # read next line, through ' '
load a file’s contents into a string, load a fixed-size set of
bytes into a string, load a file’s contents into a list of line
strings, and load the next line in the file into a string,
respectively. As we’ll see in a moment, these calls can also be
applied to shell commands in Python to read their output. File
objects also have write
methods
for sending strings to the associated file. File-related topics are
covered in depth in the next chapter, but making an output file and
reading it back is easy in Python:
>>>file = open('spam.txt', 'w')
# create file spam.txt >>>file.write(('spam' * 5) + ' ')
>>>file.close( )
>>>file = open('spam.txt')
# or open('spam.txt').read( ) >>>text = file.read( )
>>>text
'spamspamspamspamspam '
The last few lines in the more.py
file also introduce one of the first big concepts in shell tool
programming. They instrument the file to be used in either of two
ways: as a script or as a
library. Every Python module has a built-in
_ _name_ _
variable that Python
sets to the _ _main_ _
string
only when the file is run as a program, not when it’s imported as a
library. Because of that, the more
function in this file is executed
automatically by the last line in the file when this script is run
as a top-level program, not when it is imported elsewhere. This
simple trick turns out to be one key to writing reusable script
code: by coding program logic as functions
rather than as top-level code, you can also import and reuse it in
other scripts.
The upshot is that we can run more.py by
itself or import and call its more
function elsewhere. When running the
file as a top-level program, we list on the command line the name of
a file to be read and paged: as I’ll describe in depth later in this
chapter, words typed in the command that is used to start a program
show up in the built-in sys.argv
list in Python. For example, here is the script file in action,
paging itself (be sure to type this command line in your
PP3ESystem directory, or it won’t find the
input file; more on command lines later):
C:...PP3ESystem>python more.py more.py
######################################################### # split and interactively page a string or file of text; ######################################################### def more(text, numlines=15): lines = text.split(' ') while lines: chunk = lines[:numlines] lines = lines[numlines:] for line in chunk: print line More?y
if lines and raw_input('More?') not in ['y', 'Y']: break if _ _name_ _ == '_ _main_ _': import sys # when run, not imported more(open(sys.argv[1]).read( ), 10) # page contents of file on cmdline
When the more.py file is imported, we
pass an explicit string to its more
function, and this is exactly the
sort of utility we need for documentation text. Running this utility
on the sys
module’s documentation
string gives us a bit more information in human-readable form about
what’s available to scripts:
C:...PP3ESystem>python
>>>from more import more
>>>import sys
>>>more(sys._ _doc_ _)
This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter. Dynamic objects: argv -- command line arguments; argv[0] is the script pathname if known path -- module search path; path[0] is the script directory, else '' modules -- dictionary of loaded modules displayhook -- called to show results in an interactive session excepthook -- called to handle any uncaught exception other than SystemExit To customize printing in an interactive session or to install a custom top-level exception handler, assign other functions to replace these. exitfunc -- if sys.exitfunc exists, this routine is called when Python exits More?
Pressing “y” or “Y” here makes the function display the next few lines of documentation, and then prompt again, unless you’ve run past the end of the lines list. Try this on your own machine to see what the rest of the module’s documentation string looks like.
If that still isn’t enough detail, your next step is to read
the Python library manual’s entry for sys
to get the full story. All of Python’s
standard manuals ship as HTML pages, so you should be able to read
them in any web browser you have on your computer. They are
installed with Python on Windows, but here are a few simple
pointers:
On Windows, click the Start button, pick Programs, select the Python entry there, and then choose the manuals item. The manuals should magically appear on your display within a browser like Internet Explorer. As of Python 2.4, the manuals are provided as a Windows help file and so support searching and navigation.
On Linux, you may be able to click on the manuals’ entries in a file explorer, or start your browser from a shell command line and navigate to the library manual’s HTML files on your machine.
If you can’t find the manuals on your computer, you can always read them online. Go to Python’s web site at http://www.python.org and follow the documentation links.
However you get started, be sure to pick the Library manual
for things such as sys
; Python’s
standard manual set also includes a short tutorial, language
reference, extending references, and more.
At the risk of sounding like a marketing droid, I should mention that you can also purchase the Python manual set, printed and bound; see the book information page at http://www.python.org for details and links. Commercially published Python reference books are also available today, including Python Essential Reference (Sams) and Python Pocket Reference (O’Reilly). The former is more complete and comes with examples, but the latter serves as a convenient memory jogger once you’ve taken a library tour or two.[*] Also useful are O’Reilly’s Python in a Nutshell and Python Standard Library.
[*] They may also work their way into your subconscious. Python newcomers sometimes appear on Internet discussion forums to discuss their experiences “dreaming in Python” for the first time.
[*] I also wrote the latter as a replacement for the reference appendix that appeared in the first edition of this book; it’s meant to be a supplement to the text you’re reading. Insert self-serving plug here.
3.145.35.194