One of the more common tasks in the shell utilities domain is applying an operation to a set of files in a directory -- a “folder” in Windows-speak. By running a script on a batch of files, we can automate (that is, script) tasks we might have to otherwise run repeatedly by hand.
For instance, suppose you need to search all of your Python files in a development directory for a global variable name (perhaps you’ve forgotten where it is used). There are many platform-specific ways to do this (e.g., the grep command in Unix), but Python scripts that accomplish such tasks will work on every platform where Python works -- Windows, Unix, Linux, Macintosh, and just about any other in common use today. Simply copy your script to any machine you wish to use it on, and it will work, regardless of which other tools are available there.
The most common way to go about
writing such tools is to first grab hold of a list of the names of
the files you wish to process, and then step through that list with a
Python for
loop, processing each file in turn. The
trick we need to learn here, then, is how to get such a directory
list within our scripts. There are at least three options: running
shell listing commands with os.popen
, matching
filename patterns with glob.glob
, and getting
directory listings with os.listdir
. They vary in
interface, result format, and portability.
Quick: How did you go about getting directory file listings before you heard of Python? If you’re new to shell tools programming, the answer may be: “Well, I started a Windows file explorer and clicked on stuff,” but I’m thinking in terms of less GUI-oriented command-line mechanisms here (and answers submitted in Perl and Tcl only get partial credit).
On Unix, directory listings are usually obtained by typing
ls in a shell; on Windows, they can be generated
with a dir command typed in an MS-DOS console
box. Because Python scripts may use os.popen
to
run any command line we can type in a shell, they also are the most
general way to grab a directory listing inside a Python program. We
met os.popen
earlier in this chapter; it runs a
shell command string and gives us a file object from which we can
read the command’s output. To illustrate, let’s first
assume the following directory structures (yes, I have both
dir and ls commands on my
Windows laptop; old habits die hard):
C: emp>dir /B
about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir C: emp>ls
about-pp.html about-ppr2e.html python1.5.tar.gz about-pp2e.html newdir C: emp>ls newdir
more temp1 temp2 temp3
The newdir
name is a nested subdirectory in
C: emp
here. Now, scripts can grab a listing of
file and directory names at this level by simply spawning the
appropriate platform-specific command line, and reading its output
(the text normally thrown up on the console window):
C: emp>python
>>>import os
>>>os.popen('dir /B').readlines( )
['about-pp.html 12', 'python1.5.tar.gz 12', 'about-pp2e.html 12', 'about-ppr2e.html 12', 'newdir 12']
Lines read from a shell command come back with a trailing end-line character, but it’s easy enough to slice off:
>>>for line in os.popen('dir /B').readlines( ):
...print line[:-1]
... about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir
Both dir and ls commands let us be specific about filename patterns to be matched and directory names to be listed; again, we’re just running shell commands here, so anything you can type at a shell prompt goes:
>>>os.popen('dir *.html /B').readlines( )
['about-pp.html 12', 'about-pp2e.html 12', 'about-ppr2e.html 12'] >>>os.popen('ls *.html').readlines( )
['about-pp.html 12', 'about-pp2e.html 12', 'about-ppr2e.html 12'] >>>os.popen('dir newdir /B').readlines( )
['temp1 12', 'temp2 12', 'temp3 12', 'more 12'] >>>os.popen('ls newdir').readlines( )
['more 12', 'temp1 12', 'temp2 12', 'temp3 12']
These calls use general tools and all work as advertised. As we noted
earlier, though, the downsides of os.popen
are
that it is nonportable (it doesn’t work well in a Windows GUI
application in Python 1.5.2 and earlier, and requires using a
platform-specific shell command), and it incurs a performance hit to
start up an independent program. The following two alternative
techniques do better on both counts.
The term “globbing” comes from the *
wildcard character in filename patterns -- per computing folklore,
a *
matches a “glob” of characters. In
less poetic terms, globbing simply means collecting the names of all
entries in a directory -- files and subdirectories -- whose
names match a given filename pattern. In Unix shells, globbing
expands filename patterns within a command line into all matching
file- names before the command is ever run. In Python, we can do
something similar by calling the glob.glob
built-in with a pattern to expand:
>>>import glob
>>>glob.glob('*')
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>>glob.glob('*.html')
['about-pp.html', 'about-pp2e.html', 'about-ppr2e.html'] >>>glob.glob('newdir/*')
['newdir\temp1', 'newdir\temp2', 'newdir\temp3', 'newdir\more']
The glob
call accepts the usual filename pattern
syntax used in shells (e.g., ?
means any one
character, *
means any number of characters, and
[]
is a character selection set).[20] The pattern should include a directory
path if you wish to glob in something other than the current working
directory, and the module accepts either Unix or DOS-style directory
separators (/ or ). This call also is implemented without spawning a
shell command, and so is likely to be faster and more portable across
all Python platforms than the os.popen
schemes
shown earlier.
Technically speaking, glob
is a bit more powerful
than described so far. In fact, using it to list files in one
directory is just one use of its pattern-matching skills. For
instance, it can also be used to collect matching names across
multiple directories, simply because each level in a passed-in
directory path can be a pattern too:
C: emp>python
>>>import glob
>>>for name in glob.glob('*examples/L*.py'): print name
... cpexamplesLauncher.py cpexamplesLaunch_PyGadgets.py cpexamplesLaunchBrowser.py cpexampleslaunchmodes.py examplesLauncher.py examplesLaunch_PyGadgets.py examplesLaunchBrowser.py exampleslaunchmodes.py >>>for name in glob.glob(r'**visitor_find*.py'): print name
... cpexamplesPyToolsvisitor_find.py cpexamplesPyToolsvisitor_find_quiet2.py cpexamplesPyToolsvisitor_find_quiet1.py examplesPyToolsvisitor_find.py examplesPyToolsvisitor_find_quiet2.py examplesPyToolsvisitor_find_quiet1.py
In the first call here, we get back filenames from two different
directories that matched the *examples
pattern; in
the second, both of the first directory levels are wildcards, so
Python collects all possible ways to reach the base filenames. Using
os.popen
to spawn shell commands only achieves the
same effect if the underlying shell or listing command does too.
The os
module’s listdir
call provides yet another way to collect filenames in a Python list.
It takes a simple directory name string, not a filename pattern, and
returns a list containing the names of all entries in that
directory -- both simple files and nested directories -- for
use in the calling script:
>>>os.listdir('.')
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>>os.listdir(os.curdir)
['about-pp.html', 'python1.5.tar.gz', 'about-pp2e.html', 'about-ppr2e.html', 'newdir'] >>>os.listdir('newdir')
['temp1', 'temp2', 'temp3', 'more']
This too is done without resorting to shell commands, and so is
portable to all major Python platforms. The result is not in any
particular order (but can be sorted with the list
sort
method), returns base filenames without their
directory path prefixes, and includes names of both files and
directories at the listed level.
To compare all three listing techniques,
let’s run them side by side on an explicit directory here. They
differ in some ways but are mostly just variations on a
theme -- os.popen
sorts names and returns
end-of-lines, glob.glob
accepts a pattern and
returns filenames with directory prefixes, and
os.listdir
takes a simple directory name and
returns names without directory prefixes:
>>>os.popen('ls C:PP2ndEd').readlines( )
['README.txt 12', 'cdrom 12', 'chapters 12', 'etc 12', 'examples 12', 'examples.tar.gz 12', 'figures 12', 'shots 12'] >>>glob.glob('C:PP2ndEd*')
['C:\PP2ndEd\examples.tar.gz', 'C:\PP2ndEd\README.txt', 'C:\PP2ndEd\shots', 'C:\PP2ndEd\figures', 'C:\PP2ndEd\examples', 'C:\PP2ndEd\etc', 'C:\PP2ndEd\chapters', 'C:\PP2ndEd\cdrom'] >>>os.listdir('C:PP2ndEd')
['examples.tar.gz', 'README.txt', 'shots', 'figures', 'examples', 'etc', 'chapters', 'cdrom']
Of these three, glob
and
listdir
are generally better options if you care
about script portability, and listdir
seems
fastest in recent Python releases (but gauge its performance
yourself -- implementations may change over time).
In the last example, I pointed out that glob
returns names with directory paths, but listdir
gives raw base filenames. For convenient processing, scripts often
need to split glob
results into base files, or
expand listdir
results into full paths. Such
translations are easy if we let the os.path
module
do all the work for us. For example, a script that intends to copy
all files elsewhere will typically need to first split off the base
filenames from glob
results so it can add
different directory names on the front:
>>>dirname = r'C:PP2ndEd'
>>>for file in glob.glob(dirname + '/*'):
...head, tail = os.path.split(file)
...print head, tail, '=>', ('C:\Other' + tail)
... C:PP2ndEd examples.tar.gz => C:Otherexamples.tar.gz C:PP2ndEd README.txt => C:OtherREADME.txt C:PP2ndEd shots => C:Othershots C:PP2ndEd figures => C:Otherfigures C:PP2ndEd examples => C:Otherexamples C:PP2ndEd etc => C:Otheretc C:PP2ndEd chapters => C:Otherchapters C:PP2ndEd cdrom => C:Othercdrom
Here, the names after the =>
represent names
that files might be moved to. Conversely, a script that means to
process all files in a different directory than the one it runs in
will probably need to prepend listdir
results with
the target directory name, before passing filenames on to other
tools:
>>>for file in os.listdir(dirname):
...print os.path.join(dirname, file)
... C:PP2ndEdexamples.tar.gz C:PP2ndEdREADME.txt C:PP2ndEdshots C:PP2ndEdfigures C:PP2ndEdexamples C:PP2ndEdetc C:PP2ndEdchapters C:PP2ndEdcdrom
Notice, though, that all of the preceding techniques only return the names of files in a single directory. What if you want to apply an operation to every file in every directory and subdirectory in a directory tree?
For instance, suppose again that we need to find every occurrence of a global name in our Python scripts. This time, though, our scripts are arranged into a module package : a directory with nested subdirectories, which may have subdirectories of their own. We could rerun our hypothetical single-directory searcher in every directory in the tree manually, but that’s tedious, error-prone, and just plain no fun.
Luckily, in Python it’s almost as easy to process a directory
tree as it is to inspect a single directory. We can either collect
names ahead of time with the find
module, write a
recursive routine to traverse the tree, or use a tree-walker utility
built-in to the os
module. Such tools can be used
to search, copy, compare, and otherwise process arbitrary directory
trees on any platform that Python runs on (and that’s just
about everywhere).
The first way to go hierarchical is to collect a list of all names in
a directory tree ahead of time, and step through that list in a loop.
Like the single-directory tools we just met, a call to the
find.find
built-in returns a list of both file and
directory names. Unlike the tools described earlier,
find.find
also returns pathnames of matching files
nested in subdirectories, all the way to the bottom of a tree:
C: emp>python
>>>import find
>>>find.find('*')
['.\about-pp.html', '.\about-pp2e.html', '.\about-ppr2e.html', '.\newdir', '.\newdir\more', '.\newdir\more\xxx.txt', '.\newdir\more\yyy.txt', '.\newdir\temp1', '.\newdir\temp2', '.\newdir\temp3', '.\python1.5.tar.gz'] >>>for line in find.find('*'): print line
... .about-pp.html .about-pp2e.html .about-ppr2e.html . ewdir . ewdirmore . ewdirmorexxx.txt . ewdirmoreyyy.txt . ewdir emp1 . ewdir emp2 . ewdir emp3 .python1.5.tar.gz
We get back a list of full pathnames, that each include the top-level directory’s path. By default, find collects names matching the passed-in pattern in the tree rooted at the current working directory, known as “.”. If we want a more specific list, we can pass in both a filename pattern and a directory tree root to start at; here’s how to collect HTML filenames at “.” and below:
>>> find.find('*.html', '.')
['.\about-pp.html', '.\about-pp2e.html', '.\about-ppr2e.html']
Incidentally,
find.find
is also the Python library’s
equivalent to platform-specific shell commands such as a
find -print on Unix and Linux, and dir /B
/S on DOS and Windows. Since we can usually run such shell
commands in a Python script with os.popen
, the
following does the same work as find.find
, but is
inherently nonportable, and must start up a separate program along
the way:
>>>import os
>>>for line in os.popen('dir /B /S').readlines( ): print line,
... C: empabout-pp.html C: emppython1.5.tar.gz C: empabout-pp2e.html C: empabout-ppr2e.html C: emp ewdir C: emp ewdir emp1 C: emp ewdir emp2 C: emp ewdir emp3 C: emp ewdirmore C: emp ewdirmorexxx.txt C: emp ewdirmoreyyy.txt
If the find
calls don’t seem to work in your
Python, try changing the import statement used to load the module
from import
find
to
from
PP2E.PyTools
import
find
. Alas, the Python
standard library’s find
module has been
marked as “deprecated” as of Python 1.6. That means it
may be deleted from the standard Python distribution in the future,
so pay attention to the next section; we’ll use its topic later
to write our own find
module -- one that is
also shipped on this book’s CD (see http://examples.oreilly.com/python2).
To make it easy to apply an operation to
all files in a tree, Python also comes with a utility that scans
trees for us, and runs a provided function at every directory along
the way. The os.path.walk
function is called with
a directory root, function object, and optional data item, and walks
the tree at the directory root and below. At each directory, the
function object passed in is called with the optional data item, the
name of the current directory, and a list of filenames in that
directory (obtained from os.listdir
). Typically,
the function we provide scans the filenames list to process files at
each directory level in the tree.
That description might sound horribly complex the first time you hear
it, but os.path.walk
is fairly straightforward
once you get the hang of it. In the following code, for example, the
lister
function is called from
os.path.walk
at each directory in the tree rooted
at “.”. Along the way, lister
simply
prints the directory name, and all the files at the current level
(after prepending the directory name). It’s simpler in Python
than in English:
>>>import os
>>>def lister(dummy, dirname, filesindir):
...print '[' + dirname + ']'
...for fname in filesindir:
...print os.path.join(dirname, fname)
# handle one file ... >>>os.path.walk('.', lister, None)
[.] .about-pp.html .python1.5.tar.gz .about-pp2e.html .about-ppr2e.html . ewdir [. ewdir] . ewdir emp1 . ewdir emp2 . ewdir emp3 . ewdirmore [. ewdirmore] . ewdirmorexxx.txt . ewdirmoreyyy.txt
In other words, we’ve coded our own custom and easily changed recursive directory listing tool in Python. Because this may be something we would like to tweak and reuse elsewhere, let’s make it permanently available in a module file, shown in Example 2-15, now that we’ve worked out the details interactively.
Example 2-15. PP2ESystemFiletoolslister_walk.py
# list file tree with os.path.walk import sys, os def lister(dummy, dirName, filesInDir): # called at each dir print '[' + dirName + ']' for fname in filesInDir: # includes subdir names path = os.path.join(dirName, fname) # add dir name prefix if not os.path.isdir(path): # print simple files only print path if __name__ == '__main__': os.path.walk(sys.argv[1], lister, None) # dir name in cmdline
This is the same code, except that directory names are filtered out
of the filenames list by consulting the
os.path.isdir
test, to avoid listing them twice
(see -- it’s been tweaked already). When packaged this way,
the code can also be run from a shell command line. Here it is being
launched from a different directory, with the directory to be listed
passed in as a command-line argument:
C:...PP2ESystemFiletools>python lister_walk.py C:Temp
[C:Temp]
C:Tempabout-pp.html
C:Temppython1.5.tar.gz
C:Tempabout-pp2e.html
C:Tempabout-ppr2e.html
[C:Temp
ewdir]
C:Temp
ewdir emp1
C:Temp
ewdir emp2
C:Temp
ewdir emp3
[C:Temp
ewdirmore]
C:Temp
ewdirmorexxx.txt
C:Temp
ewdirmoreyyy.txt
The walk
paradigm also allows functions to tailor
the set of directories visited by changing the file list argument in
place. The library manual documents this further, but it’s
probably more instructive to simply know what walk
truly looks like. Here is its actual Python-coded implementation for
Windows platforms, with comments added to help demystify its
operation:
def walk(top, func, arg): # top is the current dirname try: names = os.listdir(top) # get all file/dir names here except os.error: # they have no path prefix return func(arg, top, names) # run func with names list here exceptions = ('.', '..') for name in names: # step over the very same list if name not in exceptions: # but skip self/parent names name = join(top, name) # add path prefix to name if isdir(name): walk(name, func, arg) # descend into subdirs here
Notice that walk
generates filename lists at each
level with os.listdir
, a call that collects both
file and directory names in no particular order, and returns them
without their directory paths. Also note that walk
uses the very same list returned by os.listdir
and
passed to the function you provide, to later descend into
subdirectories (variable names
). Because lists are
mutable objects that can be changed in place, if your function
modifies the passed-in filenames list, it will impact what
walk
does next. For example, deleting directory
names will prune traversal branches, and sorting the list will order
the walk.
The os.path.walk
tool does tree traversals for us,
but it’s sometimes more flexible, and hardly any more work, to
do it ourself. The following script recodes the directory listing
script with a manual recursive traversal function. The
mylister
function in Example 2-16
is almost the same as lister
in the prior script,
but calls os.listdir
to generate file paths
manually, and calls itself recursively to descend into
subdirectories.
Example 2-16. PP2ESystemFiletoolslister_recur.py
# list files in dir tree by recursion import sys, os def mylister(currdir): print '[' + currdir + ']' for file in os.listdir(currdir): # list files here path = os.path.join(currdir, file) # add dir path back if not os.path.isdir(path): print path else: mylister(path) # recur into subdirs if __name__ == '__main__': mylister(sys.argv[1]) # dir name in cmdline
This version is packaged as a script too (this is definitely too much code to type at the interactive prompt); its output is identical when run as a script:
C:...PP2ESystemFiletools>python lister_recur.py C:Temp
[C:Temp]
C:Tempabout-pp.html
C:Temppython1.5.tar.gz
C:Tempabout-pp2e.html
C:Tempabout-ppr2e.html
[C:Temp
ewdir]
C:Temp
ewdir emp1
C:Temp
ewdir emp2
C:Temp
ewdir emp3
[C:Temp
ewdirmore]
C:Temp
ewdirmorexxx.txt
C:Temp
ewdirmoreyyy.txt
But this file is just as useful when imported and called elsewhere:
C: emp>python
>>>from PP2E.System.Filetools.lister_recur import mylister
>>>mylister('.')
[.] .about-pp.html .python1.5.tar.gz .about-pp2e.html .about-ppr2e.html [. ewdir] . ewdir emp1 . ewdir emp2 . ewdir emp3 [. ewdirmore] . ewdirmorexxx.txt . ewdirmoreyyy.txt
We will make better use of most of this section’s techniques in
later examples in Chapter 5, and this book at
large. For example, scripts for copying and comparing directory trees
use the tree-walker techniques listed previously. Watch for these
tools in action along the way. If you are interested in directory
processing, also see the coverage of Python’s old
grep
module in Chapter 5; it
searches files, and can be applied to all files in a directory when
combined with the glob
module, but simply prints
results and does not traverse directory trees by itself.
Over the last eight years, I’ve learned to trust Python’s
Benevolent Dictator. Guido generally does the right thing, and if you
don’t think so, it’s usually only because you
haven’t yet realized how your own position is flawed. Trust me
on this. On the other hand, it’s not completely clear why the
standard find
module I showed you seems to have
fallen into deprecation; it’s a useful tool. In fact, I use it
a lot -- it is often nice to be able to grab a list of files to
process in a single function call, and step through it in a
for
loop. The
alternatives -- os.path.walk
, and recursive
functions -- are more code-y, and tougher for beginners to digest.
I suppose the find
module’s followers (if
there be any) could have defended it in long, drawn-out debates on
the Internet, that would have spanned days or weeks, been joined by a
large cast of heroic combatants, and gone just about nowhere. I
decided to spend ten minutes whipping up a custom alternative
instead. The module in Example 2-17 uses the standard
os.path.walk
call described earlier to reimplement
a find operation for Python.
Example 2-17. PP2EPyToolsfind.py
#!/usr/bin/python ######################################################## # custom version of the now deprecated find module # in the standard library--import as "PyTools.find"; # equivalent to the original, but uses os.path.walk, # has no support for pruning subdirs in the tree, and # is instrumented to be runnable as a top-level script; # results list sort differs slightly for some trees; # exploits tuple unpacking in function argument lists; ######################################################## import fnmatch, os def find(pattern, startdir=os.curdir): matches = [] os.path.walk(startdir, findvisitor, (matches, pattern)) matches.sort( ) return matches def findvisitor((matches, pattern), thisdir, nameshere): for name in nameshere: if fnmatch.fnmatch(name, pattern): fullpath = os.path.join(thisdir, name) matches.append(fullpath) if __name__ == '__main__': import sys namepattern, startdir = sys.argv[1], sys.argv[2] for name in find(namepattern, startdir): print name
There’s not much to this file; but calling its
find
function provides the same utility as the
deprecated find
standard module, and is noticeably
easier than rewriting all of this file’s code every time you
need to perform a find-type search. To process every Python file in a
tree, for instance, I simply type:
from PP2E.PyTools import find for name in find.find('*.py'): ...do something with name...
As a more concrete example, I use the following simple script to clean out any old output text files located anywhere in the book examples tree:
C:...PP2E>type PyToolscleanoutput.py
import os # delete old output files in tree from PP2E.PyTools.find import find # only need full path if I'm moved for filename in find('*.out.txt'): # use cat instead of type in Linux print filename if raw_input('View?') == 'y': os.system('type ' + filename) if raw_input('Delete?') == 'y': os.remove(filename) C: empexamples>python %X%PyToolscleanoutput.py
.InternetCgi-WebBasicslanguages.out.txt View? Delete? .InternetCgi-WebPyErrataAdminToolsdbaseindexed.out.txt View? Delete?y
To achieve such code economy, the custom find
module calls os.path.walk
to register a function
to be called per directory in the tree, and simply adds matching
filenames to the result list along the way.
New here, though, is the fnmatch
module -- a
standard Python module that performs Unix-like pattern matching
against filenames, and was also used by the original
find
. This module supports common operators in
name pattern strings: *
(to match any number of
characters), ?
(to match any single character),
and [...]
and [!...]
(to match
any character inside the bracket pairs, or not); other characters
match themselves.[21] To make sure that this
alternative’s results are similar, I also wrote the test module
shown in Example 2-18.
Example 2-18. PP2EPyToolsfind-test.py
############################################################ # test custom find; the builtin find module is deprecated: # if it ever goes away completely, replace all "import find" # with "from PP2E.PyTools import find" (or add PP2EPyTools # to your path setting and just "import find"); this script # takes 4 seconds total time on my 650mhz Win98 notebook to # run 10 finds over a directory tree of roughly 1500 names; ############################################################ import sys, os, string for dir in sys.path: if string.find(os.path.abspath(dir), 'PyTools') != -1: print 'removing', repr(dir) sys.path.remove(dir) # else may import both finds from PyTools, '.'! import find # get deprecated builtin (for now) import PP2E.PyTools.find # later use: from PP2E.PyTools import find print find print PP2E.PyTools.find assert find.find != PP2E.PyTools.find.find # really different? assert string.find(str(find), 'Lib') != -1 # should be after path remove assert string.find(str(PP2E.PyTools.find), 'PyTools') != -1 startdir = r'C:PP2ndEdexamplesPP2E' for pattern in ('*.py', '*.html', '*.c', '*.cgi', '*'): print pattern, '=>' list1 = find.find(pattern, startdir) list2 = PP2E.PyTools.find.find(pattern, startdir) print len(list1), list1[-1] print len(list2), list2[-1] print list1 == list2,; list1.sort( ); print list1 == list2
There is some magic at the top of this script that I need to explain.
To make sure that it can load both the standard library’s find
module and the custom one in
PP2EPyTools
, it must delete the entry (or
entries) on the module search path that point to the
PP2EPyTools
directory, and import the custom
version with a full package
directory -- PP2E.PyTools.find
. If not,
we’d always get the same find module, the one in
PyTools
, no matter where this script is run
from.
Here’s why. Recall that Python always adds the directory
containing a script being run to the front of
sys.path
. If we didn’t delete that entry
here, the import
find
statement
would always load the custom find in PyTools
,
because the custom find.py
module is in the same
directory as the find-test.py
script. The
script’s home directory would effectively hide the standard
library’s find. If that doesn’t make sense, go back and
reread Section 2.7 earlier in
this chapter.
Below is the output of this tester, along with a few command-line
invocations; unlike the original find, the custom version in Example 2-18 can be run as a command-line tool too. If you
study the test output closely, you’ll notice that the custom
find differs only in an occasional sort order that I won’t go
into further here (the original find module used a recursive
function, not os.path.walk
); the “0 1”
lines mean that results differ in order, but not content. Since find
callers don’t generally depend on precise filename result
ordering, this is trivial:
C: emp>python %X%PyToolsfind-test.py
removing 'C:\PP2ndEd\examples\PP2E\PyTools' <module 'find' from 'C:Program FilesPythonLibfind.pyc'> <module 'PP2E.PyTools.find' from 'C:PP2ndEdexamplesPP2EPyToolsfind.pyc'>*.py
=> 657 C:PP2ndEdexamplesPP2E ounix.py 657 C:PP2ndEdexamplesPP2E ounix.py 0 1*.html =>
37 C:PP2ndEdexamplesPP2ESystemFiletools emplate.html 37 C:PP2ndEdexamplesPP2ESystemFiletools emplate.html 1 1*.c =>
46 C:PP2ndEdexamplesPP2EOtherold-Integembed.c 46 C:PP2ndEdexamplesPP2EOtherold-Integembed.c 0 1*.cgi =>
24 C:PP2ndEdexamplesPP2EInternetCgi-WebPyMailCgionViewSubmit.cgi 24 C:PP2ndEdexamplesPP2EInternetCgi-WebPyMailCgionViewSubmit.cgi 1 1* =>
1519 C:PP2ndEdexamplesPP2Exferall.linux.csh 1519 C:PP2ndEdexamplesPP2Exferall.linux.csh 0 1 C: emp>python %X%PyToolsfind.py *.cxx C:PP2ndEdexamplesPP2E
C:PP2ndEdexamplesPP2EExtendSwigShadowmain.cxx C:PP2ndEdexamplesPP2EExtendSwigShadow umber.cxx C: emp>python %X%PyToolsfind.py *.asp C:PP2ndEdexamplesPP2E
C:PP2ndEdexamplesPP2EInternetOtherasp-py.asp C: emp>python %X%PyToolsfind.py *.i C:PP2ndEdexamplesPP2E
C:PP2ndEdexamplesPP2EExtendSwigEnvironenviron.i C:PP2ndEdexamplesPP2EExtendSwigShadow umber.i C:PP2ndEdexamplesPP2EExtendSwighellolib.i C: emp>python %X%PyToolsfind.py setup*.csh C:PP2ndEdexamplesPP2E
C:PP2ndEdexamplesPP2EConfigsetup-pp-embed.csh C:PP2ndEdexamplesPP2EConfigsetup-pp.csh C:PP2ndEdexamplesPP2EEmbExtExportsClassAndModsetup-class.csh C:PP2ndEdexamplesPP2EExtendSwigsetup-swig.csh [filename sort scheme] C: emp>python
>>> l = ['ccc', 'bbb', 'aaa', 'aaa.xxx', 'aaa.yyy', 'aaa.xxx.nnn'] >>> l.sort( ) >>> l ['aaa', 'aaa.xxx', 'aaa.xxx.nnn', 'aaa.yyy', 'bbb', 'ccc']
Finally, if an example in this book fails in a future Python release
because there is no find
to be found, simply
change find-module imports in the source code to say
from
PP2E.PyTools
import
find
instead of
import
find
. The former form
will find the custom find
module in the
book’s example package directory tree; the old module in the
standard Python library is ignored (if it is still there at all). And
if you are brave enough to add the PP2EPyTools
directory itself to your PYTHONPATH setting, all original
import
find
statements will
continue to work unchanged.
Better still, do nothing at all -- most find-based examples in
this book automatically pick the alternative by catching import
exceptions, just in case they aren’t located in the
PyTools
directory:
try: import find except ImportError: from PP2E.PyTools import find
The find
module may be gone, but it need not be
forgotten.
[20] In fact, glob
just uses the standard
fnmatch
module to match name patterns; see the
fnmatch
description later in this chapter in Section 2.12.3 for more
details.
[21] Unlike the re
module, fnmatch
supports only common Unix shell
matching operators, not full-blown regular expression patterns; to
understand why this matters, see Chapter 18 for
more details.
3.142.235.144