The heart of the prior script was findFiles
, a function that knows how to portably collect matching
file and directory names in an entire tree, given a list of filename
patterns. It doesn’t do much more than the built-in find.find
call, but it can be augmented for
our own purposes. Because this logic was bundled up in a function,
though, it automatically becomes a reusable
tool.
For example, the next script imports and applies findFiles
, to collect
all filenames in a directory tree, by using the filename pattern
*
(it matches everything). I use
this script to fix a legacy problem in the book’s examples tree. The
names of some files created under MS-DOS were made all uppercase; for
example, spam.py became
SPAM.PY somewhere along the way. Because case is
significant both in Python and on some platforms, an import statement
such as import spam
will sometimes
fail for uppercase filenames.
To repair the damage everywhere in the thousand-file examples
tree, I wrote and ran Example
7-6. It works like this: for every filename in the tree, it
checks to see whether the name is all uppercase and asks the console
user whether the file should be renamed with the os.rename
call. To make this easy, it also
comes up with a reasonable default for most new names—the old one in
all-lowercase form.
Example 7-6. PP3EPyToolsfixnames_all.py
########################################################################## # Use: "python ....PyToolsfixnames_all.py". # find all files with all uppercase names at and below the current # directory ('.'), for each, ask the user for a new name to rename the # file to; used to catch old uppercase filenames created on MS-DOS # (case matters, when importing Python module files); caveats: this # may fail on case-sensitive machines if directory names are converted # before their contents--the original dir name in the paths returned by # find may no longer exist; the allUpper heuristic also fails for # odd filenames that are all non-alphabetic (ex: '.'), ########################################################################## import os, string listonly = False def allUpper(name): for char in name: if char in string.lowercase: # any lowercase letter disqualifies return 0 # else all upper, digit, or special return 1 def convertOne(fname): fpath, oldfname = os.path.split(fname) if allUpper(oldfname): prompt = 'Convert dir=%s file=%s? (y|Y)' % (fpath, oldfname) if raw_input(prompt) in ['Y', 'y']: default = oldfname.lower( ) newfname = raw_input('Type new file name (enter=%s): ' % default) newfname = newfname or default newfpath = os.path.join(fpath, newfname) os.rename(fname, newfpath) print 'Renamed: ', fname print 'to: ', str(newfpath) raw_input('Press enter to continue') return 1 return 0 if _ _name_ _ == '_ _main_ _': patts = "*" # inspect all filenames from fixeoln_all import findFiles # reuse finder function matches = findFiles(patts) ccount = vcount = 0 for matchlist in matches: # list of lists, one per pattern for fname in matchlist: # fnames are full directory paths print vcount+1, '=>', fname # includes names of directories if not listonly: ccount += convertOne(fname) vcount += 1 print 'Converted %d files, visited %d' % (ccount, vcount)
As before, the findFiles
function returns a list of simple filename lists,
representing the expansion of all patterns passed in (here, just one
result list, for the wildcard pattern *
).[*] For each file and directory name in the result, this
script’s convertOne
function
prompts for name changes; an os.path.split
and an os.path.join
call combination portably tacks
the new filename onto the old directory name. Here is a renaming
session in progress on Windows:
C: empexamples>python %X%PyToolsfixnames_all.py
Using Python find 1 => ..cshrc 2 => .LaunchBrowser.out.txt 3 => .LaunchBrowser.py ... ...more deleted... ... 218 => .Ai 219 => .AiExpertSystem 220 => .AiExpertSystemTODOConvert dir=.AiExpertSystem file=TODO? (y|Y)n
221 => .AiExpertSystem\_ _init_ _.py 222 => .AiExpertSystemholmes 223 => .AiExpertSystemholmesREADME.1STConvert dir=.AiExpertSystemholmes file=README.1ST? (y|Y)y
Type new file name (enter=readme.1st): Renamed: .AiExpertSystemholmesREADME.1st to: .AiExpertSystemholmes eadme.1st Press enter to continue 224 => .AiExpertSystemholmesREADME.2NDConvert dir=.AiExpertSystemholmes file=README.2ND? (y|Y)y
Type new file name (enter=readme.2nd):readme-more
Renamed: .AiExpertSystemholmesREADME.2nd to: .AiExpertSystemholmes eadme-more Press enter to continue ... ...more deleted... ... 1471 => . odos.py 1472 => . ounix.py 1473 => .xferall.linux.csh Converted 2 files, visited 1473
This script could simply convert every all-uppercase name to an all-lowercase equivalent automatically, but that’s potentially dangerous (some names might require mixed case). Instead, it asks for input during the traversal and shows the results of each renaming operation along the way.
Notice, though, that the pattern-matching power of the
find.find
call goes completely
unused in this script. Because this call must always visit
every file in the tree, the os.path.walk
interface we studied in Chapter 4 would work just as well and
avoids any initial pause while a filename list is being collected
(that pause is negligible here but may be significant for larger
trees). Example 7-7 is an
equivalent version of this script that does its tree traversal with
the walk
callbacks-based
model.
Example 7-7. PP3EPyToolsfixnames_all2.py
########################################################################### # Use: "python ....PyToolsfixnames_all2.py". # same, but use the os.path.walk interface, not find.find; to make this # work like the simple find version, puts off visiting directories until # just before visiting their contents (find.find lists dir names before # their contents); renaming dirs here can fail on case-sensitive platforms # too--walk keeps extending paths containing old dir names; ########################################################################### import os listonly = False from fixnames_all import convertOne def visitname(fname): global ccount, vcount print vcount+1, '=>', fname if not listonly: ccount += convertOne(fname) vcount += 1 def visitor(myData, directoryName, filesInDirectory): # called for each dir visitname(directoryName) # do dir we're in now, for fname in filesInDirectory: # and non-dir files here fpath = os.path.join(directoryName, fname) # fnames have no dirpath if not os.path.isdir(fpath): visitname(fpath) ccount = vcount = 0 os.path.walk('.', visitor, None) print 'Converted %d files, visited %d' % (ccount, vcount)
This version does the same job but visits one extra file (the
topmost root directory), and it may visit directories in a different
order (os.listdir
results are
unordered). Both versions run in similar time for the examples
directory tree on my computer.[*] We’ll revisit this script, as well as the fixeoln
line-end fixer, in the context of
a general tree-walker class hierarchy later in this chapter.
[*] Interestingly, using string '*'
for the patterns list works the same
way as using list ['*']
here,
only because a single-character string is a sequence that contains
itself; compare the results of map(find.find, '*')
with map(find.find, ['*'])
interactively to
verify.
[*] A very subtle thing: both versions of this script might
fail on platforms where case matters if they rename directories
along the way. If a directory is renamed
before the contents of that directory have
been visited (e.g., a directory SPAM
renamed to spam), then later reference to
the directory’s contents using the old name (e.g.,
SPAM/filename) will no longer be valid on
case-sensitive platforms. This can happen in the find.find
version, because directories
can and do show up in the result list
before their contents. It’s also a
potential with the os.path.walk
version, because the
prior directory path (with original directory names) keeps being
extended at each level of the tree. I use this script only on
Windows (DOS), so I haven’t been bitten by this in practice.
Workarounds—ordering find result lists, walking trees in a
bottom-up fashion, making two distinct passes for files and
directories, queuing up directory names on a list to be renamed
later, or simply not renaming directories at all—are all complex
enough to be delegated to the realm of reader experiments (see
the newer os.walk
walker in
Chapter 4 for bottom-up
traversal options). As a rule of thumb, changing a tree’s names
or structure while it is being walked is a risky venture.
3.15.26.221