Armed with the portable search_all
script from Example
7-10, I was able to better pinpoint files to be edited every
time I changed the book examples tree structure. At least initially,
in one window I ran search_all
to
pick out suspicious files and edited each along the way by hand in
another window.
Pretty soon, though, this became tedious too. Manually typing filenames into editor commands is no fun, especially when the number of files to edit is large; the search for “Part2” shown earlier returned 74 files, for instance. Since I occasionally have better things to do than manually start 74 editor sessions, I looked for a way to automatically run an editor on each suspicious file.
Unfortunately, search_all
simply prints results to the screen. Although that text could be
intercepted and parsed, a more direct approach that spawns edit
sessions during the search may be easier, but may require major
changes to the tree search script as currently coded. At this point,
two thoughts came to mind.
First, I knew it would be easier in the long run to be able to add features to a general directory searcher as external components, not by changing the original script. Because editing files was just one possible extension (what about automating text replacements too?), a more generic, customizable, and reusable search component seemed the way to go.
Second, after writing a few directory walking utilities, it
became clear that I was rewriting the same sort of code over and over
again. Traversals could be even further simplified by wrapping common
details for easier reuse. The os.path.walk
tool helps, but its use tends
to foster redundant operations (e.g., directory name joins), and its
function-object-based interface doesn’t quite lend itself to
customization the way a class can.
Of course, both goals point to using an object-oriented
framework for traversals and searching. Example 7-11 is one concrete
realization of these goals. It exports a general FileVisitor
class that mostly just wraps os.path.walk
for easier use and extension,
as well as a generic SearchVisitor
class that generalizes the notion of directory
searches. By itself, SearchVisitor
simply does what search_all
did,
but it also opens up the search process to customization; bits of its
behavior can be modified by overloading its methods in subclasses.
Moreover, its core search logic can be reused everywhere we need to
search. Simply define a subclass that adds search-specific extensions.
As is usual in programming, once you repeat
tactical tasks often enough, they tend to inspire
this kind of strategic thinking.
Example 7-11. PP3EPyToolsvisitor.py
########################################################################## # Test: "python ....PyToolsvisitor.py testmask [string]". Uses OOP, # classes, and subclasses to wrap some of the details of os.path.walk # usage to walk and search; testmask is an integer bitmask with 1 bit # per available selftest; see also: visitor_edit/replace/find/fix*/.py # subclasses, and the fixsitename.py client script in InternetCgi-Web; ########################################################################## import os, sys listonly = False class FileVisitor: """ visits all nondirectory files below startDir; override visitfile to provide a file handler """ def _ _init_ _(self, data=None, listonly=False): self.context = data self.fcount = 0 self.dcount = 0 self.listonly = listonly def run(self, startDir=os.curdir): # default start='.' os.path.walk(startDir, self.visitor, None) def visitor(self, data, dirName, filesInDir): # called for each dir self.visitdir(dirName) # do this dir first for fname in filesInDir: # do non-dir files fpath = os.path.join(dirName, fname) # fnames have no path if not os.path.isdir(fpath): self.visitfile(fpath) def visitdir(self, dirpath): # called for each dir self.dcount += 1 # override or extend me print dirpath, '...' def visitfile(self, filepath): # called for each file self.fcount += 1 # override or extend me print self.fcount, '=>', filepath # default: print name class SearchVisitor(FileVisitor): """ search files at and below startDir for a string """ skipexts = ['.gif', '.exe', '.pyc', '.o', '.a'] # skip binary files def _ _init_ _(self, key, listonly=False): FileVisitor._ _init_ _(self, key, listonly) self.scount = 0 def visitfile(self, fname): # test for a match FileVisitor.visitfile(self, fname) if not self.listonly: if os.path.splitext(fname)[1] in self.skipexts: print 'Skipping', fname else: text = open(fname).read( ) if text.find(self.context) != -1: self.visitmatch(fname, text) self.scount += 1 def visitmatch(self, fname, text): # process a match raw_input('%s has %s' % (fname, self.context)) # override me lower # self-test logic dolist = 1 dosearch = 2 # 3=do list and search donext = 4 # when next test added def selftest(testmask): if testmask & dolist: visitor = FileVisitor( ) visitor.run('.') print 'Visited %d files and %d dirs' % (visitor.fcount, visitor.dcount) if testmask & dosearch: visitor = SearchVisitor(sys.argv[2], listonly) visitor.run('.') print 'Found in %d files, visited %d' % (visitor.scount, visitor.fcount) if _ _name_ _ == '_ _main_ _': selftest(int(sys.argv[1])) # e.g., 5 = dolist | dorename
This module primarily serves to export classes for external use,
but it does something useful when run standalone too. If you invoke it
as a script with a single argument, 1
, it makes and runs a FileVisitor
object and prints an exhaustive
listing of every file and directory at and below the place you are at
when the script is invoked (i.e., “.”, the current working
directory):
C: emp>python %X%PyToolsvisitor.py 1
. ...
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
4 => .Launcher.py
5 => .Launcher.pyc
6 => .Launch_PyGadgets.py
7 => .Launch_PyDemos.pyw
...more deleted...
479 => .GuiClockplotterGui.py
480 => .GuiClockplotterText.py
481 => .GuiClockplotterText1.py
482 => .GuiClock\_ _init_ _.py
.Guigifs ...
483 => .Guigifsfrank.gif
484 => .Guigifsfrank.note
485 => .Guigifsgilligan.gif
486 => .Guigifsgilligan.note
...more deleted...
1352 => .PyToolsvisitor_fixnames.py
1353 => .PyToolsvisitor_find_quiet2.py
1354 => .PyToolsvisitor_find.pyc
1355 => .PyToolsvisitor_find_quiet1.py
1356 => .PyToolsfixeoln_one.doc.txt
Visited 1356 files and 119 dirs
If you instead invoke this script with a 2
as its first argument, it makes and runs a
SearchVisitor
object using the
second argument as the search key. This form is equivalent to running
the search_all.py script we met earlier; it
pauses for an Enter key press after each matching file is reported
(lines in bold font here):
C: empexamples>python %X%PyToolsvisitor.py 2 Part3
. ... 1 => .autoexec.bat 2 => .cleanall.csh.cleanall.csh has Part3
3 => .echoEnvironment.pyw 4 => .Launcher.py.Launcher.py has Part3
5 => .Launcher.pyc Skipping .Launcher.pyc 6 => .Launch_PyGadgets.py 7 => .Launch_PyDemos.pyw 8 => .LaunchBrowser.out.txt 9 => .LaunchBrowser.py 10 => .Launch_PyGadgets_bar.pyw 11 => .makeall.csh.makeall.csh has Part3
... ...more deleted ... 1353 => .PyToolsvisitor_find_quiet2.py 1354 => .PyToolsvisitor_find.pyc Skipping .PyToolsvisitor_find.pyc 1355 => .PyToolsvisitor_find_quiet1.py 1356 => .PyToolsfixeoln_one.doc.txt Found in 49 files, visited 1356
Technically, passing this script a first argument of 3
runs both a FileVisitor
and a SearchVisitor
(two separate traversals are
performed). The first argument is really used as a bit mask to select
one or more supported self-tests; if a test’s bit is on in the binary
value of the argument, the test will be run. Because 3 is 011 in
binary, it selects both a search (010) and a listing (001). In a more
user-friendly system, we might want to be more symbolic about that
(e.g., check for -search
and
-list
arguments), but bit masks
work just as well for this script’s scope.
Now, after genericizing tree traversals and searches,
it’s an easy step to add automatic file editing in a brand-new, separate component. Example 7-12 defines a new
EditVisitor
class that simply customizes the visitmatch
method of the SearchVisitor
class to open a text editor
on the matched file. Yes, this is the complete program. It needs to
do something special only when visiting matched files, and so it
needs provide only that behavior; the rest of the traversal and
search logic is unchanged and inherited.
Example 7-12. PP3EPyToolsvisitor_edit.py
############################################################### # Use: "python PyToolsvisitor_edit.py string". # add auto-editor startup to SearchVisitor in an external # component (subclass), not in-place changes; this version # automatically pops up an editor on each file containing the # string as it traverses; you can also use editor='edit' or # 'notepad' on Windows; 'vi' and 'edit' run in console window; # editor=r'python GuiTextEditor extEditor.py' may work too; # caveat: we might be able to make this smarter by sending # a search command to go to the first match in some editors; ############################################################### import os, sys from visitor import SearchVisitor listonly = False class EditVisitor(SearchVisitor): """ edit files at and below startDir having string """ editor = 'vi' # ymmv def visitmatch(self, fname, text): os.system('%s %s' % (self.editor, fname)) if _ _name_ _ == '_ _main_ _': visitor = EditVisitor(sys.argv[1], listonly) visitor.run('.') print 'Edited %d files, visited %d' % (visitor.scount, visitor.fcount)
When we make and run an EditVisitor
, a text editor is started with
the os.system
command-line spawn
call, which usually blocks its caller until the spawned program
finishes. On my machines, each time this script finds a matched file
during the traversal, it starts up the vi text editor within the
console window where the script was started; exiting the editor
resumes the tree walk.
Let’s find and edit some files. When run as a script, we pass
this program the search string as a command argument (here, the
string -exec
is the search key,
not an option flag). The root directory is always passed to the
run
method as “.”, the current
run directory. Traversal status messages show up in the console as
before, but each matched file now automatically pops up in a text
editor along the way. Here, the editor is started eight
times:
C:...PP3E>python PyToolsvisitor_edit.py -exec
1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw 4 => .Launcher.py 5 => .Launcher.pyc Skipping .Launcher.pyc...more deleted...
1340 => .old_Part2Basicsunpack2.py 1341 => .old_Part2Basicsunpack2b.py 1342 => .old_Part2Basicsunpack3.py 1343 => .old_Part2Basics\_ _init_ _.py Edited 8 files, visited 1343
This, finally, is the exact tool I was looking for to simplify global book examples tree maintenance. After major changes to things such as shared modules and file and directory names, I run this script on the examples root directory with an appropriate search string and edit any files it pops up as needed. I still need to change files by hand in the editor, but that’s often safer than blind global replacements.
But since I brought it up, given a general tree
traversal class, it’s easy to code a global search-and-replace
subclass too. The FileVisitor
subclass in Example 7-13,
ReplaceVisitor
, customizes the visitfile
method to globally replace any
appearances of one string with another, in all text files at and
below a root directory. It also collects the names of all files that
were changed in a list just in case you wish to go through and
verify the automatic edits applied (a text editor could be
automatically popped up on each changed file, for instance).
Example 7-13. PP3EPyToolsvisitor_replace.py
################################################################ # Use: "python PyToolsvisitor_replace.py fromStr toStr". # does global search-and-replace in all files in a directory # tree--replaces fromStr with toStr in all text files; this # is powerful but dangerous!! visitor_edit.py runs an editor # for you to verify and make changes, and so is much safer; # use CollectVisitor to simply collect a list of matched files; ################################################################ import sys from visitor import SearchVisitor listonly = False class ReplaceVisitor(SearchVisitor): """ change fromStr to toStr in files at and below startDir; files changed available in obj.changed list after a run """ def _ _init_ _(self, fromStr, toStr, listonly=False): self.changed = [] self.toStr = toStr SearchVisitor._ _init_ _(self, fromStr, listonly) def visitmatch(self, fname, text): fromStr, toStr = self.context, self.toStr text = text.replace(fromStr, toStr) open(fname, 'w').write(text) self.changed.append(fname) if _ _name_ _ == '_ _main_ _': if raw_input('Are you sure?') == 'y': visitor = ReplaceVisitor(sys.argv[1], sys.argv[2], listonly) visitor.run(startDir='.') print 'Visited %d files' % visitor.fcount print 'Changed %d files:' % len(visitor.changed) for fname in visitor.changed: print fname
To run this script over a directory tree, go to the directory to be changed and run the following sort of command line with “from” and “to” strings. On my current machine, doing this on a 1,354-file tree and changing 75 files along the way takes roughly six seconds of real clock time when the system isn’t particularly busy.
C: empexamples>python %X%/PyTools/visitor_replace.py Part2 SPAM2
Are you sure?y
. ... 1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw 4 => .Launcher.py 5 => .Launcher.pyc Skipping .Launcher.pyc 6 => .Launch_PyGadgets.py ...more deleted... 1351 => .PyToolsvisitor_find_quiet2.py 1352 => .PyToolsvisitor_find.pyc Skipping .PyToolsvisitor_find.pyc 1353 => .PyToolsvisitor_find_quiet1.py 1354 => .PyToolsfixeoln_one.doc.txt Visited 1354 files Changed 75 files: .Launcher.py .LaunchBrowser.out.txt .LaunchBrowser.py .PyDemos.pyw .PyGadgets.py .README-PP3E.txt ...more deleted... .PyToolssearch_all.out.txt .PyToolsvisitor.out.txt .PyToolsvisitor_edit.py [to delete, use an empty toStr] C: empexamples>python %X%/PyTools/visitor_replace.py SPAM ""
This is both wildly powerful and dangerous. If the string to
be replaced can show up in places you didn’t anticipate, you might
just ruin an entire tree of files by running the ReplaceVisitor
object defined here. On the
other hand, if the string is something very specific, this object
can obviate the need to automatically edit suspicious files. For
instance, we will use this approach to automatically change web site
addresses in HTML files in Chapter
16; the addresses are likely too specific to show up in other
places by chance.
The scripts so far search and replace in directory
trees, using the same traversal code base (the visitor
module). Suppose, though, that you
just want to get a Python list of files in a
directory containing a string. You could run a search and parse the
output messages for “found” messages. Much simpler, simply knock off
another SearchVisitor
subclass to collect the list along the way, as in
Example 7-14.
Example 7-14. PP3EPyToolsvisitor_collect.py
################################################################# # Use: "python PyToolsvisitor_collect.py searchstring". # CollectVisitor simply collects a list of matched files, for # display or later processing (e.g., replacement, auto-editing); ################################################################# import sys from visitor import SearchVisitor class CollectVisitor(SearchVisitor): """ collect names of files containing a string; run this and then fetch its obj.matches list """ def _ _init_ _(self, searchstr, listonly=False): self.matches = [] SearchVisitor._ _init_ _(self, searchstr, listonly) def visitmatch(self, fname, text): self.matches.append(fname) if _ _name_ _ == '_ _main_ _': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') print 'Found these files:' for fname in visitor.matches: print fname
CollectVisitor
is just a
tree search again, with a new kind of specialization—collecting
files instead of printing messages. This class is useful from other
scripts that mean to collect a matched files list for later
processing; it can be run by itself as a script too:
C:...PP3E>python PyToolsvisitor_collect.py -exec
...
...more deleted...
...
1342 => .old_Part2Basicsunpack2b.py
1343 => .old_Part2Basicsunpack3.py
1344 => .old_Part2Basics\_ _init_ _.py
Found these files:
.package.csh
.README-PP3E.txt
.
eadme-old-pp1E.txt
.PyToolscleanpyc.py
.PyToolsfixeoln_all.py
.SystemProcessesoutput.txt
.InternetCgi-Webfixcgi.py
Here, the items in the collected list are displayed
at the end—all the files containing the string -exec
. Notice, though, that traversal
status messages are still printed along the way (in fact, I
deleted about 1,600 lines of such messages here!). In a tool meant
to be called from another script, that may be an undesirable side
effect; the calling script’s output may be more important than the
traversal’s.
We could add mode flags to SearchVisitor
to turn off status
messages, but that makes it more complex. Instead, the following
two files show how we might go about collecting matched filenames
without letting any traversal messages show up in the console, all
without changing the original code base. The first, shown in Example 7-15, simply takes
over and copies the search logic, without print statements. It’s a
bit redundant with SearchVisitor
, but only in a few lines
of mimicked code.
Example 7-15. PP3EPyToolsvisitor_collect_quiet1.py
############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import os, sys from visitor import FileVisitor, SearchVisitor class CollectVisitor(FileVisitor): """ collect names of files containing a string, silently; """ skipexts = SearchVisitor.skipexts def _ _init_ _(self, searchStr): self.matches = [] self.context = searchStr def visitdir(self, dname): pass def visitfile(self, fname): if (os.path.splitext(fname)[1] not in self.skipexts and open(fname).read( ).find(self.context) != -1): self.matches.append(fname) if _ _name_ _ == '_ _main_ _': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') print 'Found these files:' for fname in visitor.matches: print fname
When this class is run, only the contents of the matched filenames list show up at the end; no status messages appear during the traversal. Because of that, this form may be more useful as a general-purpose tool used by other scripts:
C:...PP3E>python PyToolsvisitor_collect_quiet1.py -exec
Found these files:
.package.csh
.README-PP3E.txt
.
eadme-old-pp1E.txt
.PyToolscleanpyc.py
.PyToolsfixeoln_all.py
.SystemProcessesoutput.txt
.InternetCgi-Webfixcgi.py
A more interesting and less redundant way to suppress
printed text during a traversal is to apply the stream redirection
tricks we met in Chapter 3.
Example 7-16 sets
sys.stdin
to a NullOut
object that throws away all
printed text for the duration of the traversal (its write
method does nothing). We could
also use the StringIO
module we
met in Chapter 3 for this
purpose, but it’s overkill here; we don’t need to retain printed
text.
The only real complication with this scheme is that there is
no good place to insert a restoration of sys.stdout
at the end of the traversal;
instead, we code the restore in the _
_del_ _
destructor method and require clients to delete
the visitor to resume printing as usual. An explicitly called
method would work just as well, if you prefer less magical
interfaces.
Example 7-16. PP3EPyToolsvisitor_collect_quiet2.py
############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import sys from visitor import SearchVisitor class NullOut: def write(self, line): pass class CollectVisitor(SearchVisitor): """ collect names of files containing a string, silently """ def _ _init_ _(self, searchstr, listonly=False): self.matches = [] self.saveout, sys.stdout = sys.stdout, NullOut( ) SearchVisitor._ _init_ _(self, searchstr, listonly) def _ _del_ _(self): sys.stdout = self.saveout def visitmatch(self, fname, text): self.matches.append(fname) if _ _name_ _ == '_ _main_ _': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') matches = visitor.matches del visitor print 'Found these files:' for fname in matches: print fname
When this script is run, output is identical to the prior
run—just the matched filenames at the end. Perhaps better still,
why not code and debug just one verbose CollectVisitor
utility class, and
require clients to wrap calls to its run
method in the redirect.redirect
function we wrote in
Example 3-10?
>>>from PP3E.PyTools.visitor_collect import CollectVisitor
>>>from PP3E.System.Streams.redirect import redirect
>>>walker = CollectVisitor('-exec')
# object to find '-exec' >>>output = redirect(walker.run, ('.',), '')
# function, args, input >>>for line in walker.matches: print line
# print items in list ... .package.csh .README-PP3E.txt . eadme-old-pp1E.txt .PyToolscleanpyc.py .PyToolsfixeoln_all.py .SystemProcessesoutput.txt .InternetCgi-Webfixcgi.py
The redirect
call
employed here resets standard input and output streams to
file-like objects for the duration of any
function call; because of that, it’s a more general way to
suppress output than recoding every outputter. Here, it has the
effect of intercepting (and hence suppressing) printed messages
during a walker.run('.')
traversal. They really are printed, but show
up in the string result of the redirect
call, not on the screen:
>>>output[:60]
'. ... 1 => .\autoexec.bat 2 => .\cleanall.csh 3 => .\echoEnv' >>>len(output), len(output.split(' '))
# bytes, lines (67609, 1592) >>>walker.matches
['.\package.csh', '.\README-PP3E.txt', '.\readme-old-pp1E.txt', '.\PyTools\cleanpyc.py', '.\PyTools\fixeoln_all.py', '.\System\Processes\output.txt', '.\Internet\Cgi-Web\fixcgi.py']
Because redirect
saves
printed text in a string, it may be less appropriate than the two
quiet CollectVisitor
variants
for functions that generate much output. Here, for example, 67,609
bytes of output were queued up in an in-memory string (see the
len
call results); such a
buffer may or may not be significant in most applications.
In more general terms, redirecting sys.stdout
to dummy objects as done here
is a simple way to turn off outputs (and is the equivalent to the
Unix notion of redirecting output to the file
/dev/null—a file that discards everything
sent to it). For instance, we’ll pull this trick out of the bag
again in the context of server-side Internet scripting, to prevent
utility status messages from showing up in generated web page
output streams.[*]
Be warned: once you’ve written and debugged a class
that knows how to do something useful like walking directory
trees, it’s easy for it to spread throughout your system
utility libraries. Of course, that’s the whole point of code reuse.
For instance, very soon after writing the visitor classes presented
in the prior sections, I recoded both the
fixnames_all.py and the
fixeoln_all.py directory walker scripts listed
earlier in Examples 7-6
and 7-4, respectively, to
use visitor rather than proprietary tree-walk logic (they both
originally used find.find
). Example 7-17 combines the
original convertLines
function
(to fix end-of-lines in a single file) with visitor’s tree walker
class, to yield an alternative implementation of the line-end
converter for directory trees.
Example 7-17. PP3EPyToolsvisitor_fixeoln.py
############################################################## # Use: "python visitor_fixeoln.py todos|tounix". # recode fixeoln_all.py as a visitor subclass: this version # uses os.path.walk, not find.find to collect all names first; # limited but fast: if os.path.splitext(fname)[1] in patts: ############################################################## import visitor, sys, fnmatch, os from fixeoln_dir import patts from fixeoln_one import convertEndlines class EolnFixer(visitor.FileVisitor): def visitfile(self, fullname): # match on basename basename = os.path.basename(fullname) # to make result same for patt in patts: # else visits fewer if fnmatch.fnmatch(basename, patt): convertEndlines(self.context, fullname) self.fcount += 1 # could break here # but results differ if _ _name_ _ == '_ _main_ _': walker = EolnFixer(sys.argv[1]) walker.run( ) print 'Files matched (converted or not):', walker.fcount
As we saw in Chapter 4,
the built-in fnmatch
module
performs Unix shell-like filename matching; this script uses it to
match names to the previous version’s filename patterns (simply
looking for filename extensions after a “.” is simpler, but not as
general):
C: empexamples>python %X%/PyTools/visitor_fixeoln.py tounix
. ... Changing .echoEnvironment.pyw Changing .Launcher.py Changing .Launch_PyGadgets.py Changing .Launch_PyDemos.pyw ...more deleted... Changing .PyToolsvisitor_find.py Changing .PyToolsvisitor_fixnames.py Changing .PyToolsvisitor_find_quiet2.py Changing .PyToolsvisitor_find_quiet1.py Changing .PyToolsfixeoln_one.doc.txt Files matched (converted or not): 1065 C: empexamples>python %X%/PyTools/visitor_fixeoln.py tounix
...more deleted... .ExtendSwigShadow ... . ... .EmbExtExports ... .EmbExtExportsClassAndMod ... .EmbExtRegist ... .PyTools ... Files matched (converted or not): 1065
If you run this script and the original
fixeoln_all.py on the book examples tree,
you’ll notice that this version visits two fewer matched files. This
simply reflects the fact that fixeoln_all
also collects and skips over
two directory names for its patterns in the find.find
result (both called “Output”).
In all other ways, this version works the same way even when it
could do better; adding a break
statement after the convertEndlines
call here avoids visiting
files that appear redundantly in the original’s find results
lists.
The second command here takes roughly two-thirds as long as
the first to finish on my computer (there are no files to be
converted). That’s roughly 33 percent faster than the original
find.find
-based version of this
script, but they differ in the amount of output, and benchmarks are
usually much subtler than you imagine. Most of the real clock time
is likely spent scrolling text in the console, not doing any real
directory processing. Since both are plenty fast for their intended
purposes, finer-grained performance figures are left as
exercises.
The script in Example
7-18 combines the original convertOne
function (to rename a single
file or directory) with the visitor’s tree walker class, to create a
directory tree-wide fix for uppercase filenames. Notice that we
redefine both file and directory visitation methods here, as we need
to rename both.
Example 7-18. PP3EPyToolsvisitor_fixnames.py
############################################################### # recode fixnames_all.py name case fixer with the Visitor class # note: "from fixnames_all import convertOne" doesn't help at # top level of the fixnames class, since it is assumed to be a # method and called with extra self argument (an exception); ############################################################### from visitor import FileVisitor class FixnamesVisitor(FileVisitor): """ check filenames at and below startDir for uppercase """ import fixnames_all def _ _init_ _(self, listonly=False): FileVisitor._ _init_ _(self, listonly=listonly) self.ccount = 0 def rename(self, pathname): if not self.listonly: convertflag = self.fixnames_all.convertOne(pathname) self.ccount += convertflag def visitdir(self, dirname): FileVisitor.visitdir(self, dirname) self.rename(dirname) def visitfile(self, filename): FileVisitor.visitfile(self, filename) self.rename(filename) if _ _name_ _ == '_ _main_ _': walker = FixnamesVisitor( ) walker.run( ) allnames = walker.fcount + walker.dcount print 'Converted %d files, visited %d' % (walker.ccount, allnames)
This version is run like the original find.find
-based version, fixnames_all
, but visits one more name
(the top-level root directory), and there is no initial delay while
filenames are collected on a list—we’re using os.path.walk
again, not find.find
. It’s also close to the original
os.path.walk
version of this
script but is based on a class hierarchy, not direct function
callbacks:
C: empexamples>python %X%/PyTools/visitor_fixnames.py
...more deleted... 303 => .\_ _init_ _.py 304 => .\_ _init_ _.pyc 305 => .AiExpertSystemholmes.tar 306 => .AiExpertSystemTODOConvert dir=.AiExpertSystem file=TODO? (y|Y)
307 => .AiExpertSystem\_ _init_ _.py 308 => .AiExpertSystemholmescnv 309 => .AiExpertSystemholmesREADME.1STConvert dir=.AiExpertSystemholmes file=README.1ST? (y|Y)
...more deleted... 1353 => .PyToolsvisitor_find.pyc 1354 => .PyToolsvisitor_find_quiet1.py 1355 => .PyToolsfixeoln_one.doc.txt Converted 1 files, visited 1474
Both of these fixer scripts work roughly the same way as the originals, but because the directory-walking logic lives in just one file (visitor.py), it needs to be debugged only once. Moreover, improvements in that file will automatically be inherited by every directory-processing tool derived from its classes. Even when coding system-level scripts, reuse and reduced redundancy pay off in the end.
Just in case the preceding visitor-client sections weren’t quite enough to convince you of the power of code reuse, another piece of evidence surfaced very late in this book project. It turns out that copying files off a CD using Windows drag-and-drop sometimes makes them read only in the copy. That’s less than ideal for the book examples distribution if it is obtained on CD; you must copy the directory tree onto your hard drive to be able to experiment with program changes (naturally, files on CD can’t be changed in place). But if you copy with drag-and-drop, you may wind up with a tree of more than 1,000 read-only files.
The book CD use cases described for this and some other examples in this chapter are something of historic artifacts today. As mentioned in the Preface, as of this third edition, the book’s examples are made available on the Web instead of on an enclosed CD.
The Web is more pervasive today and allows for much more dynamic updates. However, even though the book CD is a vestige of the past, the examples which were originally coded to manage it still apply to other types of CDs and so are generally useful tools.
Since drag-and-drop is perhaps the most common way to copy off a CD on Windows, I needed a portable and easy-to-use way to undo the read-only setting. Asking readers to make all of these writable by hand would be impolite, to say the least. Writing a full-blown install system seemed like overkill. Providing different fixes for different platforms doubles or triples the complexity of the task.
Much better, the Python script in Example 7-19 can be run in the
root of the copied examples directory to repair the damage of a
read-only drag-and-drop operation. It specializes the traversal
implemented by the FileVisitor
class again, this time to run an os.chmod
call on every file and directory
visited along the way.
Example 7-19. PP3EPyToolsfixreadonly-all.py
#!/usr/bin/env python ########################################################################### # Use: python PyToolsfixreadonly-all.py # run this script in the top-level examples directory after copying all # examples off the book's CD-ROM, to make all files writable again--by # default, copying files off the CD with Windows drag-and-drop (at least) # may create them as read-only on your hard drive; this script traverses # entire directory tree at and below the dir it is run in (all subdirs); ########################################################################### import os from PP3E.PyTools.visitor import FileVisitor # os.path.walk wrapper listonly = False class FixReadOnly(FileVisitor): def _ _init_ _(self, listonly=0): FileVisitor._ _init_ _(self, listonly=listonly) def visitDir(self, dname): FileVisitor.visitfile(self, fname) if self.listonly: return os.chmod(dname, 0777) def visitfile(self, fname): FileVisitor.visitfile(self, fname) if self.listonly: return os.chmod(fname, 0777) if _ _name_ _ == '_ _main_ _': # don't run auto if clicked go = raw_input('This script makes all files writeable; continue?') if go != 'y': raw_input('Canceled - hit enter key') else: walker = FixReadOnly(listonly) walker.run( ) print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount)
As we saw in Chapter 3,
the built-in os.chmod
call
changes the permission settings on an external file (here, to
0777—global read, write, and execute permissions). Because os.chmod
and the FileVisitor
’s operations are portable,
this same script will work to set permissions in an entire tree on
both Windows and Unix-like platforms. Notice that it asks whether
you really want to proceed when it first starts up, just in case
someone accidentally clicks the file’s name in an explorer GUI. Also
note that Python must be installed before this script can be run in
order to make files writable; that seems a fair assumption to make
about users who are about to change Python scripts.
C: empexamples>python PyToolsfixreadonly-all.py
This script makes all files writeable; continue?y
. ... 1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw ...more deleted... 1352 => .PyToolsvisitor_find.pyc 1353 => .PyToolsvisitor_find_quiet1.py 1354 => .PyToolsfixeoln_one.doc.txt Visited 1354 files and 119 dirs
Finally, the following script does something more
unique: it uses the visitor classes to replace the “#!” lines at the
top of all scripts in a directory tree (this line gives the path to
the Python interpreter on Unix-like machines). It’s easy to do this
with the visitor_replace
script
of Example 7-13 that we
coded earlier. For example, say something like this to replace all
#!/usr/bin/python
lines with
#!Python24python
:
C:...PP3E>python PyToolsvisitor_replace.py #!/usr/bin/python #!Python24python
Lots of status messages scroll by unless redirected to a file.
visitor_replace
does a simple
global search-and-replace operation on all nonbinary files in an
entire directory tree. It’s also a bit naïve: it won’t change other
“#!” line patterns that mention python
(e.g., you’ll have to run it again
to change #!/usr/local/bin/python
), and it might
change occurrences besides those on a first line. That probably
won’t matter, but if it does, it’s easy to write your own visitor
subclass to be more accurate.
When run, the script in Example 7-20 converts all “#!” lines in all script files in an entire tree. It changes every first line that starts with “#!” and names “python” to a line you pass in on the command line or assign in the script, like this:
C:...PP3E>python PyToolsvisitor_poundbang.py #!MyPython24python
Are you sure?y
. ... 1 => .\_ _init_ _.py 2 => .PyDemos2.pyw 3 => . owriteable.py ... 1474 => .IntegrateMixedExportsClassAndModoutput.prog1 1475 => .IntegrateMixedExportsClassAndModsetup-class.csh Visited 1475 files and 133 dirs, changed 190 files . owriteable.py .Launch_PyGadgets.py .Launch_PyDemos.pyw ... C:...PP3E>type .Launch_PyGadgets.py
#!MyPython24python ############################################### # PyGadgets + environment search/config first ...
This script caught and changed 190 files (more than visitor_replace
), so there must be other
“#!” line patterns lurking in the examples tree besides #!/usr/bin/python
.
Example 7-20. PP3EPyToolsvisitor_poundbang.py
########################################################################## # change all "#!...python" source lines at the top of scripts to either # commandline arg or changeToDefault, in all files in all dirs at and # below the dir where run; could skip binary filename extensions too, # but works ok; this version changes all #! first lines that name python, # and so is more accurate than a simple visitor_replace.py run; ########################################################################## """ Run me like this, to convert all scripts in the book examples tree, and redirect/save messages to a file: C:...PP3E>python PyToolsvisitor_poundbang.py #!MyPython24python > out.txt """ import sys from PP3E.PyTools.visitor import FileVisitor # reuse the walker classes changeToDefault = '#!Python24python' # used if no cmdline arg class PoundBangFixer(FileVisitor): def _ _init_ _(self, changeTo=changeToDefault): FileVisitor._ _init_ _(self) self.changeTo = changeTo self.clist = [] def visitfile(self, fullname): FileVisitor.visitfile(self, fullname) try: lines = open(fullname, 'r').readlines( ) if (len(lines) > 0 and lines[0][0:2] == '#!' and # or lines[0].startswith( ) 'python' in lines[0] # or lines[0].find( ) != -1 ): lines[0] = self.changeTo + ' ' open(fullname, 'w').writelines(lines) self.clist.append(fullname) except: print 'Error translating %s -- skipped' % fullname print '...', sys.exc_info( ) if _ _name_ _ == '_ _main_ _': if raw_input('Are you sure?') != 'y': sys.exit( ) if len(sys.argv) == 2: changeToDefault = sys.argv[1] walker = PoundBangFixer(changeToDefault) walker.run( ) print 'Visited %d files and %d dirs,' % (walker.fcount, walker.dcount), print 'changed %d files' % len(walker.clist) for fname in walker.clist: print fname
We’ve seen a few techniques for scanning directory trees in this book so far. To summarize and contrast, this section briefly lists four scripts that count the number of lines in all program source files in an entire tree. Each script uses a different directory traversal scheme, but returns the same result.
I counted 41,938 source lines of code (SLOC) in the book examples distribution with these scripts, as of November 2001 (for the second edition of this book). Study these scripts’ code for more details. They don’t count everything (e.g., they skip makefiles), but are comprehensive enough for ballpark figures. Here’s the output for the visitor class version when run on the root of the book examples tree; the root of the tree to walk is passed in as a command-line argument, and the last output line is a dictionary that keeps counts for the specific file-type extensions in the tree:
C: emp>python wcall_visitor.py %X%...lines deleted...
C:PP2ndEdexamplesPP3EIntegrateMixedExportsClassAndModcinterface.py
C:PP2ndEdexamplesPP3EIntegrateMixedExportsClassAndModmain-table.c
Visited 1478 files and 133 dirs
--------------------------------------------------------------------------------
Files=> 903 Lines=> 41938
{'.c': 46, '.cgi': 24, '.html': 41, '.pyw': 11, '.cxx': 2, '.py': 768,
'.i': 3, '.h': 8}
The first version, listed in Example 7-21, counts lines
using the standard library’s os.path.walk
call, which we met in Chapter 4 (using os.walk
would be similar, but we would
replace the callback function with a for
loop, and subdirectories and files
would be segregated into two lists of names).
Example 7-21. PP3EPyToolswcall.py
################################################################## # count lines in all source files in tree; os.path.walk version ################################################################## import os, sys allLines = allFiles = 0 allExts = ['.py', '.pyw', '.cgi', '.html', '.c', '.cxx', '.h', '.i'] allSums = dict.fromkeys(allExts, 0) def sum(dir, file, ext): global allFiles, allLines print file fname = os.path.join(dir, file) lines = open(fname).readlines( ) allFiles += 1 # or all = all + 1 allLines += len(lines) allSums[ext] += 1 def wc(ignore, dir, fileshere): for file in fileshere: for ext in allExts: if file.endswith(ext): # or f[-len(e):] == e sum(dir, file, ext) break if _ _name_ _ == '_ _main_ _': os.path.walk(sys.argv[1], wc, None) # cmd arg=root dir print '-'*80 print 'Files=>', allFiles, 'Lines=>', allLines print allSums
Counting with the find
module we wrote at the end of Chapter
4 with Example
7-22 is noticeably simpler, though we must wait for the list
of files to be collected.
Example 7-22. PP3EPyToolswcall_find.py
################################################################### # count lines in all source files in tree; find file list version ################################################################### import sys from wcall import allExts from PP3E.PyTools.find import find allLines = allFiles = 0 allSums = dict.fromkeys(allExts, 0) def sum(fname, ext): global allFiles, allLines print fname lines = open(fname).readlines( ) allFiles += 1 allLines += len(lines) allSums[ext] += 1 for file in find('*', sys.argv[1]): for ext in allExts: if file.endswith(ext): sum(file, ext) break print '-'*80 print 'Files=>', allFiles, 'Lines=>', allLines print allSums
The prior script collected all source files in the tree with
find
and manually checked their
extensions; the next script (Example 7-23) uses the
pattern-matching capability in find
to collect only source files in the
result list.
Example 7-23. PP3EPyToolswcall_find_patt.py
################################################################## # count lines in all source files in tree; find patterns version ################################################################## import sys from wcall import allExts from PP3E.PyTools.find import find allLines = allFiles = 0 allSums = dict.fromkeys(allExts, 0) def sum(fname, ext): global allFiles, allLines print fname lines = open(fname).readlines( ) allFiles += 1 allLines += len(lines) allSums[ext] += 1 for ext in allExts: files = find('*' + ext, sys.argv[1]) for file in files: sum(file, ext) print '-'*80 print 'Files=>', allFiles, 'Lines=>', allLines print allSums
And finally, Example
7-24 is the SLOC counting logic refactored to use the
visitor
class framework we wrote
in this chapter; OOP adds a bit more code here, but this version is
more accurate (if a directory name happens to have a source-like
extension, the prior versions will incorrectly tally it). More
importantly, by using OOP:
We get the superclass’s walking logic for free, including a directory counter.
We have a self-contained package of names that supports multiple independent instances and can be used more easily in other contexts.
We can further customize this operation because it is a class.
We will automatically inherit any changes made to visitor
in the future.
Even in the systems tools domains, strategic thinking can pay off eventually.
Example 7-24. PP3EPyToolswcall_visitor.py
################################################################## # count lines in all source files in tree; visitor class version ################################################################## import sys from wcall import allExts from PP3E.PyTools.visitor import FileVisitor class WcAll(FileVisitor): def _ _init_ _(self): FileVisitor._ _init_ _(self) self.allLines = self.allFiles = 0 self.allSums = dict.fromkeys(allExts, 0) def sum(self, fname, ext): print fname lines = open(fname).readlines( ) self.allFiles += 1 self.allLines += len(lines) self.allSums[ext] += 1 def visitfile(self, filepath): self.fcount += 1 for ext in allExts: if filepath.endswith(ext): self.sum(filepath, ext) break if _ _name_ _ == '_ _main_ _': walker = WcAll( ) walker.run(sys.argv[1]) print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount) print '-'*80 print 'Files=>', walker.allFiles, 'Lines=>', walker.allLines print walker.allSums
[*] For the impatient: see commonhtml.runsilent
in the
PyMailCGI system presented in Chapter 17. It’s a variation
on redirect.redirect
that
discards output as it is printed (instead of retaining it in a
string), returns the return value of the function called (not
the output string), and lets exceptions pass via a try
/finally
statement (instead of
catching and reporting them with a try
/except
). It’s still redirection at
work, though.
52.15.214.27