Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Copying Directory Trees

The next three sections conclude this chapter by exploring a handful of additional utilities for processing directories (a.k.a. folders) on your computer with Python. They present directory copy, deletion, and comparison scripts that demonstrate system tools at work. All of these were born of necessity, are generally portable among all Python platforms, and illustrate Python development concepts along the way.

Some of these scripts do something too unique for the visitor module’s classes we’ve been applying in early sections of this chapter, and so require more custom solutions (e.g., we can’t remove directories we intend to walk through). Most have platform-specific equivalents too (e.g., drag-and-drop copies), but the Python utilities shown here are portable, easily customized, callable from other scripts, and surprisingly fast.

A Python Tree Copy Script

My CD writer sometimes does weird things. In fact, copies of files with odd names can be totally botched on the CD, even though other files show up in one piece. That’s not necessarily a showstopper; if just a few files are trashed in a big CD backup copy, I can always copy the offending files to floppies one at a time. Unfortunately, Windows drag-and-drop copies don’t play nicely with such a CD: the copy operation stops and exits the moment the first bad file is encountered. You get only as many files as were copied up to the error, but no more.

In fact, this is not limited to CD copies. I’ve run into similar problems when trying to back up my laptop’s hard drive to another drive—the drag-and-drop copy stops with an error as soon as it reaches a file with a name that is too long to copy (common in saved web pages). The last 45 minutes spent copying is wasted time; frustrating, to say the least!

There may be some magical Windows setting to work around this feature, but I gave up hunting for one as soon as I realized that it would be easier to code a copier in Python. The cpall.py script in Example 7-25 is one way to do it. With this script, I control what happens when bad files are found—I can skip over them with Python exception handlers, for instance. Moreover, this tool works with the same interface and effect on other platforms. It seems to me, at least, that a few minutes spent writing a portable and reusable Python script to meet a need is a better investment than looking for solutions that work on only one platform (if at all).

Example 7-25. PP3ESystemFiletoolscpall.py

############################################################################
# Usage: "python cpall.py dirFrom dirTo".
# Recursive copy of a directory tree.  Works like a "cp -r dirFrom/* dirTo"
# Unix command, and assumes that dirFrom and dirTo are both directories.
# Was written to get around fatal error messages under Windows drag-and-drop
# copies (the first bad file ends the entire copy operation immediately),
# but also allows for coding customized copy operations.  May need to
# do more file type checking on Unix: skip links, fifos, etc.
############################################################################

import os, sys
verbose = 0
dcount = fcount = 0
maxfileload = 500000
blksize = 1024 * 100

def cpfile(pathFrom, pathTo, maxfileload=maxfileload):
    """
    copy file pathFrom to pathTo, byte for byte
    """
    if os.path.getsize(pathFrom) <= maxfileload:
        bytesFrom = open(pathFrom, 'rb').read( )      # read small file all at once
        open(pathTo, 'wb').write(bytesFrom)       # need b mode on Windows
    else:
        fileFrom = open(pathFrom, 'rb')           # read big files in chunks
        fileTo   = open(pathTo,   'wb')           # need b mode here too
        while 1:
            bytesFrom = fileFrom.read(blksize)    # get one block, less at end
            if not bytesFrom: break               # empty after last chunk
            fileTo.write(bytesFrom)

def cpall(dirFrom, dirTo):
    """
    copy contents of dirFrom and below to dirTo
    """
    global dcount, fcount
    for file in os.listdir(dirFrom):                      # for files/dirs here
        pathFrom = os.path.join(dirFrom, file)
        pathTo   = os.path.join(dirTo,   file)            # extend both paths
        if not os.path.isdir(pathFrom):                   # copy simple files
            try:
                if verbose > 1: print 'copying', pathFrom, 'to', pathTo
                cpfile(pathFrom, pathTo)
                fcount = fcount+1
            except:
                print 'Error copying', pathFrom, 'to', pathTo, '--skipped'
                print sys.exc_info()[0], sys.exc_info( )[1]
        else:
            if verbose: print 'copying dir', pathFrom, 'to', pathTo
            try:
                os.mkdir(pathTo)                          # make new subdir
                cpall(pathFrom, pathTo)                   # recur into subdirs
                dcount = dcount+1
            except:
                print 'Error creating', pathTo, '--skipped'
                print sys.exc_info()[0], sys.exc_info( )[1]

def getargs( ):
    try:
        dirFrom, dirTo = sys.argv[1:]
    except:
        print 'Use: cpall.py dirFrom dirTo'
    else:
        if not os.path.isdir(dirFrom):
            print 'Error: dirFrom is not a directory'
        elif not os.path.exists(dirTo):
            os.mkdir(dirTo)
            print 'Note: dirTo was created'
            return (dirFrom, dirTo)
        else:
            print 'Warning: dirTo already exists'
            if dirFrom == dirTo or (hasattr(os.path, 'samefile') and
                                    os.path.samefile(dirFrom, dirTo)):
                print 'Error: dirFrom same as dirTo'
            else:
                return (dirFrom, dirTo)

if _ _name_ _ == '_ _main_ _':
    import time
    dirstuple = getargs( )
    if dirstuple:
        print 'Copying...'
        start = time.time( )
        cpall(*dirstuple)
        print 'Copied', fcount, 'files,', dcount, 'directories',
        print 'in', time.time( ) - start, 'seconds'

This script implements its own recursive tree traversal logic and keeps track of both the “from” and “to” directory paths as it goes. At every level, it copies over simple files, creates directories in the “to” path, and recurs into subdirectories with “from” and “to” paths extended by one level. There are other ways to code this task (e.g., other cpall variants in the book’s examples distribution change the working directory along the way with os.chdir calls), but extending paths on descent works well in practice.

Notice this script’s reusable cpfile function—just in case there are multigigabyte files in the tree to be copied, it uses a file’s size to decide whether it should be read all at once or in chunks (remember, the file read method without arguments actually loads the entire file into an in-memory string). We choose fairly large file and block sizes, because the more we read at once in Python, the faster our scripts will typically run. This is more efficient than it may sound; strings left behind by prior reads will be garbage collected and reused as we go.

Also note that this script creates the “to” directory if needed, but it assumes that the directory is empty when a copy starts up; be sure to remove the target directory before copying a new tree to its name (more on this in the next section).

Here is a big book examples tree copy in action on Windows; pass in the name of the “from” and “to” directories to kick off the process, redirect the output to a file if there are too many error messages to read all at once (e.g., > output.txt), and run an rm shell command (or similar platform-specific tool) to delete the target directory first if needed:

C:	emp>rm -rf cpexamples

C:	emp>python %X%systemfiletoolscpall.py examples cpexamples
Note: dirTo was created
Copying...
Copied 1356 files, 118 directories in 2.41999995708 seconds

C:	emp>fc /B examplesSystemFiletoolscpall.py
              cpexamplesSystemFiletoolscpall.py
Comparing files examplesSystemFiletoolscpall.py and
cpexamplesSystemFiletoolscpall.py
FC: no differences encountered

At the time I wrote this example in 2000, this test run copied a tree of 1,356 files and 118 directories in 2.4 seconds on my 650 MHz Windows 98 laptop (the built-in time.time call can be used to query the system time in seconds). It runs a bit slower if some other programs are open on the machine, and may run arbitrarily faster or slower for you. Still, this is at least as fast as the best drag-and-drop I’ve timed on Windows.

So how does this script work around bad files on a CD backup? The secret is that it catches and ignores file exceptions, and it keeps walking. To copy all the files that are good on a CD, I simply run a command line such as this one:

C:	emp>python %X%systemfiletoolscpall_visitor.py
                            g:PP3rdEdexamplesPP3E cpexamples

Because the CD is addressed as “G:” on my Windows machine, this is the command-line equivalent of drag-and-drop copying from an item in the CD’s top-level folder, except that the Python script will recover from errors on the CD and get the rest. On copy errors, it prints a message to standard output and continues; for big copies, you’ll probably want to redirect the script’s output to a file for later inspection.

In general, cpall can be passed any absolute directory path on your machine, even those that indicate devices such as CDs. To make this go on Linux, try a root directory such as /dev/cdrom or something similar to address your CD drive.

Recoding Copies with a Visitor-Based Class

When I first wrote the cpall script just discussed, I couldn’t see a way that the visitor class hierarchy we met earlier would help. Two directories needed to be traversed in parallel (the original and the copy), and visitor is based on climbing one tree with os.path.walk. There seemed no easy way to keep track of where the script was in the copy directory.

The trick I eventually stumbled onto is not to keep track at all. Instead, the script in Example 7-26 simply replaces the “from” directory path string with the “to” directory path string, at the front of all directory names and pathnames passed in from os.path.walk. The results of the string replacements are the paths to which the original files and directories are to be copied.

Example 7-26. PP3ESystemFiletoolscpall_visitor.py

###########################################################
# Use: "python cpall_visitor.py fromDir toDir"
# cpall, but with the visitor classes and os.path.walk;
# the trick is to do string replacement of fromDir with
# toDir at the front of all the names walk passes in;
# assumes that the toDir does not exist initially;
###########################################################

import os
from PP3E.PyTools.visitor import FileVisitor
from cpall import cpfile, getargs
verbose = True

class CpallVisitor(FileVisitor):
    def _ _init_ _(self, fromDir, toDir):
        self.fromDirLen = len(fromDir) + 1
        self.toDir      = toDir
        FileVisitor._ _init_ _(self)
    def visitdir(self, dirpath):
        toPath = os.path.join(self.toDir, dirpath[self.fromDirLen:])
        if verbose: print 'd', dirpath, '=>', toPath
        os.mkdir(toPath)
        self.dcount += 1
    def visitfile(self, filepath):
        toPath = os.path.join(self.toDir, filepath[self.fromDirLen:])
        if verbose: print 'f', filepath, '=>', toPath
        cpfile(filepath, toPath)
        self.fcount += 1

if _ _name_ _ == '_ _main_ _':
    import sys, time
    fromDir, toDir = sys.argv[1:3]
    if len(sys.argv) > 3: verbose = 0
    print 'Copying...'
    start = time.time( )
    walker = CpallVisitor(fromDir, toDir)
    walker.run(startDir=fromDir)
    print 'Copied', walker.fcount, 'files,', walker.dcount, 'directories',
    print 'in', time.time( ) - start, 'seconds'

This version accomplishes roughly the same goal as the original, but it has made a few assumptions to keep code simple. The “to” directory is assumed not to exist initially, and exceptions are not ignored along the way. Here it is copying the book examples tree again on Windows:

C:	emp>rm -rf cpexamples

C:	emp>python %X%systemfiletoolscpall_visitor.py
                                           examples cpexamples -quiet
Copying...
Copied 1356 files, 119 directories in 2.09000003338 seconds

C:	emp>fc /B examplesSystemFiletoolscpall.py
              cpexamplesSystemFiletoolscpall.py
Comparing files examplesSystemFiletoolscpall.py and
cpexamplesSystemFiletoolscpall.py
FC: no differences encountered

Despite the extra string slicing going on, this version runs just as fast as the original. For tracing purposes, this version also prints all the “from” and “to” copy paths during the traversal unless you pass in a third argument on the command line or set the script’s verbose variable to False or 0:

C:	emp>python %X%systemfiletoolscpall_visitor.py examples cpexamples
Copying...
d examples => cpexamples
f examplesautoexec.bat => cpexamplesautoexec.bat
f examplescleanall.csh => cpexamplescleanall.csh
...more deleted...
d examplesSystem => cpexamplesSystem
f examplesSystemSystem.txt => cpexamplesSystemSystem.txt
f examplesSystemmore.py => cpexamplesSystemmore.py
f examplesSystem
eader.py => cpexamplesSystem
eader.py
...more deleted...
Copied 1356 files, 119 directories in 2.31000006199 seconds

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7.6. Copying Directory Trees

Create new playlist

Sign In

Sign Up

Copying Directory Trees

A Python Tree Copy Script

Recoding Copies with a Visitor-Based Class

Table of Contents for
7.6. Copying Directory Trees