As we saw in the
previous chapter, sockets see plenty of action on the Net. For
instance, the getfile
example at the end of that
chapter allowed us to transfer entire files between machines. In
practice, though, higher-level protocols are behind much of what
happens on the Net. Protocols run on top of sockets, but hide much of
the complexity of the network scripting examples we’ve just
seen.
FTP -- the File Transfer
Protocol -- is one of the more commonly used Internet protocols.
It defines a higher-level conversation model that is based on
exchanging command strings and file contents over sockets. By using
FTP, we can accomplish the same task as the prior chapter’s
getfile
script, but the interface is simpler, and
standard -- FTP lets us ask for files from any server machine that
supports FTP, without requiring that it run our custom
getfile
script. FTP also supports more advanced
operations such as uploading files to the server, getting remote
directory listings, and more.
Really, FTP runs on top of two sockets: one for passing control
commands between client and server (port 21), and another for
transferring bytes. By using a two-socket model, FTP avoids the
possibility of deadlocks (i.e., transfers on the data socket do not
block dialogs on the control socket). Ultimately, though,
Python’s ftplib
support module allows us to
upload and download files at a remote server machine by FTP, without
dealing in raw socket calls or FTP protocol details.
Because the Python FTP interface is so easy to use, let’s jump right into a realistic example. The script in Example 11-1 automatically fetches and builds Python with Python. No, this isn’t a recursive chicken-and-egg thought exercise -- you must already have installed Python to run this program. More specifically, this Python script does the following:
Downloads the Python source distribution by FTP
Unpacks and compiles the distribution into a Python executable
The download portion will run on any machine with Python and sockets; the unpacking and compiling code assumes a Unix-like build environment as coded here, but could be tweaked to work with other platforms.
Example 11-1. PP2EInternetFtpgetpython.py
#!/usr/local/bin/python ############################################################### # A Python script to download and build Python's source code. # Uses ftplib, the ftp protocol handler which uses sockets. # Ftp runs on 2 sockets (one for data, one for control--on # ports 20 and 21) and imposes message text formats, but the # Python ftplib module hides most of this protocol's details. ############################################################### import os from ftplib import FTP # socket-based ftp tools Version = '1.5' # version to download tarname = 'python%s.tar.gz' % Version # remote/local file name print 'Connecting...' localfile = open(tarname, 'wb') # where to store download connection = FTP('ftp.python.org') # connect to ftp site connection.login() # default is anonymous login connection.cwd('pub/python/src') # xfer 1k at a time to localfile print 'Downloading...' connection.retrbinary('RETR ' + tarname, localfile.write, 1024) connection.quit() localfile.close() print 'Unpacking...' os.system('gzip -d ' + tarname) # decompress os.system('tar -xvf ' + tarname[:-3]) # strip .gz print 'Building...' os.chdir('Python-' + Version) # build Python itself os.system('./configure') # assumes unix-style make os.system('make') os.system('make test') print 'Done: see Python-%s/python.' % Version
Most of the FTP protocol details are encapsulated by the Python
ftplib
module imported here. This script uses some
of the simplest interfaces in ftplib
(we’ll
see others in a moment), but they are representative of the module in
general:
connection = FTP('ftp.python.org') # connect to ftp site
To
open a connection to a remote (or local) FTP server, create an
instance of the ftplib.FTP
object, passing in the
name (domain or IP-style) of the machine you wish to connect to.
Assuming this call doesn’t throw an exception, the resulting
FTP object exports methods that correspond to the usual FTP
operations. In fact, Python scripts act much like typical FTP client
programs -- just replace commands you would normally type or
select with method calls:
connection.login() # default is anonymous login connection.cwd('pub/python/src') # xfer 1k at a time to localfile
Once connected, we log in, and go to the remote directory we want to
fetch a file from. The login
method allows us to
pass in additional optional arguments to specify a username and
password; by default it performs anonymous FTP:
connection.retrbinary('RETR ' + tarname, localfile.write, 1024) connection.quit()
Once we’re in the target directory, we simply call the
retrbinary
method to download the target server
file in binary mode. The retrbinary
call will take
awhile to complete, since it must download a big file. It gets three
arguments:
An FTP command string -- here, a string RETR
filename
, which is the standard format for
FTP retrievals.
A function or method to which Python passes each chunk of the
downloaded file’s bytes -- here, the
write
method of a newly created and opened local
file.
A size for those chunks of bytes -- here, 1024 bytes are downloaded at a time, but the default is reasonable if this argument is omitted.
Because this script creates a local file named
localfile
, of the same name as the remote file
being fetched, and passes its write
method to the
FTP retrieval method, the remote file’s contents will
automatically appear in a local, client-side file after the download
is finished. By the way, notice that this file is opened in
“wb” binary output mode; if this script is run on
Windows, we want to avoid automatically expanding and
bytes into
byte
sequences (that happens automatically on Windows when writing files
opened in “w” text mode).
Finally, we call the FTP quit
method to break the
connection with the server and manually close
the
local file to force it to be complete before it is further processed
by the shell commands spawned by os.system
(it’s not impossible that parts of the file are still held in
buffers before the close
call):
connection.quit() localfile.close()
And that’s all there is to it; all the FTP, socket, and
networking details are hidden behind the ftplib
interface module. Here is this script in action on a Linux machine,
with a couple thousand output lines cut in the interest of brevity:
[lutz@starship test]$python getpython.py
Connecting... Downloading... Unpacking... Python-1.5/ Python-1.5/Doc/ Python-1.5/Doc/ref/ Python-1.5/Doc/ref/.cvsignore Python-1.5/Doc/ref/fixps.py ... ...lots of tar lines deleted... ... Python-1.5/Tools/webchecker/webchecker.py Python-1.5/Tools/webchecker/websucker.py Building... creating cache ./config.cache checking MACHDEP... linux2 checking CCC... checking for --without-gcc... no checking for gcc... gcc ... ...lots of build lines deleted... ... Done: see Python-1.5/python. [lutz@starship test]$cd Python-1.5/
[lutz@starship Python-1.5]$./python
Python 1.5 (#1, Jul 12 2000, 12:35:52) [GCC egcs-2.91.66 19990314/Li on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> print 'The Larch!' The Larch!
Such a script could be automatically executed at regular intervals (e.g., by a Unix cron job) to update a local Python install with a fresh build. But the thing to notice here is that this otherwise typical Python script fetches information from an arbitrarily remote FTP site and machine. Given an Internet link, any information published by an FTP server on the Net can be fetched by and incorporated into Python scripts using interfaces such as these.
In fact, FTP is just one way to transfer
information across the Net, and there are more general tools in the
Python library to accomplish the prior script’s download.
Perhaps the most straightforward is the Python
urllib
module: given an Internet address
string -- a URL, or Universal Resource Locator -- this module
opens a connection to the specified server and returns a file-like
object ready to be read with normal file object method calls (e.g.,
read
, readlines
).
We can use such a higher-level interface to download anything with an
address on the Web -- files published by FTP sites (using URLs
that start with “ftp://”), web pages and outputs of
scripts that live on remote servers (using “http://”
URLs), local files (using “file://” URLs), Gopher server
data, and more. For instance, the script in Example 11-2 does the same as the one in Example 11-1, but it uses the general
urllib
module to fetch the source distribution
file, instead of the protocol-specific ftplib
.
Example 11-2. PP2EInternetFtpgetpython-urllib.py
#!/usr/local/bin/python ################################################################### # A Python script to download and build Python's source code # use higher-level urllib instead of ftplib to fetch file # urllib supports ftp, http, and gopher protocols, and local files # urllib also allows downloads of html pages, images, text, etc.; # see also Python html/xml parsers for web pages fetched by urllib; ################################################################### import os import urllib # socket-based web tools Version = '1.5' # version to download tarname = 'python%s.tar.gz' % Version # remote/local file name remoteaddr = 'ftp://ftp.python.org/pub/python/src/' + tarname print 'Downloading', remoteaddr # this works too: # urllib.urlretrieve(remoteaddr, tarname) remotefile = urllib.urlopen(remoteaddr) # returns input file-like object localfile = open(tarname, 'wb') # where to store data locally localfile.write(remotefile.read()) localfile.close() remotefile.close() # the rest is the same execfile('buildPython.py')
Don’t sweat the details of the URL string used here;
we’ll talk much more about URLs in the next chapter.
We’ll also use urllib
again in this and
later chapters to fetch web pages, format generated URL strings, and
get the output of remote scripts on the Web.[80] Technically
speaking, urllib
supports a variety of Internet
protocols (HTTP, FTP, Gopher, and local files), is only used for
reading remote objects (not writing or uploading them), and
retrievals must generally be run in threads if blocking is a concern.
But the basic interface shown in this script is straightforward. The
call:
remotefile = urllib.urlopen(remoteaddr) # returns input file-like object
contacts the server named in the remoteaddr
URL
string and returns a file-like object connected to its download
stream (an FTP-based socket). Calling this file’s
read
method pulls down the file’s contents,
which are written to a local client-side file. An even simpler
interface:
urllib.urlretrieve(remoteaddr, tarname)
also does the work of opening a local file and writing the downloaded bytes into it -- things we do manually in the script as coded. This comes in handy if we mean to download a file, but is less useful if we want to process its data immediately.
Either way, the end result is the same: the desired server file shows
up on the client machine. The remainder of the script -- unpacking
and building -- is identical to the original version, so
it’s been moved to a reusable Python file run with the
execfile
built-in (recall that
execfile
runs a file as though its code were
pasted into the place where the execfile
appears).
The script is shown in Example 11-3.
Example 11-3. PP2EInternetFtpuildPython.py
#!/usr/local/bin/python ############################################################### # A Python script to build Python from its source code. # Run me in directory where Python source distribution lives. ############################################################### import os Version = '1.5' # version to build tarname = 'python%s.tar.gz' % Version # remote/local file name print 'Unpacking...' os.system('gzip -d ' + tarname) # decompress file os.system('tar -xvf ' + tarname[:-3]) # untar without '.gz' print 'Building...' os.chdir('Python-' + Version) # build Python itself os.system('./configure') # assumes unix-style make os.system('make') os.system('make test') print 'Done: see Python-%s/python.' % Version
The output this time is almost identical to the output of Example 11-1, so I’ll show only a few portions (the
gzip
message appears if you don’t delete a
tar file left by a run in the past):
[lutz@starship test]$python getpython-urllib.py
Downloading ftp://ftp.python.org/pub/python/src/python1.5.tar.gz Unpacking... gzip: python1.5.tar already exists; do you wish to overwrite (y or n)? y ...tar lines... Building... ...build lines... Done: see Python-1.5/python. [lutz@starship test]$python buildPython.py
Unpacking... ...tar and build lines...
In fact, although the original script is all top-level code that runs immediately and accomplishes only one task, there really are two potentially reusable activities within it: fetching a file and building Python from source. By splitting each part off into a module of its own, we can reuse its program logic in other contexts, which naturally leads us to the topic in the next section.
Almost invariably, when I present the ftplib
interfaces in Python classes, students ask why programmers need to
supply the RETR string in the retrieval method. It’s a good
question -- the RETR string is the name of the download command in
the FTP protocol, but ftplib
is supposed to
encapsulate that protocol. As we’ll see in a moment, we have to
supply an arguably odd STOR string for uploads as well. It’s
boilerplate code that you accept on faith once you see it, but that
begs the question. You could always email Guido a proposed
ftplib
patch, but that’s not really a good
answer for beginning Python students.[81]
A better answer is that there is no law against extending the
standard library modules with higher-level interfaces of our
own -- with just a few lines of reusable code, we can make the FTP
interface look any way we want in Python. For instance, we could,
once and for all, write utility modules that wrap the
ftplib
interfaces to hide the RETR string. If we
place these utility modules in a directory on PYTHONPATH, they become
just as accessible as ftplib
itself, automatically
reusable in any Python script we write in the future. Besides
removing the RETR string requirement, a wrapper module could also
make assumptions that simplify FTP operations into single function
calls.
For instance, given a module that encapsulates and simplifies
ftplib
, our Python fetch-and-build script could be
further reduced to the script shown in Example 11-4
-- essentially just a function call and file execution.
Example 11-4. PP2EInternetFtpgetpython-modular.py
#!/usr/local/bin/python ################################################################ # A Python script to download and build Python's source code. # Uses getfile.py, a utility module which encapsulates ftp step. ################################################################ import getfile Version = '1.5' # version to download tarname = 'python%s.tar.gz' % Version # remote/local file name # fetch with utility getfile.getfile(tarname, 'ftp.python.org', 'pub/python/src') # rest is the same execfile('buildPython.py')
Besides having a line count that is much more impressive to
marketeers, the meat of this script has been split off into files for
reuse elsewhere. If you ever need to download a file again, simply
import an existing function rather than copying code with
cut-and-paste editing. Changes in download operations would need to
be made in only one file, not everywhere we’ve copied
boilerplate code; getfile.getfile
could even be
changed to use urllib
instead of
ftplib
without effecting any of its clients.
It’s good engineering.
So just how would we go about writing such an FTP interface wrapper
(he asks, knowingly)? Given the ftplib
library
module, wrapping downloads of a particular file in a particular
directory is straightforward. Connected FTP objects support two
download methods:
The
retrbinary
method downloads the requested file in
binary mode, sending its bytes in chunks to a supplied function,
without line-feed mapping. Typically, the supplied function is a
write method of an open local file object, such that the bytes are
placed in the local file on the client.
The retrlines
method downloads the requested file
in ASCII text mode, sending each line of text to a supplied function
with all end-of-line characters stripped. Typically, the supplied
function adds a
newline (mapped appropriately
for the client machine), and writes the line to a local file.
We will meet the retrlines
method in a later
example; the getfile
utility module in Example 11-5 transfers in binary mode always with
retrbinary
. That is, files are downloaded exactly
as they were on the server, byte for byte, with the server’s
line-feed conventions in text files. You may need to convert
line-feeds after downloads if they look odd in your text
editor -- see the converter tools in Chapter 5,
for pointers.
Example 11-5. PP2EInternetFtpgetfile.py
#!/usr/local/bin/python ################################################# # Fetch an arbitrary file by ftp. Anonymous # ftp unless you pass a user=(name, pswd) tuple. # Gets the Monty Python theme song by default. ################################################# from ftplib import FTP # socket-based ftp tools from os.path import exists # file existence test file = 'sousa.au' # default file coordinates site = 'ftp.python.org' # monty python theme song dir = 'pub/python/misc' def getfile(file=file, site=site, dir=dir, user=(), verbose=1, force=0): """ fetch a file by ftp from a site/directory anonymous or real login, binary transfer """ if exists(file) and not force: if verbose: print file, 'already fetched' else: if verbose: print 'Downloading', file local = open(file, 'wb') # local file of same name try: remote = FTP(site) # connect to ftp site apply(remote.login, user) # anonymous=() or (name, pswd) remote.cwd(dir) remote.retrbinary('RETR ' + file, local.write, 1024) remote.quit() finally: local.close() # close file no matter what if verbose: print 'Download done.' # caller handles exceptions if __name__ == '__main__': getfile() # anonymous python.org login
This module is mostly just a repackaging of the FTP code we used to
fetch the Python source distribution earlier, to make it simpler and
reusable. Because it is a callable function, the exported
getfile.getfile
here tries to be as robust and
generally useful as possible, but even a function this small implies
some design decisions. Here are a few usage notes:
The
getfile
function in this script runs in anonymous
FTP mode by default, but a two-item tuple containing a username and
password string may be passed to the user
argument
to log in to the remote server in non-anonymous mode. To use
anonymous FTP, either don’t pass the user argument or pass it
an empty tuple, ()
. The FTP object
login
method allows two optional arguments to
denote a username and password, and the apply
call
in Example 11-5 sends it whatever argument tuple you
pass to user
.
If passed, the last two arguments (verbose
,
force
) allow us to turn off status messages
printed to the stdout
stream (perhaps undesirable
in a GUI context) and force downloads to happen even if the file
already exists locally (the download overwrites the existing local
file).
The caller is expected to handle exceptions; this function wraps
downloads in a try
/finally
statement to guarantee that the local output file is closed, but lets
exceptions propagate. If used in a GUI or run from a thread, for
instance, exceptions may require special handling unknown in this
file.
If run standalone, this file downloads a sousa.au audio file from http://www.python.org as a self-test, but the function will normally be passed FTP filenames, site names, and directory names as well.
This script is careful to open the local output file in
“wb” binary mode to suppress end-line mapping, in case it
is run on Windows. As we learned in Chapter 2,
it’s not impossible that true binary data files may have bytes
whose value is equal to a
line-feed character;
opening in “w” text mode instead would make these bytes
be automatically expanded to a
two-byte
sequence when written locally on Windows. This is only an issue for
portability to Windows (mode “w” works elsewhere). Again,
see Chapter 5 for line-feed converter tools.
This function currently uses the same filename to identify both the
remote file and the local file where the download should be stored.
As such, it should be run in the directory where you want the file to
show up; use os.chdir
to move to directories if
needed. (We could instead assume filename is the
local file’s name, and strip the local directory with
os.path.split
to get the remote name, or accept
two distinct filename arguments -- local and remote.)
Notice also that, despite its name, this module is very different
than the getfile.py script we studied at the end
of the sockets material in the previous chapter. The socket-based
getfile
implemented client and server-side logic
to download a server file to a client machine over raw sockets.
This new getfile
here is a client-side tool only.
Instead of raw sockets, it uses the simpler FTP protocol to request a
file from a server; all socket-level details are hidden in the
ftplib
module’s implementation of the FTP
client protocol. Furthermore, the server here is a perpetually
running program on the server machine, which listens for and responds
to FTP requests on a socket, on the dedicated FTP port (number 21).
The net functional effect is that this script requires an FTP server
to be running on the machine where the desired file lives, but such a
server is much more likely to be available.
While we’re at it, let’s write a script to upload a single file by FTP to a remote machine. The upload interfaces in the FTP module are symmetric with the download interfaces. Given a connected FTP object:
Its storbinary
method can be used to upload bytes
from an open local file object.
Its storlines
method can be used to upload text in
ASCII mode from an open local file object.
Unlike the download interfaces, both of these methods are passed a
file object as a whole, not a file object method (or other function).
We will meet the storlines
method in a later
example. The utility module in Example 11-6 uses
storbinary
such that the file whose name is passed
in is always uploaded verbatim -- in binary mode, without
line-feed translations for the target machine’s conventions. If
this script uploads a text file, it will arrive exactly as stored on
the machine it came from, client line-feed markers and all.
Example 11-6. PP2EInternetFtpputfile.py
#!/usr/local/bin/python ################################################## # Store an arbitrary file by ftp. Anonymous # ftp unless you pass a user=(name, pswd) tuple. ################################################## import ftplib # socket-based ftp tools file = 'sousa.au' # default file coordinates site = 'starship.python.net' # monty python theme song dir = 'upload' def putfile(file=file, site=site, dir=dir, user=(), verbose=1): """ store a file by ftp to a site/directory anonymous or real login, binary transfer """ if verbose: print 'Uploading', file local = open(file, 'rb') # local file of same name remote = ftplib.FTP(site) # connect to ftp site apply(remote.login, user) # anonymous or real login remote.cwd(dir) remote.storbinary('STOR ' + file, local, 1024) remote.quit() local.close() if verbose: print 'Upload done.' if __name__ == '__main__': import sys, getpass pswd = getpass.getpass(site + ' pswd?') # filename on cmdline putfile(file=sys.argv[1], user=('lutz', pswd)) # non-anonymous login
Notice that for portability, the local file is opened in
“rb” binary mode this time to suppress automatic
line-feed character conversions in case this is run on Windows; if
this is binary information, we don’t want any bytes that happen
to have the value of the
carriage-return
character to mysteriously go away during the transfer.
Also
observe that the standard Python getpass.getpass
is used to ask for an FTP password in standalone mode. Like the
raw_input
built-in function, this call prompts for
and reads a line of text from the console user; unlike
raw_input
, getpass
does not
echo typed characters on the screen at all (in fact, on Windows it
uses the low-level direct keyboard interface we met in the stream
redirection section of Chapter 2). This comes in
handy for protecting things like passwords from potentially prying
eyes.
Like the download utility, this script uploads a local copy of an
audio file by default as a self-test, but you will normally pass in
real remote filename, site name, and directory name strings. Also
like the download utility, you may pass a
(username,
password)
tuple to
the user
argument to trigger non-anonymous FTP
mode (anonymous FTP is the default).
Wake up -- it’s time for a bit of fun. Let’s make use of these scripts to transfer and play the Monty Python theme song audio file maintained at Python’s web site. First off, let’s write a module that downloads and plays the sample file, as shown in Example 11-7.
Example 11-7. PP2EInternetFtpsousa.py
#!/usr/local/bin/python ################################################# # Usage: % sousa.py # Fetch and play the Monty Python theme song. # This may not work on your system as is: it # requires a machine with ftp access, and uses # audio filters on Unix and your .au player on # Windows. Configure playfile.py as needed. ################################################# import os, sys from PP2E.Internet.Ftp.getfile import getfile from PP2E.Internet.Ftp.playfile import playfile sample = 'sousa.au' getfile(sample) # fetch audio file by ftp playfile(sample) # send it to audio player
This script will run on any machine with Python, an Internet link, and a recognizable audio player; it works on my Windows laptop with a dialup Internet connection (if I could insert an audio file hyperlink here to show what it sounds like, I would):
C:...PP2EInternetFtp>python sousa.py
Downloading sousa.au Download done. C:...PP2EInternetFtp>python sousa.py
sousa.au already fetched
The getfile
and putfile
modules
can be used to move the sample file around, too. Both can either be
imported by clients that wish to use their functions, or run as
top-level programs to trigger self-tests. Let’s run these
scripts from a command line and the interactive prompt to see how
they work. When run standalone, parameters are passed in the command
line, and the default file settings are used:
C:...PP2EInternetFtp>python putfile.py sousa.au
starship.python.net pswd?
Uploading sousa.au
Upload done.
When imported, parameters are passed explicitly to functions:
C:...PP2EInternetFtp>python
>>>from getfile import getfile
>>>getfile(file='sousa.au', site='starship.python.net', dir='upload', ... user=('lutz', '****'))
Downloading sousa.au Download done. >>>from playfile import playfile
>>>playfile('sousa.au')
I’ve left one piece out of the puzzle: all that’s left is to write a module that attempts to play an audio file portably (see Example 11-8). Alas, this is the least straightforward task because audio players vary per platform. On Windows, the following module uses the DOS start command to launch whatever you have registered to play audio files (exactly as if you had double-clicked on the file’s icon in a file explorer); on the Windows 98 side of my Sony notebook machine, this DOS command line:
C:...PP2EInternetFtp>python playfile.py sousa.au
pops up a media bar playing the sample. On Unix, it attempts to pass
the audio file to a command-line player program, if one has been
added to the unixfilter
table -- tweak this for
your system (cat ‘ing audio files to
/dev/audio
works on some Unix systems, too). On
other platforms, you’ll need to do a bit more; there has been
some work towards portable audio interfaces in Python, but it’s
notoriously platform-specific. Web browsers generally know how to
play audio files, so passing the filename in a URL to a browser
located via the LaunchBrowser.py
script we met
in Chapter 4, is perhaps a portable solution here
as well (see that chapter for interface details).
Example 11-8. PP2EInternetFtpplayfile.py
#!/usr/local/bin/python ################################################# # Try to play an arbitrary audio file. # This may not work on your system as is; it # uses audio filters on Unix, and filename # associations on Windows via the start command # line (i.e., whatever you have on your machine # to run *.au files--an audio player, or perhaps # a web browser); configure me as needed. We # could instead launch a web browser here, with # LaunchBrowser.py. See also: Lib/audiodev.py. ################################################# import os, sys sample = 'sousa.au' # default audio file unixhelpmsg = """ Sorry: can't find an audio filter for your system! Add an entry for your system to the "unixfilter" dictionary in playfile.py, or play the file manually. """ unixfilter = {'sunos5': '/usr/bin/audioplay', 'linux2': '<unknown>', 'sunos4': '/usr/demo/SOUND/play'} def playfile(sample=sample): """ play an audio file: use name associations on windows, filter command-lines elsewhere """ if sys.platform[:3] == 'win': os.system('start ' + sample) # runs your audio player else: if not (unixfilter.has_key(sys.platform) and os.path.exists(unixfilter[sys.platform])): print unixhelpmsg else: theme = open(sample, 'r') audio = os.popen(unixfilter[sys.platform], 'w') # spawn shell tool audio.write(theme.read()) # send to its stdin if __name__ == '__main__': playfile()
If you read the last chapter, you’ll
recall that it concluded with a quick look at scripts that added a
user interface to a socket-based getfile
script -- one that transferred files over a proprietary socket
dialog, instead of FTP. At the end of that presentation, I mentioned
that FTP is a much more generally useful way to move files around,
because FTP servers are so widely available on the Net. For
illustration purposes, Example 11-9 shows a simple
mutation of the last chapter’s user interface, implemented as a
new subclass of the last chapter’s general form builder.
Example 11-9. P2EInternetFtpgetfilegui.py
############################################################### # launch ftp getfile function with a reusable form gui class; # uses os.chdir to goto target local dir (getfile currently # assumes that filename has no local directory path prefix); # runs getfile.getfile in thread to allow more than one to be # running at once and avoid blocking gui during downloads; # this differs from socket-based getfilegui, but reuses Form; # supports both user and anonymous ftp as currently coded; # caveats: the password field is not displayed as stars here, # errors are printed to the console instead of shown in the # gui (threads can't touch the gui on Windows), this isn't # 100% thread safe (there is a slight delay between os.chdir # here and opening the local output file in getfile) and we # could display both a save-as popup for picking the local dir, # and a remote directory listings for picking the file to get; ############################################################### from Tkinter import Tk, mainloop from tkMessageBox import showinfo import getfile, os, sys, thread # ftp getfile here, not socket from PP2E.Internet.Sockets.form import Form # reuse form tool in socket dir class FtpForm(Form): def __init__(self): root = Tk() root.title(self.title) labels = ['Server Name', 'Remote Dir', 'File Name', 'Local Dir', 'User Name?', 'Password?'] Form.__init__(self, labels, root) self.mutex = thread.allocate_lock() self.threads = 0 def transfer(self, filename, servername, remotedir, userinfo): try: self.do_transfer(filename, servername, remotedir, userinfo) print '%s of "%s" successful' % (self.mode, filename) except: print '%s of "%s" has failed:' % (self.mode, filename), print sys.exc_info()[0], sys.exc_info()[1] self.mutex.acquire() self.threads = self.threads - 1 self.mutex.release() def onSubmit(self): Form.onSubmit(self) localdir = self.content['Local Dir'].get() remotedir = self.content['Remote Dir'].get() servername = self.content['Server Name'].get() filename = self.content['File Name'].get() username = self.content['User Name?'].get() password = self.content['Password?'].get() userinfo = () if username and password: userinfo = (username, password) if localdir: os.chdir(localdir) self.mutex.acquire() self.threads = self.threads + 1 self.mutex.release() ftpargs = (filename, servername, remotedir, userinfo) thread.start_new_thread(self.transfer, ftpargs) showinfo(self.title, '%s of "%s" started' % (self.mode, filename)) def onCancel(self): if self.threads == 0: Tk().quit() else: showinfo(self.title, 'Cannot exit: %d threads running' % self.threads) class FtpGetfileForm(FtpForm): title = 'FtpGetfileGui' mode = 'Download' def do_transfer(self, filename, servername, remotedir, userinfo): getfile.getfile(filename, servername, remotedir, userinfo, 0, 1) if __name__ == '__main__': FtpGetfileForm() mainloop()
If you flip back to the end of the previous chapter, you’ll
find that this version is similar in structure to its counterpart
there; in fact, it has the same name (and is distinct only because it
lives in a different directory). The class here, though, knows how to
use the FTP-based getfile
module from earlier in
this chapter, instead of the socket-based getfile
module we met a chapter ago. When run, this version also implements
more input fields, as we see in Figure 11-1.
Notice that a full file path is entered for the local directory here.
Otherwise, the script assumes the current working directory, which
changes after each download and can vary depending on where the GUI
is launched (e.g., the current directory differs when this script is
run by the PyDemos
program at the top of the
examples tree). When we click this GUI’s Submit button (or
press the Enter key), this script simply passes the form’s
input field values as arguments to the
getfile.getfile
FTP utility function shown earlier
in this section. It also posts a pop-up to tell us the download has
begun (Figure 11-2).
As currently coded, further download status messages from this point on show up in the console window; here are the messages for a successful download, as well as one that failed when I mistyped my password (no, it’s not really “xxxxxx”):
User Name? => lutz Server Name => starship.python.net Local Dir => c: emp Password? => xxxxxx File Name => index.html Remote Dir => public_html/home Download of "index.html" successful User Name? => lutz Server Name => starship.python.net Local Dir => c: emp Password? => xxxxxx File Name => index.html Remote Dir => public_html/home Download of "index.html" has failed: ftplib.error_perm 530 Login incorrect.
Given a username and password, the downloader logs into the specified account. To do anonymous FTP instead, leave the username and password fields blank. Let’s start an anonymous FTP connection to fetch the Python source distribution; Figure 11-3 shows the filled-out form.
Pressing Submit on this form starts a download running in the background as before; we get the pop-up shown in Figure 11-4 to verify the startup.
Now, to illustrate the threading capabilities of this GUI, let’s start another download while this one is in progress. The GUI stays active while downloads are under way, so we simply change the input fields and press Submit again, as done in Figure 11-5.
This second download starts in parallel with the one attached to ftp://ftp.python.org, because each download is run in a thread, and more than one Internet connection can be active at once. In fact, the GUI itself stays active during downloads only because downloads are run in threads; if they were not, even screen redraws wouldn’t happen until a download finished.
We discussed threads in Chapter 3, but this script illustrates some practical thread concerns:
This program takes care to not do anything GUI-related in a download thread. At least in the current release on Windows, only the thread that makes GUIs can process them (a Windows-only rule that has nothing to do with Python or Tkinter).
To avoid killing spawned download threads on some platforms, the GUI must also be careful to not exit while any downloads are in progress. It keeps track of the number of in-progress threads, and just displays the pop-up in Figure 11-6 if we try to kill the GUI while both of these downloads are in progress by pressing the Cancel button.
We’ll see ways to work around the no-GUI rule for threads when
we explore the PyMailGui
example near the end of
this chapter. To be portable, though, we can’t really close the
GUI until the active-thread count falls to zero. Here is the sort of
output that appears in the console window for these two downloads:
C:...PP2EInternetFtp>python getfilegui.py
User Name? =>
Server Name => ftp.python.org
Local Dir => c: emp
Password? =>
File Name => python1.5.tar.gz
Remote Dir => pub/python/src
User Name? => lutz
Server Name => starship.python.net
Local Dir => c: emp
Password? => xxxxxx
File Name => about-pp.html
Remote Dir => public_html/home
Download of "about-pp.html" successful
Download of "python1.5.tar.gz" successful
This all isn’t much more useful than a command-line-based tool, of course, but it can be easily modified by changing its Python code, and it provides enough of a GUI to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs downloads in Python threads, more than one can be run at the same time from this GUI without having to start or restart a different FTP client tool.
While we’re in a GUI mood, let’s add a simple interface
to the putfile
utility, too. The script in Example 11-10 creates a dialog that starts uploads in
threads. It’s almost the same as the getfile
GUI we just wrote, so there’s nothing new to say. In fact,
because get and put operations are so similar from an interface
perspective, most of the get form’s logic was deliberately
factored out into a single generic class (FtpForm
)
such that changes need only be made in a single place. That is, the
put GUI here is mostly just a reuse of the get GUI, with distinct
output labels and transfer method. It’s in a file by itself to
make it easy to launch as a standalone program.
Example 11-10. PP2EInternetFtpputfilegui.py
############################################################### # launch ftp putfile function with a reusable form gui class; # see getfilegui for notes: most of the same caveats apply; # the get and put forms have been factored into a single # class such that changes need only be made in one place; ############################################################### from Tkinter import mainloop import putfile, getfilegui class FtpPutfileForm(getfilegui.FtpForm): title = 'FtpPutfileGui' mode = 'Upload' def do_transfer(self, filename, servername, remotedir, userinfo): putfile.putfile(filename, servername, remotedir, userinfo, 0) if __name__ == '__main__': FtpPutfileForm() mainloop()
Running this script looks much like running the download GUI, because
it’s almost entirely the same code at work. Let’s upload
a couple of files from the client machine to the
starship
server; Figure 11-7
shows the state of the GUI while starting one.
And here is the console window output we get when uploading two files in parallel; here again, uploads run in threads, so if we start a new upload before one in progress is finished, they overlap in time:
User Name? => lutz Server Name => starship.python.net Local Dir => c:stuffwebsitepublic_html Password? => xxxxxx File Name => about-pp2e.html Remote Dir => public_html User Name? => lutz Server Name => starship.python.net Local Dir => c:stuffwebsitepublic_html Password? => xxxxxx File Name => about-ppr2e.html Remote Dir => public_html Upload of "about-pp2e.html" successful Upload of "about-ppr2e.html" successful
Finally, we can bundle up both GUIs in a single launcher script that knows how to start the get and put interfaces, regardless of which directory we are in when the script is started, and independent of the platform on which it runs. Example 11-11 shows this process.
Example 11-11. PP2EInternetFtpPyFtpGui.pyw
################################################################ # spawn ftp get and put guis no matter what dir I'm run from; # os.getcwd is not necessarily the place this script lives; # could also hard-code a path from $PP2EHOME, or guessLocation; # could also do this but need the DOS popup for status messages: # from PP2E.launchmodes import PortableLauncher # PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)() ################################################################ import os, sys from PP2E.Launcher import findFirst mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0] if sys.platform[:3] == 'win': os.system('start %s/getfilegui.py' % mydir) os.system('start %s/putfilegui.py' % mydir) else: os.system('python %s/getfilegui.py &' % mydir) os.system('python %s/putfilegui.py &' % mydir)
When this script is started, both the get
and
put
GUIs appear as distinct, independently running
programs; alternatively, we might attach both forms to a single
interface. We could get much fancier than these two interfaces, of
course. For instance, we could pop up local file selection dialogs,
and we could display widgets that give status of downloads and
uploads in progress. We could even list files available at the remote
site in a selectable list box by requesting remote directory listings
over the FTP connection. To learn how to add features like that,
though, we need to move on to the next section.
Once upon a time, Telnet was all I needed. My web site lived at an Internet Service Provider (ISP) that provided general and free Telnet access for all its customers. It was a simple time. All of my site’s files lived only in one place -- at my account directory on my ISP’s server machine. To make changes to web pages, I simply started a Telnet session connected to my ISP’s machine and edited my web pages there online. Moreover, because Telnet sessions can be run from almost any machine with an Internet link, I was able to tweak my web pages everywhere -- from my PC, from machines I had access to on the training road, from archaic machines I played with when I was bored at my day job, and so on. Life was good.
But times have changed. Due to a security breach, my ISP made a blanket decision to revoke Telnet access from all of their customers (except, of course, those who elected to pay a substantial premium to retain it). Seemingly, we weren’t even supposed to have known about Telnet in the first place. As a replacement, the ISP mandated that all Telnet-inclined users should begin maintaining web page files locally on their own machines, and upload them by FTP after every change.
That’s nowhere near as nice as editing files kept in a single place from almost any computer on the planet, of course, and this triggered plenty of complaints and cancellations among the technically savvy. Unfortunately, the technically savvy is a financially insignificant subset; more to the point, my web page’s address had by this time been published in multiple books sold around the world, so changing ISPs would have been no less painful than changing update procedures.
After the shouting, it dawned on me that Python could help here: by writing Python scripts to automate the upload and download tasks associated with maintaining my web site on my PC, I could at least get back some of the mobility and ease of use that I’d lost. Because Python FTP scripts will work on any machine with sockets, I could run them both on my PC and on nearly any other computer where Python was installed. Furthermore, the same scripts used to transfer page files to and from my PC could be used to copy (“mirror”) my site to another web server as a backup copy, should my ISP experience an outage (trust me -- it happens).
The following two scripts were born of all of the above frustrations. The first, mirrorflat.py, automatically downloads (i.e., copies) by FTP all the files in a directory at a remote site, to a directory on the local machine. I keep the main copy of my web site files on my PC these days, but really use this script in two ways:
To download my web site to client machines where I want to make edits, I fetch the contents of my public_html web directory of my account on my ISP’s machine.
To mirror my site to my account on the starship.python.net server, I run this script periodically from a Telnet session on the starship machine (as I wrote this, starship still clung to the radical notion that users are intelligent enough to run Telnet).
More generally, this script (shown in Example 11-12) will download a directory full of files to any machine with Python and sockets, from any machine running an FTP server.
Example 11-12. PP2EInternetFtpmirrorflat.py
#!/bin/env python ########################################################### # use ftp to copy (download) all files from a remote site # and directory to a directory on the local machine; e.g., # run me periodically to mirror a flat ftp site; ########################################################### import os, sys, ftplib from getpass import getpass remotesite = 'home.rmi.net' remotedir = 'public_html' remoteuser = 'lutz' remotepass = getpass('Please enter password for %s: ' % remotesite) localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.' if sys.platform[:3] == 'win': raw_input() # clear stream cleanall = raw_input('Clean local directory first? ')[:1] in ['y', 'Y'] print 'connecting...' connection = ftplib.FTP(remotesite) # connect to ftp site connection.login(remoteuser, remotepass) # login as user/password connection.cwd(remotedir) # cd to directory to copy if cleanall: for localname in os.listdir(localdir): # try to delete all locals try: # first to remove old files print 'deleting local', localname os.remove(os.path.join(localdir, localname)) except: print 'cannot delete local', localname count = 0 # download all remote files remotefiles = connection.nlst() # nlst() gives files list # dir() gives full details for remotename in remotefiles: localname = os.path.join(localdir, remotename) print 'copying', remotename, 'to', localname if remotename[-4:] == 'html' or remotename[-3:] == 'txt': # use ascii mode xfer localfile = open(localname, 'w') callback = lambda line, file=localfile: file.write(line + ' ') connection.retrlines('RETR ' + remotename, callback) else: # use binary mode xfer localfile = open(localname, 'wb') connection.retrbinary('RETR ' + remotename, localfile.write) localfile.close() count = count+1 connection.quit() print 'Done:', count, 'files downloaded.'
There is not a whole lot new to speak of in this script, compared to other FTP examples we’ve seen thus far. We open a connection with the remote FTP server, log in with a username and password for the desired account (this script never uses anonymous FTP), and go to the desired remote directory. New here, though, are loops to iterate over all the files in local and remote directories, text-based retrievals, and file deletions:
This script has a
cleanall
option, enabled by interactive prompt. If
selected, the script first deletes all the files in the local
directory before downloading, to make sure there are no extra files
there that aren’t also on the server (there may be junk here
from a prior download). To delete local files, the script calls
os.listdir
to get a list of filenames in the
directory, and os.remove
to delete each; see Chapter 2 earlier in this book (or the Python library
manual) for more details if you’ve forgotten what these calls
do.
Notice the use of os.path.join
to concatenate a
directory path and filename according to the host platform’s
conventions; os.listdir
returns filenames without
their directory paths, and this script is not necessarily run in the
local directory where downloads will be placed. The local directory
defaults to the current directory (“.”), but can be set
differently with a command-line argument to the script.
To
grab all the files in a remote directory, we first need a list of
their names. The FTP object’s nlst
method is
the remote equivalent of os.listdir
:
nlist
returns a list of the string names of all
files in the current remote directory. Once we have this list, we
simply step through it in a loop, running FTP retrieval commands for
each filename in turn (more on this in a minute).
The nlst
method is, more or less, like requesting
a directory listing with an ls
command in typical
interactive FTP programs, but Python automatically splits up the
listing’s text into a list of filenames. We can pass it a
remote directory to be listed; by default it lists the current server
directory. A related FTP method, dir
, returns the
list of line strings produced by an FTP LIST
command; its result is like typing a dir command
in an FTP session, and its lines contain complete file information,
unlike nlst
. If you need to know more about all
the remote files, parse the result of a dir
method
call.
To keep line-feeds in sync with the
machines that my web files live on, this script distinguishes between
binary and text files. It uses a simple heuristic to do so: filenames
ending in .html
or .txt
are
assumed to be ASCII text data (HTML web pages and simple text files),
and all others are assumed to be binary files (e.g., GIF and JPEG
images, audio files, tar archives). This simple rule won’t work
for every web site, but it does the trick at mine.
Binary files are pulled down with the retrbinary
method we met earlier and a local open mode of “wb” to
suppress line-feed byte mapping (this script may be run on Windows or
Unix-like platforms). We don’t use a chunk size third argument
here, though -- it defaults to a reasonable 8K if omitted.
For ASCII text files, the script instead uses the
retrlines
method, passing in a function to be
called for each line in the text file downloaded. The text line
handler function mostly just writes the line to a local file. But
notice that the handler function created by the
lambda
here also adds an
newline character to the end of the line it is passed. Python’s
retrlines
method strips all line-feed characters
from lines to side-step platform differences. By adding an
, the script is sure to add the proper line-feed
marker character sequence for the local platform on which this script
runs (
or
). For this
automapping of the
in the script to work, of
course, we must also open text output files in “w” text
mode, not “wb” -- the mapping from
to
on Windows happens
when data is written to the file.
All of this is simpler in action than in words. Here is the command I use to download my entire web site from my ISP server account to my Windows 98 laptop PC, in a single step:
C:StuffWebsitepublic_html>python %X%internetftpmirrorflat.py
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
copying UPDATES to .UPDATES
copying PythonPowered.gif to .PythonPowered.gif
copying Pywin.gif to .Pywin.gif
copying PythonPoweredAnim.gif to .PythonPoweredAnim.gif
copying PythonPoweredSmall.gif to .PythonPoweredSmall.gif
copying about-hopl.html to .about-hopl.html
copying about-lp.html to .about-lp.html
...
...lines deleted...
...
copying training.html to . raining.html
copying trainingCD.GIF to . rainingCD.GIF
copying uk-1.jpg to .uk-1.jpg
copying uk-2.jpg to .uk-2.jpg
copying uk-3.jpg to .uk-3.jpg
copying whatsnew.html to .whatsnew.html
copying whatsold.html to .whatsold.html
copying xlate-lp.html to .xlate-lp.html
copying uploadflat.py to .uploadflat.py
copying ora-lp-france.gif to .ora-lp-france.gif
Done: 130 files downloaded.
This can take awhile to complete (it’s bound by network speed
constraints), but it is much more accurate and easy than downloading
files by hand. The script simply iterates over all the remote files
returned by the nlst
method, and downloads each
with the FTP protocol (i.e., over sockets) in turn. It uses text
transfer mode for names that imply obviously text data, and binary
mode for others.
With the script running this way, I make sure the initial assignments
in it reflect the machines involved, and then run the script from the
local directory where I want the site copy to be stored. Because the
download directory is usually not where the script lives, I need to
give Python the full path to the script file (%X%
evaluates a shell variable containing the top-level path to book
examples on my machine). When run on the
starship server in a Telnet session window, the
execution and script directory paths are different, but the script
works the same way.
If you elect to delete local files in the download directory, you may also see a batch of “deleting local...” messages scroll by on the screen before any “copying...” lines appear:
... deleting local uploadflat.py deleting local whatsnew.html deleting local whatsold.html deleting local xlate-lp.html deleting local old-book.html deleting local about-pp2e.html deleting local about-ppr2e.html deleting local old-book2.html deleting local mirrorflat.py ... copying about-pp-japan.html to ./about-pp-japan.html copying about-pp.html to ./about-pp.html copying about-ppr-germany.html to ./about-ppr-germany.html copying about-ppr-japan.html to ./about-ppr-japan.html copying about-ppr-toc.html to ./about-ppr-toc.html ...
By the way, if you botch the input of the remote site password, a Python exception is raised; I sometimes need to run again (and type slower):
C:StuffWebsitepublic_html>python %X%internetftpmirrorflat.py
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
Traceback (innermost last):
File "C:PP2ndEdexamplesPP2Einternetftpmirrorflat.py", line 20, in ?
connection.login(remoteuser, remotepass) # login as user/pass..
File "C:Program FilesPythonLibftplib.py", line 316, in login
if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)
File "C:Program FilesPythonLibftplib.py", line 228, in sendcmd
return self.getresp()
File "C:Program FilesPythonLibftplib.py", line 201, in getresp
raise error_perm, resp
ftplib.error_perm: 530 Login incorrect.
It’s worth noting that this script is at least partially configured by assignments near the top of the file. In addition, the password and deletion options are given by interactive inputs, and one command-line argument is allowed -- the local directory name to store the downloaded files (it defaults to “.”, the directory where the script is run). Command-line arguments could be employed to universally configure all the other download parameters and options, too; but because of Python’s simplicity and lack of compile/link steps, changing settings in the text of Python scripts is usually just as easy as typing words on a command line.
Windows
input note : If you study the previous code closely, you’ll
notice that an extra raw_input
call is made on
Windows only, after the getpass
password input
call and before the cleanall
option setting is
input. This is a workaround for what seems like a bug in Python 1.5.2
for Windows.
Oddly, the Windows port sometimes doesn’t
synchronize command-line input and output streams as expected. Here,
this seems to be due to a getpass
bug or
constraint -- because getpass
uses the
low-level msvcrt
keyboard interface module we met
in Chapter 2, it appears to not mix well with the
stdin
stream buffering used by
raw_input
, and botches the input stream in the
process. The extra raw_input
clears the input
stream (sys.stdin.flush
doesn’t help).
In
fact, without the superfluous raw_input
for
Windows, this script prompts for cleanall
option
input, but never stops to let you type a reply! This effectively
disables cleanall
altogether. To force distinct
input and output lines and correct raw_input
behavior, some scripts in this book run extra
print
statements or raw_input
calls to sync up streams before further user interaction. There may
be other fixes, and this may be improved in future releases; try this
script without the extra raw_input
to see if this
has been repaired in your Python.
Uploading a full directory is symmetric to downloading: it’s mostly a matter of swapping the local and remote machines and operations in the program we just met. The script in Example 11-13 uses FTP to copy all files in a directory on the local machine on which it runs, up to a directory on a remote machine.
I really use this script, too, most often to upload all of the files maintained on my laptop PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from my PC to its starship mirror machine, or from the mirror machine back to my ISP. Because this script runs on any computer with Python and sockets, it happily transfers a directory from any machine on the Net to any machine running an FTP server. Simply change the initial setting in this module as appropriate for the transfer you have in mind.
Example 11-13. PP2EInternetFtpuploadflat.py
#!/bin/env python ########################################################################## # use ftp to upload all files from a local dir to a remote site/directory; # e.g., run me to copy a web/ftp site's files from your PC to your ISP; # assumes a flat directory upload: uploadall.py does nested directories. # to go to my ISP, I change setting to 'home.rmi.net', and 'public_html'. ########################################################################## import os, sys, ftplib, getpass remotesite = 'starship.python.net' # upload to starship site remotedir = 'public_html/home' # from win laptop or other remoteuser = 'lutz' remotepass = getpass.getpass('Please enter password for %s: ' % remotesite) localdir = (len(sys.argv) > 1 and sys.argv[1]) or '.' if sys.platform[:3] == 'win': raw_input() # clear stream cleanall = raw_input('Clean remote directory first? ')[:1] in ['y', 'Y'] print 'connecting...' connection = ftplib.FTP(remotesite) # connect to ftp site connection.login(remoteuser, remotepass) # login as user/password connection.cwd(remotedir) # cd to directory to copy if cleanall: for remotename in connection.nlst(): # try to delete all remotes try: # first to remove old files print 'deleting remote', remotename connection.delete(remotename) except: print 'cannot delete remote', remotename count = 0 localfiles = os.listdir(localdir) # upload all local files # listdir() strips dir path for localname in localfiles: localpath = os.path.join(localdir, localname) print 'uploading', localpath, 'to', localname if localname[-4:] == 'html' or localname[-3:] == 'txt': # use ascii mode xfer localfile = open(localpath, 'r') connection.storlines('STOR ' + localname, localfile) else: # use binary mode xfer localfile = open(localpath, 'rb') connection.storbinary('STOR ' + localname, localfile, 1024) localfile.close() count = count+1 connection.quit() print 'Done:', count, 'files uploaded.'
Like the mirror download script, the program here illustrates a handful of new FTP interfaces and a set of FTP scripting techniques:
Just
like the mirror script, the upload begins by asking if we want to
delete all the files in the remote target directory before copying
any files there. This cleanall
option is useful if
we’ve deleted files in the local copy of the directory in the
client -- the deleted files would remain on the server-side copy
unless we delete all files there first. To implement the remote
cleanup, this script simply gets a listing of all the files in the
remote directory with the FTP nlst
method, and
deletes each in turn with the FTP delete
method.
Assuming we have delete permission, the directory will be emptied
(file permissions depend on the account we logged into when
connecting to the server). We’ve already moved to the target
remote directory when deletions occur, so no directory paths must be
prepended to filenames here.
To apply the
upload operation to each file in the local directory, we get a list
of local filenames with the standard os.listdir
call, and take care to prepend the local source directory path to
each filename with the os.path.join
call. Recall
that os.listdir
returns filenames without
directory paths, and the source directory may not be the same as the
script’s execution directory if passed on the command line.
This
script may be run on both Windows and Unix-like clients, so we need
to handle text files specially. Like the mirror download, this script
picks text or binary transfer modes by inspecting each
filename’s extension -- HTML and text files are moved in FTP
text mode. We’ve already met the storbinary
FTP object method used to upload files in binary mode -- an exact,
byte-for-byte copy appears at the remote site.
Text mode transfers work almost identically: the
storlines
method accepts an FTP command string and
a local file (or file-like) object opened in text mode, and simply
copies each line in the local file to a same-named file on the remote
machine. As usual, if we run this script on Windows, opening the
input file in “r” text mode means that DOS-style
end-of-line sequences are mapped to the
character as lines are read. When the script is
run on Unix and Linux, lines end in a single
already, so no such mapping occurs. The net effect is that data is
read portably, with
characters to represent
end-of-line. For binary files, we open in “rb” mode to
suppress such automatic mapping everywhere (we don’t want bytes
that happen to have the same value as
to
magically disappear when read on Windows).[82]
As for the mirror download script, this program simply iterates over all files to be transferred (files in the local directory listing this time), and transfers each in turn -- in either text or binary mode, depending on the files’ names. Here is the command I use to upload my entire web site from my laptop Windows 98 PC to the remote Unix server at my ISP, in a single step:
C:StuffWebsitepublic_html>python %X%InternetFtpuploadflat.py
Please enter password for starship.python.net:
Clean remote directory first?
connecting...
uploading .LJsuppcover.jpg to LJsuppcover.jpg
uploading .PythonPowered.gif to PythonPowered.gif
uploading .PythonPoweredAnim.gif to PythonPoweredAnim.gif
uploading .PythonPoweredSmall.gif to PythonPoweredSmall.gif
uploading .Pywin.gif to Pywin.gif
uploading .UPDATES to UPDATES
uploading .about-hopl.html to about-hopl.html
uploading .about-lp.html to about-lp.html
uploading .about-pp-japan.html to about-pp-japan.html
...
...lines deleted...
...
uploading . rainingCD.GIF to trainingCD.GIF
uploading .uk-1.jpg to uk-1.jpg
uploading .uk-2.jpg to uk-2.jpg
uploading .uk-3.jpg to uk-3.jpg
uploading .uploadflat.py to uploadflat.py
uploading .whatsnew.html to whatsnew.html
uploading .whatsold.html to whatsold.html
uploading .xlate-lp.html to xlate-lp.html
Done: 131 files uploaded.
Like the mirror example, I usually run this command from the local directory where my web files are kept, and I pass Python the full path to the script. When I run this on the starship Linux server, it works the same, but the paths to the script and my web files directory differ. If you elect to clean the remote directory before uploading, you’ll get a bunch of “deleting remote...” messages before the “uploading...” lines here, too:
... deleting remote uk-3.jpg deleting remote whatsnew.html deleting remote whatsold.html deleting remote xlate-lp.html deleting remote uploadflat.py deleting remote ora-lp-france.gif deleting remote LJsuppcover.jpg deleting remote sonyz505js.gif deleting remote pic14.html ...
Perhaps the biggest limitation of the web site download and upload scripts we just met are that they assume the site directory is flat (hence their names) -- i.e., both transfer simple files only, and neither handles nested subdirectories within the web directory to be transferred.
For my purposes, that’s a reasonable constraint. I avoid nested subdirectories to keep things simple, and I store my home web site as a simple directory of files. For other sites (including one I keep at the starship machine), site transfer scripts are easier to use if they also automatically transfer subdirectories along the way.
It turns out that supporting directories is fairly simple -- we need to add only a bit of recursion and remote directory creation calls. The upload script in Example 11-14 extends the one we just saw, to handle uploading all subdirectories nested within the transferred directory. Furthermore, it recursively transfers subdirectories within subdirectories -- the entire directory tree contained within the top-level transfer directory is uploaded to the target directory at the remote server.
Example 11-14. PP2EInternetFtpuploadall.py
#!/bin/env python ########################################################################## # use ftp to upload all files from a local dir to a remote site/directory; # this version supports uploading nested subdirectories too, but not the # cleanall option (that requires parsing ftp listings to detect remote # dirs, etc.); to upload subdirectories, uses os.path.isdir(path) to see # if a local file is really a directory, FTP().mkd(path) to make the dir # on the remote machine (wrapped in a try in case it already exists there), # and recursion to upload all files/dirs inside the nested subdirectory. # see also: uploadall-2.py, which doesn't assume the topremotedir exists. ########################################################################## import os, sys, ftplib from getpass import getpass remotesite = 'home.rmi.net' # upload from pc or starship to rmi.net topremotedir = 'public_html' remoteuser = 'lutz' remotepass = getpass('Please enter password for %s: ' % remotesite) toplocaldir = (len(sys.argv) > 1 and sys.argv[1]) or '.' print 'connecting...' connection = ftplib.FTP(remotesite) # connect to ftp site connection.login(remoteuser, remotepass) # login as user/password connection.cwd(topremotedir) # cd to directory to copy to # assumes topremotedir exists def uploadDir(localdir): global fcount, dcount localfiles = os.listdir(localdir) for localname in localfiles: localpath = os.path.join(localdir, localname) print 'uploading', localpath, 'to', localname if os.path.isdir(localpath): # recur into subdirs try: connection.mkd(localname) print localname, 'directory created' except: print localname, 'directory not created' connection.cwd(localname) uploadDir(localpath) connection.cwd('..') dcount = dcount+1 else: if localname[-4:] == 'html' or localname[-3:] == 'txt': # use ascii mode xfer localfile = open(localpath, 'r') connection.storlines('STOR ' + localname, localfile) else: # use binary mode xfer localfile = open(localpath, 'rb') connection.storbinary('STOR ' + localname, localfile, 1024) localfile.close() fcount = fcount+1 fcount = dcount = 0 uploadDir(toplocaldir) connection.quit() print 'Done:', fcount, 'files and', dcount, 'directories uploaded.'
Like the flat upload script, this one can be run on any machine with Python and sockets and upload to any machine running an FTP server; I run it both on my laptop PC and on starship by Telnet to upload sites to my ISP.
In the interest of space, I’ll leave studying this variant in more depth as a suggested exercise. Two quick pointers, though:
The crux of
the matter here is the os.path.isdir
test near the
top; if this test detects a directory in the current local directory,
we create a same-named directory on the remote machine with
connection.mkd
and descend into it with
connection.cwd
, and recur into the subdirectory on
the local machine. Like all FTP object methods,
mkd
and cwd
methods issue FTP
commands to the remote server. When we exit a local subdirectory, we
run a remote cwd('..')
to climb to the remote
parent directory and continue. The rest of the script is roughly the
same as the original.
Note that this script handles only directory tree uploads; recursive uploads are generally more useful than recursive downloads, if you maintain your web sites on your local PC and upload to a server periodically, as I do. If you also want to download (mirror) a web site that has subdirectories, see the mirror scripts in the Python source distribution’s Tools directory (currently, at file location Tools/scripts/ftpmirror.py). It’s not much extra work, but requires parsing the output of a remote listing command to detect remote directories, and that is just complicated enough for me to omit here. For the same reason, the recursive upload script shown here doesn’t support the remote directory cleanup option of the original -- such a feature would require parsing remote listings as well.
For more context, also see the uploadall-2.py
version of this script in the examples distribution; it’s
similar, but coded so as not to assume that the top-level remote
directory already exists.
[80] For more
urllib
download examples, see the section on HTTP
in this chapter. In bigger terms, tools like
urllib.urlopen
allow scripts to both download
remote files and invoke programs that are located on a remote server
machine. In Chapter 12, we’ll also see that
urllib
includes tools for formatting (escaping)
URL strings for safe transmission.
[81] This is one point in the class where I also usually threaten to write Guido’s home phone number on the whiteboard. But that’s generally an empty promise made just for comic effect. If you do want to discuss Python language issues, Guido’s email address, as well as contact points for other Python core developers, are readily available on the Net. As someone who’s gotten anonymous Python-related calls at home, I never do give out phone numbers (and dialing 1-800-Hi-Guido is only funny the first time).
[82]
Technically,
Python’s storlines
method automatically
sends all lines to the server with
line-feed
sequences, no matter what it receives from the local file’s
readline
method (
or
). Because of that, the most important
distinctions for uploads are to use the “rb” for binary
mode and the storlines
method for text. Consult
module ftplib.py
in the Python source library
directory for more details.
3.138.114.154