Opening URLs

The protocol portion of the URL can consist of anything that processing software can understand. Perhaps the most common URL protocols (also called schemes) are HTTP, FTP, and FILE. HTTP is used to connect to web servers, FTP is used to retrieve files, and FILE is used to retrieve a local file. All are easily accomplished using Python’s urllib module.

The urllib.urlopen function takes care of opening URLs of all kinds and can give you back a file-like object to work with. To retrieve a local file, just use the filename. For example, to open an XML document in the local directory, you can use the following syntax:

>>> from urllib import urlopen
>>> fd = urlopen("order.xml")
>>> print fd.read(  )
<?xml version="1.0"?>
<!DOCTYPE order SYSTEM "order.dtd">
<order>
        <customer_name>eDonkey Enterprises</customer_name>
        <sku>343-3940938</sku>
        <unit_price>39.95</unit_price>
</order>

>>> fd.close(  )

The urlopen function returns a file-like object. This object can then be treated as a file to retrieve and display its contents. When the file object is closed, urlopen cleans up its business as well, terminating its connection to the remote server or local file.

Using FTP

The urlopen function works for remote files just as easily as it does for local files, provided you’re connected to the Internet. For example, if you supply a URL for an FTP server’s root directory, you may be able to pull back its contents, as shown here:

>>> fd = urlopen("ftp://ftp.oreilly.com")
>>> print fd.read(  )
total 64
drwxr-xr-x   3 61           512 Aug 29  2000 bin
drwxr-xr-x   2 3            512 Aug 30  2000 dev
drwxr-xr-x   4 61           512 Oct 16  2000 etc
lrwxrwxrwx   1 1             12 Aug 31  2000 examples -> pub/examples
drwxrwx-wx   2 100          512 May  7 22:22 incoming
drwxrws--x  48 61         17408 May  6 04:00 intl
drwxr-xr-x   2 1            512 Sep  1  2000 lost+found
drwxrws--x  55 61          4608 May  7 22:22 outgoing
drwxrwsr-x  21 61           512 Mar 30 21:47 pub
drwxr-xr-x   2 61           512 Aug 31  2000 published
drwxr-sr-x   4 100          512 Apr 17 17:17 software
dr-xr-xr-x   5 61           512 Aug 30  2000 usr

>>> fd.close(  )

Retrieving URLs

urlretrieve is similar to urlopen. This function optionally accepts a filename if you wish to store the remote file locally, and the function returns a tuple of the filename and the actual data as a mime message, as shown here:

>>> from urllib import urlretrieve
>>> ob = urlretrieve("ftp://ftp.oreilly.com", "menu.txt")
>>> ob
('menu.txt', <mimetools.Message instance at 007F382C>)

The first argument is the actual URL to connect to, while the second argument is the name of a local file to hold the data.

One of the most exciting features of urlretrieve is its callback functionality. When retrieving a document, you can supply a callback method as an optional third parameter to receive progress reports as the resource is downloaded.

If you supply a callback method, urlretrieve expects your callback method to take three arguments. The first argument is the current block number on which the retrieval is operating. The second argument is the size of the blocks being used, and the third is the total size of the file. Example 8-1 shows a simple routine that reports on its progress.

Example 8-1. retrieve.py
"""
retrieve.py example
"""
from urllib import urlretrieve

def callback(blocknum, blocksize, totalsize):
  print "Downloaded " + str((blocknum * blocksize)),
  print " of ", totalsize

urlretrieve("http://www.example.com/pyxml.xml", "px.xml", callback)
print "Download Complete"

The running example shows you:

C:WINDOWSDesktoporeillypythonxmlc8>python retrieve.py
Downloaded 0  of  116063
Downloaded 8192  of  116063
Downloaded 16384  of  116063
Downloaded 24576  of  116063
Downloaded 32768  of  116063
Downloaded 40960  of  116063
Downloaded 49152  of  116063
Downloaded 57344  of  116063
Downloaded 65536  of  116063
Downloaded 73728  of  116063
Downloaded 81920  of  116063
Downloaded 90112  of  116063
Downloaded 98304  of  116063
Downloaded 106496  of  116063
Downloaded 114688  of  116063
Downloaded 122880  of  116063
Downloaded 131072  of  116063
Download Complete

The callback functionality is excellent for keeping track of FTP progress. The callback functionality is also great anytime you need to keep tabs on a long download, or communicate progress information to a frustrated, busy end-user.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.244.250