So far, we’ve settled on a dictionary-based representation for our database of records, and we’ve reviewed some Python data structure concepts along the way. As mentioned, though, the objects we’ve seen so far are temporary—they live in memory and they go away as soon as we exit Python or the Python program that created them. To make our people persistent, they need to be stored in a file of some sort.
One way to keep our data around between program runs is to write all the data out to a simple text file, in a formatted way. Provided the saving and loading tools agree on the format selected, we’re free to use any custom scheme we like.
So that we don’t have to keep working interactively, let’s first write a script that initializes the data we are going to store (if you’ve done any Python work in the past, you know that the interactive prompt tends to become tedious once you leave the realm of simple one-liners). Example 2-1 creates the sort of records and database dictionary we’ve been working with so far, but because it is a module, we can import it repeatedly without having to retype the code each time. In a sense, this module is a database itself, but its program code format doesn’t support automatic or end-user updates as is.
Example 2-1. PP3EPreviewinitdata.py
# initialize data to be stored in files, pickles, shelves # records bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'} tom = {'name': 'Tom', 'age': 50, 'pay': 0, 'job': None} # database db = {} db['bob'] = bob db['sue'] = sue db['tom'] = tom if _ _name_ _ == '_ _main_ _': # when run as a script for key in db: print key, '=> ', db[key]
As usual, the _ _name_ _
test at the bottom of Example
2-1 is true only when this file is run, not when it is
imported. When run as a top-level script (e.g., from a command
line, via an icon click, or within the IDLE GUI), the file’s
self-test code under this test dumps the database’s contents to
the standard output stream (remember, that’s what print
statements do by default).
Here is the script in action being run from a system command
line on Windows. Type the following command in a Command Prompt
window after a cd
to the
directory where the file is stored, and use a similar console
window on other types of computers:
...PP3EPreview>python initdata.py
bob =>
{'job': 'dev', 'pay': 30000, 'age': 42, 'name': 'Bob Smith'}
sue =>
{'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}
tom =>
{'job': None, 'pay': 0, 'age': 50, 'name': 'Tom'}
Now that we’ve started running script files, here are a few quick startup hints:
On some platforms, you may need to type the full
directory path to the Python program on your machine, and on
recent Windows systems you don’t need python
on the command line at all
(just type the file’s name to run it).
You can also run this file inside Python’s standard IDLE GUI (open the file and use the Run menu in the text edit window), and in similar ways from any of the available third-party Python IDEs (e.g., Komodo, Eclipse, and the Wing IDE).
If you click the program’s file icon to launch it on
Windows, be sure to add a raw_input(
)
call to the bottom of the script to keep the
output window up. On other systems, icon clicks may require a
#!
line at the top and
executable permission via a chmod
command.
I’ll assume here that you’re able to run Python code one way or another. Again, if you’re stuck, see other books such as Learning Python for the full story on launching Python programs.
Now, all we have to do is store all of this in-memory data on a file. There are a variety of ways to accomplish this; one of the most basic is to write one piece of data at a time, with separators between each that we can use to break the data apart when we reload. Example 2-2 shows one way to code this idea.
Example 2-2. PP3EPreviewmake_db_files.py
#################################################################### # save in-memory database object to a file with custom formatting; # assume 'endrec.', 'enddb.', and '=>' are not used in the data; # assume db is dict of dict; warning: eval can be dangerous - it # runs strings as code; could also eval( ) record dict all at once #################################################################### dbfilename = 'people-file' ENDDB = 'enddb.' ENDREC = 'endrec.' RECSEP = '=>' def storeDbase(db, dbfilename=dbfilename): "formatted dump of database to flat file" dbfile = open(dbfilename, 'w') for key in db: print >> dbfile, key for (name, value) in db[key].items( ): print >> dbfile, name + RECSEP + repr(value) print >> dbfile, ENDREC print >> dbfile, ENDDB dbfile.close( ) def loadDbase(dbfilename=dbfilename): "parse data to reconstruct database" dbfile = open(dbfilename) import sys sys.stdin = dbfile db = {} key = raw_input( ) while key != ENDDB: rec = {} field = raw_input( ) while field != ENDREC: name, value = field.split(RECSEP) rec[name] = eval(value) field = raw_input( ) db[key] = rec key = raw_input( ) return db if _ _name_ _ == '_ _main_ _': from initdata import db storeDbase(db)
This is a somewhat complex program, partly because it has both saving and loading logic and partly because it does its job the hard way; as we’ll see in a moment, there are better ways to get objects into files than by manually formatting and parsing them. For simple tasks, though, this does work; running Example 2-2 as a script writes the database out to a flat file. It has no printed output, but we can inspect the database file interactively after this script is run, either within IDLE or from a console window where you’re running these examples (as is, the database file shows up in the current working directory):
...PP3EPreview>python make_db_file.py
...PP3EPreview>python
>>>for line in open('people-file'):
...print line,
... bob job=>'dev' pay=>30000 age=>42 name=>'Bob Smith' endrec. sue job=>'mus' pay=>40000 age=>45 name=>'Sue Jones' endrec. tom job=>None pay=>0 age=>50 name=>'Tom' endrec. enddb.
This file is simply our database’s content with added formatting. Its data originates from the test data initialization module we wrote in Example 2-1 because that is the module from which Example 2-2’s self-test code imports its data. In practice, Example 2-2 itself could be imported and used to store a variety of databases and files.
Notice how data to be written is formatted with the as-code
repr( )
call and is re-created
with the eval( )
call which
treats strings as Python code. That allows us to store and
re-create things like the None
object, but it is potentially unsafe; you shouldn’t use eval( )
if you can’t be sure that the
database won’t contain malicious code. For our purposes, however,
there’s probably no cause for alarm.
To test further, Example 2-3 reloads the database from a file each time it is run.
Example 2-3. PP3EPreviewdump_db_file.py
from make_db_file import loadDbase db = loadDbase( ) for key in db: print key, '=> ', db[key] print db['sue']['name']
And Example 2-4 makes changes by loading, updating, and storing again.
Example 2-4. PP3EPreviewupdate_db_file.py
from make_db_file import loadDbase, storeDbase db = loadDbase( ) db['sue']['pay'] *= 1.10 db['tom']['name'] = 'Tom Tom' storeDbase(db)
Here are the dump script and the update script in action at a system command line; both Sue’s pay and Tom’s name change between script runs. The main point to notice is that the data stays around after each script exits—our objects have become persistent simply because they are mapped to and from text files:
...PP3EPreview>python dump_db_file.py
bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} Sue Jones ...PP3EPreview>python update_db_file.py
...PP3EPreview>python dump_db_file.py
bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'} Sue Jones
As is, we’ll have to write Python code in scripts or at the interactive command line for each specific database update we need to perform (later in this chapter, we’ll do better by providing generalized console, GUI, and web-based interfaces instead). But at a basic level, our text file is a database of records. As we’ll learn in the next section, though, it turns out that we’ve just done a lot of pointless work.
The formatted file scheme of the prior section works,
but it has some major limitations. For one thing, it has to read the
entire database from the file just to fetch one record, and it must
write the entire database back to the file after each set of
updates. For another, it assumes that the data separators it writes
out to the file will not appear in the data to be stored: if the
characters =>
happen to appear
in the data, for example, the scheme will fail. Perhaps worse, the
formatter is already complex without being general: it is tied to
the dictionary-of-dictionaries structure, and it can’t handle
anything else without being greatly expanded. It would be nice if a
general tool existed that could translate any sort of Python data to
a format that could be saved on a file in a single step.
That is exactly what the Python pickle
module is designed to do. The
pickle
module translates an
in-memory Python object into a serialized
byte stream—a string of bytes that can be written to
any file-like object. The pickle
module also knows how to reconstruct the original object in memory,
given the serialized byte stream: we get back the exact same object.
In a sense, the pickle
module
replaces proprietary data formats—its serialized format is general
and efficient enough for any program. With pickle
, there is no need to manually
translate objects to data when storing them persistently.
The net effect is that pickling allows us to store and fetch
native Python objects as they are and in a single step—we use normal
Python syntax to process pickled records. Despite what it does, the
pickle
module is remarkably easy
to use. Example 2-5 shows
how to store our records in a flat file, using pickle
.
Example 2-5. PP3EPreviewmake_db_pickle.py
from initdata import db import pickle dbfile = open('people-pickle', 'w') pickle.dump(db, dbfile) dbfile.close( )
When run, this script stores the entire database (the
dictionary of dictionaries defined in Example 2-1) to a flat file
named people-pickle in the current working
directory. The pickle
module
handles the work of converting the object to a string. Example 2-6 shows how to access
the pickled database after it has been created; we simply open the
file and pass its content back to pickle
to remake the object from its
serialized string.
Example 2-6. PP3EPreviewdump_db_pickle.py
import pickle dbfile = open('people-pickle') db = pickle.load(dbfile) for key in db: print key, '=> ', db[key] print db['sue']['name']
Here are these two scripts at work, at the system command line again; naturally, they can also be run in IDLE, and you can open and inspect the pickle file by running the same sort of code interactively as well:
...PP3EPreview>python make_db_pickle.py
...PP3EPreview>python dump_db_pickle.py
bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} Sue Jones
Updating with a pickle file is similar to a manually formatted file, except that Python is doing all of the formatting work for us. Example 2-7 shows how.
Example 2-7. PP3EPreviewupdate-db-pickle.py
import pickle dbfile = open('people-pickle') db = pickle.load(dbfile) dbfile.close( ) db['sue']['pay'] *= 1.10 db['tom']['name'] = 'Tom Tom' dbfile = open('people-pickle', 'w') pickle.dump(db, dbfile) dbfile.close( )
Notice how the entire database is written back to the file after the records are changed in memory, just as for the manually formatted approach; this might become slow for very large databases, but we’ll ignore this for the moment. Here are our update and dump scripts in action—as in the prior section, Sue’s pay and Tom’s name change between scripts because they are written back to a file (this time, a pickle file):
...PP3EPreview>python update_db_pickle.py
...PP3EPreview>python dump_db_pickle.py
bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom Tom'} Sue Jones
As we’ll learn in Chapter
19, the Python pickling system supports nearly arbitrary
object types—lists, dictionaries, class instances, nested
structures, and more. There, we’ll also explore the faster cPickle
module, as well as the pickler’s
binary storage protocols, which require files to be opened in binary
mode; the default text protocol used in the preceding examples is
slightly slower, but it generates readable ASCII data. As we’ll see
later in this chapter, the pickler also underlies shelves and ZODB
databases, and pickled class instances provide both data and
behavior for objects stored.
In fact, pickling is more general than these examples may imply. Because they accept any object that provides an interface compatible with files, pickling and unpickling may be used to transfer native Python objects to a variety of media. Using a wrapped network socket, for instance, allows us to ship pickled Python objects across a network and provides an alternative to larger protocols such as SOAP and XML-RPC.
As mentioned earlier, one potential disadvantage of this section’s examples so far is that they may become slow for very large databases: because the entire database must be loaded and rewritten to update a single record, this approach can waste time. We could improve on this by storing each record in the database in a separate flat file. The next three examples show one way to do so; Example 2-8 stores each record in its own flat file, using each record’s original key as its filename with a .pkl prepended (it creates the files bob.pkl, sue.pkl, and tom.pkl in the current working directory).
Example 2-8. PP3EPreviewmake_db_pickle_recs.py
from initdata import bob, sue, tom import pickle for (key, record) in [('bob', bob), ('tom', tom), ('sue', sue)]: recfile = open(key+'.pkl', 'w') pickle.dump(record, recfile) recfile.close( )
Next, Example 2-9
dumps the entire database by using the standard library’s glob
module to do filename expansion and
thus collect all the files in this directory with a
.pkl extension. To load a single record, we
open its file and deserialize with pickle
; we must load only one record file,
though, not the entire database, to fetch one record.
Example 2-9. PP3EPreviewdump_db_pickle_recs.py
import pickle, glob for filename in glob.glob('*.pkl'): # for 'bob','sue','tom' recfile = open(filename) record = pickle.load(recfile) print filename, '=> ', record suefile = open('sue.pkl') print pickle.load(suefile)['name'] # fetch sue's name
Finally, Example 2-10 updates the database by fetching a record from its file, changing it in memory, and then writing it back to its pickle file. This time, we have to fetch and rewrite only a single record file, not the full database, to update.
Example 2-10. PP3EPreviewupdate_db_pickle_recs.py
import pickle suefile = open('sue.pkl') sue = pickle.load(suefile) suefile.close( ) sue['pay'] *= 1.10 suefile = open('sue.pkl', 'w') pickle.dump(sue, suefile) suefile.close( )
Here are our file-per-record scripts in action; the results are about the same as in the prior section, but database keys become real filenames now. In a sense, the filesystem becomes our top-level dictionary—filenames provide direct access to each record.
...PP3EPreview>python make_db_pickle_recs.py
...PP3EPreview>python dump_db_pickle_recs.py
bob.pkl => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} tom.pkl => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} sue.pkl => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones ...PP3EPreview>python update_db_pickle_recs.py
...PP3EPreview>python dump_db_pickle_recs.py
bob.pkl => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} tom.pkl => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} sue.pkl => {'pay': 44000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones
Pickling objects to files, as shown in the preceding
section, is an optimal scheme in many applications. In fact, some
applications use pickling of Python objects across network sockets
as a simpler alternative to network protocols such as the SOAP and
XML-RPC web services architectures (also supported by Python, but
much heavier than pickle
).
Moreover, assuming your filesystem can handle as many files as you’ll need, pickling one record per file also obviates the need to load and store the entire database for each update. If we really want keyed access to records, though, the Python standard library offers an even higher-level tool: shelves.
Shelves automatically pickle objects to and from a keyed-access filesystem. They behave much like dictionaries that must be opened, and they persist after each program exits. Because they give us key-based access to stored records, there is no need to manually manage one flat file per record—the shelve system automatically splits up stored records and fetches and updates only those records that are accessed and changed. In this way, shelves provide utility similar to per-record pickle files, but are usually easier to code.
The shelve
interface is
just as simple as pickle
: it is
identical to dictionaries, with extra open and close calls. In fact,
to your code, a shelve really does appear to be a persistent
dictionary of persistent objects; Python does all the work of
mapping its content to and from a file. For instance, Example 2-11 shows how to store
our in-memory dictionary objects in a shelve for permanent
keeping.
Example 2-11. make_db_shelve.py
from initdata import bob, sue import shelve db = shelve.open('people-shelve') db['bob'] = bob db['sue'] = sue db.close( )
This script creates one or more files in the current directory with the name people-shelve as a prefix; you shouldn’t delete these files (they are your database!), and you should be sure to use the same name in other scripts that access the shelve. Example 2-12, for instance, reopens the shelve and indexes it by key to fetch its stored records.
Example 2-12. dump_db_shelve.py
import shelve db = shelve.open('people-shelve') for key in db: print key, '=> ', db[key] print db['sue']['name'] db.close( )
We still have a dictionary of dictionaries here, but the
top-level dictionary is really a shelve mapped onto a file. Much
happens when you access a shelve’s keys—it uses pickle
to serialize and deserialize, and
it interfaces with a keyed-access filesystem. From your perspective,
though, it’s just a persistent dictionary. Example 2-13 shows how to code
shelve updates.
Example 2-13. update_db_shelve.py
from initdb import tom import shelve db = shelve.open('people-shelve') sue = db['sue'] # fetch sue sue['pay'] *= 1.50 db['sue'] = sue # update sue db['tom'] = tom # add a new record db.close( )
Notice how this code fetches sue
by key, updates in memory, and then
reassigns to the key to update the shelve; this is a requirement of
shelves, but not always of more advanced shelve-like systems such as
ZODB (covered in Chapter 19).
Also note how shelve files are explicitly closed; some underlying
keyed-access filesystems may require this in order to flush output
buffers after changes.
Finally, here are the shelve-based scripts on the job, creating, changing, and fetching records. The records are still dictionaries, but the database is now a dictionary-like shelve which automatically retains its state in a file between program runs:
...PP3EPreview>python make_db_shelve.py
...PP3EPreview>python dump_db_shelve.py
bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 40000, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones ...PP3EPreview>python update_db_shelve.py
...PP3EPreview>python dump_db_shelve.py
tom => {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} bob => {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'} sue => {'pay': 60000.0, 'job': 'mus', 'age': 45, 'name': 'Sue Jones'} Sue Jones
When we ran the update and dump scripts here, we added a new
record for key tom
and increased
Sue’s pay field by 50 percent. These changes are permanent because
the record dictionaries are mapped to an external file by shelve.
(In fact, this is a particularly good script for Sue—something she
might consider scheduling to run often, using a cron job on Unix, or
a Startup folder or msconfig entry on Windows.)
3.141.42.116