Flat files are handy for simple persistence tasks, but
they are generally geared toward a sequential processing mode.
Although it is possible to jump around to arbitrary locations with
seek
calls, flat files don’t
provide much structure to data beyond the notion of bytes and text
lines.
DBM files, a standard tool in the Python library for database management, improve on that by providing key-based access to stored text strings. They implement a random-access, single-key view on stored data. For instance, information related to objects can be stored in a DBM file using a unique key per object and later can be fetched back directly with the same key. DBM files are implemented by a variety of underlying modules (including one coded in Python), but if you have Python, you have a DBM.
Although DBM filesystems have to do a bit of work to map chunks of stored data to keys for fast retrieval (technically, they generally use a technique called hashing to store data in files), your scripts don’t need to care about the action going on behind the scenes. In fact, DBM is one of the easiest ways to save information in Python—DBM files behave so much like in-memory dictionaries that you may forget you’re actually dealing with a file. For instance, given a DBM file object:
Indexing by key fetches data from the file.
Assigning to an index stores data in the file.
DBM file objects also support common dictionary methods such as keys-list fetches and tests and key deletions. The DBM library itself is hidden behind this simple model. Since it is so simple, let’s jump right into an interactive example that creates a DBM file and shows how the interface works:
%python
>>>import anydbm
# get interface: dbm, gdbm, ndbm,.. >>>file = anydbm.open('movie', 'c')
# make a DBM file called 'movie' >>>file['Batman'] = 'Pow!'
# store a string under key 'Batman' >>>file.keys( )
# get the file's key directory ['Batman'] >>>file['Batman']
# fetch value for key 'Batman' 'Pow!' >>>who = ['Robin', 'Cat-woman', 'Joker']
>>>what = ['Bang!', 'Splat!', 'Wham!']
>>>for i in range(len(who)):
...file[who[i]] = what[i]
# add 3 more "records" ... >>>file.keys( )
['Joker', 'Robin', 'Cat-woman', 'Batman'] >>>len(file), file.has_key('Robin'), file['Joker']
(4, 1, 'Wham!') >>>file.close( )
# close sometimes required
Internally, importing anydbm
automatically loads whatever DBM
interface is available in your Python interpreter, and opening the
new DBM file creates one or more external files with names that
start with the string 'movie'
(more on the details in a moment). But after the import and open, a
DBM file is virtually indistinguishable from a dictionary. In
effect, the object called file
here can be thought of as a dictionary mapped to an external file
called movie
.
Unlike normal dictionaries, though, the contents of file
are retained between Python program
runs. If we come back later and restart Python, our dictionary is
still available. DBM files are like dictionaries that must be
opened:
%python
>>>import anydbm
>>>file = anydbm.open('movie', 'c')
# open existing DBM file >>>file['Batman']
'Pow!' >>>file.keys( )
# keys gives an index list ['Joker', 'Robin', 'Cat-woman', 'Batman'] >>>for key in file.keys( ): print key, file[key]
... Joker Wham! Robin Bang! Cat-woman Splat! Batman Pow! >>>file['Batman'] = 'Ka-Boom!'
# change Batman slot >>>del file['Robin']
# delete the Robin entry >>>file.close( )
# close it after changes
Apart from having to import the interface and open and close the DBM file, Python programs don’t have to know anything about DBM itself. DBM modules achieve this integration by overloading the indexing operations and routing them to more primitive library tools. But you’d never know that from looking at this Python code—DBM files look like normal Python dictionaries, stored on external files. Changes made to them are retained indefinitely:
%python
>>>import anydbm
# open DBM file again >>>file = anydbm.open('movie', 'c')
>>>for key in file.keys( ): print key, file[key]
... Joker Wham! Cat-woman Splat! Batman Ka-Boom!
As you can see, this is about as simple as it can be. Table 19-1 lists the most commonly used DBM file operations. Once such a file is opened, it is processed just as though it were an in-memory Python dictionary. Items are fetched by indexing the file object by key and are stored by assigning to a key.
Table 19-1. DBM file operations
Python code | Action | Description |
---|---|---|
| Import | Get |
| Open | Create or open an existing DBM file |
| Store | Create or change the entry for
|
| Fetch | Load the value for the entry
|
| Size | Return the number of entries stored |
| Index | Fetch the stored keys list |
| Query | See if there’s an entry for
|
| Delete | Remove the entry for |
| Close | Manual close, not always needed |
Despite the dictionary-like interface, DBM files really do map
to one or more external files. For instance, the underlying gdbm
interface writes two files,
movie.dir and movie.pag,
when a GDBM file called movie
is
made. If your Python was built with a different underlying
keyed-file interface, different external files might show up on your
computer.
Technically, the module anydbm
is really an interface to whatever
DBM-like filesystem you have available in your Python. When creating
a new file, anydbm
today tries to
load the dbhash
, gdbm
, and dbm
keyed-file interface modules; Pythons
without any of these automatically fall back on an all-Python
implementation called dumbdbm
.
When opening an already existing DBM file, anydbm
tries to determine the system that
created it with the whichdb
module instead. You normally don’t need to care about any of this,
though (unless you delete the files your DBM creates).
Note that DBM files may or may not need to be explicitly
closed, per the last entry in Table 19-1. Some DBM files
don’t require a close call, but some depend on it to flush changes
out to disk. On such systems, your file may be corrupted if you omit
the close call. Unfortunately, the default DBM as of the 1.5.2
Windows Python port, dbhash
(a.k.a. bsddb
), is one of the DBM
systems that requires a close call to avoid data loss. As a rule of
thumb, always close your DBM files explicitly after making changes
and before your program exits, to avoid potential problems. This
rule extends by proxy to shelves, which is a topic we’ll meet later
in this chapter.
In Python versions 1.5.2 and later, be sure to also pass a
string 'c'
as a second argument
when calling anydbm.open
, to
force Python to create the file if it does not yet exist, and to
simply open it otherwise. This used to be the default behavior but
is no longer. You do not need the 'c'
argument when opening shelves
discussed ahead—they still use an “open or create” mode by default
if passed no open mode argument. Other open mode strings can be
passed to anydbm
(e.g.,
n
to always create the file and
r
for read-only—the new
default); see the library reference manuals for more
details.
18.116.65.130