Pickling allows you to store arbitrary objects on files and file-like objects, but it’s still a fairly unstructured medium; it doesn’t directly support easy access to members of collections of pickled objects. Higher-level structures can be added, but they are not inherent:
You can sometimes craft your own higher-level pickle file organizations with the underlying filesystem (e.g., you can store each pickled object in a file whose name uniquely identifies the object), but such an organization is not part of pickling itself and must be manually managed.
You can also store arbitrarily large dictionaries in a pickled file and index them by key after they are loaded back into memory, but this will load the entire dictionary all at once when unpickled, not just the entry you are interested in.
Shelves provide structure to collections of pickled objects that removes some of these constraints. They are a type of file that stores arbitrary Python objects by key for later retrieval, and they are a standard part of the Python system. Really, they are not much of a new topic—shelves are simply a combination of DBM files and object pickling:
To store an in-memory object by key, the shelve
module first serializes the
object to a string with the pickle
module, and then it stores that
string in a DBM file by key with the anydbm
module.
To fetch an object back by key, the shelve
module first loads the object’s
serialized string by key from a DBM file with the anydbm
module, and then converts it back
to the original in-memory object with the pickle
module.
Because shelve
uses pickle
internally, it can store any object
that pickle
can: strings, numbers,
lists, dictionaries, cyclic objects, class instances, and more.
In other words, shelve
is just a go-between; it serializes
and deserializes objects so that they can be placed in DBM files.
The net effect is that shelves let you store nearly arbitrary Python
objects on a file by key and fetch them back later with the same
key.
Your scripts never see all of this interfacing, though. Like DBM files, shelves provide an interface that looks like a dictionary that must be opened. In fact, a shelve is simply a persistent dictionary of persistent Python objects—the shelve dictionary’s content is automatically mapped to a file on your computer so that it is retained between program runs. This is quite a trick, but it’s simpler to your code than it may sound. To gain access to a shelve, import the module and open your file:
import shelve dbase = shelve.open("mydbase")
Internally, Python opens a DBM file with the name mydbase, or creates it if it does not yet exist. Assigning to a shelve key stores an object:
dbase['key'] = object
Internally, this assignment converts the object to a serialized byte stream and stores it by key on a DBM file. Indexing a shelve fetches a stored object:
value = dbase['key']
Internally, this index operation loads a string by key from a DBM file and unpickles it into an in-memory object that is the same as the object originally stored. Most dictionary operations are supported here too:
len(dbase) # number of items stored dbase.keys( ) # stored item key index
And except for a few fine points, that’s really all there is to using a shelve. Shelves are processed with normal Python dictionary syntax, so there is no new database API to learn. Moreover, objects stored and fetched from shelves are normal Python objects; they do not need to be instances of special classes or types to be stored away. That is, Python’s persistence system is external to the persistent objects themselves. Table 19-2 summarizes these and other commonly used shelve operations.
Table 19-2. Shelve file operations
Python code | Action | Description |
---|---|---|
| Import | Get |
| Open | Create or open an existing DBM file |
| Store | Create or change the entry for
|
| Fetch | Load the value for the entry
|
| Size | Return the number of entries stored |
| Index | Fetch the stored keys list |
| Query | See if there’s an entry for
|
| Delete | Remove the entry for |
| Close | Manual close, not always needed |
Because shelves export a dictionary-like interface too, this
table is almost identical to the DBM operation table. Here, though,
the module name anydbm
is
replaced by shelve
, open
calls do not require a second
c
argument, and stored values can
be nearly arbitrary kinds of objects, not just strings. You still
should close
shelves explicitly
after making changes to be safe, though; shelves use anydbm
internally, and some underlying
DBMs require closes to avoid data loss or damage.
Let’s run an interactive session to experiment with shelve interfaces. As mentioned, shelves are essentially just persistent dictionaries of objects, which you open and close:
%python
>>>import shelve
>>>dbase = shelve.open("mydbase")
>>>object1 = ['The', 'bright', ('side', 'of'), ['life']]
>>>object2 = {'name': 'Brian', 'age': 33, 'motto': object1}
>>>dbase['brian'] = object2
>>>dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'}
>>>dbase.close( )
Here, we open a shelve and store two fairly complex dictionary
and list data structures away permanently by simply assigning them
to shelve keys. Because shelve
uses pickle
internally, almost
anything goes here—the trees of nested objects are automatically
serialized into strings for storage. To fetch them back, just reopen
the shelve and index:
%python
>>>import shelve
>>>dbase = shelve.open("mydbase")
>>>len(dbase)
# entries 2 >>>dbase.keys( )
# index ['knight', 'brian'] >>>dbase['knight']
# fetch {'motto': 'Ni!', 'name': 'Knight'} >>>for row in dbase.keys( ):
...print row, '=>'
...for field in dbase[row].keys( ):
...print ' ', field, '=', dbase[row][field]
... knight => motto = Ni! name = Knight brian => motto = ['The', 'bright', ('side', 'of'), ['life']] age = 33 name = Brian
The nested loops at the end of this session step through nested dictionaries—the outer scans the shelve and the inner scans the objects stored in the shelve. The crucial point to notice is that we’re using normal Python syntax, both to store and to fetch these persistent objects, as well as to process them after loading.
One of the more useful kinds of objects to store in a
shelve is a class instance. Because its attributes record state and
its inherited methods define behavior, persistent class objects
effectively serve the roles of both database records and
database-processing programs. We can also use the underlying
pickle
module to serialize
instances to flat files and other file-like objects (e.g., trusted
network sockets), but the higher-level shelve
module also gives us a convenient
keyed-access storage medium. For instance, consider the simple class
shown in Example 19-2,
which is used to model people.
Example 19-2. PP3EDbaseperson.py (version 1)
# a person object: fields + behavior class Person: def _ _init_ _(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def tax(self): return self.pay * 0.25 # computed on call def info(self): return self.name, self.job, self.pay, self.tax( )
Nothing about this class suggests it will be used for database records—it can be imported and used independent of external storage. It’s easy to use it for a database, though: we can make some persistent objects from this class by simply creating instances as usual, and then storing them by key on an opened shelve:
C:...PP3EDbase>python
>>>from person import Person
>>>bob = Person('bob', 'psychologist', 70000)
>>>emily = Person('emily', 'teacher', 40000)
>>> >>>import shelve
>>>dbase = shelve.open('cast')
# make new shelve >>>for obj in (bob, emily):
# store objects >>>dbase[obj.name] = obj
# use name for key >>>dbase.close( )
# need for bsddb
Here we used the instance objects’ name
attribute as their key in the shelve
database. When we come back and fetch these objects in a later
Python session or script, they are re-created in memory as they were
when they were stored:
C:...PP3EDbase>python
>>>import shelve
>>>dbase = shelve.open('cast')
# reopen shelve >>> >>>dbase.keys( )
# both objects are here ['emily', 'bob'] >>>print dbase['emily']
<person.Person instance at 799940> >>> >>>print dbase['bob'].tax( )
# call: bob's tax 17500.0
Notice that calling Bob’s tax
method works even though we didn’t
import the Person
class here.
Python is smart enough to link this object back to its original
class when unpickled, such that all the original methods are
available through fetched objects.
Technically, Python reimports a class to re-create its stored instances as they are fetched and unpickled. Here’s how this works:
When Python pickles a class instance to store it in a
shelve, it saves the instance’s attributes plus a reference to
the instance’s class. In effect, pickled class instances in
the prior example record the self
attributes assigned in the
class. Really, Python serializes and stores the instance’s
_ _dict_ _
attribute
dictionary along with enough source file information to be
able to locate the class’s module later.
When Python unpickles a class instance fetched from a shelve, it re-creates the instance object in memory by reimporting the class, assigning the saved attribute dictionary to a new empty instance, and linking the instance back to the class.
The key point in this is that the class and stored instance data are separate. The class itself is not stored with its instances, but is instead located in the Python source file and reimported later when instances are fetched.
The upshot is that by modifying external classes in module files, we can change the way stored objects’ data is interpreted and used without actually having to change those stored objects. It’s as if the class is a program that processes stored records.
To illustrate, suppose the Person
class from the previous section was
changed to the source code in Example 19-3.
Example 19-3. PP3EDbaseperson.py (version 2)
# a person object: fields + behavior # change: the tax method is now a computed attribute class Person: def _ _init_ _(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def _ _getattr_ _(self, attr): # on person.attr if attr == 'tax': return self.pay * 0.30 # computed on access else: raise AttributeError # other unknown names def info(self): return self.name, self.job, self.pay, self.tax
This revision has a new tax rate (30 percent), introduces a
_ _getattr_ _
qualification
overload method, and deletes the original tax
method. Tax attribute references are
intercepted and computed when accessed:
C:...PP3EDbase>python
>>>import shelve
>>>dbase = shelve.open('cast')
# reopen shelve >>> >>>print dbase.keys( )
# both objects are here ['emily', 'bob'] >>>print dbase['emily']
<person.Person instance at 79aea0> >>> >>>print dbase['bob'].tax
# no need to call tax( ) 21000.0
Because the class has changed, tax
is now simply qualified, not called.
In addition, because the tax rate was changed in the class, Bob pays
more this time around. Of course, this example is artificial, but
when used well, this separation of classes and persistent instances
can eliminate many traditional database update programs. In most
cases, you can simply change the class, not each stored instance,
for new behavior.
Although shelves are generally straightforward to use, there are a few rough edges worth knowing about.
First, although they can store arbitrary objects,
keys must still be strings. The following fails, unless you
convert the integer 42 to the string 42
manually first:
dbase[42] = value # fails, but str(42) will work
This is different from in-memory dictionaries, which allow any immutable object to be used as a key, and derives from the shelve’s use of DBM files internally.
Although the shelve
module is smart enough to detect multiple occurrences of a nested
object and re-create only one copy when fetched, this holds true
only within a given slot:
dbase[key] = [object, object] # OK: only one copy stored and fetched dbase[key1] = object dbase[key2] = object # bad?: two copies of object in the shelve
When key1
and key2
are fetched, they reference
independent copies of the original shared object; if that object
is mutable, changes from one won’t be reflected in the other. This
really stems from the fact the each key assignment runs an
independent pickle operation—the pickler detects repeated objects
but only within each pickle call. This may or may not be a concern
in your practice, and it can be avoided with extra support logic,
but an object can be duplicated if it spans keys.
Because objects fetched from a shelve don’t know that they came from a shelve, operations that change components of a fetched object change only the in-memory copy, not the data on a shelve:
dbase[key].attr = value # shelve unchanged
To really change an object stored on a shelve, fetch it into memory, change its parts, and then write it back to the shelve as a whole by key assignment:
object = dbase[key] # fetch it object.attr = value # modify it dbase[key] = object # store back-shelve changed
The shelve
module does
not currently support simultaneous updates. Simultaneous readers
are OK, but writers must be given exclusive access to the shelve.
You can trash a shelve if multiple processes write to it at the
same time, which is a common potential in things such as Common
Gateway Interface (CGI) server-side scripts. If your shelves may
be hit by multiple processes, be sure to wrap updates in calls to
the fcntl.flock
or os.open
built-ins to lock files and
provide exclusive access.
With shelves, the files created by an underlying DBM
system used to store your persistent objects are not necessarily
compatible with all possible DBM implementations or Pythons. For
instance, a file generated by gdbm
on Linux, or by the BSD
library on Windows, may not be
readable by a Python with other DBM modules installed.
Technically, when a DBM file (or by proxy, a shelve) is
created, the anydbm
module
tries to import all possible DBM system modules in a predefined
order and uses the first that it finds. When anydmb
later opens an existing file, it
attempts to determine which DBM system created it by inspecting
the files(s) using the module whichdb
. Because the BSD system is tried
first at file creation time and is available on both Windows and
many Unix-like systems, your DBM file is portable as long as your
Pythons support BSD on both platforms. If the system used to
create a DBM file is not available on the underlying platform,
though, the DBM file cannot be used.
If DBM file portability is a concern, make sure that all the
Pythons that will read your data use compatible DBM modules. If
that is not an option, use the pickle
module directly and flat files
for storage, or use the ZODB system we’ll meet later in this
chapter.
In addition to these shelve constraints, storing class instances in a shelve adds a set of
additional rules you need to be aware of. Really, these are imposed
by the pickle
module, not by
shelve
, so be sure to follow
these if you store class objects with pickle
directly too:
The Python pickler stores instance attributes only when
pickling an instance object, and it reimports the class later
to re-create the instance. Because of that, the classes of
stored objects must be importable when objects are
unpickled—they must be coded unnested at the top level of a
module file that is accessible on the module import search
path at load time (e.g., named in PYTHONPATH
or in a
.pth file).
Further, they must be associated with a real module when
instances are pickled, not with a top-level script (with the
module name _ _main_ _
),
unless they will only ever be used in the top-level script.
You need to be careful about moving class modules after
instances are stored. When an instance is unpickled, Python
must find its class’s module on the module search using the
original module name (including any package path prefixes) and
fetch the class from that module using the original class
name. If the module or class has been moved or renamed, it
might not be found.
In applications where pickled objects are shipped over network sockets, it’s possible to deal with this constraint by shipping the text of the class along with stored instances; recipients may simply store the class in a local module file on the import search path prior to unpickling received instances. Where this is inconvenient, simpler objects such as lists and dictionaries with nesting may be transferred instead.
Although Python lets you change a class while instances of it are stored on a shelve, those changes must be backward compatible with the objects already stored. For instance, you cannot change the class to expect an attribute not associated with already stored persistent instances unless you first manually update those stored instances or provide extra conversion protocols on the class.
Shelves also inherit the pickling systems’ nonclass limitations. As discussed earlier, some types of objects (e.g., open files and sockets) cannot be pickled, and thus cannot be stored in a shelve.
In a prior Python release, persistent object classes also had to either use constructors with no arguments or provide defaults for all constructor arguments (much like the notion of a C++ copy constructor). This constraint was dropped as of Python 1.5.2—classes with nondefaulted constructor arguments now work fine in the pickling system.[*]
Finally, although shelves store objects persistently, they are not really object-oriented database systems. Such systems also implement features such as automatic write-through on changes, transaction commits and rollbacks, safe concurrent updates, and object decomposition and delayed (“lazy”) component fetches based on generated object ID. Parts of larger objects are loaded into memory only as they are accessed. It’s possible to extend shelves to support such features manually, but you don’t need to—the ZODB system provides an implementation of a more complete object-oriented database system. It is constructed on top of Python’s built-in pickling persistence support, but it offers additional features for advanced data stores. For more on ZODB, let’s move on to the next section.
[*] Subtle thing: internally, Python now avoids calling the
class to re-create a pickled instance and instead simply makes a
class object generically, inserts instance attributes, and sets
the instance’s _ _class_ _
pointer to the original class directly. This avoids the need for
defaults, but it also means that the class _ _init_ _
constructors that are no
longer called as objects are unpickled, unless you provide extra
methods to force the call. See the library manual for more
details, and see the pickle
module’s source code (pickle.py in the
source library) if you’re curious about how this works. Better
yet, see the formtable
module
listed ahead in this chapter—it does something very similar with
_ _class_ _
links to build an
instance object from a class and dictionary of attributes,
without calling the class’s _ _init_
_
constructor. This makes constructor argument
defaults unnecessary in classes used for records browsed by
PyForm, but it’s the same idea.
3.149.240.196