Stacks are a common and straightforward data structure, used in a variety of applications: language processing, graph searches, and so on. In short, stacks are a last-in-first-out collection of objects—the last item added to the collection is always the next one to be removed. Clients use stacks by:
Pushing items onto the top
Popping items off the top
Depending on client requirements, there may also be tools for such tasks as testing whether the stack is empty, fetching the top item without popping it, iterating over a stack’s items, testing for item membership, and so on.
In Python, a simple list is often adequate for implementing a
stack: because we can change lists in place, we can add and delete
items from either the beginning (left) or the end (right). Table 20-1 summarizes various
built-in operations available for implementing stack-like behavior
with Python lists, depending on whether the stack “top” is the first
or the last node in the list. In this table, the string 'c'
is the top item on the stack.
Other coding schemes are possible as well. For instance, Python
1.5 introduced a list pop
method
designed to be used in conjunction with append
to implement stacks—for example, to
push run list.append(value)
and to
pop run x=list.pop( )
. By default,
the pop
method is equivalent to
fetching, and then deleting the last item at offset -1 (and is
equivalent to the two statements in the last row in column 1 of Table 20-1). With an argument,
pop
deletes and returns the item at
that offset—list.pop(0)
is the same
as the table’s last rows in columns 2 and 3. And del stack[0]
is yet another way to delete
the first item in a list-based stack.
This list arrangement works and will be relatively fast. But it also binds stack-based programs to the stack representation chosen: all stack operations will be hardcoded. If we later want to change how a stack is represented or extend its basic operations, we’re stuck. Every stack-based program will have to be updated.
For instance, to add logic that monitors the number of stack operations a program performs, we’d have to add code around each hardcoded stack operation. In a large system, this operation may be nontrivial. As we’ll see in the next part of the book, we may also decide to move stacks to a C-based implementation, if they prove to be a performance bottleneck. As a general rule, hardcoded operations on built-in data structures don’t support future migrations as well as we’d sometimes like.
Built-in types such as lists are actually class-like objects in Python that we can subclass to customize. Unless we anticipate future changes and make instances of a subclass, though, we still have a maintenance issue if we use built-in list operations and ever want to extend what they do in the future (more on subclassing built-in types later in this chapter).
Perhaps a better approach is to encapsulate—that is, wrap up—stack implementations behind interfaces, using Python’s code reuse tools. As long as clients stick to using the interfaces, we’re free to change the interfaces’ implementations arbitrarily without having to change every place they are called. Let’s begin by implementing a stack as a module containing a Python list, plus functions to operate on it (see Example 20-1).
Example 20-1. PP3EDstructBasicstack1.py
stack = [] # on first import class error(Exception): pass # local excs, stack1.error def push(obj): global stack # use 'global' to change stack = [obj] + stack # add item to the front def pop( ): global stack if not stack: raise error, 'stack underflow' # raise local error top, stack = stack[0], stack[1:] # remove item at front return top def top( ): if not stack: # raise local error raise error, 'stack underflow' # or let IndexError occur return stack[0] def empty( ): return not stack # is the stack []? def member(obj): return obj in stack # item in stack? def item(offset): return stack[offset] # index the stack def length( ): return len(stack) # number entries def dump( ): print '<Stack:%s>' % stack
This module creates a list object (stack
) and exports functions to manage
access to it. The stack is declared global in functions that change
it, but not in those that just reference it. The module also defines
an error object (error
) that can
be used to catch exceptions raised locally in this module. Some
stack errors are built-in exceptions: the method item
triggers IndexError
for out-of-bounds
indexes.
Most of the stack’s functions just delegate the operation to the embedded list used to represent the stack. In fact, the module is really just a wrapper around a Python list. But this extra layer of interface logic makes clients independent of the actual implementation of the stack. So, we’re able to change the stack later without impacting its clients.
As usual, one of the best ways to understand such code is to see it in action. Here’s an interactive session that illustrates the module’s interfaces:
C:...PP3EDstructBasic>python
>>>import stack1
>>>stack1.push('spam')
>>>stack1.push(123)
>>>stack1.top( )
123 >>>stack1.stack
[123, 'spam'] >>>stack1.pop( )
123 >>>stack1.dump( )
<Stack:['spam']> >>>stack1.pop( )
'spam' >>>stack1.empty( )
1 >>>for c in 'spam': stack1.push(c)
... >>>while not stack1.empty( ):
...print stack1.pop( ),
... m a p s >>> >>>stack1.pop( )
Traceback (most recent call last): File "<stdin>", line 1, in ? File "stack1.py", line 11, in pop raise error, 'stack underflow' # raise local error stack1.error: stack underflow
Other operations are analogous, but the main thing to notice
here is that all stack operations are module
functions. For instance, it’s possible to
iterate over the stack, but we need to use counter-loops and
indexing function calls (item
).
Nothing is preventing clients from accessing (and changing) stack1.stack
directly, but doing so
defeats the purpose of interfaces like this one.
Perhaps the biggest drawback of the module-based
stack is that it supports only a single stack object. All
clients of the stack
module
effectively share the same stack. Sometimes we want this feature: a
stack can serve as a shared-memory object for multiple modules. But
to implement a true stack datatype, we need to use classes.
To illustrate, let’s define a full-featured stack
class. The Stack
class shown in Example 20-2 defines a new
datatype with a variety of behaviors. Like the module, the class
uses a Python list to hold stacked objects. But this time, each
instance gets its own list. The class defines both “real” methods,
and specially named methods that implement common type operations.
Comments in the code describe special methods.
Example 20-2. PP3EDstructBasicstack2.py
class error(Exception): pass # when imported: local exception class Stack: def _ _init_ _(self, start=[]): # self is the instance object self.stack = [] # start is any sequence: stack.. for x in start: self.push(x) self.reverse( ) # undo push's order reversal def push(self, obj): # methods: like module + self self.stack = [obj] + self.stack # top is front of list def pop(self): if not self.stack: raise error, 'underflow' top, self.stack = self.stack[0], self.stack[1:] return top def top(self): if not self.stack: raise error, 'underflow' return self.stack[0] def empty(self): return not self.stack # instance.empty( ) # overloads def _ _repr_ _(self): return '[Stack:%s]' % self.stack # print, backquotes,.. def _ _cmp_ _(self, other): return cmp(self.stack, other.stack) # '==', '>, '<=', '!=',.. def _ _len_ _(self): return len(self.stack) # len(instance), not instance def _ _add_ _(self, other): return Stack(self.stack + other.stack) # instance1 + instance2 def _ _mul_ _(self, reps): return Stack(self.stack * reps) # instance * reps def _ _getitem_ _(self, offset): return self.stack[offset] # instance[offset], in, for def _ _getslice_ _(self, low, high): return Stack(self.stack[low : high]) # instance[low:high] def _ _getattr_ _(self, name): return getattr(self.stack, name) # instance.sort()/reverse( )/..
Now distinct instances are created by calling the Stack
class like a function. In most
respects, the Stack
class
implements operations exactly like the stack
module in Example 20-1. But here, access
to the stack is qualified by self
, the subject instance object. Each
instance has its own stack
attribute, which refers to the instance’s own list. Furthermore,
instance stacks are created and initialized in the _ _init_ _
constructor method, not when
the module is imported. Let’s make a couple of stacks to see how all
this works in practice:
>>>from stack2 import Stack
>>>x = Stack( )
# make a stack object, push items >>>x.push('spam')
>>>x.push(123)
>>>x
# _ _repr_ _ prints a stack [Stack:[123, 'spam']] >>>y = Stack( )
# two distinct stack objects >>>y.push(3.1415)
# they do not share content >>>y.push(x.pop( ))
>>>x, y
([Stack:['spam']], [Stack:[123, 3.1415]]) >>>z = Stack( )
# third distinct stack object >>>for c in 'spam': z.push(c)
... >>>while z: print z.pop( ),
# _ _len_ _ tests stack truth ... m a p s >>>z = x + y
# _ _add_ _ handles stack + >>>z
# holds three different types [Stack:['spam', 123, 3.1415]] >>>for item in z: print item,
# _ _getitem_ _ does for ... spam 123 3.1415
Like lists and dictionaries, Stack
defines both methods and operators for manipulating instances by attribute
qualification and expressions. Additionally, it defines the _ _getattr_ _
metaclass method to
intercept references to attributes not defined in the class and to
route them to the wrapped list object (to support list methods:
sort
, append
, reverse
, and so on). Many of the module’s
operations become operators in the class. Table 20-2 shows the
equivalence of module and class operations (columns 1 and 2) and
gives the class method that comes into play for each (column
3).
Table 20-2. Module/class operation comparison
Module operations | Class operations | Class method |
---|---|---|
module.empty( ) | | _ _len_ _ |
module.member(x) | | _ _getitem_ _ |
module.item(i) | instance[i] | _ _getitem_ _ |
module.length( ) | len(instance) | _ _len_ _ |
module.dump( ) | | _ _repr_ _ |
| | _ _getitem_ _ |
| | _ _add_ _ |
module.stack.reverse( ) | instance.reverse( ) | _ _getattr_ _ |
module.push/pop/top | instance.push/pop/top | push/pop/top |
In effect, classes let us extend Python’s set of built-in types with reusable types implemented in Python modules. Class-based types may be used just like built-in types: depending on which operation methods they define, classes can implement numbers, mappings, and sequences, and may or may not be mutable. Class-based types may also fall somewhere in between these categories.
So far we’ve seen how classes support multiple instances and integrate better with Python’s object model by defining operator methods. One of the other main reasons for using classes is to allow for future extensions and customizations. By implementing stacks with a class, we can later add subclasses that specialize the implementation for new demands.
For instance, suppose we’ve started using the Stack
class in Example 20-2, but we start
running into performance problems. One way to isolate bottlenecks is to
instrument data structures with logic that keeps track of usage
statistics, which we can analyze after running client applications.
Because Stack
is a class, we can
add such logic in a new subclass without affecting the original
stack module (or its clients). The subclass in Example 20-3 extends Stack
to keep track of overall push/pop
usage frequencies and to record the maximum size of each
instance.
Example 20-3. PP3EDstructBasicstacklog.py
from stack2 import Stack # extends imported Stack class StackLog(Stack): # count pushes/pops, max-size pushes = pops = 0 # shared/static class members def _ _init_ _(self, start=[]): # could also be module vars self.maxlen = 0 Stack._ _init_ _(self, start) def push(self, object): Stack.push(self, object) # do real push StackLog.pushes += 1 # overall stats self.maxlen = max(self.maxlen, len(self)) # per-instance stats def pop(self): StackLog.pops += 1 # overall counts return Stack.pop(self) # not 'self.pops': instance def stats(self): return self.maxlen, self.pushes, self.pops # get counts from instance
This subclass works the same as the original Stack
; it just adds monitoring logic. The
new stats
method is used to get a
statistics tuple through an instance:
>>>from stacklog import StackLog
>>>x = StackLog( )
>>>y = StackLog( )
# make two stack objects >>>for i in range(3): x.push(i)
# and push object on them ... >>>for c in 'spam': y.push(c)
... >>>x, y
# run inherited _ _repr_ _ ([Stack:[2, 1, 0]], [Stack:['m', 'a', 'p', 's']]) >>>x.stats(), y.stats( )
((3, 7, 0), (4, 7, 0)) >>> >>>y.pop(), x.pop( )
('m', 2) >>>x.stats(), y.stats( )
# my maxlen, all pushes, all pops ((3, 7, 2), (4, 7, 2))
Notice the use of class attributes to record overall pushes and pops, and instance attributes for per-instance maximum length. By hanging attributes on different objects, we can expand or narrow their scopes.
One of the nice things about wrapping objects up in classes is that you are free to change the underlying implementation without breaking the rest of your program. Optimizations can be added in the future, for instance, with minimal impact; the interface is unchanged, even if the internals are. There are a variety of ways to implement stacks, some more efficient than others. So far, our stacks have used slicing and concatenation to implement pushing and popping. This method is relatively inefficient: both operations make copies of the wrapped list object. For large stacks, this practice can add a significant time penalty.
One way to speed up such code is to change the underlying data
structure completely. For example, we can store the stacked objects
in a binary tree of tuples: each item may be recorded as a pair,
(object, tree)
, where object
is the stacked item and tree
is either another tuple pair giving
the rest of the stack or None
to
designate an empty stack. A stack of items [1,2,3,4]
would be internally stored as a
tuple tree (1,(2,(3,(4,None))))
.
This tuple-based representation is similar to the notion of
“cons-cells” in Lisp-family languages: the object on the left is the
car
, and the rest of the tree on
the right is the cdr
. Because we
add or remove only a top tuple to push and pop items, this structure
avoids copying the entire stack. For large stacks, the benefit might
be significant. The next class, shown in Example 20-4, implements these
ideas.
Example 20-4. PP3EDstructBasicstack3.py
class Stack: def _ _init_ _(self, start=[]): # init from any sequence self.stack = None # even other (fast)stacks for i in range(-len(start), 0): self.push(start[-i - 1]) # push in reverse order def push(self, node): # grow tree 'up/left' self.stack = node, self.stack # new root tuple: (node, tree) def pop(self): node, self.stack = self.stack # remove root tuple return node # TypeError if empty def empty(self): return not self.stack # is it 'None'? def _ _len_ _(self): # on: len, not len, tree = 0, self.stack while tree: len, tree = len+1, tree[1] # visit right subtrees return len def _ _getitem_ _(self, index): # on: x[i], in, for len, tree = 0, self.stack while len < index and tree: # visit/count nodes len, tree = len+1, tree[1] if tree: return tree[0] # IndexError if out-of-bounds else: raise IndexError # so 'in' and 'for' stop def _ _repr_ _(self): return '[FastStack:' + repr(self.stack) + ']'
This class’s _ _getitem_ _
method handles indexing, in
tests, and for
loop iteration as
before, but this version has to traverse a tree to find a node by
index. Notice that this isn’t a subclass of the original Stack
class. Since nearly every operation
is implemented differently here, inheritance won’t really help. But
clients that restrict themselves to the operations that are common
to both classes can still use them interchangeably—they just need to
import a stack class from a different module to switch
implementations. Here’s a session with this stack version; as long
as we stick to pushing, popping, indexing, and iterating, this
version is essentially indistinguishable from the original:
>>>from stack3 import Stack
>>>x = Stack( )
>>>y = Stack( )
>>>for c in 'spam': x.push(c)
... >>>for i in range(3): y.push(i)
... >>>x
[FastStack:('m', ('a', ('p', ('s', None))))] >>>y
[FastStack:(2, (1, (0, None)))] >>>len(x), x[2], x[-1]
(4, 'p', 'm') >>>x.pop( )
'm' >>>x
[FastStack:('a', ('p', ('s', None)))] >>> >>>while y: print y.pop( ),
... 2 1 0
Perhaps a better way to speed up the stack object, though, is
to fall back on the mutability of Python’s list object. Because
lists can be changed in place, they can be modified more quickly
than any of the prior examples. In-place change operations such as
append
are prone to complications
when a list is referenced from more than one place. But because the
list inside the stack object isn’t meant to be used directly, we’re
probably safe.
The module in Example
20-5 shows one way to implement a stack with in-place
changes; some operator overloading methods have been dropped to keep
this simple. The new Python pop
method it uses is equivalent to indexing and deleting the item at
offset -1 (top is end-of-list here).
Example 20-5. PP3EDstructBasicstack4.py
class error(Exception): pass # when imported: local exception class Stack: def _ _init_ _(self, start=[]): # self is the instance object self.stack = [] # start is any sequence: stack... for x in start: self.push(x) def push(self, obj): # methods: like module + self self.stack.append(obj) # top is end of list def pop(self): if not self.stack: raise error, 'underflow' return self.stack.pop( ) # like fetch and delete stack[-1] def top(self): if not self.stack: raise error, 'underflow' return self.stack[-1] def empty(self): return not self.stack # instance.empty( ) def _ _len_ _(self): return len(self.stack) # len(instance), not instance def _ _getitem_ _(self, offset): return self.stack[offset] # instance[offset], in, for def _ _repr_ _(self): return '[Stack:%s]' % self.stack
This version works like the original in module stack2
too; just replace stack2
with stack4
in the previous interaction to get
a feel for its operation. The only obvious difference is that stack
items are in reverse when printed (i.e., the top is the end):
>>>from stack4 import Stack
>>>x = Stack( )
>>>x.push('spam')
>>>x.push(123)
>>>x
[Stack:['spam', 123]] >>> >>>y = Stack( )
>>>y.push(3.1415)
>>>y.push(x.pop( ))
>>>x, y
([Stack:['spam']], [Stack:[3.1415, 123]]) >>>y.top( )
123
The in-place changes stack object probably runs faster
than both the original and the tuple-tree versions, but the only way
to really be sure how much faster is to time the alternative
implementations. Since this could be something we’ll want to do more
than once, let’s first define a general module for timing functions in Python. In Example 20-6, the built-in
time
module provides a clock
function that we can use to get the
current CPU time in floating-point seconds, and the function
timer.test
simply calls a
function reps
times and returns
the number of elapsed seconds by subtracting stop from start CPU
times.
Example 20-6. PP3EDstructBasic imer.py
def test(reps, func, *args): import time start = time.clock( ) # current CPU time in float seconds for i in xrange(reps): # call function reps times func(*args) # discard any return value return time.clock( ) - start # stop time - start time
Next, we define a test driver script (see Example 20-7). It expects three command-line arguments: the number of pushes, pops, and indexing operations to perform (we’ll vary these arguments to test different scenarios). When run at the top level, the script creates 200 instances of the original and optimized stack classes and performs the specified number of operations on each stack.[*] Pushes and pops change the stack; indexing just accesses it.
Example 20-7. PP3EDstructBasicstacktime.py
import stack2 # list-based stacks: [x]+y import stack3 # tuple-tree stacks: (x,y) import stack4 # in-place stacks: y.append(x) import timer # general function timer function rept = 200 from sys import argv pushes, pops, items = eval(argv[1]), eval(argv[2]), eval(argv[3]) def stackops(stackClass): #print stackClass._ _module_ _ x = stackClass('spam') # make a stack object for i in range(pushes): x.push(i) # exercise its methods for i in range(items): t = x[i] for i in range(pops): x.pop( ) print 'stack2:', timer.test(rept, stackops, stack2.Stack) # pass class to test print 'stack3:', timer.test(rept, stackops, stack3.Stack) # rept*(push+pop+ix) print 'stack4:', timer.test(rept, stackops, stack4.Stack)
Here are some of the timings reported by the test driver script. The three outputs represent the measured run times in seconds for the original, tuple, and in-place stacks. For each stack type, the first test creates 200 stack objects and performs roughly 120,000 stack operations (200 repetitions × (200 pushes + 200 indexes + 200 pops)) in the test duration times listed. These results were obtained on a 650 MHz Pentium III Windows machine and a Python 1.5.2 install:
C:...PP3EDstructBasic>python stacktime.py 200 200 200
stack2: 1.67890008213 stack3: 7.70020952413 stack4: 0.694291724635 C:...PP3EDstructBasic>python stacktime.py 200 50 200
stack2: 1.06876246669 stack3: 7.48964866994 stack4: 0.477584270605 C:...PP3EDstructBasic>python stacktime.py 200 200 50
stack2: 1.34536448817 stack3: 0.795615917129 stack4: 0.57297976835 C:...PP3EDstructBasic>python stacktime.py 200 200 0
stack2: 1.33500477715 stack3: 0.300776077373 stack4: 0.533050336077
If you look closely enough, you’ll notice that the results
show that the tuple-based stack (stack3
) performs better when we do more
pushing and popping, but worse if we do much indexing. Indexing
lists is extremely fast for built-in lists (stack2
and stack4
), but very slow for tuple
trees—the Python class must traverse the tree manually.
The in-place change stacks (stack4
) are almost always fastest,
unless no indexing is done at all—tuples won by a hair in the last
test case. When there is no indexing (the last test), the tuple
and in-place change stacks are roughly four and three times
quicker than the simple list-based stack, respectively. Since
pushes and pops are most of what clients would do to a stack,
tuples are a contender, despite their poor indexing
performance.
Of course, we’re talking about fractions of a second after many tens of thousands of operations; in many applications, your users probably won’t care either way. If you access a stack millions of times in your program, though, this difference may accumulate to a significant amount of time.
Performance results like those of the prior section are prone to change from release to release in Python, because ongoing optimization work always finds its way into the interpreter over time. For the third edition of this book, I reran the tests of the prior section on a machine that was roughly twice as fast (1.2 GHz), and under Python 2.4. The results are very different, though relatively similar:
C:...PP3EDstructBasic>python stacktime.py 200 200 200
stack2: 0.383535616957 stack3: 1.74261840541 stack4: 0.160929391864 C:...PP3EDstructBasic>python stacktime.py 200 50 200
stack2: 0.320245170825 stack3: 1.70567264834 stack4: 0.121246694761 C:...PP3EDstructBasic>python stacktime.py 200 200 50
stack2: 0.335854417251 stack3: 0.208287086767 stack4: 0.124496549142 C:...PP3EDstructBasic>python stacktime.py 200 200 0
stack2: 0.353687130627 stack3: 0.0953431232182 stack4: 0.110205067963
This time, if you study the results long enough, you’ll
notice that the relative performance of stack2
(simple lists) and stack4
(in-place list changes) is
roughly the same—the in-place list stack is usually about three
times quicker again, regardless of the amount of indexing going on
(which makes sense, given that the two list-based stacks index at
the same speed). And as before, in the last test when there is no
indexing as is common for stacks, the tuple-based stack3
still performs best of all three:
roughly four times better than simple lists, and slightly better
than in-place lists.
The results, though, seem to reflect the fact that all of the stack code has been optimized in Python itself since this book’s prior edition. All three stacks are roughly four times faster today, likely reflecting a 2X boost in hardware, plus a 2X boost in Python itself. In this case, the relative performance results are similar; but in other cases, such optimizations may invalidate conclusions derived from tests run under previous Python releases.[*]
The short story here is that you must collect timing data for your code, on your machine, and under your version of Python. All three factors can skew results arbitrarily. In the next section, we’ll see a more dramatic version impact on the relative performance of set alternatives, and we’ll learn how to use the Python profiler to collect performance data in a more generic fashion.
[*] If you have a copy of the first edition of this book lying around, you might notice that I had to scale all test factors way up to get even close to the run times I noticed before. Both Python and chips have gotten a lot faster in five years.
[*] Trust me on this. I once made a sweeping statement in
another book about map
and
list comprehensions being twice as fast as for
loops, only to be made wrong by
a later Python release that optimized the others much more
than map
, except in select
use cases. Performance measurement in Python is an ongoing
task.
18.226.251.206