Putting it into practice – an AVL tree

We're going to walk step-by-step through the process of using doctest to create a testable specification for a data structure called an AVL tree. An AVL tree is a way to organize key-value pairs so that they can be quickly located by key. In other words, it's a lot like Python's built-in dictionary type. The name AVL references the initials of the people who invented this data structure.

Note

While AVL trees are similar to Python dictionaries, they have some significantly different properties. For one thing, the keys stored in an AVL tree can be iterated over in a sorted order with no overhead. Another difference is that, while inserting and removing objects in an AVL tree is slower than a Python dict in many cases, it's faster in the worst case.

As its name suggests, an AVL tree organizes the keys that are stored in it into a tree structure, with each key having up to two child keys —one child key that is less than the parent key by comparison, and one that is more. In the following figure, the Elephant key has two child keys, Goose has one, and Aardvark and Frog both have none.

The AVL tree is special because it keeps one side of the tree from getting much taller than the other, which means that users can expect it to perform reliably and efficiently no matter what. In the following figure, the AVL tree will reorganize to stay balanced if Frog gains a child:

Putting it into practice – an AVL tree

We're going to write tests for an AVL tree implementation here, rather than writing the implementation itself, so we're going to gloss over the details of how an AVL tree works, in favor of looking at what it should do when it works right.

Note

If you want to know more about AVL trees, you will find many good references on the Internet. Wikipedia's entry on this subject is a good place to start with: http://en.wikipedia.org/wiki/AVL_tree.

We're going to start with a plain-language specification, and then interject tests between the paragraphs. You don't have to actually type all of this into a text file; it is here for you to read and to think about.

English specification

The first step is to describe what the desired result should be, in normal language. This might be something that you do for yourself, or it might be something that somebody else does for you. If you're working for somebody, hopefully you and your employer can sit down together and work this part out.

In this case, there's not much to work out, because AVL trees have been fully described for decades. Even so, the description here isn't quite like the one you'd find elsewhere. This capacity for ambiguity is exactly the reason why a plain-language specification isn't good enough. We need an unambiguous specification, and that's exactly what the tests in a doctest file can give us.

The following text goes in a file called AVL.txt, (that you can find in its final form in the accompanying code archive; at this stage of the process, the file contains only the normal language specification):

An AVL Tree consists of a collection of nodes organized in a binary tree structure. Each node has left and right children, each of which may be either None or another tree node. Each node has a key, which must be comparable via the less-than operator. Each node has a value. Each node also has a height number, measuring how far the node is from being a leaf of the tree -- a node with height 0 is a leaf.

The binary tree structure is maintained in ordered form, meaning that of a node's two children, the left child has a key that compares less than the node's key and the right child has a key that compares greater than the node's key.

The binary tree structure is maintained in a balanced form, meaning that for any given node, the heights of its children are either the same or only differ by 1.

The node constructor takes either a pair of parameters representing a key and a value, or a dict object representing the key-value pairs with which to initialize a new tree.

The following methods target the node on which they are called, and can be considered part of the internal mechanism of the tree:

Each node has a recalculate_height method, which correctly sets the height number.

Each node has a make_deletable method, which exchanges the positions of the node and one of its leaf descendants, such that the tree ordering of the nodes remains correct.

Each node has rotate_clockwise and rotate_counterclockwise methods. Rotate_clockwise takes the node's right child and places it where the node was, making the node into the left child of its own former child. Other nodes in the vicinity are moved so as to maintain the tree ordering. The opposite operation is performed by rotate_counterclockwise.

Each node has a locate method, taking a key as a parameter, which searches the node and its descendants for a node with the specified key, and either returns that node or raises a KeyError.

The following methods target the whole tree rooted at the current node. The intent is that they will be called on the root node:

Each node has a get method taking a key as a parameter, which locates the value associated with the specified key and returns it, or raises KeyError if the key is not associated with any value in the tree.

Each node has a set method taking a key and a value as parameters, and associating the key and value within the tree.

Each node has a remove method taking a key as a parameter, and removing the key and its associated value from the tree. It raises KeyError if no value was associated with that key.

Node data

The first three paragraphs of the specification describe the member variables of an AVL tree node, and tell us what the valid values for the variables are. They also tell us how the tree height should be measured and define what a balanced tree means. It's our job now to take these ideas, and encode them into tests that the computer can eventually use to check our code.

We can check these specifications by creating a node and then testing the values, but that would really just be a test of the constructor. It's important to test the constructor, but what we really want to do is to incorporate checks that the node variables are left in a valid state into our tests of each member function.

To that end, we'll define functions that our tests can call to check that the state of a node is valid. We'll define these functions just after the third paragraph, because they provide extra details related to the content of the first three paragraphs:

Note

Notice that the node data test is written as if the AVL tree implementation already existed. It tries to import an avl_tree module containing an AVL class, and it tries to use the AVL class in specific ways. Of course, at the moment there is no avl_tree module, so the tests will fail. This is as it should be. All that the failure means is that, when the time comes to implement the tree, we should do so in a module called avl_tree, with contents that function as our tests assume. Part of the benefit of testing like this is being able to test-drive your code before you even write it.

>>> from avl_tree import AVL

>>> def valid_state(node):
...     if node is None:
...         return
...     if node.left is not None:
...         assert isinstance(node.left, AVL)
...         assert node.left.key < node.key
...         left_height = node.left.height + 1
...     else:
...         left_height = 0
...
...     if node.right is not None:
...         assert isinstance(node.right, AVL)
...         assert node.right.key > node.key
...         right_height = node.right.height + 1
...     else:
...         right_height = 0
...
...     assert abs(left_height - right_height) < 2
...     node.key < node.key
...     node.value

>>> def valid_tree(node):
...     if node is None:
...         return
...     valid_state(node)
...     valid_tree(node.left)
...     valid_tree(node.right)

Notice that we didn't actually call these functions yet. They aren't tests, as such, but tools that we'll use to simplify writing tests. We define them here, rather than in the Python module that we're going to test, because they aren't conceptually part of the tested code, and because anyone who reads the tests will need to be able to see what the helper functions do.

Testing the constructor

The fourth paragraph describes the constructor of the AVL class. According to this paragraph, the constructor has two modes of operation: it can create a single initialized node, or it can create and initialize a whole tree of nodes based on the contents of a dictionary.

The test for the single node mode is easy. We'll add it after the fourth paragraph:

>>> valid_state(AVL(2, 'Testing is fun'))

We don't even have to write an expected result, since we wrote the function to raise an AssertionError if there's a problem and to return None if everything is fine. AssertionError is triggered by the assert statement in our test code, if the expression in the assert statement produces a false value.

The test for the second mode looks just as easy, and we'll add it right after the other:

>>> valid_tree(AVL({1: 'Hello', 2: 'World', -3: '!'}))

There's a bit of buried complexity here, though. In all probability, this constructor will function by initializing a single node and then using that node's set method to add the rest of the keys and values to the tree. This means that our second constructor test isn't a unit test, it's an integration test that checks the interaction of multiple units.

Specification documents often contains integration-level and system-level tests, so this isn't really a problem. It's something to be aware of, though, because if this test fails it won't necessarily show you where the problem really lies. Your unit tests will do that.

Something else to notice is that we didn't check whether the constructor fails appropriately when given bad inputs. These tests are very important, but the English specification didn't mention these points at all, which means that they're not really among the acceptance criteria. We'll add these tests to the unit test suite instead.

Recalculating height

The recalculate_height() method is described in the fifth paragraph of the specification. To test it, we're going to need a tree for it to operate on, and we don't want to use the second mode of the constructor to create it —after all, we want this test to be independent of any errors that might exist there. We'd really prefer to make the test entirely independent of the constructor but, in this case, we need to make a small exception to the rule, since it's mighty difficult to create an object without calling its constructor in some way.

What we're going to do is define a function that builds a specific tree and returns it. This function will be useful in several of our later tests as well:

>>> def make_test_tree():
...     root = AVL(7, 'seven')
...     root.height = 2
...     root.left = AVL(3, 'three')
...     root.left.height = 1
...     root.left.right = AVL(4, 'four')
...     root.right = AVL(10, 'ten')
...     return root

Now that we have the make_test_tree() function, testing recalculate_height() is easy:

>>> tree = make_test_tree()
>>> tree.height = 0
>>> tree.recalculate_height()
>>> tree.height
2

Making a node deletable

The sixth paragraph of the specification described the make_deletable() method. You can't delete a node that has children, because that would leave the node's children disconnected from the rest of the tree. Consider the tree with animal names in it that we looked at earlier. If we delete the Elephant node from the bottom of the tree, what do we do about Aardvark, Goose, and Frog? If we delete Goose, how do we find Frog afterwards?

Making a node deletable

The way around that is to have the node swap places with its largest leaf descendant on the left side (or its smallest leaf descendant on the right side, but we're not doing it that way).

We'll test this by using the same make_test_tree() function that we defined earlier to create a new tree to work on, and then check whether make_deletable() swaps correctly:

>>> tree = make_test_tree()
>>> target = tree.make_deletable()
>>> (tree.value, tree.height)
('four', 2)
>>> (target.value, target.height)
('seven', 0)

Rotation

The two rotate functions, described in paragraph seven of the specification, perform a somewhat tricky manipulation of the links in a tree. You probably found the plain language description of what they do a bit confusing. This is one of those times when a little bit of code makes a whole lot more sense than any number of sentences.

While tree rotation is usually defined in terms of rearranging the links between nodes in the tree, we'll check whether it worked by looking at the values rather than by looking directly at the left and right links. This allows the implementation to swap the contents of nodes, rather than the nodes themselves, when it wishes. After all, it's not important to the specification which operation happens, so we shouldn't rule out a perfectly reasonable implementation choice:

>>> tree = make_test_tree()
>>> tree.value
'seven'
>>> tree.left.value
'three'
>>> tree.rotate_counterclockwise()
>>> tree.value
'three'
>>> tree.left is None
True
>>> tree.right.value
'seven'
>>> tree.right.left.value
'four'
>>> tree.right.right.value
'ten'
>>> tree.right.left.value
'four'
>>> tree.left is None
True

>>> tree.rotate_clockwise()
>>> tree.value
'seven'
>>> tree.left.value
'three'
>>> tree.left.right.value
'four'
>>> tree.right.value
'ten'
>>> tree.right.left is None
True
>>> tree.left.left is None
True

Locating a node

According to the eighth paragraph of the specification, the locate() method is expected to return a node, or raise a KeyError exception, depending on whether the key exists in the tree or not. We'll use our specially built testing tree again, so that we know exactly what the tree's structure looks like:

>>> tree = make_test_tree()
>>> tree.locate(4).value
'four'
>>> tree.locate(17) # doctest: +ELLIPSIS
Traceback (most recent call last):
KeyError: ...

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

The rest of the specification

The remaining paragraphs of the specification describe higher-level functions that operate by calling the already described functions. This means that, until we learn the tricks of mock objects in Chapter 4, Decoupling Units with unittest.mock, we're stuck with writing integration-level tests here. As I mentioned earlier, this is not a terrible thing to do in a specification document, so we'll go ahead and do it:

Each node has a get method taking a key as a parameter, which locates
the value associated with the specified key and returns it, or raises
KeyError if the key is not associated with any value in the tree.

>>> tree = make_test_tree()
>>> tree.get(10)
'ten'
>>> tree.get(97) # doctest: +ELLIPSIS
Traceback (most recent call last):
KeyError: ...

Each node has a set method taking a key and a value as parameters, and
associating the key and value within the tree.

>>> tree = make_test_tree()
>>> tree.set(10, 'foo')
>>> tree.locate(10).value
'foo'

Each node has a remove method taking a key as a parameter, and
removing the key and its associated value from the tree. It raises
KeyError if no values was associated with that key.

>>> tree = make_test_tree()
>>> tree.remove(3)
>>> tree.remove(3) # doctest: +ELLIPSIS
Traceback (most recent call last):
KeyError: ...
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.26.138