CHAPTER 10

Trees

This chapter explains trees, highly recursive data structures that you can use to store hierarchical data and model decision processes. For example, you can store in a tree a company organizational chart or the parts that make up a complex machine such as a car.

This chapter explains how to build relatively simple trees and provides the background you need to understand the more complicated trees described in Chapters 11 and 12.

Tree Terminology

Trees borrow terminology from genealogy, horticulture, and computer science. Trees use a lot of terms, but many of them are intuitive because you probably already understand what they mean in another context.

A tree consists of nodes connected by branches. Usually the nodes contain some sort of data, and the branches do not.

NOTE Trees are a special type of network or graph, so sometimes network and graph terms leak into discussions of trees. For example, branches are sometimes called links or edges, although those terms are more appropriate for networks and graphs. Chapters 13 and 14 have more to say about networks.

The branches in a tree are usually directed so that they define a parent/child relationship between the nodes they connect. Normally branches are drawn as arrows pointing from the parent node to the child node. Two nodes that have the same parent are sometimes called siblings.

Each node in the tree has exactly one parent node, except for a single root node, which has no parent.

The children, the children's children, and so on for a node are that node's descendants. A node's parent, its parent's parent, and so on up to the root are that node's ancestors.

All these relationship-oriented terms make sense if you think of the tree as a family tree. You can even define terms such as cousin, nephew, and grandparent without confusing anyone, although those terms are uncommon.

Depending on the type of tree, nodes may have any number of children. The number of children a node has is the node's degree. A tree's degree is the maximum degree of its nodes. For example, in a degree 2 tree, which is usually called a binary tree, each node can have at most two children.

A node with no children is called a leaf node or an external node. A node that has at least one child is called an internal node.

Unlike real trees, tree data structures usually are drawn with the root at the top and the branches growing downward, as shown in Figure 10-1.

images

Figure 10-1: Tree data structures usually are drawn with the root at the top.

All these definitions explain what a tree is intuitively. You can also recursively define a tree to be either:

  • A single root node
  • A root node connected by branches to one or more smaller trees

A node's level or depth in the tree is the distance from the node to the root. In particular, the root's level is 0.

A node's height is the length of the longest path from the node downward through the tree to a leaf node. In other words, it's the distance from the node to the bottom of the tree.

A tree's height is the same as the height of the root node.

A subtree of a tree T rooted at the node R is the node R and all its descendants. For example, in Figure 10-1 the subtree rooted at node 5 is the tree containing the nodes 5, 7, 6, and 8.

An ordered tree is one in which the order of the children is important. For example, many algorithms treat the left and right child in a binary tree differently. An unordered tree is one in which the order of the children doesn't matter. (Usually a tree has an ordering, even if it's not particularly important to the algorithm. This is true simply because the children or branches are stored in an array or some other collection that imposes an ordering on them.)

For any two nodes, the first common ancestor (or least common ancestor) is the node that is the ancestor of both nodes that is closest to the nodes. Another way to think about this is to start at one node and move up toward the root until you reach the first node that is an ancestor of the other node. For example, in Figure 10-1 the first common ancestor of nodes 3 and 5 is the root 4.

Note that the first common ancestor of two nodes might be one of the nodes. For example, in Figure 10-1 the first common ancestor of nodes 5 and 6 is node 5.

Note also that there is a unique path between any two nodes in a tree that doesn't cross any branch more than once. The path starts at the first node, moves up the tree to the nodes' first common ancestor, and then moves down the tree to the second node.

A full tree is one in which every node has either zero children or as many children as the tree's degree. For example, in a full binary, every node has either zero or two children. The tree shown in Figure 10-1 is not full because node 5 has only one child.

A complete tree is one in which every level is completely full, except possibly the bottom level, where all the nodes are pushed as far to the left as possible. Figure 10-2 shows a complete binary tree. Notice that this tree is not full because the third node from the left on level 2 has only one child.

A perfect tree is full, and all the leaves are at the same level. In other words, the tree holds every possible node for a tree of its height.

Figure 10-3 shows examples of full, complete, and perfect binary trees.

images

Figure 10-2: In a complete binary tree, every level is completely full, except possibly the bottom level, where the nodes are pushed as far to the left as possible.

images

Figure 10-3: Full, complete, and perfect binary trees contain an increasing number of nodes for a given height.

That's a lot of terminology all at once, so Table 10-1 summarizes these tree terms to make remembering them a bit easier.

Table 10-1: Summary of Tree Terminology

TERM MEANING
ancestor A node's parent, its parent's parent, and so on to the root are the node's ancestors.
binary tree A tree with degree 2.
branch Connects nodes in a tree.
child A child node is connected to its parent in the tree. Normally a child is drawn below its parent.
complete tree A tree in which every level is completely full, except possibly the bottom level, where all the nodes are pushed as far to the left as possible.
degree For a node, the number of children the node has. For a tree, the maximum degree of any of its nodes.
depth Level.
descendant A node's children, their children, and so on are the node's descendants.
external node A leaf node.
first (or least) common ancestor For any two nodes, the node that is the ancestor of both nodes that is closest to the nodes.
full tree A tree in which every node has either zero children or as many children as the tree's degree.
height For a node, the length of the longest path from the node downward through the tree to a leaf node. For a tree, this is the same as the root's height.
internal node A tree node that has at least one child.
leaf node A tree node with no children.
level A tree node's level is the distance between it and the root node.
node An object that holds data in a tree. Connected to other nodes by branches.
ordered tree A tree in which the ordering of each node's children matters.
parent A parent node is connected to its child nodes by branches. Every node except the root has exactly one parent. Normally the parent is drawn above its children.
perfect tree A full tree where all the leaves are at the same level.
root The unique node at the top of the tree that has no parent.
sibling Two nodes in a tree that have the same parent are siblings.
subtree A node and all its descendants in a tree.

Having learned all these terms, you're ready to start learning some of the properties and uses of trees.

Binary Tree Properties

Binary trees are useful in many algorithms, partly because lots of problems can be modeled using binary choices and partly because binary trees are relatively easy to understand. The following are some useful facts about binary trees:

  • The number of branches B in a binary tree containing N nodes is B = N − 1.
  • The number of nodes N in a perfect binary tree of height H is N = 2H+1 − 1.
  • Conversely, if a perfect binary tree contains N nodes, it has a height of log2 (N + 1) − 1.
  • The number of leaf nodes L in a perfect binary tree of height H is L = 2H. Because the total number of nodes in a perfect binary tree of height H is 2H+1 − 1, the number of internal nodes I is I = N − L = (2H+1 − 1) − 2H = (2H+1 − 2H) − 1 = 2H × (2 − 1) − 1 = 2H − 1.
  • This means that in a perfect binary tree, almost exactly half of the nodes are leaves and almost exactly half are internal nodes. More precisely, I = L − 1.
  • The number of missing branches (places where a child could be added) M in a binary tree that contains N nodes is M = N + 1.
  • If a binary tree has N0 leaf nodes and N2 nodes with degree 2, N0 = N2 + 1. In other words, there is one more leaf node than nodes with degree 2.

LEAF AND FULL NODES

The last fact is not very intuitive, so here's a proof.

  1. Let N be the total number of nodes; B be the total number of branches; and N0, N1, and N2 be the number of nodes of degree 0, 1, and 2, respectively.
  2. Consider the branches leading into nodes. Every node has a single branch leading into it from its parent, so B = N − 1.
  3. Next, consider the branches leading out of nodes. The N0 nodes have no branches leading out of them, the N1 nodes have one branch leading out of them, and the N2 nodes have two branches leading out of them, so the total number of branches B = N1 + 2 × N2.
  4. Setting these two equations for B equal to each other gives N − 1 = N1 + 2 × N2. Adding 1 to both sides of the equation changes this to N = N1 + 2 × N2 + 1.
  5. Adding up the three kinds of nodes, you know that N = N0 + N1 + N2.
  6. Setting these two equations for N equal to each other gives N1 + 2 × N2 + 1 = N0 + N1 + N2. Then subtracting N1 + N2 from both sides makes this N2 + 1 = N0.

These facts often make it easier to calculate the run time for algorithms that work with trees. For example, if an algorithm must search a perfect binary tree containing N nodes from the root to a leaf node, you know that the algorithm needs only O(log(N)) steps.

INDUCTIVE REASONING

You can prove many of these properties of binary trees inductively. In an inductive proof, you first establish a base case for a small problem. Then you make an inductive step in which you prove that the property being true for some value K means that it must be true for the value K + 1. Those two steps show that the property holds for all values K.

For example, consider the first property described a moment ago: The number of nodes N in a perfect binary tree of height H is N = 2H+1 − 1. The following shows an inductive proof. (Here H plays the role of K in the general description of an inductive proof.)

BASE CASE:

Consider a perfect tree of height H = 0. This tree has a single root node and no branches. In that case the number of nodes N is 1. Note that 2H+1 − 1 = 20+1 − 1 = 21 − 1 = 2 − 1 = 1, so N = 2H+1 − 1 as desired.

INDUCTIVE STEP:

Suppose the property holds true for perfect binary trees of height H. A perfect binary tree of height H + 1 consists of a root node connected to two perfect binary subtrees of height H. Because we assume the property is true for trees of height H, the total number of nodes in each subtree is 2H+1 − 1. Adding the root node, that means the total number of nodes in the tree of height H + 1 is 2 × (2H+1 − 1) + 1. Rearranging this a bit gives (2(H+1)+1 − 2) + 1 = 2(H+1)+1 − 1. This is the formula for the number of nodes for a perfect binary tree of height H + 1 (just plug H + 1 into the formula), so the property is true for a tree of height H + 1.

This proves that the number of nodes in a perfect binary tree of height H is 2H+1 − 1 for all H.

If a binary tree containing N nodes is fat (isn't too tall and skinny), such as if it's a complete tree, its statistics are similar to those of a perfect binary tree in terms of Big O notation. For example, if a fat binary tree contains N nodes, it has O(log N) height, O(N÷2) = O(N) leaves, O(N÷2) = O(N) internal nodes, and O(N) missing branches.

These properties also are true for fat trees of higher degrees but with different log bases. For example, a fat degree 10 tree containing N nodes has height O(log10 N). Because all log bases are the same in Big O notation, this is the same as O(log N), although in practice the constants ignored by Big O notation may make a big difference.

Chapter 11 describes balanced trees that do not to grow too tall and thin in order to guarantee that these properties are true.

Tree Representations

You can use classes to represent a tree's nodes. For complete trees, you can also store the tree in an array. The following two sections describe these approaches.

Building Trees in General

You can use classes to represent a tree's nodes much as you can use them to make the cells in a linked list. Give the class whatever properties it needs to hold data. Give it object references to represent the branches to the node's children.

In a binary tree, you can use separate properties named LeftChild and RightChild for the branches.

The following pseudocode shows how you might create a binary node class. The details will differ depending on your programming language:

Class BinaryNode String: Name BinaryNode: LeftChild BinaryNode: RightChild Constructor(String: name) Name = name End Constructor End Class

The class begins by declaring a public property called Name to hold the node's name. It then defines two properties named LeftChild and RightChild to hold references to the node's children.

The class's constructor takes a string as a parameter and saves it in the node's Name property.

The following pseudocode shows how you could use this class to build the tree shown in Figure 10-1:

BinaryNode: root = New BinaryNode("4") BinaryNode: node1 = New BinaryNode("1") BinaryNode: node2 = New BinaryNode("2") BinaryNode: node3 = New BinaryNode("3") BinaryNode: node5 = New BinaryNode("5") BinaryNode: node6 = New BinaryNode("6") BinaryNode: node7 = New BinaryNode("7") BinaryNode: node8 = New BinaryNode("8") root.LeftChild = node2 root.RightChild = node5 node2.LeftChild = node1 node2.RightChild = node3 node5.RightChild = node7 node7.LeftChild = node6 node7.RightChild = node8

This code first creates a BinaryNode object to represent the root. It then creates other BinaryNode objects to represent the tree's other nodes. After it has created all the nodes, the code sets the nodes' left and right child references.

If the tree's degree is greater than 2, or if it is unordered (so the order of a node's children is unimportant), it is usually more convenient to put the child references in an array, list, or some other collection. That lets the program loop through the children, doing something to each, instead of requiring you to write a separate line of code for each child.

The following pseudocode shows how you could create a TreeNode class that allows each node to have any number of children:

Class TreeNode String: Name List Of TreeNode: Children Constructor(String: name) Name = name End Constructor End Class

This class is similar to the preceding one, except that it stores its children in a List of references to TreeNode objects instead of in separate properties.

Notice that these representations only have links from a node to its child nodes. They don't include a link from a node up to its parent. Most tree algorithms work in a top-down manner, so they move from parent to child down into the tree. If you really need to be able to find a node's parent, however, you can add a Parent property to the node class.

Most tree algorithms store data in each node, but a few store information in the branches. If you need to store information in the branches, you can either add the information to the parent node or create a separate Branch class.

The following pseudocode demonstrates the first approach:

Class TreeNode String: Name List Of TreeNode: Children List Of Data: ChildData Constructor(String: name) Name = name End Constructor End Class

In this class, when you add a child to a node, you also need to add data for the branch that leads to that child. Then ChildData[i] holds the data for the branch leading to Children[i].

The following pseudocode shows a Branch class that you could use to store information about branches:

Class Branch Data: BranchData TreeNode: Child End Class

This class holds whatever data is necessary, plus a reference to the child object. If you use this class to store branch information, you need to modify the node class to use it. The following pseudocode shows how you might modify the TreeNode class:

Class TreeNode String: Name List Of Branch: Branches Constructor(String: name) Name = name End Constructor End Class

Now, to examine the node's children, you loop through the Branches list and use each Branch object's Child property.

XML AND TREES

XML (eXtensible Markup Language) is a markup language for representing data. XML documents are hierarchical. You define tokens nested within other tokens.

XML's hierarchical structure makes it a natural choice for storing trees persistently and for transmitting trees from one program or computer to another. For example, you could build a large tree representing your company's organizational chart, save it in an XML file, and then share that file with other programs throughout your company.

For more information on XML, see http://en.wikipedia.org/wiki/XML or http://www.w3schools.com/xml/xml_whatis.asp, or get a book about XML such as Beginning XML, 5th Edition by Joe Fawcett, et al (Wrox, 2012).

Building Complete Trees

The heapsort algorithm described in Chapter 6 uses a complete binary tree stored in an array to represent a heap, a binary tree in which every node holds a value that is at least as large as the values in all its children. Figure 10-4 shows a heap represented as a tree and stored in an array.

images

Figure 10-4: You can store a heap, or any complete binary tree, conveniently in an array.

If a node is stored at index i in the array, the indices of its children are 2 × i + 1 and 2 × i + 2.

If a node has index j, its parent has index images(j − 1) ÷ 2images where images images means to truncate the result to the next-smaller integer. For example, images2.9images is 2 and images2images is also 2.

This provides a concise format for storing any complete binary tree in an array. Working with this kind of tree can be a bit awkward and confusing, however, particularly if you need to resize the array frequently. For those reasons, you may want to stick to using classes to build trees.

NOTE Now that you know more about trees, you may want to reread the section “Heapsort” in Chapter 6 to see how you can use classes instead of an array to build a heap.

Tree Traversal

One of the most basic and important tree operations is traversal. In a traversal, the goal is for the algorithm to visit all the nodes in the tree in some order and perform an operation on them. The most basic traversal simply enumerates the nodes so that you can see their ordering in the traversal.

TRAVERSAL AND SEARCHING

Many algorithms search a tree for a particular node. In general, these searches are traversals, and you can use any traversal as the basis for a search.

Chapter 11 describes some special cases in which you can use the structure of the data in a tree to search it efficiently.

Binary trees have four kinds of traversals: preorder, inorder, postorder, and depth-first.

Preorder Traversal

In a preorder traversal, the algorithm processes a node, and then its left child, and then its right child. For example, consider the tree shown in Figure 10-5. Suppose you're writing an algorithm to simply display the tree's nodes in a preorder traversal.

images

Figure 10-5: Traversals process a tree's nodes in different orders.

To produce the tree's preorder traversal, the algorithm first visits the root node, so it immediately outputs the value D. The algorithm then moves to the root node's left child.

It visits that node, so it outputs B and then moves to that node's left child. There the algorithm outputs A. That node has no children, so the algorithm returns to the previous node, B, and visits that node's right child.

Next the algorithm outputs C. That node also has no children, so the algorithm returns to the previous node, B. It has finished visiting that node's children, so the algorithm moves up the tree again to node D and visits that node's right child.

The algorithm outputs the next node, E. That node also has no children, so the algorithm returns to the previous node, which is the root node. It has finished visiting that node's children, so the algorithm is done producing the traversal.

The full traversal order is D, B, A, C, E.

Notice that the algorithm examines or visits the nodes in one order but processes the nodes to produce an output in a different order. The following list shows the series of steps the algorithm follows while producing the preorder traversal for the tree shown in Figure 10-5:

  1. Visit D
  2. Output D
  3. Visit B
  4. Output B
  5. Visit A
  6. Output A
  7. Visit B
  8. Visit C
  9. Output C
  10. Visit B
  11. Visit D
  12. Visit E
  13. Output E
  14. Visit D

The following pseudocode shows a natural recursive implementation of this algorithm:

TraversePreorder(BinaryNode: node) <Process node> If (node.LeftChild != null) Then TraversePreorder(node.LeftChild) If (node.RightChild != null) Then TraversePreorder(node.RightChild) End TraversePreorder

This algorithm simply follows the definition of a preorder traversal. It starts by processing the current node. In a real program, you would insert whatever code you needed to execute for each node here. For example, you might use code that adds the current node's label to an output string, or you might examine the node to see if you had found a particular target item.

Next the algorithm determines whether the node has a left child. If it does, the algorithm calls itself recursively to traverse the left child's subtree. The algorithm then repeats that step for the right child and is done.

The algorithm is extremely short and simple.

To traverse the entire tree, a program would simply call TraversePreorder, passing it the root node as a parameter.

This algorithm works quite well, but its code must be placed somewhere in the program—perhaps in the main program, in a code module, or in a helper class. Often it is more convenient to place code that manipulates a tree inside its node class. The following pseudocode shows the same algorithm implemented inside the BinaryNode class:

Class BinaryNode String: Name BinaryNode: LeftChild BinaryNode: RightChild Constructor(String: name) Name = name End Constructor TraversePreorder() <Process this node> If (LeftChild != null) Then TraversePreorder(LeftChild) If (RightChild != null) Then TraversePreorder(RightChild) End TraversePreorder End Class

This is almost the same as the previous version, except that the code is running within a BinaryNode object, so it has direct access to that object's LeftChild and RightChild properties. This makes the code slightly simpler and keeps it nicely encapsulated within the BinaryNode class.

Now, to traverse the entire tree, you invoke the root node's TraversePreorder method.

PASSING METHODS

In most programming languages, you can pass a reference to a method as a parameter to a method. In this example, that means you could pass a method to use on a node into the TraversePreorder method. When TraversePreorder reaches the step <Process node>, it would call the method that was passed into it as a parameter.

This lets you use a single TraversePreorder method to do anything you want to the tree by passing it an appropriate node processing method.

You can use a similar technique to make other traversal algorithms perform arbitrary actions on a tree's nodes.

Although this discussion is about binary trees, you can also define a preorder traversal for trees of higher degrees. The rule is simply to visit the node and then visit its children.

Inorder Traversal

In an inorder or symmetric traversal, the algorithm processes a node's left child, the node, and then the node's right child. For the tree shown in Figure 10-5, the algorithm starts at the root node and moves to its left child B. The algorithm then moves to that node's left child A.

That node has no left child, so the algorithm visits the node and outputs A.

That node has no right child, so the algorithm returns to the parent node B.

Having finished with node B's left child, the algorithm outputs node B. It then moves to that node's right child C.

Node C has no left child, so the algorithm visits the node and outputs C. The node also has no right child, so the algorithm returns to the parent node B.

The algorithm has finished with node B's right child, so it returns to the root node D. The algorithm has finished with D's left child, so it outputs D and then moves to its right child E.

Node E has no left child, so the algorithm visits the node and outputs E. The node also has no right child, so the algorithm returns to the parent node D.

The full traversal order is A, B, C, D, E. Notice that this outputs the tree's nodes in sorted order. Normally the term sorted tree means that the tree's nodes are arranged so that an inorder traversal processes them in sorted order like this.

The following pseudocode shows a recursive implementation of this algorithm:

TraverseInorder(BinaryNode: node) If (node.LeftChild != null) Then TraverseInorder(node.LeftChild) <Process node> If (node.RightChild != null) Then TraverseInorder(node.RightChild) End TraverseInorder

This algorithm simply follows the definition of an inorder traversal. It recursively processes the node's left child if it exists, processes the node, and then recursively processes the node's right child if it exists.

To traverse the entire tree, a program would simply call TraverseInorder, passing it the root node as a parameter.

The following pseudocode shows the same algorithm implemented inside the BinaryNode class:

Class BinaryNode String: Name BinaryNode: LeftChild BinaryNode: RightChild Constructor(String: name) Name = name End Constructor TraverseInorder() If (LeftChild != null) Then TraverseInorder(LeftChild) <Process this node> If (RightChild != null) Then TraverseInorder(RightChild) End TraverseInorder End Class

Unlike the preorder traversal, it's unclear how you would define an inorder traversal for a tree with a degree greater than 2. You could define it to mean that the algorithm should process the first half of a node's children, and then the node, and then the remaining children. That's an atypical traversal, though.

Postorder Traversal

In a postorder traversal, the algorithm processes a node's left child, and then its right child, and then the node. For the tree shown in Figure 10-5:

  1. The algorithm starts at the root node and moves to its left child B.
  2. The algorithm then moves to that node's left child A.
  3. Node A has no left or right child, so the algorithm outputs the node A and returns to the parent node B.
  4. Having finished with the left child, the algorithm moves to the node's right child C.
  5. Node C has no left or right child, so the algorithm outputs node C and returns to the parent node B.
  6. Now that it has visited node B's left and right child, the algorithm outputs node B. It then returns to the parent node D.
  7. The algorithm has visited node D's left child, so now it moves to the right child E.
  8. Node E has no left or right child, so the algorithm outputs node E and returns to the parent node D.
  9. The algorithm has finished with node D's left and right children, so the algorithm outputs D and is done.

The full traversal is A, C, B, E, D.

The following pseudocode shows a recursive implementation of this algorithm:

TraversePostorder(BinaryNode: node) If (node.LeftChild != null) Then TraversePostorder(node.LeftChild) If (node.RightChild != null) Then TraversePostorder(node.RightChild) <Process node> End TraversePostorder

This algorithm recursively processes the node's left child and right child if they exist. It then processes the node.

To traverse the entire tree, a program would simply call TraversePostorder, passing it the root node as a parameter.

The following pseudocode shows the same algorithm implemented inside the BinaryNode class:

Class BinaryNode String: Name BinaryNode: LeftChild BinaryNode: RightChild Constructor(String: name) Name = name End Constructor TraverseInorder() If (LeftChild != null) Then TraversePostorder(LeftChild) If (RightChild != null) Then TraversePostorder(RightChild) <Process this node> End TraversePostorder End Class

Like the preorder traversal, you can easily define a postorder traversal for trees with a degree greater than 2. The algorithm should visit all of a node's children before visiting the node itself.

Depth-first Traversal

In a depth-first traversal, the algorithm processes all the nodes at a given level of the tree in left-to-right order before processing the nodes at the next level. For the tree shown in Figure 10-5, the algorithm starts at the root node's level and outputs D.

The algorithm then moves to the next level and outputs B and E.

The algorithm finishes at the bottom level by outputting the nodes A and C.

The full traversal is D, B, E, A, C.

This algorithm does not naturally follow the structure of the tree as the previous traversal algorithms do. The tree shown in Figure 10-5 has no child link from node E to node A, so it's not obvious how the algorithm moves from node E to node A.

One solution is to add a node's children to a queue and then process the queue later, after you've finished processing the parents' level. The following pseudocode uses this approach:

TraverseDepthFirst(BinaryNode: root) // Create a queue to hold children for later processing. Queue<BinaryNode>: children = New Queue<BinaryNode>() // Place the root node on the queue. children.Enqueue(root) // Process the queue until it is empty. While (children Is Not Empty) // Get the next node in the queue. BinaryNode: node = children.Dequeue() // Process the node. <Process node> // Add the node's children to the queue. If (node.LeftChild != null) children.Enqueue(node.LeftChild) If (node.RightChild != null) children.Enqueue(node.RightChild) End While End TraverseDepthFirst

This algorithm starts by making a queue and placing the root node in it. It then enters a loop that continues until the queue is empty.

Inside the loop, the algorithm removes the first node from the queue, processes it, and adds the node's children to the queue.

Because a queue processes items in first-in-first-out order, all the nodes at a particular level in the tree are processed before any of their child nodes are processed. Because the algorithm adds each node's left child to the queue before it adds the right node to the queue, the nodes on a particular level are processed in left-to-right order. (If you want to be more precise, you can prove these facts by induction.)

Traversal Run Times

The three recursive algorithms for preorder, inorder, and postorder traversal all travel down the tree to the leaf nodes. Then, as the recursive calls unwind, they travel back up to the root. After an algorithm visits a node and then returns to the node's parent, the algorithm doesn't visit that node again. That means the algorithms visit each node once. So if a tree contains N nodes, they all have O(N) run time.

Those three algorithms don't need any extra storage space, because they use the tree's structure to keep track of where they are in the traversal. They do, however, have depth of recursion equal to the tree's height. If the tree is very tall, that could cause a stack overflow.

The depth-first traversal algorithm processes nodes as they move through a queue. Each node enters the queue once, so if the tree has N nodes, the algorithm takes O(N) time.

This algorithm isn't recursive, so it doesn't have problems with large depths of recursion. Instead, it needs extra space to build its queue. In the worst case, if the tree is a perfect binary tree, its bottom level holds roughly half the total number of nodes (see the earlier section “Facts About Binary Trees”), so if the tree holds N nodes, the queue holds O(N ÷ 2) = O(N) nodes.

More generally, a tree of arbitrary degree might consist of a root node that has every other node as a child. In that case, the queue might need to hold N − 1 nodes, so the space requirement is still O(N).

Sorted Trees

As mentioned earlier, a sorted tree's nodes are arranged so that an inorder traversal processes them in sorted order. Another way to think of this is that each node's value is larger than the value of its left child and less than (or equal to) the value of its right child. Figure 10-6 shows a sorted tree.

images

Figure 10-6: In a sorted tree, a node's value lies between the values of its left child and its right child.

To use a sorted tree, you need three algorithms to add, delete, and find nodes.

Adding Nodes

Building a sorted tree is fairly easy. To add a value to a node's subtree, compare the new value to the node's value, and recursively move down the left or right branch as appropriate. When you try to move down a missing branch, add the new value there.

The following pseudocode shows the algorithm for a BinaryNode class. The code assumes that the class has a Value property to hold the node's data:

// Add a node to this node's sorted subtree. AddNode(Data: new_value) // See if this value is smaller than ours. If (new_value < Value) Then // The new value is smaller. Add it to the left subtree. If (LeftChild == null) LeftChild = New BinaryNode(new_value) Else LeftChild.AddNode(new_value) Else // The new value is not smaller. Add it to the right subtree. If (RightChild == null) RightChild = New BinaryNode(new_value) Else RightChild.AddNode(new_value) End If End AddNode

The algorithm compares the new value to the node's value. If the new value is smaller than the node's value, the algorithm should place the new value in the left subtree. If the left child reference is null, the algorithm gives the current node a new left child node and places the new value there. If the left child is not null, the algorithm calls the child node's AddNode method recursively to place the new value in the left subtree.

If the new value is not smaller than the node's value, the algorithm should place the new value in the right subtree. If the right child reference is null, the algorithm gives the current node a new right child node and places the new value there. If the right child is not null, the algorithm calls the child node's AddNode method recursively to place the new value in the right subtree.

NOTE As was the case for the linked lists described in Chapter 3, it is sometimes helpful to use a sentinel at the top of a tree. For a sorted tree, if you set the root node's value to something smaller than any possible value the tree might need to contain, you can simply add nodes to the tree without worrying about whether it is empty. All the nodes you add end up in the right subtree below the root.

The run time for this algorithm depends on the order in which you add the items to the tree. If the items are initially ordered in a reasonably random way, the tree grows relatively short and wide. In that case, if you add N nodes to the tree, it has height O(log N). When you add an item to the tree, you must search to the bottom of the tree, and that takes O(log N) steps. Adding N nodes at O(log N) steps each makes the total time to build the tree O(N log N).

RANDOM TREE HEIGHT

When you build a nice, wide sorted tree, it may not be obvious that you're adding O(N) nodes at a height of O(log N). After all, what if most of the nodes fit near the top of the tree so that the tree is short while you're adding most of the nodes?

Recall from the earlier section “Facts About Binary Trees” that roughly half of the nodes in a perfect binary tree are at the bottom of the tree. That means after you have added half of the nodes to the tree, you have built all but the last level of the tree, so it already has height log(N) − 1. Now you need to add the remaining half of the nodes at a depth of log(N) − 1, so the total number of steps is N ÷ 2 × log(N) − 1 = O(N log N).

If the values in the tree are initially randomly ordered, you get a reasonably wide tree. However, if you add the values in some orders, you get a tall, thin tree. In the worse case, if you add the values in sorted or reverse sorted order, every node has a single child, and you get a tree containing N nodes that has height N.

In that case, after you add N ÷ 2 nodes, the tree has height N ÷ 2. That means it will take more than N ÷ 2 steps to add the remaining N ÷ 2 nodes, giving a total run time of O(N ÷ 2 × N ÷2) = O(N2).

You can use the AddNode algorithm to build a sorting algorithm called treesort. In the treesort algorithm, you use the previous AddNode algorithm to add values to a sorted tree. You then use an inorder traversal to output the items in sorted order. The AddNode algorithm takes expected time O(N log N), and the inorder traversal takes O(N) time, so the total run time is O(N log N + N) = O(N log N).

In the worst case, building the sorted tree takes O(N2) time. Adding the O(N) time for the inorder traversal gives a total run time of O(N2 + N) = O(N2).

Finding Nodes

After you build a sorted tree, you can search for specific items in it. For example, nodes might represent employee records, and the values used to order the tree might be a record's employee ID. The following pseudocode shows a method provided by the BinaryNode class that searches a node's subtree for a target value:

// Find a node with a given target value. BinaryNode: FindNode(Key: target) // If we've found the target value, return this node. If (target == Value) Then Return <this node> // See if the desired value is in the left or right subtree. If (target < Value) Then // Search the left subtree. If (LeftChild == null) Then Return null Return LeftChild.FindNode(target) Else // Search the right subtree. If (RightChild == null) Then Return null Return RightChild.FindNode(target) End If End FindNode

First the algorithm checks the current node's value. If that value equals the target value, the algorithm returns the current node.

Next, if the target value is less than the current node's value, the desired node lies in this node's left subtree. If the left child branch is null, the algorithm returns null to indicate that the target item isn't in the tree. If the left child isn't null, the algorithm recursively calls the left child's FindNode method to search that subtree.

If the target value is greater than the current node's value, the algorithm performs similar steps to search the right subtree.

If the tree contains N nodes and is reasonably balanced so that it isn't tall and thin, it has height O(log N), so this search can take at most O(log N) steps.

Deleting Nodes

Deleting a node from a sorted tree is a bit more complicated than adding one.

The first step is finding the node to delete. The preceding section explained how to do that.

The next step depends on the position of the target node in the tree. To understand the different situations, consider the tree shown in Figure 10-7.

images

Figure 10-7: How you delete a node from a sorted binary tree depends on the node's position in the tree.

If the target node is a leaf node, you can simply delete it, and the tree is still a sorted tree. For example, if you remove node 89 from the tree shown in Figure 10-7, you get the tree shown in Figure 10-8.

images

Figure 10-8: If you remove a leaf node from a sorted binary tree, it remains a sorted binary tree.

If the target node is not a leaf and it has only one child, you can replace the node with its child. For example, if you remove node 71 from the tree shown in Figure 10-8, you get the tree shown in Figure 10-9.

images

Figure 10-9: To remove an internal node that has one child, replace it with its child.

The trickiest case occurs when the target node has two children. In that case, the general strategy is to replace the node with its left child, but that leads to two subcases.

First, if the target node's left child has no right child, you can simply replace the target node with its left child. For example, if you remove node 21 from the tree shown in Figure 10-9, you get the tree shown in Figure 10-10.

images

Figure 10-10: To remove a target node with two children and whose left child has no right child, replace the target node with its left child.

The final case occurs when the target node has two children and its left child has a right child. In that case, you should search down the tree to find the rightmost node below the target node's left child. If that node has no children, simply replace the target node with it. If that node has a left child, replace it with its left child, and then replace the target node with the rightmost node.

Figure 10-11 shows this case where you want to remove node 35. Now 35 has two children, and its left child (17) has a right child (24). The algorithm moves down from the left child (17) as far as possible by following right child links. In this example, that leads to node 24, but in general that rightmost child could be farther down the tree.

images

Figure 10-11: Removing a target node with two children whose left child has a right child is the most complicated operation for a sorted binary tree.

To delete the target node, the algorithm replaces the rightmost node with its child (if it has one) and then replaces the target node with the rightmost node. In this example, the program replaces node 24 with node 23 and then replaces node 35 with node 24, resulting in the tree on the right in Figure 10-11.

Threaded Trees

A thread is a series of links that allow you to move through the nodes in a tree or network in a way other than by following normal branches or links. A threaded tree is a tree that contains one or more threads.

For example, Figure 10-12 shows a tree with threads represented by dashed arrows.

The threads shown in Figure 10-12 point from a node to the nodes that come before and after it in an inorder traversal. They allow an algorithm to perform an inorder traversal or reverse traversal more quickly than is possible by using the branches alone.

images

Figure 10-12: A threaded tree contains references that let you move through the tree without following its normal branches.

NOTE You can define other threads in a tree, but this type is the most common. Because it includes threads forward and backward through the tree's inorder traversal, this kind of tree is sometimes called a symmetrically threaded tree.

NOTE Notice that all the nodes shown in Figure 10-12 have either a left branch or a left thread and a right branch or a right thread. You can use the same references for both branches and threads if you can somehow distinguish between them. For example, if you give the node class two boolean variables, HasLeftBranch and HasRightBranch, you can store threads in the child links if you set those variables to True.

You can even pack the two boolean values into a byte and use byte operations to see if they are set.

In practice, the savings in memory may not be worth the extra complexity and potential confusion unless you're working with an extremely large tree.

To use this kind of threaded tree, you need to know two things: how to build the tree and how to traverse it.

Building Threaded Trees

A threaded tree starts with a single node that has no branches and with threads set to null. Creating that node is simple.

The trick is adding a new node to the tree. There are two cases, depending on whether you add the new node as the left or right child of its parent.

First, suppose you're adding the new node as a left child. Suppose you're adding the node 3 as the left child of node 4 in the tree shown in Figure 10-12.

Because of where the new node is placed, its value is the next smaller value compared to its parent's value. (In this example, 3 is the next smaller value after 4.) That means the node before the new one in the traversal is the node that was formerly the node before the parent. (In this example, the node before 3 is the node that was before 4—in this case, 2.) When creating the new node, set the new node's left thread equal to the value of the parent's left thread.

The parent's predecessor in the traversal is now the new node. The parent's left branch points to the new node, so the parent no longer needs its left thread, and you should set it equal to null.

The new node's right thread should point to the next node in the tree's traversal. Because of where the new node is placed, that's the parent node, so you should set the new node's right thread to its parent. (In this example, node 3's right thread should point to node 4.)

Figure 10-13 shows the updated tree with node 3 added.

images

Figure 10-13: When you insert a node as a left child, its left thread points where the parent's left thread used to point.

When you add a node as the right child of an existing node, the steps are similar, with the roles of the left and right branches and threads reversed. The new node's right thread takes the value that the parent's right thread had, and the parent's right thread is set to null. The new node's left thread points to the parent node.

Figure 10-14 shows the tree in Figure 10-13 with the new node 8 inserted.

The following pseudocode shows an algorithm for inserting a node into a threaded sorted tree:

// Add a node to this node's sorted subtree. AddNode(Data: new_value) // See if the new value is smaller than ours. If (new_value < this.Value) // The new value is smaller. Add it to the left subtree. If (this.LeftChild != null) Then this.LeftChild.AddNode(new_value) Else // Add the new child here. ThreadedNode child = new ThreadedNode(new_value) child.LeftThread = this.LeftThread child.RightThread = this this.LeftChild = child this.LeftThread = null End If Else // The new value is not smaller. Add it to the right subtree. If (this.RightChild != null) Then this.RightChild.AddNode(new_value) Else // Add the new child here. ThreadedNode child = new ThreadedNode(new_value) child.LeftThread = this child.RightThread = this.RightThread this.RightChild = child this.RightThread = null End If End If End AddNode

images

Figure 10-14: When you insert a node as a right child, its right thread points where the parent's right thread used to point.

The algorithm first compares the new value to the node's value. If the new value is smaller, the algorithm adds it to the left subtree.

If the node has a left child, the algorithm recursively calls its AddNode method.

If the node has no left child, the algorithm adds the new node here. It creates the new node, sets its left thread equal to the current node's left thread, and sets its right thread equal to the current node. It sets the current node's left branch to point to the new node and sets the current node's left thread to null.

If the new value is greater than or equal to the current node's value, the algorithm performs similar steps to place the new value in the right subtree. The steps are the same as the case when the new value is smaller than the current node's value, with the roles of the left and right branches and threads reversed.

This algorithm is very similar to the previous algorithm for adding a node to a sorted tree. Both versions recursively search down through the tree to find the new node's location. The only difference is that this version takes extra action to sort out threads when it finally creates the new node.

As with the previous version, if you use this method to build a threaded sorted tree containing N nodes, this algorithm takes O(N log N) time if the values are initially arranged randomly. The algorithm takes O(N2) time in the worst case when the values are sorted initially or sorted in reverse order.

Using Threaded Trees

The following pseudocode uses threads to perform an inorder traversal:

InorderWithThreads(BinaryNode: root) // Start at the root. BinaryNode: node = root // Remember whether we got to a node via a branch or thread. // Pretend we go to the root via a branch so we go left next. Boolean: via_branch = True // Repeat until the traversal is done. While (node != null) // If we got here via a branch, go // down and to the left as far as possible. If (via_branch) Then While (node.LeftChild != null) node = node.LeftChild End While End If // Process this node. <Process node> // Find the next node to process. If (node.RightChild == null) Then // Use the thread. node = node.RightThread via_branch = False Else // Use the right branch. node = node.RightChild via_branch = True End If End While End InorderWithThreads

The algorithm starts by initializing the variable node to the root node. It also initializes the variable via_branch to True to indicate that the algorithm got to the current node via a branch. Treating the root node in this way makes the algorithm move to the leftmost node in the tree in the next step.

The algorithm then enters a loop that continues until the variable node drops off the tree at the end of the traversal.

If the algorithm got to the current node via a branch, it should not necessarily process that node just yet. If that node has a left branch, the nodes down that subtree have values smaller than the current node, so the algorithm must consider them first. To do that, the algorithm moves as far down the left branches as possible. (In Figure 10-12, this occurs when the algorithm moves from node 6 to node 9. The algorithm must first move down to node 7 before it processes node 9.)

The algorithm then processes the current node.

Next, if the node has no right branch, the algorithm follows the node's right thread. If the right thread is null, the node is set to null, and the While loop ends. The algorithm sets via_branch to False to indicate that it got to the new node via a thread, not a branch. (In Figure 10-12, this happens several times, such as when the algorithm moves from node 4 to node 5. Because via_branch is False, the algorithm will process node 5 next.)

If the current node's right branch is not null, the algorithm follows it and sets via_branch to True so that it moves down that node's left subtree during the next trip through the While loop.

The following list describes the steps taken by this algorithm to traverse the tree shown in Figure 10-12:

  1. Start at the root, and set via_branch to True.
  2. Variable via_branch is True, so follow the left branches to 2 and then 1. Process node 1.
  3. Follow the right thread to 2, and set via_branch to False.
  4. Variable via_branch is False, so process node 2.
  5. Follow the right branch to 4, and set via_branch to True.
  6. Variable via_branch is True, so try to move down the left branches. There is no left branch here, so stay at node 4 and process node 4.
  7. Follow the right thread to 5, and set via_branch to False.
  8. Variable via_branch is False, so process node 5.
  9. Follow the right branch to 6, and set via_branch to True.
  10. Variable via_branch is True, so try to move down the left branches. There is no left branch here, so stay at node 6 and process node 6.
  11. Follow the right branch to 9, and set via_branch to True.
  12. Variable via_branch is True, so follow the left branch to 7 and process node 7.
  13. Follow the right thread to 9, and set via_branch to False.
  14. Variable via_branch is False, so process node 9.
  15. Follow the right thread to null, and set via_branch to False.
  16. Variable node is now null, so the While loop ends.

This algorithm still follows all the nodes' branches, and it visits every node, so it has run time O(N). However, it doesn't need to let recursive calls unwind back up the child branches, so it saves a bit of time over a normal traversal. It also doesn't use recursion, so it doesn't have problems with deep levels of recursion. It also doesn't need any extra storage space, unlike a depth-first traversal.

Specialized Tree Algorithms

Over the years programmers have developed many specialized tree algorithms to solve specific problems. This chapter can't possibly describe every algorithm, but the following sections describe four algorithms that are particularly interesting. They demonstrate the useful techniques of updating a tree to include new data, evaluating recursive expressions, and subdividing geometric areas. The final section explains tries, which are well-known in algorithmic studies.

The Animal Game

In the animal game, the user thinks of an animal, and the program's simple artificial intelligence tries to guess what it is. The program is a learning system, so over time it gets better at guessing the user's animal.

The program stores information about animals in a binary tree. Each internal node holds a yes-or-no question that guides the program down the left or right branch. Leaf nodes represent animals.

The program asks the questions at each node and follows the appropriate branch until it reaches a leaf node where it guesses that node's animal.

If the program is wrong, it asks the user to type a question it can ask to differentiate between the animal it guessed and the correct answer. It adds a new internal node containing the question and gives that node leaves holding the correct and incorrect animals.

Figure 10-15 shows a small knowledge tree for the game.

For example, suppose the user is thinking about a snake. Table 10-2 shows the questions the program asks and the answers the user gives.

For another example, suppose the user is thinking about a giraffe. Table 10-3 shows the questions the program asks and the answers the user gives in this example.

images

Figure 10-15: This knowledge tree can differentiate among dog, cat, fish, snake, and bird.

Table 10-2: The Animal Game Trying to Guess Snake

THE PROGRAM ASKS: THE USER ANSWERS:
Is it a mammal? No
Does it have scales? Yes
Does it breathe water? No
Is it a snake? Yes

Table 10-3: The Animal Game Trying to Guess Giraffe

THE PROGRAM ASKS: THE USER ANSWERS:
Is it a mammal? Yes
Does it bark? No
Is it a cat? No
What is your animal? Giraffe
What question could I ask to differentiate between a cat and a giraffe? Does it have a long neck?
Is the answer to this question true for a giraffe? Yes

The program then updates its knowledge tree to hold the new question and animal. The new tree is shown in Figure 10-16.

images

Figure 10-16: This knowledge tree can now differentiate between cat and giraffe.

Expression Evaluation

You can model many situations with trees. You can model mathematical expressions by creating an internal node for each operator and a leaf node for each numeric value.

Mathematical expressions naturally break into subexpressions that you must evaluate before you can evaluate the expression as a whole. For example, consider the expression (6 × 14) ÷ (9 + 12). To evaluate this expression, you must first evaluate 6 × 14 and 9 + 12. You can then divide the results of those calculations to get the final result.

To model this expression as a tree, you build subtrees to represent the subexpressions. You then join the subtrees with a root node that represents the operation that combines the subexpressions—in this case, division.

Figure 10-17 shows the tree representing the expression (6 × 14) ÷ (9 + 12).

images

Figure 10-17: You can use trees to evaluate mathematical expressions.

Each internal node has children representing its operands. For example, binary operators such as + and / have left and right children that the operator must combine.

You can think of the leaf nodes as special operators that convert a text value into a numeric one. In that case the leaf nodes must hold their text.

The only thing missing from the arithmetic node class is a method to evaluate the node. That method should examine the type of node and then return an appropriate result. For example, if the operator is +, the method should recursively make its operands evaluate their subexpressions, and then it can add the results.

The following pseudocode creates an enumeration that defines values that indicate a node's operator type:

Enumeration Operators Literal Plus Minus Times Divide Negate End Enumeration

This enumeration defines operator types for literal values such as 8, addition, subtraction, multiplication, division, and unary negation (as in −5). You can add other operators such as square root, exponentiation, sine, cosine, and others.

The following pseudocode shows an ExpressionNode class that represents a node in a mathematical expression tree:

Class ExpressionNode Operators: Operator ExpressionNode: LeftOperand, RightOperand String: LiteralText // Evaluate the expression. Float: Evaluate() Case Operator Literal: Return Float.Parse(LiteralText) Plus: Return LeftOperand.Evaluate() + RightOperand.Evaluate() Minus: Return LeftOperand.Evaluate() - RightOperand.Evaluate() Times: Return LeftOperand.Evaluate() * RightOperand.Evaluate() Divide: Return LeftOperand.Evaluate() / RightOperand.Evaluate() Negate: Return -LeftOperand.Evaluate() End Case End Evaluate End ExpressionNode

The class begins by declaring its properties. The Operator property is a value from the Operators enumerated type.

The LeftOperand and RightOperand properties hold links to the node's left and right children. If the node represents a unary operator such as negation, only the left child is used. If the node represents a literal value, neither child is used.

The LiteralText property is used only by literal nodes. For a literal node, it contains the node's textual value, such as 12.

The Evaluate method examines the node's Operator property and takes appropriate action. For example, if Operator is Plus, the method calls the Evaluate method for its left and right children, adds their results, and returns the sum.

After you build an expression tree, evaluating it is easy. You simply call the root node's Evaluate method.

The hardest part of evaluating mathematical expressions is building the expression tree from a string such as (6 × 14) ÷ (9 + 12). How you can do that is a string operation, not a tree operation, so this topic is deferred until Chapter 15, which covers strings in depth.

Quadtrees

Quadtrees are tree data structures that help locate objects in two-dimensional space. For example, suppose you have an application that displays several thousand delivery locations. If the user clicks the map, the program needs to search through all the locations to find the one closest to where the user clicked. If the locations are stored in a simple list, the program must perform a sequential search to find the closest point. A quadtree can make the search much faster.

In a quadtree, a node represents a rectangle in two-dimensional space. The node contains a list of items that are contained within its rectangle.

If a quadtree node contains too many items, it is split into four child nodes representing the parent node's northwest, northeast, southeast, and southwest quadrants. The items are then moved into the appropriate child nodes.

To use a quadtree to find an item with given X and Y coordinates, you start at the root node. If the quadtree node has children, you use the item's coordinates to determine which child contains the item and then recursively search that child for the item. If you reach a node that doesn't have children, you search the items it contains for an item at the target location.

To see why this makes the search faster, suppose the mapping application described a moment ago contains a list of 1,500 locations. Searching that list linearly will require on average roughly 750 comparisons.

In contrast, suppose you store the items in a quadtree where each node can hold at most 100 items. If the nodes are reasonably evenly distributed around the map, the quadtree would logically look like the one shown in Figure 10-18.

images

Figure 10-18: If each leaf node can hold 100 items and the items are evenly distributed, this quadtree can hold roughly 1,600 items.

The root node's area is divided into four quadrants, each of which is divided into four smaller quadrants. Each leaf node representing a smaller quadrant can hold up to 100 items.

To find the item that the user clicked in this quadtree, you need to determine which larger quadrant contains the item and then which smaller quadrant within the larger quadrant contains the item. Then you need to search up to 100 items in the leaf node. The result is an average of two quadrant tests plus roughly 50 item tests. The relative speed of the quadrant tests and item tests may vary depending on your implementation. But the speed generally is much faster than the 750 item tests required by a simple list.

If a quadtree contains N nodes, each of which can hold up to K items, and the items are distributed reasonably evenly in the map area, the tree has a height of roughly log4 g (N ÷ K). In the previous example, N = 1,500 and K = 100, so the height should be roughly log4 g (1,500 ÷ 100) = log4 g (15) ≈ 1.95, which is close to the height of 2 for the tree shown in Figure 10-18.

Figure 10-19 shows another way to visualize a quadtree. In this figure, the quadtree contains 200 items, and each quadtree node can hold at most 10 items. (In a real program, you would probably want to let each node hold more items so that they don't split too often.) The program draws a box around each quadtree node so that you can see how the area is divided.

images

Figure 10-19: Each box shows a quadtree node's area.

In the tree shown in Figure 10-19, the full map area is divided into four quadrants, and each of these is divided into smaller quadrants. Some of the smaller quadrants are divided again, and some of those areas are divided one last time.

To manage a quadtree, you need algorithms to add a new item to a node to a subtree and to find an item in a subtree. You may also want an algorithm to draw the items on a map.

The following pseudocode shows the basic class definition for a quadtree node:

Class QuadtreeNode // The maximum number of points allowed in a quadtree node. Integer: MaxItems = 10 // The items in this quadtree node. List Of Data: Items // The area that this quadtree node represents. Rectangle: Area // The middle X and Y values. Float: Xmid, Ymid // The child quadtree nodes. QuadtreeNode: NWchild, NEchild, SEchild, SWchild // Initializing constructor. Constructor(Rectangle: area) Area = area Xmid = (Area.Left + Area.Right) / 2 Ymid = (Area.Top + Area.Bottom) / 2 End Constructor End QuadtreeNode

The value MaxItems indicates the maximum number of items a node can hold before it must be split into quadrants.

The Items property contains the items that the node holds. If the node is internal, Items is null, and the items are stored in the node's child subtrees.

The Area property holds the rectangle that the node represents in two-dimensional space.

The Xmid and Ymid values give the middle X and Y coordinates of the Area rectangle. These are used to determine which quadrant contains an item.

This class provides a constructor that initializes the node's Area property and initializes the Xmid and Ymid properties. You could calculate Xmid and Ymid whenever you need them, but those values are used frequently (at least for nonleaf nodes), so you can save time by initializing them now.

The following pseudocode shows how a QuadtreeNode can add a new item to its subtree:

// Add an item to this node's subtree. AddItem(Item: new_item) // See if this quadtree node is full. If ((Items != null) And (Items.Count + 1 > MaxItems)) Then // Divide this quadtree node. Float: wid = (Area.Right - Area.Left) / 2 Float: hgt = (Area.Bottom - Area.Top) / 2 NWchild = New QuadtreeNode( New Rectangle(Area.Left, Area.Top, wid, hgt)) NEchild = New QuadtreeNode( New Rectangle(Area.Left + wid, Area.Top, wid, hgt)) SEchild = New QuadtreeNode( New Rectangle(Area.Left + wid, Area.Top + hgt, wid, hgt)) SWchild = New QuadtreeNode( New Rectangle(Area.Left, Area.Top + hgt, wid, hgt)) // Move the points into the appropriate subtrees. For Each item In Items If (item.Y < Ymid) Then If (item.X < Xmid) Then NWchild.AddItem(item) Else NEchild.AddItem(item) Else If (item.X < Xmid) SWchild.AddItem(item) Else SEchild.AddItem(item) End If Next item // Remove this node's Items list. Items = null End If // End if the quadtree node is full. // Add the item to the appropriate subtree. If (Items != null) ThenItems.Add(new_item) Else If (new_item.Y < Ymid) Then If (new_item.X < Xmid) Then NWchild.AddItem(new_item) Else NEchild.AddItem(new_item) Else If (new_item.X < Xmid) Then SWchild.AddItem(new_item) Else SEchild.AddItem(new_item) End If End AddItem

If the current node is a leaf node and adding one more item would give it too many items, the algorithm splits the node. It creates the four child nodes and then loops through the items, recursively calling the child nodes' AddItem methods to move the items into the appropriate child subtree. It then sets the Items list to null to indicate that this is an internal node.

Next, if the Items property is not null, this is a leaf node, so the algorithm adds the new item to the Items list.

If the Items property is null, this is an internal node. In that case, the algorithm performs a series of If-Then tests to see which subtree should contain the item, and it recursively calls the corresponding child's AddItem method.

The following pseudocode shows at a high level how a QuadtreeNode can find the item at a given point. The algorithm returns the item it finds through the parameter result. It returns True if it finds the item and False otherwise:

// Find the item at a specified point. Return true if we find it. Boolean: FindItem(Point: target, Item: result) // See if we have children. If (Items == null) Then // We have children. Search the appropriate child subtree. If (target.Y < Ymid) Then If (new_item.X < Xmid) Then Return NWchild.FindItem(target, result) Else Return NEchild.FindItem(target, result) Else If (new_item.X < Xmid) Then Return SWchild.FindItem(target, result) Else Return SEchild.FindItem(target, result)) End If Else // We have no children. Search the current node's items. For Each item In Items // See if this is the item. If ((item.X == target.X) And (item.Y == target.Y)) Then result = item Return True End If Next item // We did not find the item. Return False End If End FindItem

The algorithm first checks the Items property to see if the current node has children. If the node has children, it determines which child subtree contains the item and recursively calls the corresponding child's FindItem method.

If the current node doesn't have children, the algorithm searches the node's Items list to find the item at the target location. If the algorithm doesn't find an item at that position, it returns False to indicate that it didn't find the item.

This algorithm is reasonably straightforward, but in practice things usually aren't so simple. The items stored in the tree are things other than simple points such as circles, rectangles, or line segments. In that case, where do you store an item that straddles the edge between two quadtree nodes?

One approach is to store the item in both quadtree nodes so that the algorithm can find it no matter which area the user clicks. If you use this approach, you need to change the algorithms to work with two-dimensional items. For example, the search algorithm cannot simply compare the target point to the item's location. Instead, it must use some method to see if the target point lies within the item.

One problem with this approach is that it requires duplicate items representing the same object in different quadtree nodes. That wastes space and fills quadtree nodes sooner than they would be filled otherwise, so they must split more often and make the tree deeper.

Another approach is to represent each item with a specific point, perhaps in its center or upper-left corner. Then, when you need to find an item, you search quadtree nodes with areas that overlap an area around the target point that is big enough to include the largest item.

For example, in the program shown in Figure 10-19, the items are circles with radius 5 that are represented by their center points. When searching for an item at location (A, B), the program examines any quadtree node with an area that intersects the rectangle A − 5 ≤ X ≤ A + 5 and B − 5 ≤ Y ≤ B + 5.

The changes to the algorithms aren't too complicated, but they make the code quite a bit longer.

OCTTREES

An octtree is similar to a quadtree, except that it stores objects in three dimensions. An octtree node represents a three-dimensional volume. When an octtree node contains too many items, its volume is divided into eight octants that are represented by eight child nodes, and the items are distributed among the child subtrees.

Tries

A trie (the word comes from “retrieval” but is usually pronounced “try”) is a tree that holds strings. Each internal node represents a single letter. Leaf nodes may represent more than one letter. A path from the root to a leaf node corresponds to a string.

A partial path from the root to an internal node forms a prefix for longer paths, so tries are sometimes called prefix trees.

A path that represents a key string, whether it ends at an internal node or at a leaf node, has an associated value.

Figure 10-20 shows a trie that holds the keys and values shown in Table 10-4.

Table 10-4: Keys and Values for the Example Trie

KEY VALUE
WANE 29
WISP 72
WANT 36

images

Figure 10-20: A path through a trie defines a string.

For example, consider the path from the root to the node E. The nodes visited correspond to the letters W, A, N, and E, so that node represents the key WANE. That key's value is 29.

For another example, consider the path to the node T. The nodes visited correspond to the letters W, A, N, and T, so the node represents the key WANT. That key's value is 36.

Notice that the path to the node N forms the string WAN, which is a prefix of both WANE and WANT.

Notice also that a leaf node may represent more than one letter. In this example, the node ISP represents three letters. The path from the root to that node represents the key WISP and has value 72.

To add a new key to the trie, you use the key's letters to follow the appropriate path through the trie. If you reach a leaf node and the key has still more letters that are not represented by the current path, you add a new child node to represent the rest of the key.

For example, suppose you want to add the key WANTED to the trie. You follow the path through the nodes W, A, N, and T. The key still has the letters ED, so you add a new node to hold them. Figure 10-21 shows the new trie.

images

Figure 10-21: To add a key that is longer than the corresponding path through the tree, add a new leaf node.

Sometimes when you add a new key to the trie, you find it early. For example, the trie shown in Figure 10-21 already has the nodes needed to represent the key WAN. In that case, all you need to do is add a value to the appropriate node, as shown in Figure 10-22.

images

Figure 10-22: If a trie already contains nodes to represent a key, simply add a value to the key's final node.

Instead of storing a letter in each internal node, you can figure out the node's letter by keeping track of the path you took from the root to get there. For example, you could store a node's children in an array where Children[0] is the branch for the letter A, Children[1] is the branch for the letter B, and so forth.

Figure 10-23 shows the trie from Figure 10-22 with internal node letters moved to the branches. Notice that the node with value 29 doesn't need any extra information because the key it represents is fully specified by the path to the node. In contrast, the path to the node with value 10 specifies only the letters W, A, N, T, and E, so the node needs to store the final D.

images

Figure 10-23: Instead of storing an internal node's letter in the node, you can figure out the node from the path to the node.

The following pseudocode shows how you can add an item to a trie. In this code the phrase “remaining node key” means part of a key stored in a leaf node, such as D and SP in Figure 10-23:

AddValue(string new_key, string new_value) <If new_key is not blank and matches the remaining node key, place the value in this node and return.> <If new_key is blank and the remaining node key is too, place the value here and return.> <If new_key is blank but the node's remaining key isn't blank, move the node's remaining key (minus the first letter) into a child, place the value here, and return.> // If we get to this point, we need a child node. If <The Children array is null> Then // Make the Children array. Children = New TrieNode[26] <If the node's remaining key is not blank, move it into the appropriate child.> End If // Convert the letter into an integer by "subtracting" A. Integer: index = new_key[0] -'A' // Search the appropriate subtrie. If (Children[index] == null) // This child doesn't exist. Make it and // let it represent the rest of the new key. Children[index] = New TrieNode() Children[index].RemainingKey = new_key.Substring(1) Children[index].Value = new_value Return End If // Search the appropriate subtrie. Children[index].AddValue(new_key.Substring(1), new_value) End AddValue

This is a fairly confusing algorithm. It may be easier to understand if you draw a tree and then walk through the algorithm, updating it in various ways.

As the algorithm moves through the trie, it removes the letters from the new key corresponding to the branches it crosses. The body of the algorithm then considers the current value of the new key.

First, if the new key is not blank and it matches the remaining node key, the algorithm places the new value in the current node. This would happen in Figure 10-23 if you were setting the value for WANTED. When the algorithm reached the node labeled D, the new key value will be D and the node's remaining key is D.

Next, if the new key is blank and the node's remaining key is too, the algorithm places the value in this node. This occurs if you set the value of WAN in Figure 10-23. When the algorithm crosses the N branch, the new key is reduced to an empty string. The node at the end of that branch doesn't have any remaining key (only leaf nodes can have remaining key values), so this is where the value for WAN belongs.

Next, if the new key is blank but the node's remaining key isn't, the algorithm moves the node's remaining key into a child. This would happen if the trie contains WANE and WANTED but not WANT. In that case, the path for WANTED will be W, A, N, T, ED. When you add WANT and cross the T branch, the new key value is blank because the path represents the entire new key WANT. But that node has value ED. The algorithm moves the ED down to a child, creating a new E branch and a new node with D as its remaining key.

If the algorithm gets past all the previous steps, the algorithm must move into a child node's subtrie. Before it tries to do that, it determines whether the node's Children array has been initialized. If it has not, the algorithm creates the Children array. If the node's remaining key is not blank, the algorithm also moves the remaining key (minus its first letter) into the appropriate child.

The algorithm then examines the child that should contain the new key. If that child doesn't exist, the algorithm creates it and stores the rest of the new key and the new value in it.

If that child does exist, the algorithm recursively calls itself to add the new key (minus its first letter) to that child's subtrie.

A search for a value in a trie follows the same path through the trie but uses a much simpler algorithm because there are fewer special cases to consider. The following pseudocode shows how a search algorithm might work in a trie:

// Find a value in this node's subtrie. Data: FindValue(String: target_key) // If the remaining key matches the // remaining node key, return this node's value. If (target_key == RemainingKey) Then Return Value // Search the appropriate child. If (Children == null) Then Return null Integer: index = target_key[0] -'A' If (Children[index] == null) Then Return null Return Children[index].FindValue(target_key.Substring(1)) End FindValue

The algorithm first compares the target key to the node's remaining key value. If these are the same, two things may have happened. First, the algorithm may have used up the target key and reached a node that doesn't have a remaining value. (This happens if you search for WAN in Figure 10-22.) Second, the algorithm may have reached a node where the remaining target key matches the node's remaining key. (This happens if you search for WANTED in Figure 10-22.) In either case, this code matches the target key, so the algorithm returns its value.

If the remaining target key doesn't match the node's remaining key, the algorithm must search a child node. If the current node has no children, the target key isn't in the trie, so the algorithm returns null.

If the node has children, the algorithm calculates the index of the target key's child. If that child doesn't exist, the target key isn't in the trie, so the algorithm again returns null.

Finally, if the target key's child exists, the algorithm calls itself recursively for that child to find the target key (minus its first letter).

Summary

Trees can be useful for storing and manipulating hierarchical data. After you build a tree, you can enumerate its values in different orders and search for values within the tree.

The performance of many tree algorithms is related to the tree's height. If a tree holding N nodes is relatively short and wide, its height is O(log N), and those algorithms are fairly quick. If the tree is tall and thin, it could have height O(N), and some of those algorithms perform badly. For example, building a sorted binary tree takes O(N × log N) time in the best case and O(N2) time in the worst case.

Because a tree's height is important to these algorithms, special trees have been devised that rebalance themselves so that they cannot grow too tall and thin. The next chapter describes several kinds of balanced trees, including the B-trees and B+trees used by many database systems to store and search indices efficiently.

Exercises

Asterisks indicate particularly difficult problems.

  1. Can a perfect binary tree hold an even number of nodes?
  2. A perfect tree is full and complete, although not all full and complete trees are perfect. Draw a tree that is full and complete but not perfect.
  3. Use induction to prove that the number of branches B in a binary tree containing N nodes is B = N − 1.
  4. Prove that the number of branches B in a binary tree containing N nodes is B = N − 1 without using induction.
  5. *Use induction to prove that the number of leaf nodes L in a perfect binary tree of height H is L = 2H.
  6. **Use induction to prove that the number of missing branches (places where a child could be added) M in a binary tree that contains N nodes is M = N + 1.
  7. What is the preorder traversal for the tree shown in Figure 10-24?

    images

    Figure 10-24: Tree data structures usually are drawn with the root at the top.

  8. What is the inorder traversal for the tree shown in Figure 10-24?
  9. What is the postorder traversal for the tree shown in Figure 10-24?
  10. What is the depth-first traversal for the tree shown in Figure 10-24?
  11. Write a program that finds the preorder, inorder, postorder, and depthfi rst traversals for the tree shown in Figure 10-24.
  12. What happens if you use a queue instead of a stack in the depth-first traversal algorithm described in the section “Depth-first Traversal?” How could you generate the same traversal recursively?
  13. Write a program similar to the one shown in Figure 10-25 that uses a preorder traversal to display a textual representation of the tree shown in Figure 10-24.

    images

    Figure 10-25: A preorder traversal can generate a textual display of a tree similar to the one used by Windows Explorer to display a directory hierarchy.

  14. **Write a program similar to the one shown in Figure 10-26 to display a more intuitive picture of a tree. (Hints: Give the node class a PositionSubtree method that positions the node's subtree. It should take as parameters the minimum x- and y-coordinates that the node's subtree can occupy, and it should calculate the rectangle that the subtree will cover. It will need to recursively call the PositionSubtree method of its left and right child subtrees and use the subtrees' sizes to see how big to make the original subtree. Also give the node class methods to recursively draw the tree's links and nodes.)

    images

    Figure 10-26: To draw a tree, a program must first position it.

  15. **The tree shown in Figure 10-26 is particularly useful for unordered trees, but for ordered binary trees it can be hard to tell whether a node is the left or right child of its parent. For example, in Figure 10-26 it's unclear whether node C is the left or right child of node D.

    Modify the program you wrote for Exercise 14 to produce a display similar to the one shown in Figure 10-27. Here, if a node has only one child, the program allows some space for the missing child, so you can tell whether the other child is a left or right child.

    images

    Figure 10-27: In an ordered binary tree, you can leave space to indicate missing children.

  16. Write pseudocode to perform a reverse inorder traversal on a threaded sorted tree.
  17. *Write a program that builds a threaded sorted tree and displays its inorder and reverse inorder traversals.
  18. **Expand the program you built for Exercise 17 so that it displays the tree shown in Figure 10-28. The circles in the drawing show a node's value and the values of the nodes to which its threads lead. For example, node 4 has its left thread set to null (displayed as -- in the program) and its right thread pointing to node 5.

    images

    Figure 10-28: This program builds and displays threaded sorted trees.

  19. In general, is the knowledge tree used by the animal game full, complete, perfect, none of those, or a combination of those?
  20. The animal game can use the following node class to store information:

    Class AnimalNode String: Question AnimalNode: YesChild, NoChild End Class

    If you use this class, how can you tell whether a node represents a question or an animal?

  21. Write a program that implements the animal game.
  22. Draw expression trees for the following expressions:
    • (15 ÷ 3) + (24 ÷ 6)
    • 8 × 12 − 14 × 32
    • 1 ÷ 2 + 1 ÷ 4 + 1 ÷ 20
  23. Write a program that evaluates mathematical expressions. Because parsing expressions to build mathematical expression trees is deferred until Chapter 15, this program doesn't need to do that. Instead, make the program use code to build and evaluate the expressions in Exercise 22.
  24. Draw expression trees for the following expressions:
    • images
    • 5! ÷ ((5 − 3)! × 3!)
    • Sine(45°)2
  25. *Extend the program you wrote for Exercise 23 to evaluate the expressions in Exercise 24.
  26. **Write a program similar to the one shown in Figure 10-19. Let the user click to select a circle. If the user clicks outside all the circles, select no circle. When you draw the map, draw the selected circle (if there is one) in a different color.
  27. Draw a trie to represent the following keys and values:

    images

  28. **Write a program that lets you add and find items in a trie.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.167.87