In the previous chapter, we presented heap trees. These are close to fully balanced trees in which the largest item is always found in the root node and each node has a value greater than its children.
In this chapter, we present another balanced tree structure, the red-black tree. Like the AVL tree presented in Chapter 10, the red-black tree data structure is aimed at efficient insertion, deletion, and searching of items stored in the tree.
In the next section, we introduce red-black trees.
12.1 Red-Black Trees
An interesting and important balanced binary search tree is the red-black tree. Rudolf Bayer invented this tree structure in 1972, ten years after the AVL tree was invented.
Red-black trees, like AVL trees, are self-balancing. After an insertion or deletion, the resulting tree is a red-black tree. Like AVL trees, the computational complexity for insertion, deletion, or search is O(log2n).
Insertion and deletion for red-black trees generally involve fewer rotational corrections, but the resulting tree is less balanced than an AVL tree. In applications that expect many insertions and deletions and fewer searches, red-black trees may be preferable to AVL trees.
Because of the complexity of red-black trees, we limit ourselves in this chapter to implementing insertion into a red-black tree. The interested reader will find an implementation for deletion in Chapter 13 (page 545) of my book, Modern Software Development Using C#.Net, Thompson, 2006.
Definition of Red-Black Tree
- 1.
Every node is assigned a color of red or black.
- 2.
The root node is always black.
- 3.
The children of a red node are black.
- 4.
Every path from the root node to a leaf node contains the same number of black nodes.
Example of Red-Black Tree
In this ten-node red-black tree, every path from the root to a leaf node contains exactly two black nodes.
Some terminologies we will use include parent, grandparent, and uncle.
As an example, the parent of node 217 is 250. The uncle of 217 is 150 (sibling of parent). The grandparent of 217 is 175.
In the next section, we discuss the logic of inserting an item into a red-black tree. We “walk” through an example in detail to illustrate the process.
12.2 Insertion Process
We discuss the logic of insertion with a series of examples.
The first step for insertion is to do an ordinary search-tree insertion.
The new node added to the tree is always colored red. Our goal is to keep the number of black nodes between root and all leaf nodes constant.
If the new inserted node has a red parent, this violates condition 3 in the preceding text, which requires the child of a red node to be black. We then must take corrective action.
The first case we consider is when the parent of the node inserted is red and the uncle of the node inserted exists and is red. Consider the following tree after inserting 25. The uncle of 25 is 150 and is red.
We perform a correction by modifying the color of the parent (change red to black) and uncle (change red to black) and grandparent if it is not the root (which it is in this case). The corrected tree is shown in the following. If 100 was not the root, we might have to continue the search for violations up the tree after changing node 100 to red.
The result of performing color modification is shown as follows:
The next case we consider is when the parent of the node inserted is red and the uncle is black or does not exist. There are four cases to consider.
In the first case, we insert 25. Parent is red and uncle does not exist.
In the second case, we again insert 25. Parent is red and uncle does not exist.
The other two cases are symmetric with respect to the root node (are on the right side of the root).
The corrective action we take involves tree rotations as follows:
We take an inorder traversal of the subtree starting at the grandparent and label the nodes first, second, and third in the traversal; then the second node will always be the new root of the subtree and its left child the first and right child the third.
In case 1, the traversal produces first = 25, second = 50, and third = 100.
In case 2, the traversal produces first = 25, second = 50, and third = 100.
We recolor the new subroot black, and its two children remain red.
This produces the corrected tree.
In case 1, we perform a right rotate on node 100. In case 2, we perform a left rotate on node 25 (producing case 1) and then a right rotate on node 100. Cases 3 and 4 follow a symmetric pattern.
Detailed Walk-Through of Many Insertions
To solidify our understanding of insertion, we construct a red-black tree, step by step, by inserting the sequence of values: 10, 20, 4, 15, 17, 40, 50, 60, 70, 35, 38, 18, 19, 45, 30, 25. We show the work for some of the insertions and leave the rest as an exercise.
After inserting 10, 20, and 4, we have
After inserting 15, we have
Since the parent of 15 is red and uncle is red, we do recoloring to produce
After inserting 17, we get
But this needs correction. Since the parent of 17 is red and uncle does not exist, we perform rotational corrections (left on 15 and right on 20) and recoloring to get
We next insert 40. We show only the result after reconfiguring (recoloring case) because the parent and uncle are red.
We next insert 50. This is a case 4 requiring one left rotational correction on 20 producing
We next insert 60. Because of the red parent and red uncle, this requires only recoloring. The result is
We are halfway there! As an exercise, please continue the insertions and show that the final red-black tree is
A careful inspection of this tree shows that the number of black nodes from root 17 to every leaf is exactly 3. Every red node has only black children.
This tree is clearly less balanced than an AVL tree (the maximum depth on the right side of the root is 5 and the maximum depth on the left side of the root is 2).
In the next section, we present an implementation of Insertion into a red-black tree. The details are complex because of the many special cases.
12.3 Implementation of Red-Black Tree
The implementation details for insertion into a red-black tree are daunting. This is because of the number of possible rotational or color corrections that are potentially possible based on the logic discussed and illustrated in Section 12.2.
The best strategy for unraveling the logic in the implementation presented in Listing 12-1 is to “walk” as far as you can, step by step, through the example presented in Section 12.2.
A few small changes to the display tree function, defined and discussed in Section 8.3, were made for drawing a red-black tree. The changes in this portion of the implementation are shown in boldface.
Red-black tree
The output produced by main is shown in the following. This is the same as the tree constructed in Section 12.2.
The OrderedStringer interface was brought back into play because the display tree requires it to create the labels for each tree node.
Comparing the Performance of Red-Black Tree to AVL Tree
A benchmark test was performed to see how long it takes to construct a red-black tree from a sequence of 100,000 random integers. The same test was performed to see the time required to build an AVL tree from 100,000 random integers.
The results are interesting and the following:
Insertion time for red-black tree: 27.62615ms
Search time for red-black tree: 16.037945ms
Insertion time for AVL tree: 48.315163ms
Search time for AVL tree: 3.914522ms
Benchmark Conclusion
The red-black tree takes about half as long to build but takes four times as long to search compared to the AVL tree. The AVL tree is more balanced than the red-black tree but requires many more rotations during construction.
Since we typically build search trees for many fast lookups, the AVL is generally preferable in such cases.
12.4 Summary
The logic for building a red-black tree was presented and illustrated. An implementation of a generic red-black tree was presented with the Insert method along with many supporting methods. With small modifications, the code for drawing a red-black tree was shown. The performance of a red-black tree was compared to an AVL tree. Red-black trees can be more efficiently generated but are less efficient to search than AVL trees.
In the next chapter, we introduce expression trees.