40. Searching and Priority Queues in o(log n) Time

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Arne Andersson

Uppsala University

40.1Introduction

40.2Model of Computation.

40.3Overview.

40.4Achieving Sub-Logarithmic Time per Element by Simple Means.

Range Reduction•Packing Keys• Combining

40.5Deterministic Algorithms and Linear Space.

Fusion Trees•Exponential Search Trees

40.6From Amortized Update Cost to Worst-Case

40.7Sorting and Priority Queues

Range Reduction•Packed Sorting•Combining the Techniques•Further Techniques and Faster Randomized Algorithms

References

40.1Introduction

In many cases of algorithm design, the comparison-based model of computation is not the obvious choice. In this chapter, we show how to design data structures with very good complexity on a realistic model of computation where keys are regarded as binary strings, each one contained in one or more machine words (registers). This model is sometimes referred to as the RAM model,^† and it may be argued that it reflects real computers more accurately than the comparison-based model.

In the RAM-model the word length becomes a natural part of the model. A comparison does not necessarily take constant time, on the other hand we may use a larger variety of operations on data. This model allows for comparison-based algorithms to be used but also for algorithms like tries, bucket sort, radix sort etc, which are known to be efficient in practice.

40.2Model of Computation

We use a unit-cost RAM with word size w. In the standard case we assume that the n keys are w-bit keys that can be treated as binary strings or integers, but we may also consider key that occupy multiple words. It should be noted that the assumption that keys can be treated as binary strings or integers also holds for floating-point numbers (cf. IEEE 754 floating-point standard [1, p. 228]).

In the RAM-model, we can use other operations than comparisons, for instance indirect addressing, shifting, bitwise logical operations, and multiplication. Without loss of generality, we assume that w = Ω(log n), since otherwise we could not even fit the number n, or a pointer, into a machine word. (If we can not fit the number n into a constant number of words, the traditional analysis for comparison-based algorithms would also fail.)

Our complexity analysis has two parameters, the number of keys n and the word length w. In cases where the complexity is expressed only in terms of n, it is supposed to hold for any possible value of w, and vice versa.

For the searching problem, we assume that an ordered set is maintained and that operations like range queries and neighbour queries are supported. We say that we study ordered dictionaries, as defined below.

Definition 40.1

A dictionary is ordered if neighbour queries and range queries are supported at the same cost as member queries (plus the reporting cost), and if the keys can be reported in sorted order in linear time.

40.3Overview

The basic purpose of this chapter is to introduce some of the basic techniques and give references to recent development:

•We start by presenting some simple data structures, which allow us to explain how the “information-theoretic O(log n) barrier” bay be surpassed. These data structures use a two-step approach: First, range reduction is used to decrease key length, such that we only need to consider keys that are much shorter than w. Secondly, we treat these short keys efficiently by packed computing where many keys are packed together in words.

•Next, we discuss some more elaborate data structures. In particular, we show how to achieve low worst-case complexity in linear space.

i.The fusion tree, the first data structure presented that achieved sublogarithmic complexity.

ii.The exponential search tree, which achieves tight worst-case bound on dynamic ordered dictionaries.

•We also give references to recent results on efficient priority queue implementations and sorting.

40.4Achieving Sub-Logarithmic Time per Element by Simple Means

In this section, we show that it is surprisingly simple to achieve a sublogarithmic complexity in n independent of w, which implies sorting and searching asymptotically faster than comparison-based algorithms.

We use indirect addressing and large arrays. As a consequence, the data structures will need much space. However, all algorithms presented here can be fit into linear space with randomization (i.e., with universal hashing [2]).

In some cases, we will consider keys that are shorter than w, we will then use b or k to denote key length.

In this section, we will use F(n, b) to express the complexity of searching, as specified below.

Definition 40.2

Let F(n, b) be the worst-case cost of performing one search or update in an ordered dictionary storing n keys of length b.

Unless we use hashing to obtain linear space, the methods discussed in this section can all be implemented with a simple instruction set. All necessary instructions are standard, they are even in AC0. (An instruction is in AC0 if it is implementable by a constant depth, unbounded fan-in (AND,OR,NOT)-circuit of size w^O(1). An example of a non-AC0 instruction is multiplication [3].)

40.4.1Range Reduction

One way to simplify a computational problem is by range reduction. In this case, we reduce the problem of dealing with w-bit keys to that of dealing with k-bits keys, k < w.

Assume that we view our w-bit keys as consisting of two w/2-bit characters and store these in a trie of height 2. Each internal node in the trie contains

•a reference to the min-element below the node; the min-element is not stored in any subtrie;

•a table of subtries, where each existing subtrie is represented by a w/2-bit key;

•a data structure for efficient neighbour search among the w/2-bit keys representing the subtries.

Since each node except the root has one incoming edge and each node contains exactly one element (the min-element), the trie has exactly n nodes and n − 1 edges.

We make neighbour searches in the following way: Traverse down the trie. If we find a leaf, the search ends, otherwise we end up at an empty entry in the subtrie table of some node. By making a neighbour search in that node, we are done. The cost for traversing the trie is O(1) and the cost for a local neighbour search is O(F(n, b/2)) by definition.

The space requirements depend on how the table of subtrie pointers is implemented. If the table is implemented as an array of length 2^b/2, each node in the trie requires Θ(2^b/2) space. If we instead represent each table as a hash table, the total space of all hash tables is proportional to the total number of edges in the trie, which is n − 1.

We summarize this in the following equation.

$F (n, w) = O (1) + F (\frac{n, w}{2}) .$ $F (n, w) = O (1) + F (\frac{n, w}{2}) .$	(40.1)

We can use the same construction recursively. That is, the local data structure for neighbour search among w/2-bit keys can be a trie of height 2 where w/4 bits are used for branching, etc.

In order to apply recursion properly, we have to be a bit careful with the space consumption. First, note that if the number of edges in a trie with n elements was larger than n, for instance 2n, the total space (number of edges) would grow exponentially with the number of recursive levels. Therefore, we need to ensure that the number of edges in a trie is not just O(n) but actually at most n. This is the reason why each node contains a min-element; in this way we guarantee that the number of edges is n − 1.

Secondly, even when we can guarantee that the space per recursive level does not increase, we are still faced with Θ(n) space (with hashing) per level. If we use more than a constant number of levels, this will require superlinear space. This is handled in the following way: When we apply the recursive construction r times, we only keep a small part of the elements in the recursive structure. Instead, the elements are kept in sorted lists of size Θ(r), and we keep only the smallest element from each list in our recursive trie. When searching for a key, we first search for its list in the recursive trie structure, we then scan the list. Insertions and deletions are made in the lists, and the sizes of the lists are maintained by merging and splitting lists. Now, the total space taken by each level of the recursive trie construction is Θ(n/r) and the total space for r recursive levels is Θ(n). Searching, splitting and merging within the lists only takes O(r) time. In summary, setting r = log(w/k) we get the following lemma.

Lemma 40.1

F(n, w) = O(log (w/k)) + F(n, k).

This recursive reduction was first used in van Emde Boas trees [4–7].

40.4.2Packing Keys

If the word length is small enough—as in today’s computers—the range reduction technique discussed above will decrease the key length to a constant at a low cost. However, in order to make a really convincing comparison between comparison-based algorithms and algorithms based on indirect addressing, we must make the complexity independent of the word size. This can be done by combining range reduction with packed computation. The basic idea behind packed computation is to exploit the bit-parallelism in a computer; many short keys can be packed in a word and treated simultaneously.

The central observation is due to Paul and Simon [8]; they observed that one subtraction can be used to perform comparisons in parallel. Assume that the keys are of length k. We may then pack Θ(w/k) keys in a word in the following way: Each key is represented by a (k + 1)-bit field. The first (leftmost) bit is a test bit and the following bits contain the key, cf. Figure 40.1. Let X and Y be two words containing the same number of packed keys, all test bits in X are 0 and all test bits in Y are 1. Let M be a fixed mask in which all test bits are 1 and all other bits are 0. Let

$R \leftarrow (Y - X) AND M .$ $R \leftarrow (Y - X) AND M .$	(40.2)

Figure 40.1A multiple comparison in a packed B-tree.

Then, the ith test bit in R will be 1 if and only if y_i > x_i. All other test bits, as well as all other bits, in R will be 0.

We use packed comparisons to achieve the following result.

Lemma 40.2

$F (n, k) = O (\log (w / k) + \frac{\log n}{\log (w / k)})$ $F (n, k) = O (\log (w / k) + \frac{\log n}{\log (w / k)})$ .

Proof. (Sketch) We use a packed B-tree [9].

Th packed B-tree has nodes of degree Θ(w/k). In each node, the search keys are packed together in a single word, in sorted order from left to right. When searching for a k-bit key x in a packed B-tree, we take the following two steps:

1.We construct a word X containing multiple copies of the query key x. X is created by a simple doubling technique: Starting with a word containing x in the rightmost part, we copy the word, shift the copy k + 1 steps and unite the words with a bitwise or. The resulting word is copied, shifted 2k + 2 steps and united, etc. Altogether X is generated in O( log(w/k)) time.

2.After the word X has been constructed, we traverse the tree. At each node, we compute the rank of x in constant time with a packed comparison. The cost of the traversal is proportional to the height of the tree, which is O(log n/ log(w/k)).

A packed comparison at a node is done as in Expression 40.2. The keys in the B-tree node are stored in Y and X contains multiple copies of the query key. After subtraction and masking, the rightmost p test bits in R will be 1 if and only if there are p keys in Y which are greater than x. This is illustrated in Figure 40.1. Hence, by finding the position of the leftmost 1-bit in R we can compute the rank of x among the keys in Y. In order to find the leftmost key, we can simply store all possible values of R in a lookup table. Since the number of possible values equals the number of keys in a B-tree node plus one, a hash table implementation of this lookup table would require only Θ(w/k) space. ■

Above, we omitted a lot of details, such as how to perform updates and how pointers within a packed B-tree are represented. Details can be found in [9].

40.4.3Combining

We can now derive our first bounds for searching. First, we state bounds in terms of w. The following bound holds for searching [4–6]:

Theorem 40.1

F(n, w) = O(log w).

Proof. (Sketch) Apply Lemma 40.1 with k = 1.■

Next, we show how to remove the dependency of word length [9]:

Theorem 40.2

$F (n, w) = O (\sqrt{\log n})$ $F (n, w) = O (\sqrt{\log n})$ .

Proof. (Sketch) If $\log w = O (\sqrt{\log n})$ $\log w = O (\sqrt{\log n})$ , Theorem 40.1 is sufficient. Otherwise, Lemma 40.1 with $k = w / 2^{\sqrt{\log n}}$ $k = w / 2^{\sqrt{\log n}}$ gives $F (n, w) = O (\sqrt{\log n}) + F (n, w / 2^{\sqrt{\log n}})$ $F (n, w) = O (\sqrt{\log n}) + F (n, w / 2^{\sqrt{\log n}})$ . Lemma 40.2 gives that $F (n, w / 2^{\sqrt{\log n}}) = O (\sqrt{\log n})$ $F (n, w / 2^{\sqrt{\log n}}) = O (\sqrt{\log n})$ . ■

40.5Deterministic Algorithms and Linear Space

The data structures in this section are more complicated than the previous ones. They also need more powerful—but standard—instructions, like multiplication. On the other hand, these structures achieves linear space without randomization (i.e., without hashing).

Definition 40.3

Let D(n) be the worst-case search cost and the amortized update cost in an ordered dictionary storing n keys in O(n) space.

40.5.1Fusion Trees

The fusion tree was the first data structure to surpass the logarithmic barrier for searching. The central part of the fusion tree [10] is a static data structure with the following properties:

Lemma 40.3

For any d, d = O(w^1/6), a static data structure containing d keys can be constructed in O(d4) time and space, such that it supports neighbour queries in O(1) worst-case time.

Proof. (Sketch) The main idea behind the fusion tree is to view the keys as stored in an implicit binary trie and concentrate at the branching levels in this trie. We say that branching occurs at significant bit positions. We illustrate this view with an example, shown in Figure 40.2.

Figure 40.2Searching in the internal trie in a fusion tree node. Horizontal lines represent significant bit positions. The thin paths represent the 6 keys in the trie, while the fat path represents the search key x (which is not present in the trie). The arrow marks the position of the compressed key x′ among the keys in the trie.

In the example, w = 16 and d = 6. We store a set Y of keys y1,…, y_d. Each key in Y is represented as a path in a binary trie. In the figure, a left edge denotes a 0 and a right edge denotes a 1. For example, y3 is 1010010101011010. The significant bit positions correspond to the branching levels in the trie. In this example the levels are 4, 9, 10, and 15, marked by horizontal lines. By extracting the significant bit positions from each key, we create a set Y′ of compressed keys y1′,…, y_d′. In our example the compressed keys are 0000, 0001, 0011, 0110, 1001, and 1011. Since the trie has exactly d leaves, it contains exactly d − 1 binary nodes. Therefore, the number of significant bit positions, and the length of a compressed key, is at most d − 1. This implies that we can pack the d keys, including test bits, in d2 bits. Since d = O(w^1/6), the packed keys fit in a constant number of words.

This extraction of bits is nontrivial; it can be done with multiplication and masking. However, the extraction is not as perfect as described here; in order to avoid problems with carry bits etc, we need to extract some more bits than just the significant ones. Here, we ignore these problems and assume that we can extract the desired bits properly. For details we refer to Fredman and Willard [10].

The d compressed keys may be used to determine the rank of a query key among the original d keys with packed computation. Assume that we search for x = 1010011001110100, represented as the fat path in Figure 40.2. First, we extract the proper bits to form a compressed key x′ = 0010. Then, we use packed searching to determine the rank of x′ among y1′,…, y_d′. In this case, the packed searching will place x′ between y2′ and y3′. as indicated by the arrow in Figure 40.2. This is not the proper rank of the original key x, but nevertheless it is useful. The important information is obtained by finding the position of the first differing bit of x and one of the keys y2 and y3. In this example, the 7th bit is the first differing bit. and, since x has a 1 at this bit position, we can conclude that it is greater than all keys in Y with the same 6-bit prefix. Furthermore, the remaining bits in x are insignificant. Therefore, we can replace x by the key 1010011111111111, where all the last bits are 1s. When compressed, this new key becomes 0111. Making a second packed searching with this key instead, the proper rank will be found.

Hence, in constant time we can determine the rank of a query key among our d keys.■

The original method by Fredman and Willard is slightly different. Instead of filling the query keys with 1s (or 0s) and making a second packed searching, they use a large lookup table in each node. Fusion trees can be implemented without multiplication, using only AC0 instructions, provided that some simple non-standard instructions are allowed [11].

Theorem 40.3

D(n) = O(log n/ log log n).

Proof. (Sketch) Based on Lemma 40.3, we use a B-tree where only the upper levels in the tree contain B-tree nodes, all having the same degree (within a constant factor). At the lower levels, traditional (i.e., comparison-based) weight-balanced trees are used. The reason for using weight-balanced trees is that the B-tree nodes are costly to reconstruct; the trees at the bottom ensure that few updates propagate to the upper levels. In this way, the amortized cost of updating a B-tree node is small.

The amortized cost of searches and updates is O(log n/ log d + log d) for any d = O(w^1/6). The first term corresponds to the number of B-tree levels and the second term corresponds to the height of the weight-balanced trees. Since w ≥ log n (otherwise a pointer would not fit in a word), the cost becomes at most O(log n/ log log n).■

40.5.2Exponential Search Trees

The exponential search tree [12,13] allows efficient dynamization of static dictionary structures. The key feature is:

Any static data structure for searching that can be constructed in polynomial time and space can be efficiently used in a dynamic data structure.

The basic data structure is a multiway tree where the degrees of the nodes decrease doubly-exponentially down the tree. In each node, we use a static data structure for navigation. The way the tree is maintained, we can guarantee that, before an update occurs at a certain node, a polynomial number of updates will be made below it. Hence, even if an update requires a costly reconstruction of a static data structure, this will occur with large enough intervals.

Lemma 40.4

Suppose a static data structure containing d keys can be constructed in O(d4) time and space, such that it supports neighbour queries in O(S(d)) worst-case time. Then,

D (n) = O (S (n^{1 / 5})) + D (n^{4 / 5});

$D (n) = O (S (n^{1 / 5})) + D (n^{4 / 5});$

Proof. (Sketch) We use an exponential search tree. It has the following properties:

•Its root has degree Θ(n^1/5).

•The keys of the root are stored in a local (static) data structure, with the properties stated above. During a search, the local data structure is used to determine in which subtree the search is to be continued.

•The subtrees are exponential search trees of size Θ(n^4/5).

First, we show that, given n sorted keys, an exponential search tree can be constructed in linear time and space. The cost of constructing a node of degree d is O(d4), and the total construction cost C(n) is (essentially) given by

$C (n) = O ((n^{1 / 5})^{4}) + n^{1 / 5} \cdot C (n^{4 / 5}) \Rightarrow C (n) = O (n) .$ $C (n) = O ((n^{1 / 5})^{4}) + n^{1 / 5} \cdot C (n^{4 / 5}) \Rightarrow C (n) = O (n) .$	(40.3)

Furthermore, with a similar equation, the space required by the data structure can be shown to be O(n).

Balance is maintained by joining and splitting subtrees. The basic idea is the following: A join or split occurs when the size of a subtree has changed significantly, that is, after Ω(n^4/5) updates. Then, a constant number of subtrees will be reconstructed; according to Equation 40.3, the cost of this is linear in the size of the subtrees = O(n^4/5). Also, some keys will be inserted or deleted from the root, causing a reconstruction of the root; the cost of this is by definition O(n^4/5). Amortizing these two costs over the Ω(n^4/5) updates, we get O(1) amortized cost for reconstructing the root. Hence, the restructuring cost is dominated by the search cost.

Finally, the search cost follows immediately from the description of the exponential search tree.■

Exponential search trees may be combined with various other data structures, as illustrated by the following two lemmas:

Lemma 40.5

A static data structure containing d keys can be constructed in O(d4) time and space, such that it supports neighbour queries in $O (\frac{\log d}{\log w} + 1)$ $O (\frac{\log d}{\log w} + 1)$ worst-case time.

Proof. (Sketch) We just construct a static B-tree where each node has the largest possible degree according to Lemma 40.3. That is, it has a degree of $min (d, w^{1 / 6})$ $min (d, w^{1 / 6})$ . This tree satisfies the conditions of the lemma.■

Lemma 40.6

A static data structure containing d keys and supporting neighbour queries in O(log w) worst-case time can be constructed in O(d4) time and space.

Proof. (Sketch) We study two cases.

Case 1: w > d^1/3. Lemma 40.5 gives constant query cost.

Case 2: w ≤ d^1/3. The basic idea is to combine a van Emde Boas tree (Theorem 40.1) with perfect hashing. The data structure of Theorem 40.1 uses much space, which can be reduced to O(d) by hash coding. Since we can afford a rather slow construction, we can use the deterministic algorithm by Fredman, Komlós, and Szemerédi [14]. With this algorithm, we can construct a perfectly hashed van Emde Boas tree in O(d3w) = o(d4) time. ■

Combining these two lemmas, we get a significantly improved upper bound on deterministic sorting and searching in linear space:

Theorem 40.4

$D (n) = O (\sqrt{\log n})$ $D (n) = O (\sqrt{\log n})$ .

Proof. (Sketch) If we combine Lemmas 40.4–40.6, we obtain the following equation

$D (n) = O (min (1 + \frac{\log n}{\log w}, \log w)) + D (n^{4 / 5})$ $D (n) = O (min (1 + \frac{\log n}{\log w}, \log w)) + D (n^{4 / 5})$	(40.4)

which, when solved, gives the theorem.■

Taking both n and w as parameters, D(n) is $o (\sqrt{\log n})$ $o (\sqrt{\log n})$ in many cases [12]. For example, it can be shown that D(n) = O(log w log log n).

The strongest possible bound is achieved by using the following result by Beame and Fich [3]

Lemma 40.7: (Beame and Fich [3])

In polynomial time and space, we can construct a deterministic data structure over d keys supporting searches in $O (min {\sqrt{\log d / log log d}, \frac{\log w}{log log w}})$ $O (min {\sqrt{\log d / log log d}, \frac{\log w}{log log w}})$ time.

Combining this with the exponential search tree we get, among others, the following theorem.

Theorem 40.5

$D (n) = O (\sqrt{\log n / log log n})$ $D (n) = O (\sqrt{\log n / log log n})$ .

Since a matching lower bound was also given by Beame and Fich, this bound is optimal.

40.6From Amortized Update Cost to Worst-Case

In fact, there are worst-case efficient versions of the data structures above. Willard [15] gives a short sketch on how to make fusion trees worst-case efficient, and as shown by Andersson and Thorup [13], the exponential search tree can be modified into a worst-case data structure. Here, we give a brief description of how exponential search trees are modified.

In the above definition of exponential search trees, the criteria for when a subtree is too large or too small depend on the degree of its parent. Therefore, when a node is joined or split, the requirements on its children will change. Above, we handled that by simply rebuilding the entire subtree at each join or split, but in a worst-case setting, we need to let the children of a node remain unchanged at a join or split. In order to do this, we need to switch from a top-down definition to a bottom-up definition.

Definition 40.4: (Worst-case efficient exponential search trees)

In an exponential search tree all leaves are on the same depth. Let the height of a node to be the distance from the node to the leaves descending from it. For a non-root node v at height i > 0, the weight (number of descending leaves) is |v| = Θ(n_i) where n_i = α^{(1 + 1/(k − 1))i} and α = Θ(1). If the root has height h, its weight is O(n_h).

With the exception of the root, Definition 40.4 follows our previous definition of exponential search trees (when k = 5), that is, if v is a non-root node, it has Θ(|v|^1/k) children, each of weight Θ(|v|^{1 − 1/k}).

The worst-case efficiency is mainly based on careful scheduling: the static search structures in the nodes are rebuilt in the background so that they remain sufficiently updated as nodes get joined and split. This scheduling is developed in terms of a general theorem about rebuilding, which has some interesting properties as a tool for other de-amortization applications [13].

40.7Sorting and Priority Queues

In the comparison-based model of computation, the cost per element is the same (O(log n)) for searching and sorting. However, in the RAM model of computation, the sorting problem can be solved faster than searching. The simple intuitive explanation of this is that the bit-parallelism in packed computation can be utilized more efficiently when a number of keys are treated simultaneously, as in sorting, than when they are treated one-by-one as in searching.

Even more, it turns out that priority queues can be as implemented as efficiently as sorting. (The intuitive reason for this is that a priority queue can use sorting as a subroutine, and only keep a small part of the queue perfectly sorted.) Thorup [16] has shown the following reduction:

Theorem 40.6

If we can sort n keys in time S(n) per key, then we can implement a priority queue supporting find-min in constant time and updates (insert and delete) in S(n) time.

In the following, we will use T(n, b) to denote the cost of sorting n b-bit keys.

40.7.1Range Reduction

In sorting algorithms, range reduction is an often used technique. For example, we may view traditional radix sort, where we sort long strings by dividing them into shorter parts, as range reduction.

For our purposes, we will use a range reduction technique by Kirkpatrick and Reisch [17], which is similar to the van Emde Boas tree, cf. Section 40.4.1. The only difference from Section 40.4.1 is that instead of letting each trie node contain a data structure for efficient neighbour search among the outgoing edges, we just keep an unsorted list of all outgoing edges (plus the array for constant-time indexing of edges). Then, after all elements have been inserted into the trie, we create sorted lists of edges at all nodes by the following method:

1.Mark each edge with its parent node.

2.Concatenate all edge lists and sort the entire list.

3.Scan the sorted list and put each edge back in the proper node.

4.All edges lists are now sorted. By a recursive traversal of the trie we can report all leafs in sorted order.

Other details, such as the need to store one key per node to avoid space blow up, are handled in the same way as in Section 40.4.1. Altogether, we get the reduction

$T (n, w) = O (n) + T (n, w / 2) .$ $T (n, w) = O (n) + T (n, w / 2) .$	(40.5)

Applied recursively, this gives

$T (n, w) = O (n \log (w / k)) + T (n, k) .$ $T (n, w) = O (n \log (w / k)) + T (n, k) .$	(40.6)

40.7.2Packed Sorting

For the sorting problem, multiple comparisons can be utilized more efficiently than in a packed B-tree. In the packed B-tree, we used the bit-parallelism to compare one key to many keys, in this way we implemented a parallel version of a linear search, which is not the most efficient search method.

For sorting, however, we can utilize the packed computation more efficiently. It turns out that algorithms for sorting networks are well suited for implementation by packed computation. A sorting network is a “static” algorithm; it contains a number of compare-and-swap operations, these are the same regardless of the outcome of the comparisons. The merging technique by Batcher [18], originally used to design odd-even merge sort, can be efficiently implemented with packed computation. As a sorting network, Batcher’s merging algorithm has depth Θ(log n) where each level has O(n) compare-and-swap units. Based on the merging, we can sort in Θ( log2n) time where the total work is Θ(n log2n).

Batcher’s merging technique is well suited for combination with the Paul-Simon technique, as shown by Albers and Hagerup [19].

Lemma 40.8

T(n, w/ log n) ≤ O(n log log n).

Proof. (Sketch) The key idea is that by packing Θ(log n) keys in a machine word, we can combine Batcher’s algorithm with packed computation to merge two sorted sequences in O( log log n) time. And, if we can merge two sequences of length Θ(log n) in O( log log n) time (instead of O(log n) by a comparison-based algorithm), we can use this as a subroutine tom implement a variant of merge sort that sorts n keys in O(n log log n) time (instead of O(n log n)).■

40.7.3Combining the Techniques

First, the bound on searching from Theorem 40.1 has a corresponding theorem for sorting [17]:

Theorem 40.7

T(n, w) = O(n log(w/log n)).

Proof. (Sketch) Apply Eq. 40.6 with k = log n. Keys of length log n can be sorted in linear time with bucket sort.■

Secondly, we combine range reduction with packed computation. We get the following bound [20]:

Theorem 40.8

T(n, w) = O(n log log n).

Proof. (Sketch) If log w = O( log log n), Theorem 40.7 is sufficient. Otherwise, Eq. 40.6 with k = w/ log n gives T(n,w) = O(nloglog n) + T(n,w/log n). Lemma 40.8 gives the final bound.■

40.7.4Further Techniques and Faster Randomized Algorithms

Apart from these rather simple techniques, there are a number of more elaborate techniques that allows the complexity to be improved further. Examples of such techniques are signature sort [20] and packed bucketing [21]. Here, we give a short sketch of signature sorting.

Consider a situation where the word length w is very large, and we wish to reduce the problem of sorting w-bit keys to that of sorting k-bit keys, k ≫ log n. Instead of treating these k-bit keys directly, we represent each such key by a b-bit signature, where the b bits are a hash function of the k bits. In fact, for one w-bit key, we can in constant time replace it by a shorter key, consisting of q b-bit signatures (for details, we refer to the literature [20,21]).

1.Replace each w-bit key by a qb-bit key of concatenated signatures.

2.Sort the qb-bit keys.

3.Compute, for each qb-bit key, its first distinguishing signature. This can be done by constructing a signature-based trie of all keys.

4.If we know the first distinguishing signature in a qb-bit key, we know the first distinguishing k-bit field in the corresponding w-bit key. Finding these k-bit fields, the range reduction is completed and we can continue by sorting these shorter keys.

It should be noted that the sorted set of qb-bit keys does not correspond to a sorted set of w-bit keys. However, the ordering we get is enough to find the proper distinguishing fields. Furthermore, since we use hash coding, we might get collisions, in which case the method will not work. By choosing b large enough, the risk of failure is small enough that we can afford to redo the entire sorting in case of failure: still the expected time for the range reduction step will be linear.

As an important recent result, Han and Thorup presents a linear algorithm for splitting n integers into subsets, where each subset is of size O(log n). Combining this with techniques like signature sorting, they manage to improve the randomized complexity of sorting to $O (n \sqrt{log log n})$ $O (n \sqrt{log log n})$ . This, in turn, implies that a priority queue can be implemented at $O (\sqrt{log log n})$ $O (\sqrt{log log n})$ time per update and find-min in constant time.

Other relevant reading can be found in the cited articles, or in [22–28]

References

1.J. L. Hennessy and D. A. Patterson. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann Publ., San Mateo, CA, 1994.

2.J. L. Carter and M. N. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18:143–154, 1979.

3.P. Beame and F. Fich. Optimal bounds for the predecessor problem. In Proc. 31st STOC, pages 295–304, 1999.

4.P. van Emde Boas. Preserving order in a forest in less than logarithmic time. In Proceedings of the 16th Annual IEEE Symposium on Foundations of Computer Science, pages 75–84, 1975.

5.P. van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Inf. Proc. Lett., 6(3):80–82, 1977.

6.P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Math. Syst. Theory, 10:99–127, 1977.

7.K. Mehlhorn and S. Nähler. Bounded ordered dictionaries in O( log log n) time and O(n) space. Inf. Proc. Lett., 35(4):183–189, 1990.

8.W. J. Paul and J. Simon. Decision trees and random access machines. In Logic and Algorithmic: An International Symposium Held in Honour of Ernst Specker, pages 331–340. L’Enseignement Mathématique, Université de Genevè, 1982.

9.A. Andersson. Sublogarithmic searching without multiplications. In Proc. 36th FOCS, pages 655–663, 1995.

10.M. L. Fredman and D. E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47:424–436, 1993. Announced at STOC’90.

11.A. Andersson, P. B. Miltersen, and M. Thorup. Fusion trees can be implemented with AC0 instructions only. Theoretical Computer Science, 215(1-2):337–344, 1999.

12.A. Andersson. Faster deterministic sorting and searching in linear space. In Proc. 37th FOCS, pages 135–141, 1996.

13.A. Andersson and M. Thorup. Tight(er) worst-case bounds on dynamic searching and priority queues. In Proc. 32th STOC, 2000.

14.M. L. Fredman, J. Komlós, and E. Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538–544, 1984.

15.D. E. Willard. Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree. SIAM Journal on Computing, 29(3):1030–1049, 2000. Announced at SODA’92.

16.M. Thorup. Equivalence between priority queues and sorting. In Proc. FOCS’02, 2002.

17.D. Kirkpatrick and S. Reisch. Upper bounds for sorting integers on random access machines. Theor. Comp. Sci., 28:263–276, 1984.

18.K. E. Batcher. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference, pages 307–314, 1968. Volume 32.

19.S. Albers and T. Hagerup. Improved parallel integer sorting without concurrent writing. Inf. Contr., 136:25–51, 1997. Announced at SODA ’92.

20.A. Andersson, T. Hagerup, S. Nilsson, and R. Raman. Sorting in linear time? J. Comp. Syst. Sc., 57:74–93, 1998. Announced at STOC’95.

21.Y. Han and M. Thorup. Integer sorting in $o (n \sqrt{log log n})$ $o (n \sqrt{log log n})$ expected time and linear space. In Proc. FOCS ’02, 2002.

22.A. Andersson and S. Nilsson. A new efficient radix sort. In Proc. 35th Annual IEEE Symposium on Foundations of Computer Science, pages 714–721. IEEE Computer Society Press, 1994.

23.A. Brodnik, P. B. Miltersen, and I. Munro. Trans-dichotomous algorithms without multiplication - some upper and lower bounds. In Proc. 5th WADS, LNCS 1272, pages 426–439, 1997.

24.M. L. Fredman and D. E. Willard. Trans-dichotomous algorithms for minimum spanning trees and shortest paths. J. Comput. Syst. Sci., 48:533–551, 1994.

25.Y. Han. Fast integer sorting in linear space. In Proc. 34th STOC, pages 602–608, 2002.

26.Y. Han. Improved fast integer sorting in linear space. Inform. Comput., 170(8):81–94, 2001. Announced at STACS’00 and SODA’01.

27.R. Raman. Priority queues: small, monotone and trans-dichotomous. In Proc. 4th ESA, LNCS 1136, pages 121–137, 1996.

28.M. Thorup. On RAM priority queues. SIAM J. Comp., 30(1):86–109, 2000.

*This chapter has been reprinted from first edition of this Handbook, without any content updates.

†The term RAM is used for many models. There are also RAM models with infinite word length.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 40. Searching and Priority Queues in o(log n) Time

Create new playlist

Sign In

Sign Up

Table of Contents for
40. Searching and Priority Queues in o(log n) Time