8.6.1. Delayed Duplicate Detection for BFS
A variant of Munagala and Ranade's algorithm for BFS in implicit graphs has been coined with the term
delayed duplicate detection for
frontier search. Let
s be the initial node, and
be the implicit successor generation function. The algorithm maintains BFS layers on disk. Layer
is scanned and the set of successors are put into a buffer of a size close to the main memory capacity. If the buffer becomes full, internal sorting followed by a duplicate elimination phase generates a sorted duplicate-free node sequence in the buffer that is flushed to disk. The outcome of this phase are
k presorted files. Note that delayed internal duplicate elimination can be improved by using hash tables for the blocks before being flushed to disk. Since the node set in the hash table has to be stored anyway, the savings by early duplicate detection are often small.
In the next step,
external merging is applied to unify the files into
by a simultaneous scan. The size of the output files is chosen such that a single pass suffices. Duplicates are eliminated. Since the files were presorted, the complexity is given by the scanning time of all files. We also have to eliminate
and
from
to avoid recomputations; that is, nodes extracted from the external queue are not immediately deleted, but kept until the layer has been completely generated and sorted, at which point duplicates can be eliminated using a parallel scan. The process is repeated until
becomes empty, or the goal has been found.
The corresponding pseudo code is shown in
Algorithm 8.2. Note that the explicit partition of the set of successors into blocks is implicit. Termination is not shown, but imposes no additional implementation problem.
Theorem 8.2
(Efficiency Implicit External BFS) On an undirected implicit problem graph, external BFS with delayed duplicate elimination requires at most I/Os.
Proof
The proof for implicit problem graphs is essentially the same for explicit problem graphs with the exception that the access to the external device for generating the successors of a state resulting in at most
I/Os is not needed.
As with the algorithm of Munagala and Ranade, delayed duplicate detection applies
I/Os. As no explicit access to the adjacency list is needed, by
and
, the total execution time is
I/Os.
In search problems with a bounded branching factor we have
, and thus the complexity for implicit external BFS reduces to
I/Os. If we keep each
in a separate file for sparse problem graphs (e.g., simple chains), file opening and closing would accumulate to
I/Os. The solution for this case is to store the nodes in
,
, and so forth, consecutively in internal memory. Therefore, I/O is needed, only if a level has at most
B nodes.
The algorithm shares similarities with the internal frontier search algorithm (see
Ch. 6) that was used for solving the
Multiple Sequence Alignment problem. In fact, the implementation has been applied to external memory search with considerable success. The BFS algorithm extends to graphs with bounded locality. For this case and to ease the description of upcoming algorithms, we assume to be given a general file subtraction procedure as implemented in
Algorithm 8.3.
In an internal, not memory-restricted setting, a plan is constructed by backtracking from the goal node to the start node. This is facilitated by saving with every node a pointer to its predecessor. For
memory-limited frontier search, a divide-and-conquer solution reconstruction is needed for which certain relay layers have to be stored in the main memory. In external search divide-and-conquer solution reconstruction and relay layers are not needed, since the exploration fully resides on disk.
There is one subtle problem: predecessors of the pointers are not available on disk. This is resolved as follows. Plans are reconstructed by saving the predecessor together with every state, by scanning with decreasing depth the stored files, and by looking for matching predecessors. Any reached node that is a predecessor of the current node is its predecessor on an optimal solution path. This results in a I/O complexity that is at most linear to scanning time
.
Even if conceptually simpler, there is no need to store the
Open list in different files
,
. We may store successive layers appended in one file.
8.6.4. External A*
In the following we study how to extend external breadth-first exploration in implicit graphs to an A*-like search. If the heuristic is consistent, then on each search path, the evaluation function
f is nondecreasing. No successor will have a smaller
f-value than the current one. Therefore, the A* algorithm, which traverses the node set in
f-order, expands each node at most once. Take, for example, a sliding-tile puzzle. As the Manhattan distance heuristic is consistent, for every two successive nodes
u and
v the difference of the according estimate evaluations
is either −1 or 1. By the increase in the
g-value, the
f-values remain either unchanged, or
.
As earlier,
external A* maintains the search frontier on disk, possibly partitioned into main memory–size sequences. In fact, the disk files correspond to an external representation of a bucket implementation of a priority queue data structure (see
Ch. 3). In the course of the algorithm, each bucket addressed with index
i contains all nodes
u in the set
Open that have priority
. An external representation of this data structure will memorize each bucket in a different file.
We introduce a refinement of the data structure that distinguishes between nodes with different
g-values, and designates bucket
to all nodes
u with path length
and heuristic estimate
. Similar to external BFS, we do not change the identifier
Open to separate generated from
expanded nodes. In external A* (see
Alg. 8.7), bucket
refers to nodes that are in the current search frontier or belong to the set of expanded nodes. During the exploration process, only nodes from one currently
active bucket , where
, are expanded, up to its exhaustion. Buckets are selected in lexicographic order for
; then, the buckets
with
and
are
closed, whereas the buckets
with
or with
and
are
open. Depending on the actual node expansion progress, nodes in the active bucket are either
open or
closed.
To estimate the maximum number of buckets once more we consider
Figure 7.14 as introduced in the analysis of the number of iteration in Symbolic A* (see
Ch. 7), in which the
g-values are plotted with respect to the
h-values, such that nodes with the same
value are located on the same diagonal. For nodes that are expanded in
the successors fall into
,
, or
. The number of naughts for each diagonal is an upper bound on the number of buckets that are needed. In
Chapter 7, we have already seen that the number is bounded by
.
By the restriction for
f-values in the (
n2 − 1)-
Puzzle only about half the number of buckets have to be allocated. Note that
is not known in advance, so that we have to construct and maintain the files on-the-fly.
Figure 8.5 shows the memory profile of external A* on a
Thirty-
Five-
Puzzle instance (with 14 tiles permuted). The exploration started in bucket
and terminated while expanding bucket
. Similar to external BFS but in difference to ordinary A*, external A* terminates while generating the goal, since all states in the search frontier with a smaller
g-value have already been expanded. For this experiment three disjoint 3-tile and three disjoint 5-tile pattern databases were loaded, which, together with the buckets for reading and flushing, consumed about 4.9 gigabytes of RAM. The total disk space taken was 1,298,389,180,652 bytes, or 1.2 terabytes, with a state vector of
bytes: 32 bytes for the state vector plus information for incremental heuristic evaluation; 1 byte for each value stored, multiplied by six sets of at most 12 pattern databases plus 1 value each for their sum. Factor 2 is due to symmetry lookups. The exploration took about two weeks.
The following result restricts duplicate detection to buckets of the same h-value.
Proof
As in the algorithm of Munagala and Ranade, we can exploit the observation that in an undirected problem graph, duplicates of a node with BFS level
i can at most occur in levels
i,
, and
. In addition, since
h is a total function, we have
if
.
For ease of describing the algorithm, we consider each bucket for the
Open list as a different file. Very sparse graphs can lead to bad I/O performance, because they may lead to buckets that contain by far less than
B elements and dominate the I/O complexity. For the following, we generally assume large graphs for which
and
.
Algorithm 8.7 depicts the pseudo code of the external A* algorithm for consistent estimates and unit cost and undirected graphs. The algorithm maintains the two values
and
to address the currently considered buckets. The buckets of
are traversed for increasing
up to
. According to their different
h-values, successors are arranged into three different frontier lists:
,
, and
; hence, at each instance only four buckets have to be accessed by I/O operations. For each of them, we keep a separate buffer of size
; this will reduce the internal memory requirements to
M. If a buffer becomes full then it is flushed to disk. As in BFS it is practical to presort buffers in one bucket immediately by an efficient internal algorithm to ease merging, but we could equivalently sort the unsorted buffers for one bucket externally.
There can be two cases that can give rise to duplicates within an active bucket (see
Fig. 8.6, black bucket): two different nodes of the
same predecessor bucket generating a common successor, and two nodes belonging to
different predecessor buckets generating a duplicate. These two cases can be dealt with by merging all the presorted buffers corresponding to the same bucket, resulting in one sorted file. This file can then be scanned to remove the duplicate nodes from it. In fact, both the merging and duplicates removal can be done simultaneously.
Another special case of the duplicate nodes exists when the nodes that have already been evaluated in the upper layers are generated again (see
Fig. 8.6). These duplicate nodes have to be removed by a file subtraction process for the next active bucket
by removing any node that has appeared in
and
(buckets shaded in light gray). This file subtraction can be done by a mere parallel scan of the presorted files and by using a temporary file in which the intermediate result is stored. It suffices to remove duplicates only in the bucket that is expanded next,
. The other buckets might not have been fully generated, and hence we can save the redundant scanning of the files for every iteration of the innermost
while loop.
When merging the presorted sets
with the previously existing
Open buckets (both residing on disk), duplicates are eliminated, leaving the sets
Open ,
Open , and
Open duplicate free. Then the next active bucket
Open is refined not to contain any node in
Open or
Open . This can be achieved through a parallel scan of the presorted files and by using a temporary file in which the intermediate result is stored, before
Open is updated. It suffices to perform file subtraction lazily only for the bucket that is expanded next.
Theorem 8.6
(Optimality of External A*) In a unit cost graph external A* is complete and optimal.
Proof
Since external A* simulates A* and only changes the order of expanded nodes that have the same f-value, completeness and optimality are inherited from the properties shown for A*.
Theorem 8.7
(I/O Performance of External A* in Undirected Graphs) The complexity for external A* in an implicit unweighted and undirected graph with a consistent estimate is bounded by I/Os.
Proof
By simulating internal A*, delayed duplicate elimination ensures that each edge in the problem graph is looked at at most once. Similar to the analysis for external implicit BFS
, I/Os are needed to eliminate duplicates in the successor lists. Since each node is expanded at most once, this adds
I/Os to the overall runtime.
Filtering, evaluating nodes, and merging lists is available in scanning time of all buckets in consideration. During the exploration, each bucket
Open will be referred to at most six times, once for expansion, at most three times as a successor bucket, and at most two times for duplicate elimination as a predecessor of the same
h-value as the currently active bucket. Therefore, evaluating, merging, and file subtraction add
I/Os to the overall runtime. Hence, the total execution time is
I/Os.
If we additionally have
, the complexity reduces to
I/Os. We next generalize the result to directed graphs with bounded locality.
Theorem 8.8
(I/O Performance of External A* in Graphs with Bounded Locality) The complexity for external A* in an implicit unweighted problem graph with bounded locality and consistent estimate is bounded by I/Os.
Proof
Consistency implies that we do not have successors with an
f-value that is smaller than the current minimum. If we subtract a bucket
from
with
and
being smaller than the locality
l, then we arrive at full duplicate detection. Consequently, during the exploration each problem graph node and edge is considered at most once. The efforts due to removing nodes in each bucket individually accumulate to at most
I/Os, and subtraction adds
I/Os to the overall complexity.
Internal costs have been neglected in this analysis. Since each node is considered only once for expansion, the internal costs are
times the time
for successor generation, plus the efforts for internal duplicate elimination and sorting. By setting the weight of all edges
to
for a consistent heuristic
h, A* can be cast as a variant of Dijkstra's algorithm that requires internal costs of
,
successor of
on a bucket-based priority queue. Due to consistency we have
, so that, given
, internal costs are bounded by
, where
refers to the total internal sorting efforts.
To reconstruct a solution path, we store predecessor information with each node on disk (thus doubling the state vector size), and apply backward chaining, starting with the target node. However, this is not strictly necessary: For a node in depth
g, we intersect the set of possible predecessors with the buckets of depth
. Any node that is in the intersection is reachable on an optimal solution path, so that we can iterate the construction process. The time complexity is bounded by the scanning time of all buckets in consideration, namely by
I/Os.
Up to this point, we have made the assumption of unit cost graphs; in the rest of this section, we generalize the algorithm to small integer weights in
. Due to consistency of the heuristic, it holds for every node
u and every successor
v of
u that
. Moreover, since the graph is undirected, we equally have
, or
; hence,
. This means that the successors of the nodes in the active bucket are no longer spread across three, but over
buckets. In
Figure 8.7, the region of successors is shaded in dark gray, and the region of predecessors is shaded in light gray.
For duplicate reduction, it is sufficient to subtract the
buckets
from the active bucket
prior to the expansion of its nodes (indicated by the shaded rectangle in
Fig. 8.7). We assume
I/Os for accessing the files is negligible.
Theorem 8.9
(I/O Performance of External A* in Nonunit Cost Graphs) The I/O complexity for external A* in an implicit undirected unit cost graph, where the weights are in , with a consistent estimate, is bounded by .
Proof
It can be shown by induction over
that no duplicates exist in smaller buckets. The claim is trivially true for
. In the induction step, assume to the contrary that for some node
,
contains a duplicate
with
; let
be the predecessor of
v. Then, by the undirected graph structure, there must be a duplicate
. But since
, this is a contradiction to the induction hypothesis.
The derivation of the I/O complexity is similar to the unit cost case; the difference is that each bucket is referred to at most
times for bucket subtraction and expansion. Therefore, each edge in the problem graph is considered at most once.
If we do not impose a bound
C on the maximum integer weight, or if we allow directed graphs, the runtime increases to
I/Os. For larger edge weights and
-values, buckets become sparse and should be handled more carefully.
Let us consider how to externally solve Fifteen-Puzzle problem instances that cannot be solved internally with A* and the Manhattan distance estimate. Internal sorting is implemented by applying Quicksort. The external merge is performed by maintaining the file pointers for every flushed buffer and merging them into a single sorted file. Since we have a simultaneous file pointers capacity bound imposed by the operating system, a two-phase merging applies. Duplicate removal and bucket subtraction are performed on single passes through the bucket file. As said, the successor's f-value differs from the parent node by exactly 2.
In
Table 8.3 we show the diagonal pattern of nodes that is developed during the exploration for a simple problem instance.
Table 8.4 illustrates the impact of duplicate removal (
) and bucket subtraction (
) for problem instances of increasing complexity. In some cases, the experiment is terminated because of the limited hard disk capacity.
Table 8.3 Nodes inserted in the buckets. Rows denote depth of search (g-value) and columns denote estimated goal distance (h-value).
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|
0 | – | – | – | 1 | – | – | – | – | – | – | – |
1 | – | – | – | – | 2 | – | – | – | – | – | – |
2 | – | – | – | 4 | – | 2 | – | – | – | – | – |
3 | – | – | – | – | 10 | – | 4 | – | – | – | – |
4 | – | – | – | 7 | – | 17 | – | 10 | – | – | – |
5 | – | – | – | – | 20 | – | 34 | – | 24 | – | – |
6 | – | – | – | 6 | – | 38 | – | 74 | – | 44 | – |
7 | – | – | – | – | 19 | – | 71 | – | 156 | – | 76 |
8 | – | – | – | 8 | – | 40 | – | 185 | – | 195 | – |
9 | – | – | – | – | 21 | – | 97 | – | 203 | – | – |
10 | – | – | – | 3 | – | 62 | – | 92 | – | – | – |
11 | – | – | – | – | 21 | – | 46 | – | – | – | – |
12 | – | – | – | 5 | – | 31 | – | – | – | – | – |
13 | – | – | 2 | – | 10 | – | – | – | – | – | – |
14 | – | 2 | – | 5 | – | – | – | – | – | – | – |
15 | 2 | – | 5 | – | – | – | – | – | – | – | – |
Table 8.4 Impact of duplicate removal and bucket subtraction on the number of generated nodes.
Instance | N | | |
---|
1 | 530,401 | 2,800 | 1,654 |
2 | 71,751,166 | 611,116 | 493,990 |
3 | < out of disk space> | 7,532,113 | 5,180,710 |
4 | < out of disk space> | < out of disk space> | 297,583,236 |
5 | < out of disk space> | < out of disk space> | 2,269,240,000 |
6 | < out of disk space> | < out of disk space> | 2,956,384,330 |
One interesting feature of our approach from a practical point of view is the ability to pause and resume the program execution in large problem instances. This is desirable, for example, in the case when the limits of secondary storage are reached, because we can resume the execution with more disk space.