It is easy to create a pattern database by conducting a breadth-first search in a backward direction, starting at
. This assumes that for each action
a we can devise an inverse action
a−1 such that
iff
. If the set of backward actions
is equal to
A, the problem is reversible (leading to an undirected problem graph). For backward pattern database construction, the uniqueness of the actions’ inverse is sufficient. The set of all states generated by applying inverse actions to a state
u is denoted as
. It is generated by the inverse successor generating function
. Pattern databases can cope with weighted state spaces. Moreover, by additionally associating the shortest path predecessor with each state it is possible to maintain the shortest abstract path that leads to the abstract goal. To construct a pattern database for weighted graphs, the shortest path exploration in abstract space uses inverse actions and Dijkstra's algorithm.
Since pattern databases represent the set
Closed of expanded nodes in abstract space, a straightforward implementation for storing and retrieving the computed distance information are hash tables. As introduced in
Chapter 3, many different options are available such as hash tables with chaining, open addressing, suffix lists, and others. For search domains with a regular structure (like the (
n2−1)-
Puzzle), the database can be time- and space-efficiently implemented as a
perfect hash table. In this case the hash address uniquely identifies the state that is searched, and the hash entries themselves consist of the shortest path distance value only. The state itself has not been stored. A simple and fast (though not memory-efficient) implementation of such a perfect hash table is a multidimensional array that is addressed by the state components. More space-efficient indexes for permutation games have been introduced in
Chapter 3. They can be adapted to partial state/pattern addresses.
The pattern database technique was first applied to define heuristics for sliding-tile puzzles. The space required for pattern database construction can be bounded by the length of the abstract goal distance encoding times the size of the perfect hash table.
4.4.2. Rubik's Cube
A state in
Rubik's Cube (see
Sec. 1.7.2) is uniquely specified by the position and orientation of the 8 edge cubies and the 12 corner cubies. For the implementation, this can be represented as an array of 20 elements, one for each cubie. The values encode the position and orientation as one of 24 different values—8.3 for the corners and 2·12 for the edges.
Uninformed search is impractical on large Rubik's Cube problems, since the number of generated nodes grows rapidly. In Depth 10 we expect 244,686,773,808 nodes, and in Depth 18 we have more than 2.46·1020 nodes.
If we consider the eight corner cubies as a pattern, the position and orientation of the last cubie is determined by the remaining seven, so there are exactly
possible combinations. Using backward breadth-first search starting from the goal state, we can enumerate these states and build a pattern database. Perfect hashing can be used, allocating four bits to encode the heuristic value for each abstract state. The mean heuristic value is about 8.764. In addition, we may consider the edge cubies separately. Since taking all edge cubies into consideration would lead to a memory-exceeding pattern database, the set of edge cubies is divided, leading to two pattern databases of size
. The mean heuristic value for the maximum of all three pattern database heuristics is about 8.878.
Using the previous databases, Korf solved 10 random instances to the
Rubik's Cube move optimally. The solvable instances were generated by making 100 random moves each starting from the goal state. His results are shown in
Table 4.2. The optimal solution for the hardest possible instance has 20 moves (see
Ch. 1).
Table 4.2 Solutions to 10 random Rubik's Cube instances.
Problem | Depth | Nodes Generated |
---|
1 | 16 | 3,720,885,493 |
2 | 17 | 11,485,155,726 |
3 | 17 | 64,937,508,623 |
4 | 18 | 126,005,368,381 |
5 | 18 | 262,228,269,081 |
6 | 18 | 344,770,394,346 |
7 | 18 | 502,417,601,953 |
8 | 18 | 562,494,969,937 |
9 | 18 | 626,785,460,346 |
10 | 18 | 1,021,814,815,051 |
4.4.4. Korf's Conjecture
In the following we are interested in the performance of pattern databases. We will argue that for the effectiveness of an (admissible) heuristic function, the expected value is a very good predictor. This average can be approximated by random sampling, or for pattern databases determined as the average of the database values. This value is exact for the distribution of heuristic values in the abstract state space but, as abstractions are nonuniform in general, only approximate for the concrete state space.
In general, the larger the values of an admissible heuristic, the better the corresponding database should be judged. This is due to the fact that the heuristic values directly influence the search efficiency in the original search space. As a consequence, we compute the mean heuristic value for each database. More formally, the average estimate of a pattern database
PDB with
entries in the range
is
A fundamental question about memory-based heuristics concerns the relationship between the size of the pattern database and the number of nodes expanded when the heuristic is used to guide the search. One problem of relating the performance of a search algorithm to the accuracy of the heuristic is it is hard to measure. Determining the exact distance to the goal is computationally infeasible for large problems.
If the heuristic value of every state is equal to its expected value
, then a search to depth
d is equivalent to searching to depth
without a heuristic, since the
f-value for every state would be its depth plus
. However, this estimate turns out to be much too low in practice. The reason for the discrepancy is that the states encountered are not random samples. States with large heuristic values are pruned and states with small heuristic values spawn more children.
On the other hand, we can predict the expected value of a pattern database heuristic during the search. The minimal depth of a search tree covering a search space of
n nodes with constant branching factor of
b will be around
. This is because with
d moves, we can generate about
nodes. Since we are ignoring possible duplicates, this estimate is generally too low.
We assume
d to be the average optimal solution length for a random instance and that our pattern database is generated by caching heuristic values for all states up to distance
d from the goal. If the abstract search tree is also branching with factor
b, a lower bound on the expected value of a pattern database heuristic is
, where
m is the number of stored states in the database (being equal to the size of the abstract state space). The derivation of
is similar to the one of
for the concrete state space.
The hope is that the combination of an overly optimistic estimate with a too-pessimistic estimate results in a more realistic measure. Let
t be the number of nodes generated in an A* search (without duplicate detection). Since
d is the depth to which A* must search, we can estimate
. Moreover, as argued earlier, we have
and
. Substituting the values for
d and
yields
Since the treatment is insightful but informal, this estimate has been denoted as
Korf's conjecture; it states that the number of generated nodes in an A* search without duplicate detection using a pattern database may be approximated by
, the size of the problem space divided by the available memory. Using experimental data from the
Rubik's Cube problem shows that the prediction is very good. We have
,
,
, and
, which is off only by a factor of 1.4.
4.4.5. Multiple Pattern Databases
The most successful applications of pattern databases all use multiple ones.
This raises the question on improved main memory consumption: Is it best to use one large database, or rather split the available space up into several smaller ones? Let
m be the number of patterns that we can store in the available memory, and
p be the number of pattern databases. In many experiments on the performance of
p pattern databases of size
(e.g., in the domain of the sliding-tile puzzle and in
Rubik's Cube) it has been observed that small values of
p are suboptimal. The general observation is that the use of maximized smaller pattern databases reduces the number of nodes. For example, heuristic search in the
Eight-
Puzzle with 20 pattern databases of size 252 performs less state expansions (318) than 1 pattern database of size 5,040 (yielding 2,160 state expansions).
The observation remains true if maximization is performed on a series of different partitions into databases. The first heuristic of the
Twenty-
Four-
Puzzle partitions the tiles into four groups of 6 tiles each. When partitioning the 24 tiles into eight different pattern databases with four groups of 5 tiles and one group of 4 tiles this results
patterns.
Compared to the first heuristic that generates
patterns, this is roughly one-third. However, the second heuristic performs better, with a ratio of nodes generated in between 1.62 to 2.53.
Of course, the number of pattern databases cannot be scaled to an arbitrary amount. With very few states the distances in abstract state space are very imprecise. Moreover, since node generation in sliding-tile puzzles is very fast, the gains of a smaller node count are counterbalanced by the larger efforts in addressing the multiple databases and computing the maximum.
The explanation of this phenomenon, that many smaller pattern databases may perform better than one larger one, is based on two observations:
• The use of smaller pattern databases instead of one large pattern database usually reduces the number of patterns with high h-value; maximizing the values of the smaller pattern databases can make the number of patterns with low h-values significantly smaller than the number of low-valued patterns in the larger pattern database.
• Eliminating low h-values is more important for improving search performance than for retaining large h-values.
The first assertion is intuitively clear. A smaller pattern database means a smaller pattern space with fewer patterns with high h-values. Maximization of the smaller pattern databases reduces the number of patterns with very small h-values.
The second assertion refers to the number of nodes expanded. If pattern databases differ only in their maximum value, this only affects the nodes with a large h-value, corresponding to a number of nodes that is typically small. If the two pattern databases, on the contrary, differ in the fraction of nodes with a small h-value, this has a large effect on the number of nodes expanded, since the number of nodes that participate in those values is typically large.
As multiple pattern database lookups can be time consuming, we gain efficiency by computing bounds on the heuristic estimate prior to the search, and avoiding database lookups if the bounds are exceeded.
4.4.6. Disjoint Pattern Databases
Disjoint pattern databases are important to derive admissible estimates. It is immediate that the maximum of two heuristics is admissible. On the other hand, we would like to add the heuristic estimates of two pattern databases to arrive at an even better estimate. Unfortunately, adding heuristics does not necessarily preserves admissibility. Additivity can be applied if the cost of a subproblem is composed from costs of objects from a corresponding pattern only. For the (n2 − 1)-Puzzle, every operator moves only one tile, but Rubik's Cube is a counterexample.
Consider the example of a small graph with four nodes
s,
u,
v, and
t arranged along one path (see
Fig. 4.6), where
s is the start and
t is the goal node. The first abstraction merges nodes
s and
u, and the second abstraction merges
u with
v. Because self-loops do not contribute to optimal solutions they have been omitted from the abstract graph. As the incoming edge to
t remains in both abstractions, being in state
v gives the cumulated abstract distance value 2, which is larger than the concrete distance 1.
The reason why we could only take the maximum of the fringe and corner heuristics for the
Eight-
Puzzle is that we want to avoid counting some action twice. The minimum number of moves stored
in the database does not involve only the tiles' movements that are part of the actual pattern. Since the nonpattern moves can be part of the abstract solution path for the other pattern, adding the two values might result in a nonadmissible heuristic.
A solution for the problem is not to record the total solution path length, but to count moves of the tiles in the pattern for computing the heuristic estimate only. Since at each point in time only one tile is moved, this makes it possible to add the heuristic values rather than maximizing them. As an extreme case, we can think of the Manhattan distance as the sum of n2 − 1 patterns consisting of one tile each. Since each move changes only the shifted tile, addition is admissible. Generally, we can resort to this partitioning technique if we make sure that the subgoal solutions are independent.
Different disjoint pattern databases' heuristics can additionally be combined using the maximization of their outcome. For example, when solving random instances of the
Twenty-
Four-
Puzzle, we might compute the maximum of two additive groups of four disjoint pattern databases each. As shown in
Figure 4.7 each tile group (indicated by the enclosed areas) consists of six tiles for generating databases with
patterns (the location of the blank is indicated using a black square).
If, for all states, the same partitioning is applied, we speak of statically partitioned disjoint pattern databases. There is an alternative way of dynamically choosing among several possible partitions the one with the maximum heuristic value. For example, a straightforward generalization of the Manhattan distance for sliding-tile puzzles is to precompute the shortest solution for every pair of tiles, rather than considering each tile individually. Then, we can construct an admissible heuristic by choosing half of these pairs such that each tile is covered exactly once. With an odd number of tiles, one of them will be left out, and simply contributes with its Manhattan distance.
To compute the most accurate heuristic for a given state, we have to solve a
maximum weighted bipartite matching problem in a graph where each vertex in both sets corresponds to a tile, and each
edge between the two sets is labeled with the pairwise solution cost of the corresponding pair of tiles. An algorithm is known that accomplishes this task in
time, where
k is the number of tiles. However, it has also been shown that the corresponding matching problem for
triples of tiles is NP-complete. Thus, in general, dynamic partitioning might not be efficiently computable, and we would have to resort to approximate the largest heuristic value.
If we partition the state variables into disjoint subsets (patterns), such that no action affects variables in more than one subset, then a lower bound for the optimal solution of an instance is the sum of the optimal costs of solving optimally each pattern corresponding to the variable values of the instance.
Definition 4.3
(Disjoint State Space Abstractions) Let actions be trivial
(a no-op) if they induce a self-cycle in the abstract state space graph. Two state space abstractions ϕ1 and ϕ2 are disjoint, if for all nontrivial actions a′ in the abstraction generated by ϕ1 and for all nontrivial actions in the abstraction generated by ϕ2, we have , where ,
.
Trivial actions correspond to self-loops in the problem graph.
If we have more than one pattern database, then for each state
u in the concrete space and each abstraction
we compute the values
. The heuristic estimate
is the accumulated cost of the costs in the different abstractions; that is,
. To preserve
admissibility, we require
disjointness, where two pattern databases with regard to the abstractions
and
are
disjoint, if or all
we have
.
Theorem 4.3
(Additivity of Disjoint Pattern Databases) Two disjoint pattern databases are additive, by means that their distance estimates can be added while still providing a lower bound for the optimal solution path length.
Proof
Let
P1 and
P2 be abstractions of
according to
ϕ1 and
ϕ2, respectively, and let
be an optimal sequential plan for
P. Then, the abstracted plan
is a solution for the state space problem
P1, and
is a solution for the state space problem
P2. We assume that all void actions in
and
, if any, are removed. Let
k1 and
k2 be the resulting respective lengths of
and
. Since the pattern databases are disjoint, for all
and all
we have
. Therefore,
.
Consider a slight modification of the example graph with four nodes
s,
u,
v, and
t, now arranged as shown in
Figure 4.8. The first abstraction merges nodes
s with
u and
v with
t and the second abstraction merges
s with
v and
u with
t. Now each edge remains valid in only one abstraction, so that being in state
v gives the cumulated abstract distance value 1, which is equal to the concrete distance 1.
In
Figure 4.10 a plain pattern database (left) and a pair of disjoint pattern databases (right) are shown. All pattern databases (gray bars) refer to underlying partial state vectors (represented as thin rectangles). The first rectangle in both cases represents the state vector in original space with all parts of it being relevant (no shading). The second (and third) rectangle additionally indicates the selected part of don't care variables in the state vector (shaded in black) for each abstraction. The heights of the pattern database bars that are erected on top of the state vector indicate the sizes of the pattern databases (number of states stored), and the widths of the pattern database rulers correlate with the
selected parts of the state vector. The maximum size of a pattern database is determined by the amount of main memory available and is illustrated by a line above the databases.
Finding disjoint state space abstractions in general is difficult. Therefore, in pattern database practice, an alternative approach for enforcing disjointness is used: If an action has a nonempty
intersection with one more than one chosen pattern, it is assigned to cost 0 in all but one database. Alternatively, we can assign 1 divided by the number of times the action is valid for an abstraction.
For the (
n2 − 1)-
Puzzle at most one tile can move at a time. Hence, if we restrict the count to only pattern tile moves, we can add entries of pattern databases with a disjoint tile set.
Table 4.3 shows the effect of disjoint pattern databases in reducing the number of search nodes and increasing the mean heuristic value.
Table 4.3 Effect of disjoint pattern databases in the Fifteen-Puzzle.
Heuristic | Nodes | Mean Heuristic Value |
---|
Manhattan Distance | 401,189,630 | 36.942 |
Disjoint 7- and 8-Tile Pattern Databases | 576,575 | 45.632 |
Reconsider the example of the graph with four nodes
s,
u,
v, and
t arranged along a path with its two abstraction functions. The edge to
t is assigned to cost 1 in the first abstraction and to cost 0 in the second abstraction such that, when being in state
v, the cumulated abstract distance value is now 1, which is equal to the concrete distance 1. The resulting mapping is shown in
Figure 4.9.
In general, we cannot expect that each action contributes costs to only one pattern database. For this case, concrete actions can be counted multiple times in different abstractions. This implies that the inferred heuristic is no longer admissible. To see this, suppose that this action immediately reaches the goal with cost 1. In contrast, the cumulated cost for being nonvoid in two abstractions is 2.
Another option is to count an action in only one abstraction. In a modified breadth-first pattern database construction algorithm this is achieved as follows. In each BFS level we compute the transitive hull of zero-cost actions: each zero-cost action is applied until no zero-cost action is applicable. In other words, the impact of an action is added to the overall cost, only if it does not appear for the construction of another pattern database.
A general theory of additive state space abstraction has been defined on top of a directed abstract graph by assigning two weights per edge, one for the primary cost
and residual cost
. Having two costs per abstract edge instead of just one is inspired by counting distinguished moves differently than don't care moves. In that example, our primary cost is the cost associated with the distinguished moves, and our residual cost is the cost associated with the don't care moves.
It is not difficult to see that if for all edges
and state space abstractions
in the original space the edge
is contained in the abstract space, and if for all paths
in the original space we have
, the resulting heuristic is consistent. Moreover, if for all paths
we have
, the resulting additive heuristic is also consistent.