10.1.1. Substring Pruning
Without mentioning them, most implementations of memory-restricted search algorithms perform already a basic form of pruning; when the successors of a node are generated, they prohibit using an inverse action to return to the node's parent. For example, in an infinitely large
Gridworld state space (see
Fig. 10.1, left) with actions
U,
D,
L, and
R, the action sequence
LR will always produce a duplicate node. Rejecting inverse action pairs, including
RL,
UD, and
DU as well, reduces the number of children of a search node from four to three (see
Fig. 10.1, middle).
In this section, we describe a method for pruning duplicate nodes that extends on this idea. The approach is suitable for heuristic search algorithms like IDA* that have imperfect duplicate detection
due to memory restrictions; it can be seen as an alternative to the use of transposition or hash tables.
We take advantage of the fact that the problem spaces of most combinatorial problems are described implicitly, and consider the state space problem in labeled representation, so that Σ is the set of different action labels. The aim of substring pruning is to prune paths from the search process that contain one of a set of forbidden words D over Σ*. These words, called duplicates, are forbidden because it is known a priori that the same state can be reached through a different, possibly shorter, action sequence called a shortcut; only the latter path is allowed.
To distinguish between shortcuts and duplicates, we impose a lexicographical ordering ≤
l on the set of actions in Σ. In the
Gridworld example, it is possible to further reduce the search according to the following rule: Go straight in the
x-direction first, if at all, and then straight in the
y-direction, if at all, making at most one turn. Rejecting the duplicates
DR,
DL,
UR, and
UL (with respective shortcuts
RD,
LD,
RU, and
LU) from the search space according to this rule leads to the state space of
Figure 10.1 (right). As a result, each point (
x,
y) in the
Gridworld example is generated via a unique path. This reduces the complexity of the search tree to an optimal number of nodes (quadratic in the search depth), an exponential gain. The set of pruning rules we have examined depends on the fact that some paths always generate duplicates of nodes generated by other paths.
How can we find such string pairs (d, c), where d is a duplicate and c is a shortcut, and how can we guarantee admissible pruning, by means that no optimal solution is rejected?
First, we can initially conduct an exploratory (e.g., breadth-first) search up to a fixed-depth threshold. A hash table records all encountered states. Whenever a duplicate is indicated by the hash conflict, the lexicographically larger action sequence is recorded as a duplicate.
Another option that is applicable in undirected search spaces is to search for cycles, comparing a newly generated node in the search tree with all nodes in its generating path. The resulting cycle is split into one (or many) duplicate and shortcut (pairs). For example, the cycle in the (
n2 − 1)-
Puzzle RDLURDLURDLU with inverse
DRULDRULDRUL is split as shown in
Table 10.1. When splitting a full-length cycle, one part has to be inverted (all its actions have to be reversed). In case the length of one string is equal to the length of the other, we need some further criteria to draw a distinction between the two. One valid option is a lexicographic order on the set of actions Σ (and subsequently on the set of action strings Σ*). For actions Σ = {
U,
R,
D,
L}, let
U ≤
l R ≤
l D ≤
l L. The problem graph is undirected, so that we impose
L to be inverse to
R (written as
L−1 =
R), and
U to be inverse to
D.
For
RDLURDLURDLU and duplicate
DRULDR, we have
RDLURD as its corresponding shortcut. The example is illustrated in
Figure 10.2.
Table 10.1 Splitting a cycle into duplicates and shortcuts.
First Part (Duplicate) | Second Part | Inverse (Shortcut) |
---|
RDLURDLURDL | ϵ | |
RDLURDLURDL | U | D |
RDLURDLURD | LU | DR |
RDLURDLUR | DLU | DRU |
RDLURDLU | RDLU | DRUL |
RDLURDL | RDLUR | DRULD |
RDLURD | RDLURD | DRULDR |
DRULDRU | LURDL | RDLUR |
DRULDRUL | LURD | RDLU |
DRULDRULD | LUR | RDL |
DRULDRULDR | LU | RD |
DRULDRULDRU | L | R |
DRULDRULDRUL | ϵ | ϵ |
For generating cycles, a heuristic can be used that minimizes the distance back to the initial state, where the search was invoked. For such cycle-detecting searches, state-to-state estimates like the Hamming distance on the state vector are appropriate.
The setting for unit-cost state space graphs naturally extends to weighted search spaces with a weight function w on the set of action strings as follows.
Definition 10.1
(Pruning Pair) Let G be a weighted problem graph with action label set Σ
and weight function w: Σ* →
ℝ. A pair (
d,
c) ∈ Σ* × Σ*
is a pruning pair
if:1. w(d) > w(c), or, if w(d) = w(c), then c ≤l d;
2. for all u ∈ S: d is applicable in u if and only if c is applicable in u; and
3. for all u ∈ S we have that applying d on S yields the same result as applying c on S, if they are both applicable; that is, d(u) = c(u).
For sliding-tile puzzles, the applicability of an action sequence (condition 2) merely depends on the position of the blank. If in
c the blank moves a larger distance in the
x- or
y-direction than in
d, it cannot be a shortcut; namely, for some positions,
d will be applicable, but not
c. An illustration
is provided in
Figure 10.3. If all three conditions for a pruning pair are valid the shortcut is no longer needed and the search refinement can rely only on the duplicates found, which in turn admissibly prune the search. It is easy to see that truncating a path that contains
d preserves the existence of an optimal path to a goal state (see Exercises).
The conditions may have to be checked dynamically during the actual search process. If conditions 2 or 3 are violated, we have to test the validity of a proposed pair (d, c). Condition 2 is tested by executing the sequence d−1c at every encountered state, and condition 3 is tested via comparing the respective states before and after executing d−1c.
Note that the state space does not have to include inversible actions to apply d−1 for testing conditions 2 and 3. Instead, we can use |d−1| parent pointers to climb up the search tree starting with the current state.
Pruning Automata
Assuming that we do not have to check conditions 2 or 3 for each encountered state, especially for depth-first, oriented search algorithms like IDA*, searching for a duplicate in the (suffix of the) current search tree path can slow down the exploration considerably, if the set of potential duplicates is large.
The problem of efficiently finding any of a set of strings in a text is known as the bibliographic search problem; it can be elegantly solved by constructing a substring acceptor in the form of a finite-state automaton, and feeding the text to it.
The automaton runs synchronous to the original exploration and prunes the search tree if the automaton reaches an accepting state. Each ordinary transition induces an automaton transition and vice versa. Substring pruning is important for search algorithms like IDA*, since constant-time pruning efforts fit well with enhanced DFS explorations. The reason is that the time to generate one successor for these algorithms is already reduced to a constant in many cases.
As an example,
Figure 10.4 (left) shows the automaton for subset pruning with the string
DU. The initial state is the most-left state and the accepting state is the most-right state in the automaton.
Figure 10.4 (right) depicts the automaton for full predecessor elimination with accepting states omitted.
Table 10.2 shows the impact of substring pruning in the (
n2 − 1)-
Puzzle and in
Rubik's Cube (see
Ch. 1). Note that the branching factor 13.34 for the
Rubik's Cube is already the result of substring pruning from the naïve space with a branching factor of 18, and the branching factors for the (
n2 − 1)-
Puzzle are already results of predecessor elimination (see
Ch. 5). Different construction methods have been applied: BFS denotes that the search algorithm for generating duplicates operates breadth-first, with a hash conflict denoting the clash of two different search paths. An alternative method to find duplicates is cycle-detecting heuristic best-first search (CDBF). By unifying the generated sets of duplicates, both search methods can be applied in parallel (denoted as BFS + CDBF). We display the number of states in the pruning automaton and the number of duplicate strings that are used as forbidden words for substring matching.
Table 10.2 Reducing the branching factor by substring pruning.
Puzzle | Construction | Duplicate | States | Without | With |
---|
Eight-Puzzle | BFS | 35,858 | 137,533 | 1.732 | 1.39 |
Fifteen-Puzzle | BFS | 16,442 | 55,441 | 2.130 | 1.98 |
Fifteen-Puzzle | CYC | 58,897 | 246,768 | 2.130 | 1.96 |
Twenty-Four-Puzzle | BFS+CDBF | 127,258 | 598,768 | 2.368 | 2.235 |
23Rubik's Cube | BFS | 31,999 | 24,954 | 6.0 | 4.73 |
33Rubik's Cube | BFS | 28,210 | 22,974 | 13.34 | 13.26 |
As pruning strategies cut off branches in the search tree, they reduce the average node branching factor. We show this value with and without substring pruning (assuming that some basic pruning rules have been applied already).
Substring pruning automata can be included in the prediction of the search tree growth (see
Ch. 5). This is illustrated in
Figure 10.5 for substring pruning the grid by eliminating predecessors. In the
beginning (depth 0) there is only one state. In the next iteration (depth 1) we have 4 states, then 6 (depth 2), 8 (depth 3), 10 (depth 4), and so on.
To accept several strings m1, …, mn, one solution is to construct a nondeterministic automaton that accepts them as substrings, representing the regular expression Σ*m1 Σ*|Σ* m2 Σ*| ⋯ |Σ* mn Σ*, and then convert the nondeterministic automaton into a deterministic one. Although this approach is possible, the conversion and minimization can be computationally hard.
Aho-Corasick Algorithm
The algorithm of
Aho and Corasick constructs a deterministic finite-state machine for a number of search strings. It first generates a
trie of the set of duplicate strings.
Figure 10.6 shows a trie for the duplicates in the
Gridworld state space at depth 2. Every leaf corresponds to one forbidden action sequence and is considered to be accepting.
In the second step, the algorithm computes a
failure function on the set of trie nodes. Based on the failure function, substring search will be available in linear time. Let
u be a node in the trie and
string(
u) the corresponding string. The failure node
failure(
u) is defined as the location of the longest proper suffix of
string(
u), which is also the prefix of a string in the trie (see
Fig. 10.7, bold arrows).
The
failure values are computed in a complete breadth-first traversal, where, inductively, the values in depth
i rely on the values computed in depth
j <
i.
Algorithm 10.1 shows the corresponding pseudo code. To highlight the different branching in a search tree and in a trie, we use
T(
u,
a) to denote a possible successor of
u via
a and write ⊥ if this successor along
a is not available.
First, all nodes in BFS level 1 are determined and inserted into the queue
Q. For the example this includes the four nodes in depth 1. As long as
Q is nonempty, the top element
u is deleted and its successors
v are processed. To compute
failure(
v), the node in the sequence
failure(
u),
failure2(
u),
failure3(
u) is determined, which enables a transition with the chosen character
a. As a rationale, if
string(
failure(
u)) is the longest suffix of
string(
u) in the trie, and
failure(
u) has a child transition available labeled
a, then
string(
failure(
u))
a is the longest suffix of
string(
u) that is contained in the trie.
In the example, to determine the failure value for node u in level 2 that was generated due to applying action R we go back via the link of its predecessor to the root node s and take the transition labeled R to u′. As a consequence, failure(u) = u′.
Each node is processed only once. Assuming the size of the alphabet to be bounded by a constant with an amortized analysis, we can show that computing the failure function takes time O(d) in total, where d is the sum of string lengths in D, d = ∑m ∈ D|m|.
Theorem 10.1
(Time Complexity of Failure Computation) Let D be the set of duplicates and d be the total length of all strings in D. The construction of the failure function is O(d).
Proof
Let |
string(
failure(
u))| be the length of the longest proper suffix of
string(
u), that is prefix of one string in
D, where a string is proper if it is not empty. If
u′ and
u are nodes on the path from the root node to a node
i in
T and
u′ is the predecessor of
u we have |
string(
failure(
u))| ≤ |
string(
failure(
u′))| + 1.
Choose any string mi that corresponds to a path pi in the trie for D. Then the total increase in the length for the failure function values of mi is ∑u ∈ pi |string(failure(u))| − |string(failure(u′))| ≤ |mi|. On the other side, |string(failure(u))| decreases on each failure transition by at least one and remains nonnegative throughout. Therefore, the total increase of the failure function strings for u on pi is at most |mi|.
To construct the substring pruning automaton
A from
T, we also use breadth-first search, traversing the trie
T together with its failure function. In other words, we have to compute Δ
A(
u,
a) for all automaton states
u and all actions
a in Σ. The skeleton of
A is the trie
T itself. The transitions Δ
A(
u,
a) for
u ∈
T and
a ∈ Σ that are not in the skeleton can be derived from the function
failure as follows. At a given node
u we execute
a at
failure(
u) using the already computed values of Δ
A(
failure(
u),
a). In the example, we have included the transitions for node
u in
Figure 10.8. The time complexity to generate
A is
O(
d).
Searching existing substrings of a (path) string of length n is available in linear time by simply traversing automaton A. This leads to the following result.
Theorem 10.2
(Time Complexity Substring Matching) The earliest match of k strings of total length d in a text of length n can be determined in time O(n + d).
Automaton A can now be used for substring pruning as follows. With each node in the search process, we store a value state(u), denoting the state in the automaton; since A has d states, this requires about lg2d memory bits per state. If u has successor v with an action that is labeled with a we have state(v) = ΔA(a, s). This operation is available in constant time. Therefore, we can check in constant time if the generating path to u has one duplicate string as a suffix.
*Incremental Duplicate Learning A*
A challenge is to apply substring pruning dynamically, during the execution of a search algorithm. In fact, we have to solve a variant of the dynamic dictionary matching problem. Unfortunately, the algorithm of Aho and Corasick is not well suited to this problem.
In contrast to the static dictionary approach to prune the search space, generalized suffix trees (as introduced in
Ch. 3) make it possible to insert and delete strings while maintaining a quick substring search. They can be adapted to a dictionary automaton as follows.
For incremental duplicate pruning with a set of strings, we need an efficient way to determine Δ(q, a) for a state q and action character a, avoiding the reconstruction of the path to the current state in the search tree.
Theorem 10.3
(Generalized Suffix Trees as FSMs) Let string a be read from a1 to aj − 1. There is a procedure Δ for which the returned value on input aj is the longest suffix ai, …, aj of a1, …, aj that is also the substring of one string m stored in the generalized suffix tree. The amortized time complexity for Δ is constant.
Proof
An automaton state q in the generalized suffix tree automaton is a pair q = (i, l) consisting of the extended locus l and the current index i on the contracted edge incident to l. Recall that substrings referred to at l are denoted as intervals [first, last] of the strings stored.
To find
q′ = Δ(
q,
a) starting at state (
i,
l) we search for a new extended locus
l′ and a new integer offset
i′, such that
a corresponds to the transition stored at index position first +
i of the substring stored at
l. We use existing suffix links of the contracted loci and possible rescans if the character
a does not match, until we have found a possible continuation for
a. The extended locus of the edge and the reached substring index
i′ determines the pair (
l′,
i′) for the next state. In case of reaching a suffix tree node that corresponds to a full suffix we have encountered an accepting state
q*. The returned value in
q* is the substring corresponding to the path from the root to the new location. By amortization we establish the stated result.
For a dynamic learning algorithm, we interleave duplicate detection with A*. As A* already has full duplicate detection based on states stored in the hash table, using substring pruning is optional. However, by following this case, it is not difficult to apply dynamic subset pruning to other memory-restricted search algorithms. For the sake of simplicity, we also assume that the strings m stored in the generalized suffix tree are mutually substring free; that is, no string is a substring of another one.
The resulting algorithm is denoted as
incremental duplicate learning A* (IDLA*). Its pseudo code is shown in
Algorithm 10.2. As usual, the update of the underlying data structures
Open and
Closed are hidden in the call of
Improve, which mainly implements the relaxation step
f(
v) ← {
f(
v),
f(
u) +
w(
u,
v)}. The algorithm takes a state space problem and a (empty or preinitialized) dictionary automaton data structure
D as input parameters. If according to the assumed regularity of the search space the detected tuples (
d,
c) are pruning pairs, we only have to store string
d. If we are uncertain about the regularity at the accepting state of the dictionary automaton for
d, we additionally store the proposed shortcut
c and check whether or not it is actually applicable. Either way, inherited by the optimality of A*, the search algorithm outputs an optimal solution path.
The algorithm provides combined duplicate detection and usage. Before searching a node in the hash table, we search if we encounter an accepting state in D. Depending on the regularity of the search space, we might or might not check whether the proposed shortcut is valid. If D does not accept, then we use the hash table to find a synonym v′ of the successor node v of u. If we do not find a match then we insert the new node in the ordinary search structures only.
If there is a counterpart
v′ of
v in the hash table for
Closed (computed using the generic
Lookup procedure; see
Ch. 3), we prune the longest common prefix (
lcp) of both generating paths to construct the pair (
d,
c). This is useful to enhance the generality and pruning power of (
d,
c). The shorter the strings, the larger the potential for pruning. The correctness argument is simple: If (
d,
c) is a pruning pair, then every common prefix extension is also a pruning pair. As a result the reduced pair is inserted into the dictionary.
For a simple example we once more consider the Gridworld. As a heuristic we choose the Manhattan distance estimate. Suppose that the algorithm is invoked with initial state (3, 3) and goal state (0, 0). Initially, we have Open = {((3, 3), 6)}, where the first entry denotes the state vector in the form of a Gridworld location and the second entry denotes its heuristic estimate. Expanding the initial states yields the successor set {((4, 3), 8), ((3, 4), 8), ((3, 2), 6), ((2, 3), 6)}. Since no successor is hashed, all elements are inserted into Open and Closed. In the next step, u = ((2, 3), 6) is inserted and one successor v = ((3, 3), 8) turns out to have a counterpart v′ = ((3, 3), 6) in the hash table. This leads to the duplicate LR with corresponding shortcut ϵ, which is inserted into the dictionary automaton data structure. Next, ((3, 2), 6) is expanded. We establish another duplicate string DU with shortcut sequence ϵ. Additionally, we encounter the pruning pair (LD, DL), and so on.
10.1.2. Pruning Dead-Ends
In domains like the Fifteen-Puzzle, every configuration that we can reach by executing an arbitrary move sequence remains solvable. This is not always the case, though. In several domains there are actions that can never be reversed (doors may shut if we go through). This directly leads to situations where a solution is out of reach.
A simple case of a dead-end in Sokoban can be obtained by pushing a ball into a corner, from where it cannot be moved. The goal of dead-end pruning is to recognize and avoid these branches as early as possible. There is, of course, an exception since a ball in a corner is a dead-end if and only if the corner position is not a goal field. In the following we exclude this subtle issue from our considerations.
This section presents an algorithm that allows us to generate, memorize, and generalize situations that are dead-ends. The strategy of storing unsolvable subpositions is very general, but to avoid introducing extensive notation, we take the
Sokoban problem (see
Ch. 1) as a welcome case study.
Two simple ways of identifying some special cases of dead-end positions in
Sokoban can be described by the two procedures
IsolatedField and
Stuck. The former one checks for one or two connected, unoccupied fields that are surrounded by balls and not reachable by the man without pushes. If it is not possible to connect the squares only by pushing the surrounding balls “from the outside,” the position one is a dead-end. The latter procedure,
Stuck, checks if a ball is
free; that is, if it has either no horizontal neighbor or no vertical neighbor. Initially, place all nonfree balls into a queue. Then, ignoring all free balls, iteratively try to remove the free balls from the queue until it becomes empty or nothing changes anymore. If in one iteration every ball stays inside the queue, some of which are not located on goal fields, the position is a dead-end. In the worst case,
Stuck needs
O(
n2) operations for
n balls, but in most cases it turns out to operate in linear time. The correctness is given by the fact that to free a ball with the men from a given position at least two of its neighbors have to be free.
Figure 10.9 illustrates the procedure and shows an example of a dead-end in
Sokoban. Balls 1, 2, 5, or 6 cannot be moved, but balls 3, 4, 7, and 8 can be moved. It is obvious that without detecting the dead-end already on these balls, this position can be the root of a very large search tree. Initially all balls are put in a queue. If a ball is free it is removed, otherwise it is enqueued again. After a few iterations we reach a fixpoint. Balls 1, 2, 5, and 6 cannot be set free. Such a position is a valid end state only if the balls are correctly placed on goal fields.
We could perform these checks prior to every node expansion. Note, however, that they provide only sufficient but not necessary conditions for dead-ends. We can strengthen the pruning capabilities based on the following two observations:
• A dead-end position can be recursively defined as being either a position that can be immediately recognized as a dead-end, or a nongoal position in which every move that is available leads to a dead-end position. If all successors of a position are dead-ends, the position itself is a dead-end as well.
• Many domains allow a generalization of dead-end patterns; we can define a relation ⊑ and write v ⊑ u if v is a subproblem of u. If v is not solvable, then neither is u. For example, a (partial) dead-end Sokoban position remains a dead-end if we add balls to it. Thus, ⊑ turns out to be a simple pattern matching relation.
Decomposing a problem into parts has been successfully applied in divide-and-conquer algorithms, and storing solutions to already solved subproblems is called
memorizing. The main difference to these
approaches is that we concentrate on
parts of a position to be retrieved. For
Sokoban, decomposing a position should separate unconnected positions and remove movable balls to concentrate on the intrinsic patterns that are responsible for a dead-end. For example, in
Figure 10.9 the position can be decomposed by splitting the two ball groups. The idea of decomposing a position is a natural generalization of the
isolated field heuristic: A position with nongoal fields on which the man can never get is a likely dead-end. Take the graph
G of all empty squares and partition
G into connected components (using linear time). Examine each component separately. If every empty square can be reached by the man the position is likely to be alive. If one component is a dead-end, and indeed often they are, the entire position itself is a dead-end.
Our aim is to learn and generalize dead-end positions when they are encountered in the search; some authors also refer to this aspect as
bootstrapping. Each dead-end subproblem found and inserted into the
subset dictionary (see
Ch. 3) can be used immediately to prune the search tree and therefore to get deeper into the search tree.
Depending on the given resources and heuristics, decomposition could be either invoked in every expansion step, every once in a while, or only in critical situations. The decomposition itself has to be carefully chosen to allow a fast dead-end detection and therefore a shallow search. A good trade-off has to be found: The characteristics responsible for the dead-end on one the hand should appear in only one component and, on the other hand, the problem parts should be far more easy to analyze than the original one. For Sokoban, we can consider the partial position that consists of all balls that cannot be safely removed by the procedure Stuck.
Before we turn to the implementation, we study the search tree for the
Sokoban puzzle shown in
Figure 10.10. It demonstrates how the preceding two observations, in conjunction with a simple dead-end detection, can use bottom-up propagation to generalize more complex patterns.
The root position
s is a dead-end, and although the
IsolatedField procedure would immediately recognize this, we assume for the sake of the argument that only
Stuck is used as the basic dead-end position detection. Initially, the subset dictionary is empty and the status of
s is undefined. We expand
u and generate its successor set
Succ(
u). Three of the five elements in
Succ(
u) are found to be a dead-end; the responsible balls are marked as filled circles. Marking can be realized through a Boolean array, with
true denoting that the corresponding ball is relevant for the dead-end, and
false otherwise. For usual transitions, a ball is a necessary constituent of a dead-end pattern if it is responsible for the unsolvability of
one successor state; hence, unifying the
relevant vectors is done by a Boolean or-operation. Generally, a state
u is a dead-end if
relevant(
u) ⊑
u is a dead-end. Initially,
relevant is the constantly false function for all states.
Since in this instance the status of two successors is not defined, overall solvability remains open. Therefore, the left-most and right-most nodes are both expanded (thin edges) and decomposed (bold edges). Decomposition is preferred, which is done by assigning a g-value of zero to the newborn child.
The expansion of the partial states is easier, and we find all of their successors dead-ended within one more step. As the root is a dead-end, we want to find a minimal responsible pattern by means of backpropagation. By unification, we see that all balls of the expanded nodes in the second-to-last level are relevant. Both positions are newly found dead-ended and inserted into the subset dictionary. Since we have now reached a decomposition node, it is sufficient to copy the relevant part to the parent; the failure of one component implies the failure of the complete state. All generated nodes in Succ(u) are dead-ends, and we can further propagate our knowledge bottom-up by forming the disjunction of the successor's relevance information. Finally, the root is also inserted into the subset dictionary.
The corresponding pseudo code for this kind of bottom-up propagation is given in
Algorithm 10.3. We assume that in addition to the vector of flags
relevant, each node maintains a counter
succ of the number of nondead-end children.
The main procedure
Abstraction-Decomposition-A* (see
Alg. 10.4) carries out the search for solvability (of possibly decomposed states) and for an optimal solution (in the main search tree) in an interleaved fashion; a flag
decomposed(
u) remembers which of the two modes is currently applied to state
u. Moreover,
solvable(
u) keeps track of its status:
true,
false, or
unknown.
Like in A*, until the priority queue gets empty, the node with the best f-value is selected. If it is a goal node and we are in the top search level, we have found the solution to the overall problem and can terminate; otherwise, the partial position is obviously solvable. For nongoal nodes, we try to establish their solvability by applying a simple dead-end check (e.g., using procedure Stuck), or by recognizing a previously stored pattern. Unsolvable partial positions are used to find larger dead-end patterns by backpropagation, as described earlier.
Since we are only interested in solvability of decomposed nodes, we do not have to search the subproblems all the way to the end. If we have enough information for a subproblem to be alive, we can back up this knowledge in the search tree in an analogous way. We might also allow a one-sided error without harming overall admissibility (i.e., let the alive procedure be overly optimistic).
The main difference to usual A* search is that for a node expansion of
u, in addition to the set of successor nodes
Succ(
u), decomposed positions Δ(
u) are generated and then inserted into
Open with
g set to zero to distinguish the new root from other search tree nodes. The ordinary successors are inserted, dropped, and reopened as usual in A*. Instead of using a lower-bound estimate
h to solve the puzzle, in the learning process we can also try to actively focus the search to produce dead-ends.
Efficiently storing and searching dead-end (sub)positions is central to the whole learning process. We have called the abstract data structure providing insertion and lookup of substructures a
pattern store. Possible implementations of such a dictionary data structure have been provided in
Chapter 3.
10.1.3. Penalty Tables
In this section, an approach related to Abstraction-Decomposition-A* is described that tries to generalize, store, and reuse minimal dead-end patterns as well, but uses auxiliary, separate pattern searches instead of a dedicated decomposition/generalization procedure.
The approach can be conveniently described for the
Sokoban domain. We refer to two different mazes: the
original maze (i.e., the maze containing all the balls of the current position) and the
test maze, which will be used for pattern searches. The pattern search algorithm's design is iterative and starts when a dead-end position has been found. Only the last ball moved is put into the test maze, and the algorithm tries to solve this simplified problem. If it succeeds, another A* test search is triggered, with some other ball of the original maze added; those balls are preferred that interfere with the path of the man or of a ball in the previous solution. If no solution is found, a dead-end is detected. An example is provided in
Figure 10.11. Since this pattern need not yet to be minimal, successive iterations try to remove one ball at a time while preserving the dead-end property.
To fine-tune the trade-off between main and auxiliary search, a number of parameters control the algorithm, such as the maximum number of expansions max
n in each individual pattern search, the
maximal pattern size max
b, and the
frequency of pattern searches. One way to decide on the latter one
is to trigger a pattern search for each position if the number of its descendants explored in the main search exceeds a certain threshold.
To improve search efficiency, we can make some further simplifications to the A* procedure: Since the pattern search is only concerned with solvability, as a heuristic it suffices to consider each ball's shortest distance to any goal field rather than computing a minimal matching, as usual. Moreover, balls on goal fields or balls that are reachable and free and can be immediately removed.
We can regard a dead-end subset dictionary as a table that associates each matching position with a heuristic value of ∞. This idea leads to the straightforward extension of dead-end detection to
penalty tables, which maintain a corrective amount by which the simpler, procedural heuristic differs from the true goal distance. When a partial position has actually turned out to be solvable, the found solution distance might be larger than the standard heuristic indicates; in other words, the latter one is wrong. Multiple corrections from more than one applicable pattern can be added if the patterns do not overlap. Again, possible implementations for penalty tables are subset dictionaries as discussed in
Chapter 3.
To apply
pattern search in other domains, two properties are required:
reducibility of the heuristic and
splittability of the heuristic with regard to the state vector. These conditions are defined as follows. A state description
S (viewed as a set of values) is called splittable, if for any two disjoint subsets
S1 and
S2 of
S we have
δ(
S) =
δ(
S1) +
δ(
S2) +
δ(
S (
S1 ∪
S2)) +
C; this means that the solution of
S is at least as long as both subsolutions added. The third term accounts for additional steps that might be needed for conditions that are neither in
S1 nor in
S2, and
C stands for subproblem interactions. A heuristic is reducible, if
h(
S′) ≤
h(
S) for
S′ ⊆
S. A heuristic is splittable, if for any two disjoint subsets
S1 and
S2 of
S we have
h(
S) =
h(
S1) +
h(
S2) +
h(
S (
S1 ∪
S2)). The minimum matching heuristic introduced in
Chapter 1 is not splittable but the sum of the distances to the closest goal results in a splittable heuristic.
If a heuristic is admissible, for S′ ⊆ S we additionally have a possible gap δ(S′) − h(S′) ≥ 0. We define the penalty with respect to S′ as pen(S′) = δ(S′) − h(S′).
Theorem 10.4
(Additive Penalties) Let h be an admissible, reducible, and splittable heuristic; S be a state set; and S1, S2 ⊆ S be disjoint subsets of S. Then we have pen(S) ≥ pen(S1) + pen(S2).
Proof
Using admissibility of
h and the definition of penalties, we deduce
As a corollary, the improved heuristic h′(S) = h(S) + pen(S1) + pen(S2) is admissible since h′(S) ≤ h(S) + pen(S) = δ(S). In other words, assuming reducibility and splittability for the domain and the heuristic, the penalties of nonoverlapping patterns can be added without affecting admissibility.
Pattern search was one of the most effective techniques for solving many Sokoban instances. It has also been used to improve the performance of search in the sliding-tile puzzle domains. In the Fifteen-Puzzle the savings in the number of nodes are of about three to four orders of magnitude of difference compared to the Manhattan distance heuristic.
10.1.4. Symmetry Reduction
For multiple lookups in pattern databases (see
Ch. 4), we already utilized the power of state space symmetries for reducing the search efforts. For every physical symmetry valid in the goal we apply it to the current state and get another estimate, which in turn can be larger than the original lookup and lead to stronger pruning. In this section, we expand on the observation and embed the observation in a more general context.
As a motivating example of the power of state space reduction through existing symmetries in the problem description, consider the
Arrow Puzzle problem, in which the task is to change the order of the arrows in the arrangement
to
, where the set of allowed actions is restricted to reversing two adjacent arrows at a time. One important observation for a fast solution to the problem is that the order of arrow reversal does not matter. This exploits a form of action symmetry that is inherent in the problem. Moreover, two outermost arrows don't participate in an optimal solution and at least three reversals are needed. Therefore, any permutation of swapping the arrow index pairs (2,3), (3,4), and (4,5) leads to an optimal solution for the problem.
For state space reduction with respect to symmetries we expect to be provided by the domain expert, we use equivalence relations. Let
P = (
S,
a,
s,
T) be a state space problem as introduced in
Chapter 1.
Definition 10.2
(Equivalence Classes, Quotients, Congruences) A relation ∼ ⊆ S × S is an equivalence relation if the following three conditions are satisfied: ∼ is reflexive (i.e., for all u in S we have u ∼ v); ∼ is symmetric (i.e., for all u and v in S we have if u ∼ v then v ∼ u); and ∼ is transitive (i.e., for all u, v, and w in S we have u ∼ v and v ∼ w implies u ∼ w). Equivalence relations naturally yield equivalence classes [u] = {v ∈ S | u ∼ v} and a (disjoint) partition of the search space into equivalence classes S = [u1] ∪ ⋯ ∪ [uk] for some k. The state space (S/∼) = {[u] | u ∈ S} is called quotient state space. An equivalence relation ∼ of S is a congruence relation on P if for all u, u′ ∈ S with u ∼ u′ and action a ∈ A with a(u) = v, there is a v′ ∈ S with v ∼ v′ and an action a′ ∈ A with a′(u′) = v′. Any congruence relation induces a quotient state space problem (P/∼) = (S/∼), (A/∼), [s], {[t] | t ∈ T}). In (P/∼) an action [a] ∈ (A/∼) is defined as follows. We have [a]([u]) = [v] if and only if there is an action a ∈ A mapping u to v so that u ∈ [u] and v′ ∈ [v′].
Note that (A/∼) is sloppy in saying that the congruence relation ∼ partitions the action space A.
Definition 10.3
(Symmetry) A bijection ϕ: S → S is said to be a symmetry if ϕ(s) = s, ϕ(t) ∈ T for all t ∈ T and for any successor v to u there exists an action from ϕ(u) to ϕ(v). Any set Y of symmetries generates a subgroup g(Y) called a symmetry group. The subgroup g(Y) induces an equivalence relation ∼Y on states, defined as u ∼Y v if and only if ϕ(u) = v and ϕ ∈ g(Y). Such an equivalence relation is called a symmetry relation on P induced by Y.
Any symmetry relation on P is a congruence on P. Moreover, u is reachable from s if and only if [u]Y is reachable from [s]Y. This reduces the search for goal t ∈ T to the reachability of state [t]Y.
To explore a state space with respect to a state symmetry, we use a function Canonicalize each time a new successor node is generated, which determines a representative element for each equivalence class. Note that finding symmetries automatically is not easy, since it links to the computational problem of graph isomorphism, which is hard.
If a symmetry is known, then the implementation of symmetry detection is immediate, since both the Open set and Closed set can simply maintain the outcome of the Canonicalize action.
One particular technique for finding symmetries is as follows. In some search problems states correspond to (variable) valuations and goals are specified with partial valuations. Actions are often defined transparent to the assignments of values to the variables. For example, an important feature of (parametric) planning domains is that actions are generic to bindings of the parameters to the objects.
As a consequence, a symmetry in the form of a permutation of variables in the current and in the goal state has no consequence to the set of parametric actions.
Let
V be the set of variables with |
V| =
n. There are
n! possible permutations of
V, a number likely too large to be checked for symmetries. Taking into account type information on variable types reduces the number of all possible permutations of the variables to
, where
ti is the number of variables with type
i,
i ∈ {1, …,
k}. Even for moderately sized domains, this number is likely still too large. To reduce the number of potential symmetries to a tractable size, we further restrict symmetries to
variable transpositions [
v ↔
v′] (i.e., a permutation of only two variables
v,
v′ ∈
V), for which we have at most
n(
n − 1)/2 ∈
O(
n2) candidates. Using type information this number reduces to
, which often is a practical value for symmetry detection.
We denote the set of (typed) variable transpositions by M. The outcome of a transposition of variables (v, v′) ∈ M applied to a state u = (v1, …, vn), written as u[v ↔ v′], is defined as (v′1, …, v′n), with v′i = vi if vi ∉ {v, v′}, vi = v′ if vi = v, and vi = v if vi = v′. It is simple to extend the definition to parametric actions.
In the example depicted in
Figure 10.12, we have variables denoting the location of persons, so that we can write
We observe that
u[
v ↔
v′] =
u[
v′ ↔
v] and
u[
v ↔
v′][
v ↔
v′] =
u.
Theorem 10.5
(Time Complexity for Object Symmetry Detection) The worst-case complexity to determine the set of all variable transpositions for which a problem P = (S, A, s, T) is symmetric is bounded by O(|M| ⋅ n) operations.
Proof
A state space problem
P = (
S,
A,
s,
T) is symmetric with respect to the transposition [
v ↔
v′], abbreviated as
P[
v ↔
v′], if
s[
v ↔
v′] =
s and
T[
v ↔
v′] ∈
T. The brute-force time complexity for computing
u[
v ↔
v′] is of order
O(
n). However, by precomputing a lookup table containing the index of
u′ =
u[
v ↔
v′] for all (
v,
v′) ∈
M, this complexity can be reduced to one lookup.
For the example problem, the goal contains the three target locations of
dan,
ernie, and
scott. In the initial state the problem contains no symmetry, since
s[
scott ↔
ernie] ≠
s and
T[
dan ↔
ernie] ≠
T. In
forward-chaining search, which explores the search space starting in a forward direction starting with the initial state
s, symmetries that are present in
s may vanish or reappear. Goals, however, do not change over time. Therefore, in an offline computation we can refine the set of symmetries
M to
M′ = {(
v,
v′) ∈
M |
T[
v ↔
v′] =
T. Usually, |
M′| is much smaller than |
M|. For the example problem instance, the only variable symmetry left in
M′ is the transposition of
scott and
ernie. The remaining task is to compute the set
M″(
u) = {(
v,
v′) ∈
M′ |
u[
v ↔
v′] =
u} of symmetries that are present in the current state
u. In the initial state
s of the example problem, we have
M″(
s) = ∅, but once
scott and
ernie share the same location in a state
u, this variable pair is included in
M″(
u). With respect to
Theorem 10.5 this additional restriction reduces the time complexity to detect all remaining variable symmetries to
O(|
M′| ⋅
n).
If a state space problem with current state c ∈ S is symmetric with respect to the variable transposition [v ↔ v′] then applications of actions are neglected, significantly reducing the branching factor. The pruning set A′(u) of a set of applicable actions A(u) is the set of all actions that have a symmetric counterpart and that are of minimal index.
Theorem 10.6
(Optimality of Symmetry Pruning) Reducing the action set A(u) to A′(u) for all states u during the exploration of state space problem P = (S, A, s, T) preserves optimality.
Proof
Suppose that for some expanded state u, reducing the action set A(u) to A′(u) during the exploration of planning problem P = (S, A, s, T) does not preserve optimality. Furthermore, let u be the state with this property that is maximal in the search order. Then there is a solution π = (a1, …, ak) in Pu = (S, A, u, T) with associated state sequence (u0 = u, …, uk ⊆ T). Obviously, ai ∈ a(ui − 1), i ∈ {1, …, k}. By the definition of the pruning set there exists a′1 = a1[v ↔ v′] of minimal index that is applicable in u0.
Since Pu = (S, A, u, T) = Pu[v ↔ v′] = (S, A, u[v ↔ v′] = u, T[v ↔ v′] = T), we have a solution a1[v ↔ v′], …, ak[v ↔ v′] with state sequence (u0[v ↔ v′] = u0, u1[v ↔ v′], …, uk[v ↔ v′] = uk) that reaches the goal T with the same cost. This contradicts the assumption that reducing the set A(u) to A′(u) does not preserve optimality.