5.6.1. Asymptotic Branching Factors
Consider
Rubik's Cube with the two pruning rules described in
Section 1.7.2. Recall that we divided the faces of the cube into two classes; a twist of a first face can be followed by a twist of any second face, but a twist of a second face cannot be followed immediately by a twist of the opposite first face. We call nodes where the last move was a twist of a first face
type-1 nodes, and those where it was a twist of a second face
type-2 nodes. The branching factors of these two types are 12 and 15, respectively, which also gives us bounds on the asymptotic branching factor.
To determine the asymptotic branching factor exactly, we need the proportion of type-1 and type-2 nodes. Define the
equilibrium fraction of type-1 nodes as the number of type-1 nodes at a given depth, divided by the total number of nodes at that depth, in the limit of large depth. The fraction of type-2 nodes is one minus the fraction of type-1 nodes. The equilibrium fraction is not
: Each type-1 node generates
type-1 nodes and
type-2 nodes as children, the difference being that
you can't twist the same first face again. Each type-2 node generates
type-1 nodes and
type-2 nodes, since you can't twist the opposite first face next, or the same second face again. Thus, the number of type-1 nodes at a given depth is six times the number of type-1 nodes at the previous depth, plus six times the number of type-2 nodes at the previous depth. The number of type-2 nodes at a given depth is nine times the number of type-1 nodes at the previous depth, plus six times the number of type-2 nodes at the previous depth.
Let
be the fraction of type-1 nodes, and
the fraction of type-2 nodes at a given depth. If
n is the total number of nodes at that depth, then there will be
type-1 nodes and
type-2 nodes at that depth. In the limit of large depth, the fraction of type-1 nodes will converge to the equilibrium fraction, and remain constant. Thus, at large depth,
Cross multiplying gives us the quadratic equation
, which has a positive root at
. This gives us an asymptotic branching factor of
.
In general, this analysis produces a system of simultaneous equations. For another example, consider the
Five-
Puzzle, the 2 × 3 version of the well-known sliding-tile puzzles shown in
Figure 5.4 (left). In this problem, the branching factor of a node depends on the blank position. The position types are labeled
s and
c, representing side and corner positions, respectively (see
Fig. 5.4, middle). We don't generate the parent of a node as one of its children, to avoid duplicate nodes representing the same state. This requires keeping track of both the current and previous blank positions. Let
cs denote a node where the blank is currently in a side position, and the last blank position was a corner position. Define
ss,
sc, and
cc nodes analogously. Since
cs and
ss nodes have two children each, and
sc and
cc nodes have only one child each, we have to know the equilibrium fractions of these different types of nodes to determine the asymptotic branching factor.
Figure 5.4 (right) shows the different types of states, with arrows indicating the type of children they generate. For example, the double arrow from
ss to
sc indicates that each ss node generates two sc nodes at the next level.
Let
be the number of nodes of type
t at depth
d in the search tree. Then we can write the following recurrence relations directly from the graph in
Figure 5.4. For example, the last equation comes from the fact that there are two arrows from
ss to
sc, and one arrow from
cs to
sc.
The initial conditions are that the first move either generates an
ss node and two
sc nodes, or a
cs node and a
cc node, depending on whether the blank starts in a side or corner position, respectively.
A simple way to compute the branching factor is to numerically compute the values of successive terms of these recurrences, until the relative frequencies of different state types converge. Let
,
,
, and
be the number of nodes of each type at a given depth, divided by the total number of nodes at that depth. After a hundred iterations, we get the equilibrium fractions
,
,
, and
. Since
cs and
ss states generate two children each, and the others generate one child each, the asymptotic branching factor is
. Alternatively, we can simply compute the ratio between the total nodes at two successive depths to get the branching factor. The running time of this algorithm is the product of the number of different types of states (e.g., four in this case) and the search depth. In contrast, searching the actual tree to depth 100 would generate over 10
13 states.
To compute the exact branching factor, we assume that the fractions eventually converge to constant values. This generates a set of equations, one from each recurrence. Let b represent the asymptotic branching factor. This allows us to rewrite the recurrences as the following set of equations. The last one constrains the fractions to sum to one.
Repeated substitution to eliminate variables reduces this system of five equations in five unknowns to the single equation,
, with a solution of
. In general, the degree of the polynomial will be the number of different types of states. The
Fifteen-
Puzzle without predecessor elimination has three types of states:
c-nodes with node branching factor 2, side or
s-nodes with node branching factor 3, and middle or
m-nodes with node branching factor 4.
Figure 5.5 shows the type of transition graph for the
Eight-
Puzzle,
Fifteen-
Puzzle, and
Twenty-
Four-
Puzzle.
For the
Twenty-
Four-
Puzzle, however, the search tree of two side or two middle states may differ. For this case we need six classes with a blank at position 1, 2, 3, 7, 8, and 13 according to the tile labeling in
Figure 1.10. In the general case the number of different node branching classes in the (
n2 − 1)-
Puzzle (without predecessor elimination) is
This still compares well to a partition according to the
n2 equivalent classes in the first factorization (savings of a factor of about eight) and, of course, to the
states in the overall search space (exponential savings).
Let
F be the vector of node frequencies and
P the transposed matrix of the matrix representation of the state-type graph
G. Then the underlying mathematical issue turns out to be an
eigenvalue problem. Transforming
leads to
for the identity matrix
I. The solutions for
b are the roots of the characteristic equation
where
det is the determinant of the matrix. Since
, the transposition of the equivalence graph matrix preserves the value of
b. For the case of the
Fifteen-
Puzzle with corner, side, and middle nodes, we have
which simplifies to
. The solutions to this equation are 1,
, and
. The value
matches experimental data for the asymptotic branching factor.
The equation
can be unrolled to
. We briefly sketch how to compute
for large values of
d. Matrix
P is
diagonalizable if there exists an inversible matrix
C and a diagonal matrix
Q with
. This simplifies the calculation of
, since we have
(the remaining terms
cancel). By the diagonal shape of
Q, the value of
is obtained by simply taking the matrix elements
to the power of
d. These elements are the eigenvalues of
P.
For the
Fifteen-
Puzzle the basis-transformation matrix
C and its inverse
are
and
The vector of node counts is
such that the exact total number of nodes in depth
d is
The number of corner nodes (
), the number of side nodes (
), and the number of middle nodes (
) grow as expected. The largest eigenvalue
dominates the growth of the search tree in the limit for large values of
d.
When incorporating pruning to the search, symmetry of the underlying graph structure may be affected. We consider the Eight-Puzzle. The adjacency matrix for predecessor elimination now consists of four classes: cs, sc, mc, and cm, where the class ij indicates that the predecessor of a j-node in the search tree is an i-node, and m stands for the center position.
In this case we cannot infer diagonalizability according to the set of real numbers. Fortunately, we know that the branching factor is a positive real value since the iteration process is real. Therefore, we may perform all calculations to predict the search tree growth with complex numbers, for which the characteristic polynomial factorizes. The branching factor and the search tree growth can be calculated analytically and the iteration process eventually converges.
In the example, the set of (complex) eigenvalues is
,
,
, and
. Therefore, the asymptotic branching factor is
. The vector
is equal to
Finally, the total number of nodes in depth
d is
For small values of
d the value
equals 1, 2, 4, 8, 10, 20, 34, 68, 94, 188, and so on.
Table 5.3 gives the even- and odd-depth branching factors of the (
n2 − 1)-
Puzzle up to 10 × 10. As
n goes to infinity, all the values converge to 3, the branching factor of an infinite sliding-tile puzzle, since most positions have four neighbors, one of which was the previous blank position.
Table 5.3 The asymptotic branching factor for the (n2 − 1)-Puzzle with predecessor elimination. The last column is the geometric mean (the square root of their product); the best estimate of the overall branching factor.
n | n2 − 1 | Even Depth | Odd Depth | Mean |
---|
3 | 8 | 1.5 | 2 | |
4 | 15 | 2.1304 | 2.1304 | 2.1304 |
5 | 24 | 2.30278 | 2.43426 | 2.36761 |
6 | 35 | 2.51964 | 2.51964 | 2.51964 |
7 | 48 | 2.59927 | 2.64649 | 2.62277 |
8 | 63 | 2.69590 | 2.69590 | 2.69590 |
9 | 80 | 2.73922 | 2.76008 | 2.74963 |
10 | 99 | 2.79026 | 2.79026 | 2.79026 |
In some problem spaces, every node has the same branching factor. In other spaces, every node may have a different branching factor, requiring exhaustive search to compute the average branching factor. The technique described earlier determines the size of a brute-force search tree in intermediate cases, where there are a small number of different types of states, the generation of which follows a regular pattern.
5.6.2. IDA* Search Tree Prediction
We measure the time complexity of IDA* by the number of node expansions. If a node can be expanded and its children evaluated in constant time, the asymptotic time complexity of IDA* is simply the number of node expansions. Otherwise, it is the product of the number of node expansions and the time to expand a node. Given a consistent heuristic function, both A* and IDA* must expand all nodes of which the total cost,
, is less than
c, the cost of an optimal solution. Some nodes with the optimal solution cost may be expanded as well, until a goal node is chosen for expansion, and the algorithms terminate. In other words,
is a sufficient condition for A* or IDA* to expand node
u, and
is a necessary condition. For a worst-case analysis, we adopt the weaker necessary condition.
An easy way to understand the node expansion condition is that any search algorithm that guarantees optimal solutions must continue to expand every possible solution path, as long as it is smaller than the cost of an optimal solution. On the final iteration of IDA*, the cost threshold will equal
c, the cost of an optimal solution. In the worst case, IDA* will expand all nodes
u of which the cost
. We will see later that this final iteration determines the overall asymptotic time complexity of IDA*.
We characterize a heuristic function by the distribution of heuristic values over the nodes in the problem space. In other words, we need to know the number of states with heuristic value 0, how many states have heuristic value 1, the number with heuristic value 2, and so forth. Equivalently, we can specify this distribution by a set of parameters
D(
h), which is the fraction of total states of the problem of which the heuristic value is less than or equal to
h. We refer to this set of values as the
overall distribution of the heuristic.
D(
h) can also be defined as the probability that a state chosen randomly and uniformly from all states in the problem space has a heuristic value less than or equal to
h. Heuristic
h can range from zero to infinity, but for all values of
h greater than or equal to the maximum value of the heuristic,
D(
h) = 1.
Table 5.4 shows the overall distribution for the Manhattan distance heuristic on the
Five-
Puzzle.
Table 5.4 Heuristic distributions for the Manhattan distance on the Five-Puzzle. The first column gives the heuristic value. The second column gives the number of states of the Five-Puzzle with each heuristic value. The third column gives the total number of states with a given or smaller heuristic value, which is simply the cumulative sum of the values from the second column. The fourth column gives the overall heuristic distribution D(h). These values are computed by dividing the value in the third column by 360, the total number of states in the problem space. The remaining columns are explained in the text.
h | States | Sum | D(h) | Corner | Side | Csum | Ssum | P(h) |
---|
0 | 1 | 1 | 0.002778 | 1 | 0 | 1 | 0 | 0.002695 |
1 | 2 | 3 | 0.008333 | 1 | 1 | 2 | 1 | 0.008333 |
2 | 3 | 6 | 0.016667 | 1 | 2 | 3 | 3 | 0.016915 |
3 | 6 | 12 | 0.033333 | 5 | 1 | 8 | 4 | 0.033333 |
4 | 30 | 42 | 0.116667 | 25 | 5 | 33 | 9 | 0.115424 |
5 | 58 | 100 | 0.277778 | 38 | 20 | 71 | 29 | 0.276701 |
6 | 61 | 161 | 0.447222 | 38 | 23 | 109 | 52 | 0.446808 |
7 | 58 | 219 | 0.608333 | 41 | 17 | 150 | 69 | 0.607340 |
8 | 60 | 279 | 0.775000 | 44 | 16 | 194 | 85 | 0.773012 |
9 | 48 | 327 | 0.908333 | 31 | 17 | 225 | 102 | 0.906594 |
10 | 24 | 351 | 0.975000 | 11 | 13 | 236 | 115 | 0.974503 |
11 | 8 | 359 | 0.997222 | 4 | 4 | 240 | 119 | 0.997057 |
12 | 1 | 360 | 1.000000 | 0 | 1 | 240 | 120 | 1.000000 |
The overall distribution is easily obtained for any heuristic. For heuristics implemented in the form of a pattern database, the distribution can be determined exactly by scanning the table. Alternatively, for a heuristic computed by a function, such as Manhattan distance on large sliding-tile puzzles, we can randomly sample the problem space to estimate the overall distribution to any desired degree of accuracy. For heuristics that are the maximum of several different heuristics, we can approximate the distribution of the combined heuristic from the distributions of the individual heuristics by assuming that the individual heuristic values are independent.
The distribution of a heuristic function is not a measure of its accuracy, and says little about the correlation of heuristic values with actual costs. The only connection between the accuracy of a heuristic and its distribution is that given two admissible heuristics, the one with higher values will be more accurate than the one with lower values on average.
Although the overall distribution is the easiest to understand, the complexity of IDA* depends on a potentially different distribution. The equilibrium distribution P(h) is defined as the probability that a node chosen randomly and uniformly among all nodes at a given depth of the brute-force search tree has a heuristic value less than or equal to h, in the limit of large depth.
If all states of the problem occur with equal frequency at large depths in the search tree, then the equilibrium distribution is the same as the overall distribution. For example, this is the case with the
Rubik's Cube search tree. In general, however, the equilibrium distribution may not equal the overall distribution. In the
Five-
Puzzle, for example, the overall distribution assumes
that all states, and, hence, all blank positions, are equally likely. At deep levels in the tree, the blank is in a side position in more than one-third of the nodes, and in a corner position in less than two-thirds of the nodes. In the limit of large depth, the equilibrium frequency of side positions is
. Similarly, the frequency of corner positions is
. Thus, to compute the equilibrium distribution, we have to take these equilibrium fractions into account. The fifth and sixth columns of
Table 5.4, labeled
corner and
side, give the number of states with the blank in a corner or side position, respectively, for each heuristic value. The seventh and eighth columns give the cumulative numbers of corner and side states with heuristic values less than or equal to each particular heuristic value. The last column gives the equilibrium distribution
P(
h). The probability
P(
h) that the heuristic value of a node is less than or equal to
h is the probability that it is a corner node, 0.64679, times the probability that its heuristic value is less than or equal to
h, given that it is a corner node, plus the probability that it is a side node, 0.35321, times the probability that its heuristic value is less than or equal to
h, given that it is a side node. For example,
. This differs from the overall distribution
.
The equilibrium heuristic distribution is not a property of a problem, but of a problem space. For example, including the parent of a node as one of its children can affect the equilibrium distribution by changing the equilibrium fractions of different types of states. When the equilibrium distribution differs from the overall distribution, it can still be estimated from a pattern database, or by random sampling of the problem space, combined with the equilibrium fractions of different types of states, as illustrated earlier.
To provide some intuition behind our main result,
Figure 5.6 shows a schematic representation of a search tree generated by an iteration of IDA* on an abstract problem instance, where all edges have unit cost. The numbers were generated by assuming that each node generates one child each with a heuristic value one less, equal to, and one greater than the heuristic value of the parent. For example, there are six nodes at depth 3 with heuristic value 2, one with a parent that has a heuristic value 1, two with parents that have a heuristic value 2, and three with parents that have a heuristic value 3. In this example, the maximum value of the heuristic is 4, and the heuristic value of the initial state is 3.
One assumption of our analysis is that the heuristic is consistent. Because of this, and since all edges have unit cost (
for all
u,
v) in this example, the heuristic value of a child must be at least the heuristic value of its parent, minus one. We assume a cutoff threshold of eight moves for this iteration of IDA*. Solid arrows represent sets of
fertile nodes that will be expanded, and dotted arrows
represent sets of
sterile nodes that will not be expanded because their total cost,
, exceeds the cutoff threshold.
The values at the far right of
Figure 5.6 show the number of nodes expanded at each depth, which is the number of fertile nodes at that depth.
is the number of nodes in the brute-force search tree at depth
i, and
P(
h) is the equilibrium heuristic distribution. The number of nodes generated is the branching factor times the number expanded.
Consider the graph from top to bottom. There is a root node at depth 0, which generates
N1 children. These nodes collectively generate
N2 child nodes at depth 2. Since the cutoff threshold is eight moves, in the worst case, all nodes
n of which the total cost
will be expanded. Since 4 is the maximum heuristic value, all nodes down to depth
will be expanded. Thus, for
, the number of nodes expanded at depth
d will be
Nd, the same as in a brute-force search. Since 4 is the maximum heuristic value,
, and, hence,
.
The nodes expanded at depth 5 are the fertile nodes, or those for which
, or
. At sufficiently large depths, the distribution of heuristic values converges to the equilibrium distribution. Assuming that the heuristic distribution at depth 5 approximates the equilibrium distribution, the fraction of nodes at depth 5 with
is approximately
. Since all nodes at depth 4 are expanded, the total number of nodes at depth 5 is
N5, and the number of fertile nodes is
.
There exist nodes at depth 6 with heuristic values from 0 to 4, but their distribution differs from the equilibrium distribution. In particular, nodes with heuristic values 3 and 4 are underrepresented relative to the equilibrium distribution, because these nodes are generated by parents with heuristic values from 2 to 4. At depth 5, however, the nodes with heuristic value 4 are sterile, producing no offspring at depth 6, hence reducing the number of nodes at depth 6 with heuristic values 3 and 4. The number of nodes at depth 6 with
is completely unaffected by any pruning however, since their parents are nodes at depth 5 with
, all of which are fertile. In other words, the number of nodes at depth 6 with
, which are the fertile nodes, is exactly the same as in the brute-force search tree, or
.
Due to consistency of the heuristic function, all possible parents of fertile nodes are themselves fertile. Thus, the number of nodes to the left of the diagonal line in
Figure 5.6 is exactly the same as in the brute-force search tree. In other words, heuristic pruning of the tree has no effect on the number of fertile nodes, although it does effect the sterile nodes. If the heuristic was inconsistent, then the distribution of fertile nodes would change at every level where pruning occurred, making the analysis far more complex.
When all edges have unit cost, the number of fertile nodes at depth
i is
, where
Ni is the number of nodes in the brute-force search tree at depth
i,
d is the cutoff depth, and
P is the equilibrium heuristic distribution. The total number of nodes expanded by an iteration of IDA* to depth
d is
Let us now generalize this result to nonunit edge costs. First, we assume that there is a minimum edge cost; we can, without loss of generality, express all costs as multiples of this minimum cost, thereby normalizing it to one. Moreover, for ease of exposition these transformed actions and heuristics are assumed to be integers; this restriction can easily be lifted.
We replace the depth of a node by
, the sum of the edge costs from the root to the node. Let
Ni be the number of nodes
u in the brute-force search tree with
. We assume that the heuristic is consistent, meaning that for any two nodes
u and
v,
, where
is the cost of an optimal path from
u to
v.
Theorem 5.5
(Node Prediction Formula) For larger values of c the expected number of nodes expanded by IDA* up to cost c, given a problem-space tree with Ni nodes of cost i, with a heuristic characterized by the equilibrium distribution P, is Proof
Consider the nodes
u for which
, which is the set of nodes of cost
i in
the brute-force search tree. There are
Ni such nodes. The nodes of cost
i that will be expanded by IDA* in an iteration with cost threshold
c are those for which
, or
. By definition of
P, in the limit of large
i, the number of such nodes in the brute-force search tree is
. It remains to show that all these nodes in the brute-force search tree are also in the tree generated by IDA*.
Consider an ancestor node
v of such a node
u. Then there is only one path between them in the tree, and
, where
is the cost of the path from node
v to node
u. Since
, and
,
. Since the heuristic is consistent,
, where
is the cost of an optimal path from
v to
u in the problem graph, and hence,
. Thus,
, or
. Since
,
, or
. This implies that node
m is fertile and will be expanded during the search. Therefore, since all ancestors of node
u are fertile and will be expanded, node
u must eventually be generated itself. In other words, all nodes
u in the brute-force search tree for which
are also in the tree generated by IDA*. Since there can't be any nodes in the IDA* tree that are not in the brute-force search tree, the number of such nodes at level
i in the IDA* tree is
, which implies the claim.
The effect of earlier iterations (small values of
c) on the time complexity of IDA* depends on the rate of growth of node expansions in successive iterations. The
heuristic branching factor is the ratio of the number of nodes expanded in a search-to-cost threshold
c, divided by the nodes expanded in a search-to-cost
c − 1, or
, where the normalized minimum edge cost is 1. Assume that the size of the brute-force search tree grows exponentially as
, where
b is the brute-force branching factor. In that case, the heuristic branching factor
is
The first term of the numerator,
, is less than or equal to one, and can be dropped without significantly affecting the ratio. Factoring
b out of the remaining numerator gives
Thus, if the brute-force tree grows exponentially with branching factor
b, then the running time of successive iterations of IDA* also grows by a factor of
b. In other words, the heuristic branching factor is the same as the brute-force branching factor. In that case, it is easy to show that the overall time complexity of IDA* is
times the complexity of the last iteration (see Exercises).
Our analysis shows that on an exponential tree, the effect of a heuristic is to reduce search complexity from
to
, for some constant
k, which depends only on the heuristic function; contrary to previous analyses, however, the branching factor remains basically the same.
5.6.3. *Convergence Criteria
We have not yet looked closely at the convergence conditions of the process for computing the asymptotic branching factor.
The matrix for calculating the population of nodes implies
, with
being the vector of node sizes of different types. The asymptotic branching factor
b is the limit of
, where
. We observe that in most cases
for every
, where
k is the number of state types. Evaluating
for increasing depth
d is exactly what is considered in the
algorithm of van Mises for approximating the largest eigenvalue (in absolute terms) of
P. The algorithm is also referred to as the
power iteration method.
As a precondition, the algorithm requires that
P be diagonalizable. This implies that we have
n different eigenvalues
and each eigenvalue
with multiplicity of
has
linear-independent eigenvectors. Without loss of generality, we assume that the eigenvalues are given in decreasing order
. The algorithm further requires that the start vector
has a representation in the basis of eigenvectors in which no coefficient according to
is trivial.
We distinguish the following two cases:
and
. In the first case we obtain that (independent of the choice of
) the value of
equals
. Similarly, in the second case,
is in fact
. The cases
for
are dealt with analogously. The outcome of the algorithm and therefore the limit in the number of nodes in layers with difference
l is
, so that once more the geometric mean turns out to be
.
We indicate the proof of the first case only. Diagonalizability implies a basis of eigenvectors
. Due to
the quotient of
converges to zero for large values of
d. If the initial vector
with respect to the eigenbasis is given as
, applying
yields
by linearity of
P, which further reduces to
by the definition of eigenvalues and eigenvectors. The term
will dominate the sum for increasing values of
d. Factorizing
in the numerator and
in the denominator of the quotient of
results in an equation of the form
, where
is bounded by a constant, since except for the leading term
, both the numerator and the denominator in
R only involve expressions of the form
. Therefore, to find the asymptotic branching factor analytically, it suffices to determine the set of eigenvalues of
P and take the largest one. This corresponds to the results of the asymptotic branching factors in the (
n2 − 1)-
Puzzle.
For the
Fifteen-
Puzzle for increasing depth
d the value
equals 1, 3, 13/5, 45/13, 47/15, 461/141, 1485/461, 4813/1485, 15565/4813, 50381/15565, 163021/50381, 527565/163021 = 3.236178161,
and so on, a sequence approximating
. Moreover, the ratio of
and
quickly converges to
.