2.1. Uninformed Graph Search Algorithms
In implicit graph search, no graph representation is available at the beginning; only while the search progresses, a partial picture of it evolves from those nodes that are actually explored. In each iteration, a node is expanded by generating all adjacent nodes that are reachable via edges of the implicit graph (the possible edges can be described for example by a set of transition rules). This means applying all allowed actions to the state. Nodes that have been generated earlier in the search can be kept track of; however, we have no access to nodes that have not been generated so far. All nodes have to be reached at least once on a path from the initial node through successor generation. Consequently, we can divide the set of reached nodes into the set of expanded nodes and the set of generated nodes that are not yet expanded. In AI literature the former set is often referred to as the Open list or the search frontier, and the latter set as the Closed list. The denotation as a list refers to the legacy of the first implementation, namely as a simple linked list. However, we will see later that realizing them using the right data structures is crucial for the search algorithm's characteristics and performance.
The set of all explicitly generated paths rooted at the start node and of which the leaves are the
Open nodes constitutes the
search tree of the underlying problem graph. Note that while the
problem graph is defined solely by the problem domain description, the search tree characterizes the part explored by a search algorithm at some snapshot during its execution time.
Figure 2.1 gives a visualization of a problem graph and a corresponding search tree.
In tree-structured problem spaces, each node can only be reached on a single path. However, it is easy to see that for finite acyclic
graphs, the search tree can be exponentially larger than the original
search space. This is due to the fact that a node can be reached multiple times, at different stages of the search via different paths. We call such a node a
duplicate; for example, in
Figure 2.1 all shown leaves at depth 3 are duplicates. Moreover, if the graph contains cycles, the search tree can be infinite, even if the graph itself is finite.
In
Algorithm 2.1 we sketch a framework for a general node expansion search algorithm.
Definition 2.1
(Closed/Open Lists) The set of already expanded nodes is called Closed and the set of generated but yet unexpanded nodes is called Open. The latter is also denoted as the search frontier.
As long as no solution path has been established, a frontier node u in Open is selected and its successors are generated. The successors are then dealt with in the subroutine Improve, which updates Open and Closed accordingly (in the simplest case, it just inserts the child node into Open). At this point, we deliberately leave the details of how to Select and Improve a node unspecified; their subsequent refinement leads to different search algorithms.
Open and Closed were introduced as data structures for sets, offering the opportunities to insert and delete nodes. Particularly, an important role of Closed is duplicate detection. Therefore, it is often implemented as a hash table with fast lookup operations.
Duplicate identification is total, if in each iteration of the algorithm, each node in Open and Closed has one unique representation and generation path. In this chapter, we are concerned with algorithms with total duplicate detection. However, imperfect detection of already expanded nodes is quite frequent in state space search out of necessity, because very large state spaces are difficult to store with respect to given memory limitations. We will see many different solutions to this crucial problem in upcoming chapters.
Generation paths do not have to be fully represented for each individual node in the search tree. Rather, they can be conveniently stored by equipping each node
u with a
predecessor link , which is a pointer to the parent in the search tree (or
for the root
s). More formally,
if
. By tracing the links back in bottom-up direction until we arrive at the root
s, we can reconstruct a solution path
of length
k as
(see
Alg. 2.2).
Algorithm 2.3 sketches an implementation of
Improve with duplicate detection and predecessor link updates. Note that this first, very crude implementation does not attempt to find a shortest path; it merely decides if a path exists at all from the start node to a goal node.
To illustrate the behavior of the search algorithms, we take a simple example of searching a goal node at
from node
in the
Gridworld of
Figure 2.2. Note that the potential set of paths of length
i in a grid grows exponentially in
i; for
we have at most
, for
we have at most
, and for
we have at most
paths.
2.1.1. Depth-First Search
For
depth-first search (DFS), the
Open list is implemented as a
stack (a.k.a. a LIFO, or
last-in first-out queue), so that
Insert is in fact a
push operation and
Select corresponds to a
pop operation. Operation
push places an element and operation
pop extracts an element at the top of this data structure. Successors are simply pushed onto the stack. Thus, each step greedily generates a successor of the last visited node, unless it has none, in which case it backtracks to the parent and explores another not yet explored sibling.
It is easy to see that in finite search spaces DFS is complete (i.e., will find a solution path if there is some), since each node is expanded exactly once. It is, however, not optimal. Depending on which successor is expanded first, any path is possible. Take, for example, the solution ((3,3), (3,2), (2,2), (2,3), (2,4), (2,5), (3,5), (3,4), (4,4), (4,3), (5,3), (5,4), (5,5)) in the Gridworld example. The path length, defined as the number of state transitions, is 12 and hence larger than the minimum.
Table 2.1 and
Figure 2.3 show the expansion steps and the search tree explored by DFS when run on the example of
Figure 2.1. Without loss of generality, we assume that children are expanded in alphabetical order.
Table 2.1 Steps in DFS (with duplicate detection) for the example in Figure 2.1.
Step | Selection | Open | Closed | Remarks |
---|
1 | {} | {a} | {} | |
2 | a | {b,c,d} | {a} | |
3 | b | {e,f,c,d} | {a,b} | |
4 | e | {f,c,d} | {a,b,e} | f is duplicate |
5 | f | {c,d} | {a,b,e,f} | e is duplicate |
6 | c | {d} | {a,b,e,f,c} | d is duplicate |
7 | d | {g} | {a,b,e,f,c,d} | c is duplicate |
8 | g | {} | {a,b,e,f,c,d,g} | Goal reached |
Without duplicate elimination, DFS can get trapped in cycles of the problem graph and loop forever without finding a solution at all.
2.1.2. Breadth-First Search
For
breadth-first search (BFS), the set
Open is realized as a
first-in first-out queue (FIFO). The
Insert operation is called
Enqueue, and adds an element to the end of the list; the
Dequeue operation selects
and removes its first element. As a result, the neighbors of the source node are generated layer by layer (one edge apart, two edges apart, and so on).
As for DFS, Closed is implemented as a hash table, avoiding nodes to be expanded more than once. Since BFS also expands one new node at a time, it is complete in finite graphs. It is optimal in uniformly weighted graphs (i.e., the first solution path found is the shortest possible one), since the nodes are generated in level order with respect to the tree expansion of the problem graph.
One BFS search order in the Gridworld example is ((3,3), (3,2), (2,3), (4,3), (3,4), (2,2), (4,4), (4,2), (2,4), (3,5), (5,3), (1,3), (3,1), …, (5,5)). The returned solution path ((3,3), (4,3), (4,4), (4,5), (5,5)) is optimal.
Table 2.2 Steps in BFS (with duplicate detection) for the example in Figure 2.1.
Step | Selection | Open | Closed | Remarks |
---|
1 | {} | {a} | {} | |
2 | a | {b,c,d} | {a} | |
3 | b | {c,d,e,f} | {a,b} | |
4 | c | {d,e,f} | {a,b,c} | d is duplicate |
5 | d | {e,f,g} | {a,b,c,d} | c is duplicate |
6 | e | {f,g} | {a,b,c,d,e} | f is duplicate |
7 | f | {g} | {a,b,c,d,e,f} | e is duplicate |
8 | g | {} | {a,b,c,d,e,f,g} | Goal reached |
A possible drawback for BFS in large problem graphs is its large memory consumption. Unlike DFS, which can find goals in large search depth, it stores all nodes with depth smaller than the shortest possible solution length.
2.1.3. Dijkstra's Algorithm
So far we have looked at uniformly weighted graphs only; that is, each edge counts the same. Now let us consider the generalization that edges are weighted with a
weight function (a.k.a.
cost function)
w. In weighted graphs, BFS loses its optimality. Take, for example, weights on the DFS solution path
p of 1/12, and weights of 1 for edges not on
p. This path is of total weight 1, and the BFS solution path is of weight
.
To compute the shortest (cheapest) path in graphs with nonnegative weights, Dijkstra proposed a greedy search strategy based on the
principle of optimality. It states that
an optimal path has the property that whatever the initial conditions and control variables (choices) over some initial period, the control (or decision variables) chosen over the remaining period must be optimal for the remaining problem, with the node resulting from the early decisions taken to be the initial condition. Applying the principle developed by Richard Bellman to shortest path search results in
In words, the minimum distance from
s to
v is equal to the minimum of the sum of the distance from
s to a predecessor
u of
v, plus the edge weight between
u and
v. This equation implies that any subpath of an optimal path is itself optimal (otherwise it could be replaced to yield a shorter path).
The search algorithm maintains a tentative value of the
shortest distance. More precisely, an upper bound
on
for each node
u; initially set to
,
is successively decreased until it matches
. From this point on, it remains constant throughout the rest of the algorithm.
A suitable data structure for maintaining Open is a priority queue, which associates each element with its f-value, and provides operations Insert and DeleteMin (accessing the element with the minimum f-value and simultaneously removing it from the priority queue). Additionally, the DecreaseKey operation can be thought of as deleting an arbitrary element and reinserting it with a lower associated f-value; executing these two steps together can be performed more efficiently in some implementations. Note that the signature of Insert requires now an additional parameter: the value used to store the node in the priority queue.
The algorithm initially inserts
s into the priority queue with
set to zero. Then, in each iteration an
Open node
u with minimum
f-value is selected, and all its children
v reachable by outgoing edges are generated. The subroutine
Improve of
Algorithm 2.1 now updates the stored estimate
for
v if the newly found path via
u is shorter than the best previous one. Basically, if for a path you can take a detour via another path to shorten the path, then it should be taken.
Improve inserts
v into
Open, in turn. The pseudo code is listed in
Algorithm 2.4. This update step is also called a
node relaxation. An example is given in
Figure 2.5.
For illustration, we generalize our running example by assuming edge weights, as given in
Figure 2.6. The execution of the algorithm is given in
Table 2.3 and
Figure 2.7.
Table 2.3 Steps in Dijkstra's algorithm for the example in Figure 2.6. s in brackets denote the f-value.
Step | Selection | Open | Closed | Remarks |
---|
1 | {} | {a(0)} | {} | |
2 | a | {b(2),c(6),d(10)} | {a} | |
3 | b | {e(6),f(6),c(6),d(10)} | {a,b} | Ties broken arbitrarily |
4 | e | {f(6),c(6),d(10)} | {a,b,e} | f is duplicate |
5 | f | {c(6),d(10)} | {a,b,e,f} | e is duplicate |
6 | c | {d(9)} | {a,b,e,f} | d reopened, parent changes to c |
7 | d | {g(14)} | {a,b,e,f,c,d} | a is duplicate |
8 | g | {} | {a,b,e,f,c,d,g} | Goal reached |
The correctness argument of the algorithm is based on the fact that for a node
u with minimum
f-value in
Open,
f is
exact; that is,
.
Proof
Assume the contrary and let
u be the first selected node from
Open with
; that is,
. Furthermore, let
be a shortest path for
u with
y being the first node on the path that is not expanded (see
Fig. 2.8).
Then
, since
given the minimality of
u. Furthermore, the edge
has been relaxed. Hence,
(in the second-last step, the positivity of weights is used). This inequality is a contradiction to the selection of
u from the priority queue, instead of
y.
It is important to observe that
Lemma 2.1 suits Dijkstra's exploration scheme to implicit enumeration, because at the first encountering of a goal node
t we already have
.
Theorem 2.1
(Correctness Dijkstra's Algorithm) In weighted graphs with nonnegative weight function the algorithm of Dijkstra's algorithm is optimal; that is, at the first node that is selected for expansion, we have .
Proof
With nonnegative edge weights, for each pair
with
we always have
. Therefore, the values
f for selected nodes are monotonically increasing. This proves that at the first selected node
we have
.
In infinite graphs we have to guarantee that a goal node will eventually be reached.
Theorem 2.2
(Dijkstra's Algorithm on Infinite Graphs) If the weight function w of a problem graph is strictly positive and if the weight of every infinite path is infinite, then Dijkstra's algorithm terminates with an optimal solution. Proof
The premises induce that if the cost of a path is finite, the path itself is finite. Therefore, there are only finitely many paths of cost smaller than
. We further observe that no path of cost
can be a prefix of an optimal solution path. Therefore, Dijkstra's algorithm examines the problem graph only on a finite subset of all infinite paths. A goal node
with
will eventually be reached, so that Dijkstra's algorithm terminates. The solution will be optimal by the correctness argument of
Theorem 2.1.
Note that for all nodes
u in
Closed, an optimal path from
s to
u has been found. Thus, a slight modification of Dijkstra's algorithm that only stops when
Open runs empty can not only find the shortest path between a single source
s and a single target
t, but also to all other nodes (provided, of course, that the number of nodes is finite).
2.1.4. Negatively Weighted Graphs
Unfortunately, the correctness and optimality argument in
Lemma 2.1 is no longer true for graphs with negative edge weights. As a simple example, consider the graph consisting of three nodes
having edges
with
,
with
, and edge
with
, for which the algorithm of Dijkstra computes
instead of the correct value,
.
An even worse observation is that negatively weighted graphs may contain negatively weighted cycles, so that the shortest path may be infinitely long and of value
. This has led to the
Bellman-Ford algorithm to be described later. However, we can still handle graphs with negative weights using a modified Dijkstra algorithm if we impose a slightly less restrictive condition on the graph, namely that
for all
u. That is, the distance from each node to the goal is nonnegative. Figuratively speaking, we can have negative edges when far from the goal, but they get “eaten up” when coming closer. The condition implies that no negatively weighted cycles exist.
In the sequel, we will denote the extended version of Dijkstra's algorithm as algorithm A. As can be gleaned from the comparison between
Algorithm 2.5 and
Algorithm 2.4, with negative edges it can be necessary to reopen not only
Open nodes, but also
Closed ones.
Proof
Without loss of generality let
i be maximal among the nodes satisfying (I). We distinguish the following cases:
1. Node
u is not on
p or
. Then node
remains in
Open. Since no
v in
with
is changed and no other node is added to
Closed, (I) is preserved.
2. Node
u is on
p and
. If
, there is nothing to show.
First assume
. Then
Improve will be called for
; for all other nodes in
, the argument of case 1 holds. According to (I), if
v is in
Closed, then
, and it will be reinserted into
Open with
. If
v is neither in
Open nor
Closed, it is inserted into
Open with this merit. Otherwise, the
DecreaseKey operation will set it to
. In either case,
v guarantees the invariance (I).
Now suppose
. By the maximality assumption of
i we have
with
. If
, no
DecreaseKey operation can change it because
already has optimal merit
. Otherwise,
remains in
Open with an unchanged
f-value and no other node besides
u is inserted into
Closed; thus,
still preserves (I).
Proof
Assume that the algorithm does terminate at node
with
. According to
there is a node
u with
in
Open, which lies on an optimal solution path
to
t. We have
in contradiction to the fact that
is selected from
Open.
In infinite graphs we can essentially apply the proof of
Theorem 2.2.
Theorem 2.4
(A in Infinite Graphs) If the weight of every infinite path is infinite, then algorithm A terminates with an optimal solution.
Proof
Since
for all
u, no path of cost
can be a prefix of an optimal solution path.
2.1.5. Relaxed Node Selection
Dijkstra's algorithm is bound to always expand an Open node with minimum f-value. However, as we will see in later chapters, sometimes it can be more efficient to choose nodes based on other criteria. For example, in route finding in large maps we might want to explore neighboring streets in subregions together to optimize disk access.
In
Algorithm 2.6 we give a pseudo-code implementation for a relaxed node selection scheme that gives us precisely this freedom. In contrast to Algorithm A and Dijkstra's algorithm, reaching the first goal node will no longer guarantee optimality of the established solution path. Hence, the algorithm has to continue until the
Open list runs empty. A global current best solution path length
U is maintained and updated; the algorithm improves the solution quality over time.
If we want the algorithm to be optimal we have to impose the same restriction on negatively weighted graphs as in the case of Algorithm A.
Theorem 2.5
(Optimality Node-Selection A, Conditioned) If we have for all nodes , then node selection A terminates with an optimal solution.
Proof
Upon termination, each node inserted into
Open must have been selected at least once. Suppose that invariance (I) is preserved in each loop; that is, that there is always a node
v in the
Open list on an optimal path with
. Thus, the algorithm cannot terminate without eventually selecting the goal node on this path, and since by definition it is not more expensive than any found solution path and
best maintains the currently shortest path, an optimal solution will be returned. It remains to show that the invariance (I) holds in each iteration. If the extracted node
u is not equal to
v there is nothing to show. Otherwise,
. The bound
U denotes the currently best solution length. If
, no pruning takes place. On the other hand,
leads to a contradiction since
(the latter inequality is justified by
).
If we do allow
to become negative, we can at least achieve the following optimality result.
Theorem 2.6
(Optimality Node-Selection A, Unconditioned) For -as the pruning condition in the node selection A algorithm, it is optimal.
Proof
By analogy to the previous theorem, it remains to show that (I) holds in each iteration. If the extracted node
u is not equal to
v there is nothing to show. Otherwise,
. The bound
U denotes the currently best solution length. If
, no pruning takes place. On the other hand,
leads to a contradiction since
, which is impossible given that
U denotes the cost of some solution path; that is,
.
Unfortunately, we do not know the value of
, so the only thing that we can do is to approximate it; in other words, to devise a bound for it.
2.1.6. * Algorithm of Bellman-Ford
Bellman and Ford's algorithm is the standard alternative to Dijkstra's algorithm when searching graphs with negative edge weights. It can handle any such finite graphs (not just those with nonnegative goal distances), and will detect if negative cycles exist.
The basic idea of the algorithm is simple: relax all edges in each of
passes (
n is the number of nodes in the problem graph), where node relaxation of edge
is one update of the form
. It satisfies the invariant that in pass
i, all cheapest paths have been found that use at most
edges. In a final pass, each edge is checked once again. If any edge can be further relaxed at this point, a negative cycle must exist; the algorithm reports this and terminates. The price we pay for the possibility of negative edges is a time complexity of
, worse than Dijkstra's algorithm by a factor of
.
Most of the time, the Bellman-Ford algorithm is described in terms of explicit graphs, and is used to compute shortest paths from a source to all other nodes. In the following, however, we develop an implicit version of the algorithm of Bellman and Ford that makes it comparable to the previously introduced algorithms. One advantage is that we can exploit the fact it is necessary to perform this relaxation in iteration
i only if the
f-value of
u has changed in iteration
.
Note that the Bellman-Ford algorithm can be made to look almost identical to Dijkstra's algorithm by utilizing a queue instead of a priority queue: For all nodes
u extracted from one end of the queue, relax every successor
v of
u, and insert
v into the tail of the queue. The reasoning is as follows. For graphs with negative edge weights it is not possible to have perfect choice on the extracted element that is known to be contained in the
Open list by invariance (I) (see page
58). As we have seen, considering already expanded nodes is necessary. Suppose that
u is the extracted node. Before
u is selected for next time the optimal node
with
has to be selected at least once, such that the solution path
that is associated with
is extended by at least one edge. To implement this objective for convenience we redisplay the
Improve procedure that has been devised so far for the situation where the
Open list is a queue.
Algorithm 2.7 shows the pseudo code.
The implicit version of Bellman and Ford is listed in
Algorithm 2.8. In the original algorithm, detection of negative cycles is accomplished by checking for optimal paths longer than the total number of nodes, after
all edges have been relaxed
times. In our implicit algorithm, this can be done more efficiently. We can maintain the length of the path, and as soon as any one gets longer than
n, we can exit with a failure notice. Also, more stringent checking for duplicates in a path can be implemented.
We omitted the termination condition at a goal node, but it can be implemented analogously as in the
node selection A algorithm. That is, it is equivalent whether we keep track of the current best
solution during the search or (as in the original formulation) scan all solutions after completion of the algorithm.
Theorem 2.7
(Optimality of Implicit Bellman-Ford) Implicit Bellman-Ford is correct and computes optimal cost solution paths.
Proof
Since the algorithm only changes the ordering of nodes that are selected, the arguments for the correctness and optimality of the implicit version of Bellman and Ford's and the node selection A algorithm are the same.
Theorem 2.8
(Complexity of Implicit Bellman-Ford) Implicit Bellman-Ford applies no more than O(ne) node generations.
Proof
Let
be the set
Open (i.e., the content of the queue) when
u is removed for the
i th time from
Open. Then, by applying the Invariant (I) (
Lemma 2.2) we have that
contains at least one element, say
, with optimal cost. Since
Open is organized as a queue,
is deleted from
Open before
u is deleted for the
-th time. Since
is on the optimal path and will never be added again, we have the number of iterations
i is smaller than the number of nodes in the expanded problem graph. This proves that each edge is selected at most
n times, so that at most
ne nodes are generated.
2.1.7. Dynamic Programming
The divide-and-conquer strategy in algorithm design suggests to solve a problem recursively by splitting it into smaller subproblems, solving each of them separately, and then combining the partial results into an overall solution. Dynamic programming was invented as a similarly general paradigm. It addresses the problem that a recursive evaluation can give rise to solving overlapping subproblems repeatedly, invoked for different main goals. It suggests to store subresults in a table so that they can be reused. Such a tabulation is most efficient if an additional node order is given that defines the possible subgoal relationships.
All Pair Shortest Paths
For example, consider the problem of finding the shortest distance for each pair of nodes in
. We could run either
Single-
Source Shortest Paths algorithms discussed so far—BFS or Dijkstra's algorithm—repeatedly, starting from each node
i in turn, but this would traverse the whole graph several times. A better solution is to apply the
All Pairs Shortest Paths Floyd-Warshall algorithm. Here, all distances are recorded in an
n-by-
n matrix
D, where element
indicates the shortest path costs from
i to
j. A sequence of matrices
is computed, where
contains only the edge weights (it is the adjacency matrix), and
contains the shortest distances between nodes with the constraint that intermediate nodes have no index larger than
k.
According to the
principle of optimality it holds that
In particular, if no path between
i and
j passes through
k, then
.
Algorithm 2.9 solves the
All Pairs Shortest Paths problem in
time and
space.
Multiple Sequence Alignment
Dynamic programming is a very effective means in many domains. Here, we will give a
Multiple Sequence Alignment example (see Sec.
1.7.6). Let
w define the cost of substituting one character
with another, and denote the distance between two strings
and
as
. Then, according to the principle of optimality, the following recurrence relation holds:
A pairwise alignment can be conveniently depicted as a path between two opposite corners in a two-dimensional grid: one sequence is placed on the horizontal axis from left to right, the other one on the vertical axis from top to bottom. If there is no gap in either string, the path moves diagonally down and right; a gap in the vertical (horizontal) string is represented as a horizontal (vertical) move right (down), since a letter is consumed in only one of the strings. The alignment graph is directed and acyclic, where a (nonborder) vertex has incoming edges from the left, top, and top-left adjacent vertices, and outgoing edges to the right, bottom, and bottom-right vertices.
The algorithm progressively builds up alignments of prefixes of
m and
in a bottom-up fashion. The costs of partial alignments are stored in a matrix
D, where
contains the distance between
and
. The exact order of the scan can vary (e.g., rowwise or columnwise), as long as it is compatible with a
topological order of the graph; a topological order of a directed, acyclic graph is a sorting of the nodes
, such that if
is reachable from
, then it must hold that
. In particular,
has no incoming edges, and if the number of nodes is some finite
n, then
has no outgoing edges. In general, many different topological orderings can be constructed for a given graph.
For instance, in the alignment of two sequences, a cell value depends on the values of the cells to the left, top, and diagonally top-left, and these have to be explored before it.
Algorithm 2.10 shows the case of columnwise traversal. Another particular such ordering is that of
antidiagonals, which are diagonals running from top-right to bottom-left. The antidiagonal number of a node is simply the sum of its coordinates.
As an example, the completed matrix for the edit distance between the strings
sport and
sort is shown in
Figure 2.9. After all matrix entries have been computed, the solution path has to be
reconstructed to obtain the actual alignment. This can be done iteratively in a backward direction starting from the bottom-right corner up to the top-left corner, and selecting in every step a parent node that allows a transition with the given cost. Alternatively, we could store in each cell an additional pointer to the relevant predecessor.
It is straightforward to generalize pairwise sequence alignment to the case of aligning
k sequences simultaneously, by considering higher-dimensional lattices. For example, an alignment of three sequences can be visualized as a path in a cube.
Figure 2.10 illustrates an example for the alignment
A B C _ B
_ B C D _
_ _ _ D B
If the sequence length is at most
n, the generalization of
Algorithm 2.10 requires
time and space to store the dynamic programming table. In
Section 6.3.2, we will present a refined algorithm
that reduces the space complexity by one order of magnitude. An example of how successor costs are calculated, with the cost matrix of
Figure 2.10 and a gap opening penalty of 4, is shown in
Figure 2.11.
Markov Decision Process Problems
A common way of calculating an optimal policy is by means of dynamic programming using either policy iteration or value iteration.
Both policy iteration and value iteration are based on the Bellman optimality equation,
In some cases, we apply a
discount to handle infinite paths. Roughly speaking, we can define the value of a state as the total reward/cost an agent can expect to accumulate when traversing the graph according to its policy, starting from that state. The discount factor defines how much more we should value immediate costs/rewards, compared to costs/rewards that are only attainable after two or more
steps. Formally, the corresponding equation according to the principle of optimality is
Policy iteration successively improves a policy
by setting
for each state
u, where the
evaluation of
,
, can be computed as a system of
linear equations:
A pseudo-code implementation for policy iteration is shown in
Algorithm 2.11.
Value iteration improves the estimated
cost-to-go function
f by successively performing the following operation for each state
u:
The algorithm exits if an error bound on the policy evaluation falls below a user-supplied threshold ϵ, or a maximum number of iterations have been executed. If the optimal cost
is known for each state, the optimal policy can be easily extracted by choosing an operation according to a single application of the Bellman equation. Value iteration is shown in pseudo code in
Algorithm 2.12. The procedure takes a heuristic
h for initializing the value function as an additional parameter.
The error bound on the value function is also called the
residual, and can, for example, be computed in form
. A residual of zero denotes that the process has converged. An advantage of policy iteration is that it converges to the exact optimum, whereas value iteration usually only reaches an approximation. On the other hand, the latter technique is usually more efficient on large state spaces.
For implicit search graphs the algorithms proceed in two phases. In the first phase, the whole state space is generated from the initial state
s. In this process, an entry in a hash table (or vector) is allocated to store the
f-value for each state
u; this value is initialized to the cost of
u if
, or to a given (not necessarily admissible) heuristic estimate (or zero if no estimate is available) if
u is nonterminal. In the second phase, iterative scans of the state space are performed updating the values of nonterminal states
u as
where
, which depends on the search model (see
Sec. 1.8.4).
Value iteration converges to the solution optimal value function provided that its values are finite for all
. In the case of MDPs, which may have cyclic solutions, the number of iterations is not bounded and value iteration typically only converges in the limit. For this reason, for MDPs, value iteration is often terminated after a predefined bound of
iterations are performed, or when the residual falls below a given ϵ > 0.
Monte Carlo policy evaluation estimates
, the value of a state under a given policy. Given a set of iterations, value
is approximated by following
. To estimate
, we count the
visits to a fixed state
u. Value
is computed by averaging the returns in a set of iterations. Monte Carlo policy evaluation converges to
as the number of visits goes to infinity. The main argument is that by the law of large numbers the sequence of averages will converge to their expectation.
For convenience of terminology, in the sequel we will continue referring to nodes when dealing with the search algorithm.
2.4. Summary
In this chapter, we discussed several search algorithms and their properties. Some search algorithms find a path from the given start state to any of the given goal states; others find paths from the given start state to all other states; and even others find paths from all states to all other states. Desirable
properties of search algorithms include their correctness (they find a path if and only if one exists), optimality (they find a shortest path), a small runtime, and small memory consumption. In some cases, the correctness or optimality of a search algorithm is only guaranteed on special kinds of graphs, such as unit or nonnegative cost graphs, or where the goal distances of all states are nonnegative. We are often forced to make trade-offs between the different properties, for example, find suboptimal paths because the runtime or memory consumption would otherwise be too large.
We discussed that most search algorithms perform dynamic programming as an underlying technique. Dynamic programming is a general problem-solving technique that assembles the solutions of complex problems from solutions to simpler problems, which are calculated once and reused several times. Dynamic programming works over the entire state space as a whole, in contrast to heuristic search algorithms, which focus on finding only the optimal path for a single current state and prune everything else for efficiency.
On the other hand, most heuristic search algorithms apply dynamic programming updates and traverse the state space in the same way. They build a tree from the given start state to the goal state, maintaining an estimate of the start distances of all states in the search tree. The interior nodes are in the Closed list and the leaf nodes are in the Open list. They repeatedly pick a leaf of the search tree and expand it; that is, generate its successors in the state space and then add them as its children in the search tree. (They differ in which leaves they pick.) They can find a path from the start state to any of the given goal states if they stop when they are about to expand a goal state and return the only path from the root of the search tree to this goal state. They can find paths from the start state to all other states if they stop only when they have expanded all leaf nodes of the search tree. We distinguished uninformed and informed search algorithms. Uninformed search algorithms exploit no knowledge in addition to the graphs they search. We discussed depth-first search, breadth-first search, Dijkstra's algorithm, the Bellman-Ford algorithm, and the Floyd-Warshall algorithm in this context.
Informed (or synonymously, heuristic) search algorithms exploit estimates of the goal distances of the nodes (heuristic values) to be more efficient than uninformed search algorithms. We discussed A* in this context. We discussed the properties of A* in detail since we discuss many variants of it in later chapters. A* with consistent heuristics has many desirable properties:
• A* can find a shortest path even though it expands every state at most once. It does not need to reexpand states that it has expanded already.
• A* is at least as efficient as every other search algorithm in the sense that every search algorithm (that has the same heuristic values available as A*) needs to expand at least the states that A* expands (modulo tie-breaking; that is, possibly except for some states of which the f-values are equal to the length of the shortest path).
• A* with any given heuristic values cannot expand more states than A* with heuristic values that are dominated by the given heuristic values (again modulo tie-breaking).
Many of these properties followed from the properties of Dijkstra's algorithm, since A* for a given search problem and heuristic values behave identically to Dijkstra's algorithm on a search problem that can be derived from the given search problem by changing the edge costs to incorporate the heuristic values.
Table 2.5 summarizes the uninformed and informed search algorithms, namely how to implement their
Open and
Closed lists, whether they can exploit heuristic values, which values their edge costs can take on, whether they find shortest paths, and whether or not they can reexpand states that they
have expanded already. (The figures that describe their pseudo code are given in parentheses.) Most search algorithms build either on depth-first search or on breadth-first search, where depth-first search trades off runtime and optimality for a small memory consumption and breadth-first search does the opposite. Many discussed algorithms relate to breadth-first search: Dijkstra's algorithm is a specialized version of A* in case no heuristic values are available. It is also a specialized and optimized version of the Bellman-Ford algorithm in case all edge costs are nonnegative. Breadth-first search, in turn, is a specialized and optimized version of Dijkstra's algorithm in case all edge costs are uniform. Finally, we discussed algebraic extension to the operator costs and generalizations for search problems of which the edge costs are vectors of numbers (e.g., the cost and time required for executing an action), explaining why dynamic programming often cannot solve them as efficiently as graphs of which the edge costs are numbers because they often need to maintain many paths from the start state to states in the search tree.
2.6. Bibliographic Notes
Shortest path search is a classic topic in the field of combinatorial optimization. The algorithm of
Dijkstra (1959) to find shortest paths in graphs with nonnegative edge weights is one of the most important algorithms in computer science. Sound introductions to shortest path search are given by
Tarjan (1983) and
Mehlhorn (1984). The linear-time shortest path algorithm for acyclic graphs can be found in
Lawler (1976).
Using lower bound to prune search goes back to the late 1950s, where branch-and-bound search was proposed. The first use of distance goal estimates to guide state space search is probably due to
Doran and Michie (1966) in a program called
GraphTraverser. Original A* search as proposed by
Hart, Nilsson, and Raphael (1968) refers to Dijkstra's algorithm, but in most AI textbooks the link to standard graph theory has vanished. The reweighting transformation is found in
Cormen, Leiserson, and Rivest (1990) in the context of the
All Pairs Shortest Paths problem of
Johnson (1977). It makes it abundantly clear that heuristics do not change the branching factor of a space, but affect the relative depth of the goal, an issue stressed while predicting the search efforts of IDA*. The general problem of finding the shortest path in a graph containing cycles of negative length is NP-hard (
Garey and Johnson, 1979).
Bellman (1958) and
Ford and Fulkerson (1962) discovered the algorithm for shortest path search in negative graphs independently. A nondeterministic version is discussed by
Ahuja, Magnanti, and Orlin (1989).
Gabow and Tarjan (1989) showed that if
C is the weight of the largest edge the algorithm of Bellman and Ford can been improved to
. To alleviate the problem of exponentially many node expansions in A*,
Martelli (1977) suggested a variation of the Bellman-Ford algorithm,
where the relaxation is performed for the entire set of expanded nodes each time a new node is considered. His algorithm has been slightly improved by
Bagchi and Mahanti (1983). The heuristic search version of the Bellman-Ford algorithm were entitled
C,
ProbA, and
ProbC (
Bagchi and Mahanti, 1985).
Mero (1984) has analyzed heuristic search with a modifiable estimate in an algorithm
B′ that extends the
B algorithm of
Martelli (1977). A taxonomy of shortest path search has been given by
Deo and Pang (1984) and a general framework that includes heuristic search has been presented by
Pijls and Kolen (1992).
Sturtevant, Zhang, Holte, and Schaeffer (2008) have analyzed inconsistent heuristics for A* search, and propose a new algorithm that introduces a delay list for a better trade-off.
The technique of dynamic programming has been introduced by
Bellman (1958). One of the most important applications of this principle in computer science is probably the parsing of context-free languages (
Hopcroft and Ullman, 1979;
Younger, 1967). Computing edit-distances with dynamic programming is a result that goes back to mathematicians like Ulam and Knuth. In computational biology, the work of
Needleman and Wunsch (1981) is considered as the first publication that applies dynamic programming to compute the similarity of two strings. A neat application also used for compiling this textbook is the optimal line break algorithm of
Knuth and Plass (1981). A good introduction to the subject of dynamic programming is found in the textbook of
Cormen et al. (1990). The lattice edge representation for multiple sequence alignment has been adopted in the program MSA by
Gupta, Kececioglu, and Schaffer (1996).
The term
best-first search appears in AI literature in two different meanings.
Pearl (1985) uses this term to define a general search algorithm that includes A* as a special case. Others use it to describe an algorithm that always expands the node estimated closest to the goal.
Russell and Norvig (2003) coined the term
greedy best-first search to avoid confusion. In our notation an algorithm is
optimal if it computes a shortest solution path. We chose an optimal instead of an admissible algorithm as done by
Pearl (1985). Optimality in our notation does not mean
optimal efficiency. The optimality efficiency argument on the number of node expansions in A* with consistent heuristics has been given by
Dechter and Pearl (1983). Common misconceptions of heuristic search have been highlighted by
Holte (2009).
The cost formalism extends the one of
Sobrinho (2002) with an additional set, to where the heuristic function is mapped. The problem of Cartesian products and power constructions as defined for semirings by
Bistarelli, Montanari, and Rossi (1997) is that they provide partial orders. Observations on multiobjective A* have been provided by
Mandow and de-la Cruz (2010a,b). Multiobjective IDA* has been considered by Coego,
Mandow, and de-la Cruz (2009). Speed-up techniques for a Dijkstra-like search have been studied by
Berger, Grimmer, and Müller-Hannemann (2010). Advances in multiobjective search also include work on minimum spanning trees by
Galand, Perny, and Spanjaard (2010) as well as
Knapsack problems by
Delort and Spanjaard (2010).