Optimal planners compute the best possible plan. They minimize the number of (sequential or parallel) plan steps or optimize the plan with respect to the plan metric. There are many different proposals for optimal planning, ranging from plan graph encoding, satisfiability solving, integer programming to constraint satisfaction, just to name a few. Many modern approaches consider heuristic search planning based on admissible estimates.
We first introduce optimal planning via layered planning graphs and via satisfiability planning. Next, admissible heuristics are devised that are based on dynamic programming or on pattern databases.
15.1.1. Graphplan
Graphplan is a parallel-optimal planner for propositional planning problems. The rough working of the
Graphplan algorithm is the following. First, the parametric input is grounded. Starting with a planning graph, which only consists of the initial state in the first layer, the graph is incrementally constructed. In one stage of the algorithm, the planning graph is extended with one action and one propositional layer followed by a graph extraction phase. If this search terminates with no solution, the plan horizon
is extended with one additional layer. As an example, consider the
Rocket domain for transporting cargo in space (see
Fig. 15.1). The layered planning graph is shown in
Table 15.1.
Table 15.1 Layered planning graph for the Rocket domain. Propositional and action levels alternate. First propositional level includes the initial state. The action layer i contains all actions with preconditions satisfied in propositional layer i − 1. A proposition is added to layer i if it is already contained in layer i − 1 or there is an applicable action that has the proposition as an add-effect.
Prop. Level 1 | Action Level 1 | Prop. Level 2 | Action Level 2 | Prop. Level 3 |
---|
| (load b l) | (in b r) | (noop) | (in b r) |
| (load a l) | (in a r) | (noop) | (in a r) |
| (move l p) | (at r p) | (noop) | (at r p) |
| | | (unload a p) | (at a p) |
| | | (unload b p) | (at b p) |
(at a l) | (noop) | (at a l) | (noop) | (at a l) |
(at b l) | (noop) | (at b l) | (noop) | (at b l) |
(at r l) | (noop) | (at r l) | (noop) | (at r l) |
(fuel r) | (noop) | (fuel r) | (noop) | (fuel r) |
In the forward phase the next action layer is determined as follows. For each action and each instantiation of it, a node is inserted, provided that no two preconditions are marked as mutually exclusive (mutex ). Two propositions a and b are marked mutex, if all options to generate a are exclusive to all options to generate b. Additionally, so-called noop actions are included that merely propagate existing propositions to the next layer. Next, the actions are tested for exclusiveness and for each action a list of all other actions is recorded, for which they are exclusive. To generate the next propositional layer, the add-effects are inserted and taken as the input for the next stage.
A fixpoint is reached when the set of propositions no longer changes. As one subtle problem, this test for termination is not sufficient. Consider a goal in
Blocksworld given by
(on a b),
(on b c), and
(on c a). Every two conditions are satisfiable but not all three all together. Thus, the forward phase terminates if the set of propositions
and the goal set in the previous stage have not changed.
It is not difficult to see that the size of the planning graph is polynomial in the number
n of objects, the number
m of action schemas, the number
p of propositions in the initial state, the length
l of the longest add-list, and the number of layers in the planning graph. Let
k be the largest number of action parameters. The number of instantiated effects is bounded by
. Therefore, the maximal number of nodes in every propositional layer is bounded by
. Since every action schema can be instantiated in at most
different ways, the maximal number of nodes in each action layer is bounded by
.
In the generated planning graph, Graphplan constructs a valid plan, chaining backward from the set of goals. In contrast to most other planners, this processing works layer by layer to take care of preserving the mutex relations. For a given time step t, Graphplan tries to extract all possible actions (including noops) from time step t − 1, which has the current subgoal as add-effects. The preconditions for these actions build the subgoals to be satisfied in time step t − 1 to satisfy the chosen subgoal at time step t. If the set of subgoals is not solvable, Graphplan chooses a different set of actions, and recurs until all subgoals are satisfied or all possible combinations have been exhausted. In the latter case, no plan (with respect to the imposed depth threshold) exists, as it can be shown that Graphplan terminates with failure if and only if there is no solution to the current planning problem. The complexity of the backward phase is exponential (and has to be unless PSPACE = P). By its layered construction, Graphplan constructs a plan that has the smallest number of parallel plan steps.
15.1.2. Satplan
To cast a fully instantiated action planning task as a satisfiability problem we assign to each proposition a time stamp denoting when it is valid. As an example, for the initial state and goal condition of a sample
Blocksworld instance, we may have generated the formula
Further formulas express action execution. They include action effects, such as
and
noop clauses for propositions to persist. To express that no action with unsatisfied precondition is executed, we may introduce rules, in which additional action literals imply their preconditions, such as
Effects are dealt with analogously. Moreover, (for sequential plans) we have to express that at one point in time only one action can be executed. We have
Finally, for a given step threshold
N we require that at every step
i <
N at least one action has to be executed; for all
i <
N we require
The only model of this planning problem is (move a b t 1), (move b t a 2). Satplan is a combination of a planning problem encoding and an SAT solver. The planner's performance participates in the development of more and more efficient SAT solvers.
Extensions of Satplan to parallel planning encode the planning graph and the mutexes of Graphplan. For this approach, Satplan does not find the plan with the minimal total number of steps, but the one with the minimal number of parallel steps.
15.1.3. Dynamic Programming
One of the first estimates for heuristic search planning is the
max-atom heuristic. It is an approximation of the optimal cost for solving a relaxed problem in which the delete lists are ignored. An illustration is provided in
Figure 15.2 (left).
The heuristic is based on dynamic programming. Consider a propositional and grounded propositional planning problem
(see
Ch. 1) with planning states
u and
v, and propositions
p and
q. Value
is the sum of the approximations
g(
u,
p), to reach
p from
u, where
is a fixpoint equation. The recursion starts with
g(
u,
p) = 0, for
p ∈
u, and
g(
u,
p) = ∞, otherwise. The values of
g are computed iteratively, until the values no longer change. The costs
g(
u,
C) for the set
C is computed as
. In fact, the cost of achieving all the propositions in a set can be computed as either a sum (if sequentiality is assumed) or as a max (if parallelism is assumed instead). The
max-atom heuristic chooses the maximum of the cost values with
. The pseudo code is shown in
Algorithm 15.1. This heuristic is consistent, since by taking the maximum cost values, the heuristic on an edge can decrease by at most 1.
As the heuristic determines the backward distance of atoms with respect to the initial state, the previous definition of the heuristic is of use only for backward or regression search. However, it is not difficult to implement a variant that is applicable to forward or progression search, too.
The
max-atom heuristic has been extended to
max-atom-pair to include larger atom sets, which enlarges the extracted information (without losing admissibility) by approximating the costs of atom pairs:
where
p & q denotes that
p and
q are both contained in the add list of the action, and where
p|
q denotes that
p is contained in the add list and
q is neither contained in the add list nor in the delete list. If all goal distances are precomputed, then these distances can be retrieved from a table, yielding a fast heuristic.
The extension to
hm with atom sets of
elements has to be expressed with the following recursion:
where
R(
C) denotes all pairs (
D,
O) that contain
,
, and
. It is rarely used in practice.
It has been argued that the layered arrangement of propositions in Graphplan can be interpreted as a special case of a search with h2.
15.1.4. Planning Pattern Databases
In the following, we study how to apply
pattern databases (see
Ch. 4) to action planning. For a fixed state vector (in the form of a set of instantiated predicates) we provide a general abstraction function. The ultimate goal is to create admissible heuristics for planning problems fully automatically.
Planning patterns omit propositions from the planning space. Therefore, in an abstract planning problem with regard to a set
, a domain abstraction is induced by
. (Later on we will discuss how
R can be selected automatically.) The interpretation of
ϕ is that all Boolean variables for propositions not in
R are mapped to
don't care. An
abstract planning problem of a propositional planning problem (
S,
A,
s,
T) with respect to a set of propositional atoms
R is defined by
,
,
,
, where
a|
R for
is given as
. An illustration is provided in
Figure 15.2 (right).
This means that abstract actions are derived from concrete ones by intersecting their precondition, add, and delete lists with the subset of predicates in the abstraction. Restriction of actions in the concrete space may yield
noop actions in the abstract planning problem, which are discarded from the action set
A|
R.
A
planning pattern database with respect to a set of propositions
R and a propositional planning problem
is a collection of pairs (
d,
v) with
and
d being the shortest distance to the abstract goal. Restriction |
R is
solution preserving; that is, for any sequential plan
π for the propositional planning problem
P there exists a plan
πR for the abstraction
. Moreover, an optimal abstract plan for
P|
R is shorter than or equal to an optimal plan for
P. Strict inequality holds if some abstract actions are noops, or if there are even shorter paths in the abstract space. We can also prove consistency of pattern database heuristics as follows. For each action
a that maps
u to
v in the concrete space, we have
. By the triangular inequality of shortest paths, the abstract goal distance for
ϕ(
v) plus one is larger than or equal to the abstract goal distance
ϕ(
u).
Figure 15.3 illustrates a standard (left) and multiple pattern database (right) showing that different abstractions may refer to different parts of the state vector.
Encoding
Propositional encodings are not necessarily the best state space representations for solving propositional planning problems. A multivalued variable encoding is often better. It transforms a propositional planning task into an
SAS+ planning problem, which is defined as a quintuple
, with
being a set of state variables of states in
S(each
has finite-domain
Dv),
A being a set of actions given by a pair (
P,
E) for preconditions and effects, and
s and
T being the initial and goal state
s, respectively, in the form of a partial assignment. A
partial assignment for
X is a function
s over
X, such that
, if
s(
v) is defined.
The process of finding a suitable multivalued variable encoding is illustrated with a simple planning problem, where a truck is supposed to deliver a package from Los Angeles to San Francisco. The initial state is defined by the atoms (ispackage package), (istruck truck), (location los-angeles), (location san-francisco), (at package los-angeles), and (at truck los-angeles). Goal states have to satisfy the condition (at package san-francisco). The domain provides three action schemas named load to load a truck with a certain package at a certain location, the inverse operation unload, and drive to move a truck from one location to another.
The first preprocessing step will detect that only the at (denoting the presence of a given truck or package at a certain location) and in predicates (denoting that a package is loaded in a certain truck) change over time and need to be encoded. The labeling predicates ispackage, istruck, location are not affected by any action and thus do not need to be specified in a state encoding.
In a next step, some mutual exclusion constraints are discovered. In our case, we will detect that a given object will always be at or in at most one other object, so propositions such as (at package los-angeles) and (in package truck) are mutually exclusive. This result is complemented by fact space exploration: Ignoring negative (delete) effects of actions, we exhaustively enumerate all propositions that can be satisfied by any legal sequence of actions applied to the initial state, thus ruling out illegal propositions, such as (in los-angeles package), (at package package), or (in truck san-francisco). Now all the information that is needed to devise an efficient state encoding schema for this particular problem is at the planner's hands. We discover that three multivalued variables are needed. The first one is required for encoding the current city of the truck and the other two variables are required to encode the status of the package.
Using the multivalued variable encoding the number of possible planning states shrinks drastically, whereas the number of reachable planning states remains unchanged. As another consequence, the multivalued variable encoding provides better predictions to the expected sizes of the abstract planning spaces.
Multiple Pattern Databases
As abstract planning problems are defined by a selection R of atoms there are several different candidates for a pattern database. This remains true when projecting multivalued variables.
We exemplify our considerations in Blocksworld. The domain is specified with the four action schemas pick-up, put-down, stack, and unstack, and the propositions on, ontable, handempty, and clear. The example problem contains four blocks a, b, c, and d used for grounding. For example, (pick–up a) is defined as the precondition list {(clear a), (ontable a), (handempty)}, the add list {(holding a)}, and the delete list {(ontable a), (clear a), (handempty)}. The goal of the instance is {(on d c), (on c a), (on a b)} and the initial state is given by {(clear b), (ontable d), (on b c), (on c a), (on a d)}.
One possible multivalued variable encoding for the domain, consisting of groups of mutually exclusive propositions, which has been automatically inferred by a static analyzer (as explained earlier) has nine variable domains:
Dv1 = {
(on c a),
(on d a),
(on b a),
(clear a),
(holding a)} (for the blocks on top of
a);
Dv2 = {
(on a c),
(on d c),
(on b c),
(clear c),
(holding c)} (for the blocks on top of
c);
Dv3 = {
(on a d),
(on c d),
(on b d),
(clear d),
(holding d)} (for the blocks on top of
d);
Dv4 = {
(on a b),
(on c b),
(on d b),
(clear b),
(holding b)} (for the blocks on top of
b);
Dv5 = {
(ontable a),
none} (is block
a bottom-most in tower);
Dv6 = {
(ontable c),
none} (is block
c bottom-most in tower);
Dv7 = {
(ontable d),
none} (is block
d bottom-most in tower);
Dv8 = {
(ontable b),
none} (is block
b bottom-most in tower); and
Dv9 = {
(handempty),
none} (for the gripper). Each state can be expressed by selecting one proposition for each variable and
none refers to the absence of all other propositions in the group.
We may select variables with even index to define the abstraction
and variables with odd index to define the abstraction
. The resulting planning databases are depicted in
Table 15.2. Note that there are only three atoms present in the goal state so that one of the pattern databases only contains patterns of length one. Abstraction
corresponds to
v1 and
corresponds to the union of
v2 and
v4.
Table 15.2 Two pattern databases for the example problem. Pairs of pattern and goal distances are shown. The upper one refers to the projection onto group of v1 with its set of possible assignments (on c a), (on d a), (on b a), (clear a), and (holding a). It is generated starting from (on c a). The second database projects to the cross-product of v2 covering (on a c), (on d c), (on b c), (clear c), (holding c), and v4 covering (on a b), (on c b), (on d b), (clear b), and (holding b); starting from the partial goal state (on d c), (on a b).
((on c a), 0) |
((clear a), 1) |
((holding a), 2) |
((on b a), 2) |
((on d a), 2) |
((on d c), (on a b), 0) | |
((on d c)(clear b), 1) | ((on a b)(clear c), 1) |
((on d c)(holding b), 2) | ((clear c)(clear b), 2) |
((on d c)(on d b), 2) | ((on a b)(holding c), 2) |
((on a c)(on a b) , 2) | |
((clear c)(holding b), 3) | ((clear b)(holding c), 3) |
((on a c)(clear b), 3) | ((on d b)(clear c), 3) |
((holding c)(holding b), 4) | ((on b c)(clear b), 4) |
((on a c)(holding b), 4) | ((on c b)(clear c), 4) |
((on d b)(holding c), 4) | ((on a c)(on d b), 4) |
((on b c)(holding b), 5) | ((on a b)(on b c), 5) |
((on d b)(on b c), 5) | ((on c b)(holding c), 5) |
((on a c)(on c b), 5) | ((on c b)(on d c), 5) |
To construct an explicit-state pattern database, one can use a hash table for storing the goal distance for each abstract state. In contrast,
symbolic planning pattern databases are planning pattern databases that have been constructed with BDD exploration for latter use either in symbolic or explicit heuristic search. Each search layer is represented by a BDD. In planning practice, a better scaling seems to favor symbolic pattern database construction.
Pattern Addressing
To store the pattern databases, large hash tables are required. The estimate of a state is then a mere hash lookup (one for each pattern database). To improve efficiency of the lookup, it is not difficult to devise an incremental hashing scheme.
First, we assume a propositional encoding. For
we select
as the hash function with prime
q denoting the size of the hash table. The hash value of
can be incrementally calculated as
Of course, hashing induces a (clearly bounded) number of collisions that have to be resolved by using one of the techniques presented in
Chapter 3.
Since
can be precomputed for all
, we have an incremental running time to compute the hash address that is of order
, which is almost constant for most propositional planning problems (see
Ch. 1). For constant time complexity we store
together with each action
a. Either complexity is small, when compared with the size of the planning state.
It is not difficult to extend incremental hash addressing to multivalued variables. According to the partition into variables, a hash function is defined as follows. Let
be the selected variables in the current abstraction and
. Furthermore, let
and
be, respectively, the variable index and the position of proposition
p in its variable group. Then, the hash value of state
u is
The incremental computation of
h(
v) for
is immediate.
Automated Pattern Selection
Even in our simple example planning problem, the number of variables and the sizes of the generated pattern databases for
and
differ considerably. Since we perform a complete exploration for pattern database construction, time and space resources may become exhausted. Therefore, an automatic way to find a balanced partition with regard to a given memory limit is required. Instead of imposing a bound on the total size for all pattern databases altogether, we impose a threshold for the sizes of the individual pattern databases, which has to be adapted to a given infrastructure.
As shown in
Chapter 4, one option for pattern selection is
Pattern Packing: According to the domain sizes of the variables, it finds a selection of pattern databases so that their combined sizes do not exceed the given threshold.
In the following, we consider genetic algorithms for improved pattern selection, where
pattern genes denote which variable appears in which pattern. Pattern genes have a two-dimensional layout (see
Fig. 15.4). It is recommended to initialize the genes with
Pattern Packing. To avoid that all genes of the initial population are identical, the variable ordering for
Pattern Packing can be randomized.
A pattern mutation flips bits with respect to a small probability. This allows us to add or delete variables in patterns. We extended the mutation operator to enable insertion and removal of entire patterns. Using selection, an enlarged population produced by recombination is truncated to its original size, based on their fitness. The (normalized) fitness for the population is interpreted as a distribution function for selecting the next population. Consequently, genes with higher fitness are chosen with higher probability.
The higher the distance-to-goal values in abstract space, the better the quality of the corresponding database in accelerating the search in the concrete search space. As a consequence, we compute the mean heuristic value for each of the databases and superimpose it over a pattern partitioning. If
,
is the
i th pattern database and
its maximal
h-value, then the fitness for gene
g is
or
where the operator depends on the disjointness of the multiple pattern databases.