6.3. The Discussion of Statistical Heuristic Search

6.3.1. Statistical Heuristic Search and Quotient Space Theory

The heuristic search generally implements on a tree. A tree is a specific case of a sequence of quotient spaces. If in a sequence of hierarchical quotient spaces, the number of elements in each level is finite, then it’s a tree. Assume that SA implements its search on a uniform m-tree. Let image be the sub-nodes in the first level. From quotient space point of view, it is equivalent to the partition image of domain image, where image is a set of leaf nodes of the subtree rooted at image. Thus, set image is a quotient set of the set of the overall leaf nodes. The statistic image of image is extracted from set image.
A heuristic search on a tree or graph can be restated as follows. A statistical heuristic search is sampling on some level (quotient space) of a search space, extracting statistics and making statistical inference on the level. So we can transfer the statistical heuristic search from a tree (graph) to a sequence of quotient spaces.
1. The Quotient Space Model of Heuristic Search
image is a function defined on image. To find image, there are several ways. In heuristic search, an estimation function image of image and a tree structure of X are introduced, then using image as a guidance to find the optimal solution image on the tree. The same problem can be solved by the quotient space method. First, we transform the problem of finding the optimal solution on space image into a set of its quotient spaces image. Then, by letting the grain-size of the quotient space [X] approaching to zero, then the optimal solution on [X] will approach to the optimal solution on X. If an estimation function of [ f ] on image is introduced, then we may solve the problem on the quotient spaces. From the above statement we can see the connection between heuristic search and quotient space problem solving methods.
On the other hand, in statistical inference some statistic is used to estimate function image. Thus, the statistical inference method can be used to judging which the solution is.
These mean that we can integrate heuristic search, quotient space method and statistical inference to form a new statistical heuristic search model – a quotient space model of statistical heuristic search.
Assume that image is a set of random variables in a basic probability space image, image is a sequence of hierarchical quotient spaces of image, and image are their corresponding equivalence relations, where image denotes that image is a quotient space of image, and image is a finite set.
image, let image be a statistic of image. Based on image implementing a statistical inference S, if image is accepted and image, then image is a goal and success; otherwise fail.
If image, then image is partitioned into image by equivalence relation image. From image extracting statistic image implement statistic inference S based on image,…. Where, statistic image is extracted from a subset of image, so it represents the global information of the subset.
This is a quotient space model of statistical heuristic search.

6.3.2. Hypothesis I

All conclusions about SA we made are under Hypothesis I. We have a further discussion on the hypothesis.
Hypothesis I
Assume that G is a uniform m-ary tree. image let image be a subtree rooted at n. Statistic image extracted from T(n) (called global statistic) satisfies
(1) image is an i.i.d. random variable having a finite fourth moment, image is the mean of a(n).
(2) image (L is a solution path), image; image, image.
In essence, this means that the statistics extracted from image should be different from that extracted from image statistically. In order to apply the statistical inference, in a sense the hypothesis is necessary. But the constraint given in Hypothesis I (2) may be relaxed. Let us examine some relaxed cases.
1. Inconsistent cases
(1) In Hypothesis I, for each image, image and image, image, i.e., the above equalities hold consistently. We now may relax the constraints as follows. There exist constants image and image such that for each image, image; image, image; and image is finite.
Since image, i.e., image is inverse proportion to image or proportion to image, where image, image only depends on the difference between image and image. Therefore, as long as image, even image and image are changing, the order of the mean complexity of SA does not change any more.
(2) In some cases, although image is less than image, there does not exist a constant c independent of N such that the former is different from the latter.
We now discuss these kinds of relaxed conditions.
Assume that image and image are nodes at the k-th level, where image. If there exists constant image such that image, then for imagesubtrees implementing statistical inference S, the order of the complexity is

image

Thus, the order of the total complexity of SA is

image

where image.
The order of mean complexity of SA still remains polynomial. We have the following theorem.

Theorem 6.6

G is a tree. image, image and image are nodes at the k-th level, where image. If there exist constants image and image, such that image (image). Then, the order of total complexity of SA algorithm is

image

2. Mixed Cases
‘False Goals': If there are A(N) nodes not belonging to L such that image does not hold, i.e., the global statistics extracted from the subtrees rooted at such nodes do not satisfy image, then in searching process those nodes statistically are not much different from the nodes belonging to L. There seems to be A(N) 'false goals' in G. The complexity of SA will increase by A(N) times at most. As long as A(N) is a polynomial of N which is the depth the goal is located at, SA can also avoid the exponential explosion.
3. image is not i.i.d
In hypothesis I, it’s assumed that statistic image is i.i.d. and has finite fourth moment. Now, we relax the constraints and only assume that image is independent and has variance image, i.e., give up the requirement of the identical distribution
In the proof of the polynomial complexity of SA, we use formulas image and image. The above two formulas are based on central limit theorem and Chow-Robbins lemma. However, the precondition of central limit theorem is the i.i.d. assumption of image. But the i.i.d. assumption is only the sufficient condition but not necessary. In Gnedenko (1956), the central limit theorem is based on the relaxed conditions as shown in Lemma 6.2.

Lemma 6.2

image are mutually independent random variables. Let
image, where image is the variance of image. If there exists image such that when image,

image(6.11)

Thus, image, we uniformly have

image(6.12)

The above lemma does not require the identical distribution of image. Then we have the following corollary.

Corollary 6.4

Assume that image are mutually independent and have finite fourth moments. Their variances image. Formula (6.12) uniformly holds for x (see Formula 6.12).

Proof

Since image, image. Let image. Since image has finite fourth moment, image.
Substituting the above formula into the left-hand side of Formula (6.11), we have:

image

Namely, Formula (6.11) holds. From Lemma 6.2, Formula (6.11) uniformly holds for x.
We replace the i.i.d. condition of image by the following conditions, i.e. image are mutually independent, and have variances image and finite fourth moments. Similarly, we can revise Chow-Robbins lemma under the same relaxed condition. Since many statistical inference methods are based on the central limit theorem, we have the following theorem.

Theorem 6.7

image, image is the global statistic of nodes and satisfies
(1) Random variables image are mutually independent and have variances image and finite fourth moments.
(2) image and image, image, constant image, where image and image are brother nodes.
Then, the corresponding SA can find the goal with probability one, and the mean complexity image.
In the following discussion, when we said that image satisfies Hypothesis I, it always means that image satisfies the above relaxed conditions.

6.3.3. The Extraction of Global Statistics

When a statistical heuristic search algorithm is used to solve an optimization problem, by means of finding the minimum (or maximum) of its objective function, the ‘mean’ of the statistics is used generally. However, the optimal solution having the minimal (or maximal) objective function does not necessarily fall on the subset with the minimal (or maximal) mean objective function. Therefore, the solution obtained by the method is not necessarily a real optimal solution.
In order to overcome the defect, we will introduce one of the better ways below, the MAX statistic.
1. The Sequential Statistic
We introduce a new sequential statistic and its properties as follows (Kolmogorov, 1950).
Assume that image is a sub-sample with n elements from a population, and their values are image. Based on ascending order by size, we have image. If image have values image then define image as image. image is called a set of sequential statistics of image.

Lemma 6.3

Assume that population image has distributed density f(x). If ( image) is a simple random sample of image and image is its sequential statistic, then its joint distributed density function is

image

Let X be the maximal statistic of the sub-sample with size n. X has a distributed density function below

image

where image. From Lemma 6.3, we have

image

Definition 6.4

Under the above notions, let:

image

image is called the empirically distributed function of image.

Lemma 6.4

Assume that image is a simple random sub-sample from a population that has distributed function image. image is its empirically distributed function. Then for a fixed x, ∞<x<∞, we have

image

When n→∞, the distributed function image approaches to N(0,1).
2. MAX Statistical Test
Definition 6.5
The MAX1 test with parameter image is defined as follows.
Give image and image, X and Y are two random variables. Their observations have upper bounds. Let n be sample size. image and image are their maximal statistics respectively, when using sequential test.
The orders of observations of X and Y are image and image, respectively. Assuming that image, then image. Let d=k/n.

image(6.13)

image and it has the same order of image.
If image, stop and the algorithm fails.
Otherwise,

image(6.14)

we may conclude that the maximum of X is greater than the maximum of Y.
If n<N′, the observation continues.

Definition 6.6

The MAX2 test with parameter image is defined as follows.
In the definition of MAX1 test, the statement ‘If image, stop and the algorithm fails’ is replaced by ‘If image, when image we may conclude that the maximum of X(Y) is greater than the maximum of Y(X)’, then we have MAX2 test.

Definition 6.7

If in the i-level search of SA, the MAX1 (or MAX2) with parameter image is used as statistical inference method, then the corresponding SA search is called SA(MAX1) (or SA(MAX2)) search with significant level image (image) and precision image.
3. The Precision and Complexity of MAX Algorithms
Lemma 6.5 (Kolmogorov Theorem)
image is a continuous distributed function. Let image. Then we have image.

Lemma 6.6

Assume that image is the empirically distributed function of image and image is continuous. Given image, then when image, we have

image

Proof

In Lemma 6.5, we use limit distribution to approximate distribution function image. Thus, we have

image

When image, we have image, i.e. image.

Proposition 6.1

X and Y are two bounded random variables and their continuously distributed functions are image and image, respectively. Their maximums are image and image respectively, where image. Let image. Assume that image. Given image, let

image

Thus, if image and image, then we can judge that the maximum of X(Y) is greater than the maximum of Y(X) with probability image.

Proof

Since image is continuous and image. Assume that the corresponding orders of observations of X and Y are image and image, where image, then we have image. Let image and image.
When image, we have

image(6.15)

When image is small enough, we have image. Thus, when image by letting image and image, from Lemma 6.6, we have

image(6.16)

When image holds, from Lemma 6.2 and Formula (6.15), we have

image(6.17)

Thus, when image from Formulas (6.16) and (6.17), the correct rate of the judgment is image, and the computational complexity is image.
In fact, the value of image is not known in advance, so in MAX1 test we replace image by image. This will produce some error. In order to guarantee image, we may replace image in Formula (6.13) by the following value

image(6.18)

Corollary 6.5

Under the assumption of Proposition 6.1, the correct rate of judgment by MAX1 test is image, and its complexity is image.
4. The Applications of SA(MAX) Algorithms
The SA search based on statistical inference method MAX is called SA(MAX) algorithm. In the section, we will use SA(MAX) to find the maximum of a function.
image is a bounded function on D, where D is a subset of an n-dimensional Euclidean space. So it’s needed to transform the maximum finding problem of functions into that of SA(MAX) search.
Assume that image is a measurable function, where D is a measurable set in an n-dimensional space. The measure of D is image, a finite number. Assume that image.
Regarding image as a measure space, define a random variable image from image as follows.

image

When using SA(MAX) algorithm to find the maximum of functions, the given requirement is the following.
(1) image, where c is a given positive number.
(2) When D is finite, image, generally let c=1.
Now, we consider the precision and complexity of SA(MAX) algorithm in the finding of the maximum of functions.

Theorem 6.8

image is a measurable function on D and image. Assume that image is continuous. SA(MAX) algorithm is used to find the maximum of functions. Given image and image, the MAX1 is used as an inference method, and in the i-th level the parameter used is image. When the algorithm succeeds, then the probability of image is greater than image, and the complexity is image, where image is the maximum of image, image and L is the total number of levels that SA reaches.

Proof

If the algorithm succeeds, i.e. the judgment of MAX1 succeeds at every time, then the correct probability of judgment at every time is greater than image. The total correct probability is greater than image.
From Proposition 6.1, we have:

image

Thus,

image

The algorithm is performed through L levels. The total complexity of SA(MAX) is image.

Theorem 6.9

Under the same condition as Theorem 6.8, given image and image, SA(MAX2) algorithm with parameter image at each level is used to find the maximum of function f. When algorithm terminates, we have the maximum image; the probability of image, or image, is greater than image, and the order of complexity is image.

Proof

Similar to Theorem 6.8, we only need to prove the image case. Assume that when the search reaches the i-th level image, and we find the maximum image. From Proposition 6.1, we have that the probability of image is greater than image. On the other hand, image is a monotonically increasing function with respect to image. Let image. We further have that the probability of image is greater than image.
Since the complexity at each level is image. The complexity for L levels search is image at most.
From Theorem 6.9, it’s known that different from SA(MAX1) algorithm SA(MAX2) never fails, but the conclusion made by SA(MAX2) is weaker than SA(MAX1). Secondly, since constants image and image can be arbitrarily small, the maximum can be found by SA(MAX2) with arbitrarily credibility and precision.
5. Examples
For comparison, SA(MAX) and GA (Genetic Algorithm) algorithms are used to solve the same problem (Zhang and Zhang, 1997a, 1997b).

Example 6.1

The goal is to find the maximum of function image (Fig. 6.2).
The relation between the results obtained by the two algorithms and N is shown in Fig. 6.3, where N is the total times of calculating function image. The ‘black dots’ show the results obtained by SA(MAX) algorithm when N=64, 128,…. We can see that the maximum of image obtained by the algorithm in each case is the real maximal value. The ‘white dots’ show the results obtained by GA algorithm when N=100, 200,…. We can see that the best result obtained by the algorithm is the value of image at x=0.3333216.
image
Figure 6.2 image
image
Figure 6.3 The Relation between the Maximum of image Obtained and N

Example 6.2

The goal is to find the maximum of function image (Fig. 6.4).
The ‘black dots’ and ‘white dots’ in Fig. 6.5 show the results obtained by SA(MAX) and GA algorithms, respectively. The maximum obtained by SA(MAX) is 0.1481475 (x=0.6675003). In GA, the iteration is implemented for 20 generations and each generation has 100 individuals. The maximum obtained is 0.1479531 (x=0.6624222).
image
Figure 6.4 image
image
Figure 6.5 The Relation between the Maximum of image Obtained and N

Example 6.3

The goal is to find the maximum of image (Fig. 6.6).
The results are shown in Fig. 6.7. We can see that SA(MAX) finds two maximums of image, i.e., 2231.01075, x=0.1749125 and 2231.01075, x=0.8250875, but GA finds only one maximum of image, i.e., 2052.376 , x=0.8246953.
image
Figure 6.6 image
image
Figure 6.7 The Relation between the Maximum of image Obtained and N
From the above results, it’s known that the performances of SA(MAX) are better than that of GA.

6.3.4. SA Algorithms

In statistical heuristic search, the statistic inference method is introduced to heuristic search as a global judgment for subsets so that the search efficiency is improved. Under a given significant level, if a search direction is accepted by SA, the probabilityimage for finding the goal can be ensured. When a wrong direction is chosen, SA will terminate with the polynomial mean complexity at most. By using successively SA search the goal can be found with probability one and with polynomial complexity. In fact, in the new round search, a new significant level or a new statistic inference method may be used, based on the results obtained in the previous round. So a variety of SA algorithms can be constructed.
Now, we summarize the SA procedure as follows.
If a statistic inference method S and a heuristic search algorithm A are given then we have a SA algorithm.
(1) Set up a list OPEN of nodes. Expand root node image, we have m sub-nodes, i.e., m image-subtrees or m equivalence classes in some quotient space. Put them into m sub-lists of OPEN, each corresponds to one image-subtree. Set up closed list CLOSED and waiting list WAIT. Initially, they are empty. Set up a depth index i and initially i=1.
(2) LOOP. If OPEN is empty, go to (11).
(3) From each sub-list of OPEN choose a node and remove it from OPEN to CLOSED. And call it node n.
(4) If n is a goal, success.
(5) Expand node n, we have m sub-nodes and put them into OPEN. Establish a pointer from each sub-node to node n. Reorder nodes in sub-lists by the values of their statistics. Perform statistical inference S on each sub-list, i.e. sub-tree.
(6) If some image-subtree T is accepted. Remove the rest of image-subtrees accept T from OPEN to WAIT, go to (10) .
(7) If no image-subtree is rejected, go to LOOP.
(8) Remove the rejected image-subtrees from OPEN to WAIT.
(9) If there is more than one image-subtree in OPEN, go to LOOP.
(10) Index i is increased by 1 image. Repartition image-subtree on OPEN into its sub-subtrees and reorder the sub-aubtrees based on their statistics. Go to LOOP.
(11) If WAIT is empty, fail.
(12) Remove all nodes in WAIT to OPEN, let image and go to (10)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201