6.2. The Computational Complexity

6.2.1. SPA Algorithms

Definition 6.1

Assume that in SA search SPRT is used as statistical inference method S, and for judging m image subtrees the significance level image, image, is chosen. Under the above conditions, the SA search is called SPA1 with significant level image, or simply SPA1 algorithm.
Variable image obeys the image distribution. Construct a simple hypothesis:

image

In Formulas (6.4) and (6.5), image, image and image replacing image, image and image, respectively, then we have the following lemma.

Lemma 6.1

For judging m image subtrees, the asymptotically mean complexity of SPA1 is

image

where image

Proof

From Formula (6.5), when image is image, the mean of stopping variable R can be represented by the following expression approximately.

image

image

Therefore, in order to judge m image subtrees with significant level image, the asymptotically mean complexity is

image

where image.

Theorem 6.2

image and image are given. Let image. image has a normal distribution image. Using SPA1 algorithm under level image, the mean complexity of finding a solution path in G is image with probability image. b is the error probability, where image, the Type I error image and Type II error image.

Proof

From Lemma 6.1, the mean complexity for judging m i-subtrees is image. Thus, using SPA1 algorithm, the mean complexity of finding a goal is:

image

From the Sterling formula image, we obtain:

image

For judging m 1-subtrees, image . In general, for judging m i-subtrees, image. The total error probability of Type I is:

image

Similarly, image.
It is noted that we use the minimum of mean statistics among all i-subtrees to estimate image, and the average of mean statistics of the rest (except the minimal one) of i-subtrees to estimate image. This will produce new errors. We will construct new SA algorithms to overcome the defect.

Corollary 6.2

Assume that image has a distribution function image. Let

image

The mean complexity of SPA1 algorithm is O(N ln N).
The theorem and corollary show that SPA1 algorithm can overcome the exponential explosion of complexity only in average sense. This means that in some cases, the statistical inference will still encounter a huge computational complexity. In order to overcome the shortage, we will discuss the revised version SPA2 of SPA1.

Definition 6.2

In SA, SPRT is performed over m i-subtrees using a level image and a given threshold image, image, where image. If the sample size exceeds image then the hypothesis image is rejected. We define the SA as SPA2 under level image, and denoted by SPA2 for short.
It’s noted that parameters image and image are generally unknown. We may use the following formula to estimate image.

image

where image.

Theorem 6.3

image and image are given. Let image. image has a normal distribution. Using SPA2 under level image, the upper bound of the complexity of finding a solution path in G is image with probability image. b is the error probability, where image, the Type I error image and Type II error image.

Proof

In i-subtrees, the threshold is image. So the upper bound of the total complexity is image.
Thus, the upper bound of complexity of SPA2 image.
Now, we consider the error probability.
If in the searching process the sample size has never surpassed the threshold, from Formulas (6.5) and (6.6), it is noted that judging m i-subtrees, the error probability of Type I image. So the total error probability is:

image

In some searching stage, if the sample size surpasses the threshold and H0 is rejected, the error probability does not change if the subtrees being deleted do not contain the goal, and the error probability will increase if the subtrees being deleted contain the goal. We estimate the incremental error probability as follows.
From Property 6.6 in Section 6.1.2, the distribution of the stopping variable R of SPRT is

image(6.10)

Assume image. In i-subtrees their level is image and the mean of R is image, where image. From Formula (6.10), we have image.
Thus, image, image is the value of c corresponding to i-subtrees.
The probability that the sample size surpasses the threshold image is

image

Namely, when the sample size surpasses the threshold, the rejection of H0 will cause the new error probabilityimage. The totally incremental error probability of Type I is

image

Finally, the total error probability of Type I is image.
Similarly, Type II error image.
Certainly, when the sample size surpasses the threshold, the rejection of H0 does not change the error probability of Type II.

Corollary 6.3

Assume that image has a distribution function image and satisfies

image

The upper bound of the complexity of SPA2 is image.
SPA algorithms constructed have the following shortcoming. The distribution function image should be known beforehand. Generally this is unpractical so it has to be assumed as a normal distribution image sometime. Even so, its parameters image and image are still unknown generally. Although their values can be estimated from some data it will cause new errors certainly. We will use ASM statistical inference method to overcome the shortcoming.

6.2.2. SAA Algorithms

image is given. Let image and image. Assume that image. For any node image, there are m successors image. image is a subtree rooted at image. For any subtree image, let image. Apply ASM statistical inference to SA search, we have confidence intervals image.
Assume that image is the leftmost interval among m confidence intervals along a number line. If image and image are disjoint, image is accepted; otherwise all subtrees are rejected and the algorithm fails.
In fact, ASM is a sequential testing method. First, letting R=1 and using Formula (6.6) as hypothesis testing, if the formula is satisfied, then we have the corresponding interval image, otherwise the sampling continues.

Definition 6.3

In SA search, if the ASM is used as statistical inference method S, and when testing i-subtrees image, then the SA search is called SAA algorithm with level image, or simply SAA algorithm.

Theorem 6.4

Assume that image satisfies Hypothesis I and has a finite forth moment. Given image, letting image, image, then SAA algorithm with level image can find the goal with probabilityimage, and the order of its mean complexity is image.

Proof

Since for i-subtrees image, from Formula (6.8) we have that the order of mean complexity of ASM for testing i-subtrees is

image

Thus, the order of total mean complexity is

image

For judging i-subtrees, the error probability of Type I is image and Type II isimage. The total error probability for judging i-subtrees is image.
Thus, the total error probability of SAA algorithm is

image

SAA is superior to SPA in that it's no need to know what distribution function image is in advance, and the calculation of statistics is quite simple.
In general, image is unknown. So it’s difficulty to choose a proper width of confidence interval in SAA based on image. In practice, approximate image can be chosen as a rule of thumb. Sometime, SAA may work in the following way. Given an arbitrary constant δ1, if image intersects with other intervals image, new constant image is tried, until a proper value of δ is got.
The revised SAA is as follows.
Let image be a sequence of strictly and monotonically decreasing positive numbers, e.g., image. In the i-th turn search, let the width of confidence interval be image. Since image so long as image, c is a constant, there always exists image such that if image then image. So we can overcome the difficulty brought about by the unknown c; certainly the computational complexity of SAA will increase image times, i.e., the order of complexity becomes image.
If we choose a lower bound image of image, when image image no longer decreases, i.e., let image. Then, the order of mean complexity of SAA will not increase.
Under the same significance level, the mean image of stopping variable of SPRT is minimal, i.e., the mean complexity of SA constructed by SPRT is minimal. But the distribution function image should be known beforehand in the SPA search, i.e., under a more rigor condition.

6.2.3. Different Kinds of SA

In Section 6.2.1 and 6.2.2 we construct SA by using SPRT and ASM as statistical inference methods. Since the two methods are sequential and fully similar to search, it’s easy to understand that the introducing the methods to the benefit of search. If there is any other kind of statistical inference method, e.g., non-sequential, can this get the same effect? We'll discuss below.
Assume that image and image are two i.i.d. random variables having finite fourth moments. Their distribution functions are image are image, respectively. Let simple hypotheses be image, image. From statistics, it's known that if a statistical inference method S satisfies the following properties, the statistical decision can be made in a polynomial time.
The properties are
(1) Given significance level image, the mean of the stopping variable R satisfies image, where E(R) is the mean of R and image.
(2) image, S terminates with probability one.
(3) When n>0, image, where c>0 is a constant.
As we know, both SPRT and ASM satisfy the above properties. image is proportional to image which underlies the complexity reduction of the SA search algorithms by using SPRT and ASM as statistical inference methods. If the variances image and image of image and image are known, imagetest and t-test may be adopted. For example, using imagetest to determine the validity of image, that is, whether the mean of random variable X is equal to that of Y, we may use the following composite statistic.

image

where image and image are the means of image and image, respectively, l and n are the sample sizes of image and image, respectively.
When search reaches node p it has m sub-nodes image. Let image be a sub-tree rooted at image. The means of statistics image of sub-trees image are assumed to be image, respectively.
Now, we use imagetesting method to judge whether the means of image and image are equal. Significance level image and sample size image are given, where image and image are the numbers of expanded nodes of image and image, respectively. In the testing process, sample size image is gradually increased, for example, image.
From image, we have image, where image is a standard normal distribution function.
If image then image are deleted.
If image then the total sample size image is increased by 2, i.e., sub-trees image and image are expanded by search algorithm A. imagetest continues.
It’s noted that in the imagetest, when calculating the composite statistic image we replace image by image and image by image.
In order to terminate the search in time, we choose a threshold image for the i-th level nodes. If the sample sizeimage, then sub-tree image is accepted.
It can be proved that the above algorithm has the same order of mean complexity as that of SPA algorithm.
If the variances of image and image are finite but unknown, the sequential t-test constructed from Cox theorem can be used.
By combining different kinds of statistical inference methods and heuristic searches and successively using these searches, a variety of SA algorithms can be obtained.
If the global statistics extracted from each subtree in G satisfy Hypothesis I, then the SA search constructed from S has the following properties which are our main conclusions about SA.
(1) The mean complexity of the SA is image, N is the depth at which the goal is located.
(2) Given image, using the SA search for the first time the goal can be found with probability α.
(3) Based on the property image, given a proper threshold, then the upper bound of the complexity of each time SA search is image.
(4) In some SA search stage, a wrong search direction might be chosen but the search can terminate in a polynomial mean time, due to the polynomial judgment time of the statistical inference. Consequently, by applying the SA search successively, the goal can also be found with polynomial mean complexity.

6.2.4. The Successive Algorithms

In a word, under a given significance level image, the SPA (SPA1 or SPA2) search can avoid the exponential explosion and results in the polynomial complexity image (or image). Unfortunately, the solution path may mistakenly be pruned off or a wrong path may be accepted, i.e., the error probability is image. In other words, in light of the SPA search, a real goal can only be found with probability (1-b).
The mean of stopping variable R of the statistical inference is image. No matter the i-subtree containing goal can be found or not, the search stops with mean computation image certainly. Thus, the search stops with mean complexity image in the first round search. Imagine that if the goal node cannot be found in the first round search, SA search is applied to the remaining part of G once again. Thus, the probability of finding a real goal is increased by image, or error probability is decreased to image,…, the repeated usage of SA continues until the goal is found. We call this procedure successive SA, or SA for short. How about its computational complexity? Does it still remain the polynomial time?
Using the SA search, in the first time the probability of finding the goal is image. Its complexity is image. Thus, the equivalent complexity is

image

In general, the probability that the goal is found by SA search just in the i-th time is image, and the complexity is image. The equivalent complexity is

image

The total mean complexity of SA is

image

Since image, we have

image

We have the following theorem.

Theorem 6.5

In a uniform m-ary tree, using the successive SA search, a goal can be found with probability one, and the order of its mean complexity remains image.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.68