Direct Statistical Estimation of GA Landscape Properties

Colin R. Reeves [email protected]    School of Mathematical and Information Sciences Coventry University UK

Abstract

A variety of predictive measures have been suggested for assessing how difficult it might be to solve a particular problem instance using a particular algorithm. However, most of these measures have been indirect. For neighbourhood search methods, one direct indicator of problem difficulty is the number of local optima that exist in the problem landscape. In the case of evolutionary algorithms, the concept of a local optimum is not easy to define, but it is known that GA populations, for example, commonly converge to fixed points or ‘attractors’. Whether we speak of local optima or attractors, however, estimating the number of them is not an easy problem. In this paper some probability distributions are derived for quantities that can be measured in repeated application of heuristic search methods. It is then shown how this can be used to provide direct statistical estimates of the number of attractors using maximum likelihood methods. We discuss practical questions of numerical estimation, provide some illustrations of how the method works in the case of a GA, and discuss some implications of the assumptions made in deriving the estimates.

1 INTRODUCTION

Many attempts have been made in the last 10 years to answer a question that recurs in using evolutionary algorithms such as GAs to solve optimization problems: which algorithm is best suited to optimize this particular function? It is a natural corollary of the ‘No-Free-Lunch’ (NFL) theorem (Wolpert and Macready, 1997) that there will be differences among algorithms in any specific case, but finding ways of distinguishing between algorithms seems to be problematic.

Some have focused on measures of problem difficulty that we can compute by sampling the Universe of all potential solutions (Davidor, 1990; Davidor, 1991; Aizawa, 1997). These use measures that rely simply on the function itself, which is of uncertain utility if we wish to compare algorithms. The underlying factor in these approaches is the Walsh decomposition of the function, which is captured in a single value known as the ‘epistasis variance’. Reeves provides a recent survey and critique (Reeves, 1999a) of some of these ideas, and shows that there are considerable practical and interpretative problems in using such measures.

Others have tried to take account heuristically of the way that the proposed algorithm works (Jones and Forrest, 1995). More recently Stadler and Wagner have put in place a more rigorous mathematical foundation (Stadler and Wagner, 1997) whereby the ‘amplitude spectrum’ of the landscape induced by a particular operator can be estimated. Interestingly, this measure again depends in a fundamental way on the Walsh decomposition.

However, all of these methods can really only compute a proxy for the properties that make a problem instance difficult. These include the number of local optima, their depth and width, and the size of their basins of attraction. Such concepts can be defined quite straightforwardly in the case of a deterministic neighbourhood search (NS). However, the concept of a local optimum is rather harder to pin down in the case of GAs, since the landscape in any specific generation depends not merely on the operators used (as is solely the case for NS), but also on the nature of the current population, and on the particular pairs of parents selected for mating. Nevertheless, the work of (Vose, 1999) has shown that GAs commonly converge towards fixed points in the sense of stable populations of strings, and that these tend to be near the corner of the population simplex—i.e., where the population is a set of identical strings.

In this paper we shall be interested in estimating the number of these fixed points or ‘attractors’ for a given instance. An important point to realize is that while at least one of the local optima in NS is sure to be the global optimum, we cannot of course be certain that any attractor of a GA is actually the global optimum. Nevertheless, many successful practical applications of GAs are fairly convincing evidence that the set of such attractors has a high probability of including the global optima or at least other very high-quality points.

Clearly, while not the only characteristic of interest, and not the only determinant of solution quality, the number of attractors is one property of the ‘landscape’ that is likely to influence how difficult the problem is. Methods based on Walsh decomposition cannot be certain of measuring this well. As is shown by (Reeves, 2000), it is possible to generate large numbers of functions that have identical epistasis variance measures, or amplitude spectra, and yet the number of NS local optima can vary widely. To have a direct measure of the number of attractors would therefore be very useful.

In this paper we describe a methodology for computing an estimate of the number of attractors using data obtained from repeated sampling. Although we shall couch the argument in terms of attractors for an evolutionary algorithm, it also obviously applies mutatis mutandis to any search algorithm. For example, in the case of deterministic neighbourhood search, for ‘attractors’ we simply read ‘local optima’; for simulated annealing (say) we might be interested in the number of ‘high-quality’ local optima, since it would be assumed that the chance of being trapped in low-quality ones is reduced almost to zero.

Previous work in this direction has been hinted at in (Schaffer et al., 1989), and the technique used there has been reported recently by (Caruana and Mullins, 1999). Ideas similar to those which we consider in the current paper have also been presented by (Kallel and Garnier, 2000), but with a different focus.

In the next section of this paper, the probability model that we shall use is introduced. Questions of estimation, approximations and numerical methods are considered, along with some details on the statistical properties of the estimators derived. A parameter of importance is κ, the proportion of distinct attractors found in repeated sampling. Sections 3 and 4 consider specifically the cases where κ is large (close to 1), and small (below 0.5) respectively. Finally, in section 5 some applications are presented in the context of a GA solving some relatively simple optimization problems.

2 A PROBABILITY MODEL

Suppose there are v attractors, and that when using a GA from an initial random population the chance of encountering each attractor is the same. This may be interpreted as an assumption that all attractors have the same size basin of attraction—a first approximation that is probably not valid, but it is a reasonable starting point for the development of a model. We shall return to this point later.

Suppose the GA is run to convergence—i.e., an attractor is discovered. The process is then repeated many times. When there are many attractors, many repetitions may take place and very few attractors are found more than once. On the other hand, if the number of attractors is few, we will soon encounter previously discovered attractors. We can devise a probability model that describes this behaviour as follows:

Suppose that the GA is repeated r times, and that the trials are independent. The random variable K describes the number of distinct attractors that are found.

Proposition 1

The probability distribution of K is given by

P[K=k]=ν(ν1)(νk+1)S(r,k)νr1kmin(r,ν),orP[K=k]=ν!(νk)!S(r,k)νr,

si1_e

where S(r,k) is the Stirling number of the second kind.

The proof of this proposition follows a straightforward application of the inclusionexclusion principle of combinatorics—see (Johnson and Kotz, 1969), who call this the Arfwedson distribution.

Johnson and Kotz further give the mean as

E[K]=ν{1(11/ν)r}

si2_e  (1)

However, in our position, v is an unknown parameter which we wish to estimate. The well-known principle of maximum likelihood (Beaumont, 1980) provides a well-tested pathway to obtain an estimate ˆνsi3_e. Maximum likelihood (ML) estimators have certain useful properties, such as asymptotic consistency, that make them generally better to use than simple moment estimators, for example. The first step is to write the log-likelihood as

l(ν)=logP=logν!log(νk)!+logS(r,k)rlogν,

si4_e

and then find its maximum by solving

Δl=l(ν)l(νl)=0

si5_e

since v is a discrete quantity. This is straightforward in principle, since ∆log v! = log v, and we find that the equation reduces to

rlog(l1/ν)log(lk/ν)=0

si6_e  (2)

which can be solved by numerical methods. In fact this equation can also be derived from the relationship for the mean above. In other words, the ML estimator and the moment estimator here coincide.

2.1 Approximations

In the case of large v. we can expand the logarithms in Eq. 2 in powers of 1 /v, truncate and equate the two sides of the equation in order to provide approximate values for ˆvsi7_e. Suppressing—temporarily—the ^ symbols for clarity, we obtain the equation

kν+k22ν2+k33ν3+=rν+r2ν2+r3ν3+

si8_e

Truncating after 2 terms reduces to

kν+k22ν2=rν+r2ν2

si9_e

which implies that

2ν(rk)=k2rorˆvk2r2(rk)

si10_e

Adding a 3rd term gives

6ν2k+3νk2+2k3=6ν2r+3νr+2r

si11_e

collecting terms

6ν2(rk)+ν(r3k2)+2(rk3)=0

si12_e

so that finally

ˆv(3k2r)±(r3k2)2+48(rk)(k3r)12(rk)

si13_e  (3)

2.2 Bias, variance and confidence intervals

In fact the procedure we have described is formally identical to the so-called ‘Schnabel census’ used by ecologists to estimate the number of animals in a closed population (Seber, 1982). Thus results from that field—first obtained by (Darroch, 1958)—may be adapted for the case we are considering. In (Darroch, 1958) it is shown that the ML estimate has a bias given by

bθ22(eθ1θ)2,whereθ=r/ν

si14_e

In most cases of interest here, θ will be small, and ˆνsi15_e will overestimate v by about 1, which is hardly important when we expect v to be several orders of magnitude higher! The variance of ˆνsi16_e is shown to be

V[ˆν]=ν(eθ1θ).

si17_e

In order to obtain numerical estimates for these quantities, the value v must be replaced by ˆνsi18_e —the estimate obtained as in Eq. 2. In (Darroch, 1958) an upper bound is derived for ˆνsi19_e:

ν*=r/θ*

si20_e  (4)

where θ* is the solution of

1θθ=kr.

si21_e

In many cases, this is also a good approximation to ˆνsi22_e.

The expression for variance could be used directly to find a confidence interval for ˆνsi23_e. We can estimate its first two moments, as above, but the distribution of ˆνsi24_e is hardly Normal, and Darroch recommends that confidence limits should be found indirectly from the distribution of K, whose distribution is much closer to Normal. This entails solving the equations

k±zα/2s(ν)=m(ν)

si25_e  (5)

where m(v) is Eq. 1, considered as a function of υ, zα/2 is the appropriate Normal distribution ordinate for a 100(1 – α)% confidence interval, and

s(ν)=(νm(ν))(m(ν)m(v1)).

si26_e

Of course, a closed form solution is impossible, and the equations have to be solved numerically.

2.3 Numerical methods

Eq. 2 can be rewritten as

g(v)=rlog(11/ν)log(1k/ν)=0

si27_e

and then an iterative scheme such as binary search or Newton-Raphson can be used to find a numerical solution. Approximations such as those given above can be used to provide an initial estimate of v for this iteration. In experimenting with several approaches, the author found the most reliable method was to use the upper bound of Eq. 4 for the initial estimate, with the discrete version of Newton-Raphson

ν(ι+1)=ν(ι)g(ν(i))g(ν(ι))g(ν(ι)1)

si28_e

as suggested by (Robson and Regier, 1968). Tables 1 (respectively 2) display point (respectively 95% confidence interval) estimates of v for a representative range of values of κ = k/r and r.

Table 1

Estimates of v (expected values) at some values of κ = k/r and r.

rκ = k/r
0.50.60.70.80.9
1006288130214462
200125177262429928
50031344365610752326
1000627887131221524656
200012551775262643079317
50003137443965661076923300
1000062758878131322154046604
200001255017757262654308293212
50000313754439465665107707233035
1000006275088789131330215417466075
200000125500177578262661430835932154
50000031375044394665665510770912330392

t0010

Regression using a varying-coefficients model for the expected values results in a fitted equation

ˆν=0.0277r(324)κ

si29_e

which shows a (not unexpected) linear dependence on r, but an exponential relationship with κ. This demonstrates that as κ increases (i.e., as the proportion of distinct attractors in the sample increases) the expected number of attractors in the search space increases dramatically. The results for confidence intervals in Table 2 show that there will be considerable uncertainty for small samples (r < 10000, say. at least for relatively large κ), but of course, with larger samples, the estimate becomes more precise. The other noteworthy feature is the asymmetric nature of the intervals, reflecting a highly-skewed distribution for ˆνsi30_e. Finally, for large values of κ (> 0.95, say), it is not possible to use Eq. 5 at all, since the implied value of k when the + sign of Eq. 5 is taken exceeds r.

Table 2

95% confidence intervals for v at some values of k/r and r. For example, if κ = 0.7 and r = 500, we are 95% confident that u lies between 581 and 745.

rκ = k/r
0.50.60.70.80.9
10053-7372-110101-176153-331288-1040
200112-140153-206217-322336-576653-1539
500292-337404-488581-745917-12861842-3116
1000596-660830-9501204-14361920-24373930-5680
20001210-13011693-18622470-27973969-46968244-10684
50003066-32104308-45756315-683110222-1136821532-25357
100006174-63778692-907012775-1350420756-2237544051-49445
2000012407-1269417493-1802725758-2678841964-4425289547-97165
5000031148-3160343975-4481964857-66487105925-109542227163-239197
10000062429-63072S8195-89388130185-132490212885-218000457712-474726
200000125046-125956176737-178425261039-264299427244-434478920270-944328
500000313031-314470442614-445283654086-6592401071398-10828352311519-2349554

t0015

On the other hand, in cases where r is large relative to v, the ratio k/r may be very small, and care is needed in attempting to solve Eq. 2 numerically. In such cases, the best estimate of v is often simply k or k + 1. Likewise, the approximation in Eq. 3 is not suitable for small values of κ—in fact, simple numerical calculations show that unless κ > 0.55, Eq. 3 yields a value for v that is actually smaller than k, which is clearly impossible.

In practice this is likely to be irrelevant for most optimization problems—the interesting cases are those where we expect that κ > 0.7, say, and v will be much larger than r. Nevertheless, we now consider both issues—the cases of large and small κ.

3 FIRST REPETITION WAITING TIME DISTRIBUTION

As remarked above, in many cases v will be a very large number, as will be k—so it might be that every one of the attractors found is distinct unless r itself is quite large. Of course, if we find that k = r there will be no solution to Eq. 2. and all that we can say for certain is that there are at least k attractors. This outcome would be most unlikely, of course. One possibility is to keep sampling the set of all attractors until the first re-occurrence of a previously encountered attractor. Suppose this occurs on iteration T. We will call this the waiting time to the first repetition.

Proposition 2

The probability distribution of T is given by

P[T=t]=ν!(t1)(νt+1)!vt2tν+1orP[T=t]=ν(ν1)(νt+2)(t1)νt

si31_e

The proof of this proposition again follows a straightforward combinatorial argument: the number of ways of choosing (t – 1) distinct objects from v (in any order) is (νt1)(t1)!si32_e, while the total number of possible arrangements of v objects in (t – 1) trials is vt−1. The ratio of these quantities gives the probability that (t – 1) distinct attractors are encountered in (t – 1) trials. The probability that the tth trial encounters one of those attractors previously seen is (t – l)/v. Hence the result.

The mean of T is

E[T]=2ν+ν+1t=3t(t1)ν!(νt+1)!νt

si33_e

which seems to have no obvious closed-form solution. However, this and other statistics can in principle be computed for a given value of N using the obvious recurrence formulae

P[T=t+1]=tt1νt+1νP[T=t]

si34_e

(with P[T = 2] = 1/v) to calculate the probabilities. For large values of v this may take some time, and it is quicker to evaluate the median. Figure 1 shows the results of computations of median waiting time, and associated upper and lower quartiles, for a range of values of v.

f06-01-9781558607347
Figure 1 Waiting times for first repetition of an attractor for a range of values of v, the total number of attractors.

From this diagram it appears that the average waiting time is increasing as something like v0.5. A linear regression analysis confirmed that this is a plausible model, the actual relationship estimated over the range of values in the diagram being

tmed=0.673+1.25ν

si35_e

with a coefficient of determination (R2) of almost 100%.

3.1 Maximum likelihood estimation

For a single determination T = t the log-likelihood is

l(ν)=logν!+log(t1)log(νt+1)!tlogν

si36_e  (6)

and the first difference is

Δl=logνlog(νt+1)t(logνlog(ν1))

si37_e

which gives an equation that is clearly a special case of Eq. 2

log(1t1ν)=tlog(11/ν)

si38_e

Thus the numerical methods described in section 2.3 can also be used for this case. If we carry out multiple runs of the algorithm, so that we have m determinations (t1,…,tm), the generalization of the above is simply

mi=1log(1ti1ν)=log(11/ν)mi=1ti.

si39_e

This can be simplified to

l(ν)=mi=1(νk=νti+2logktilogν)+termsnotinvolvingν

si40_e  (7)

From this a direct line search algorithm can be devised for finding the maximum of l(v) and hence the maximum likelihood estimator of v. (Note that further algorithmic simplifications can be made by realising that many of the terms in the sum over k are repeated for all values of i—in fact all of them after k = vtmin + 1 where tmin is the smallest value among the tt.) The author found that a golden section search based on this approach worked almost as well as the Newton-Raphson method used in section 2.3.

We can also find what appears to be a fairly good approximation, using natural logarithms in Eq. 6, and Stirling’s approximation

lnν!ln2π2+(ν+12)lnνν

si41_e

Setting dl(ν)dν=0si42_e gives the following, after some algebra:

ν=x2(1x)+tln(1x)

si43_e  (8)

where x = (t – 1)/v. This can easily be iterated to a solution. (It is very easily handled in a spreadsheet, for example.) Again, if we have multiple determinations of T, the generalization is easy to obtain:

ν=mt=1[xt2(1xt)+ti]mi=1ln(1xi).

si44_e

Note that even when we exhaust our resources, and still we have k = r, we could use the techniques described above on the assumption that iteration r + 1 will provide the first re-occurrence. In this way, we could calculate a conservative estimate of v.

3.2 Further Approximations

Since we can expect x 1less 1 in most interesting instances of optimization problems, it is possible to expand the right-hand side of Eq. 8 in powers of x, truncate, and solve the resulting equation for v. This produces

ν(x+x22+x33+)=x2+x22+x32++t

si45_e

Truncating after the 1st term clearly fails to provide a sensible estimate, but truncating after the second-order term gives the following:

ν(x+x22)=x2+x22+t.

si46_e

On substituting for x, and after some algebra, we find

2ν2(t1)(t2)ν+(t1)2=0,

si47_e

and solving the quadratic for v (and taking the positive square root) leads to

ν*(t1)(t2)2,

si48_e

which confirms the empirical observation in Fig. 1. In the case of multiple runs the quadratic is

av2+bv+c=0

si49_e

with coefficients

a=2m,b=(U23U1+2m),c=U22U1+m

si50_e

where

Uk=mt=1(ti)k.

si51_e

4 ESTIMATING THE NUMBER OF SAMPLES

We might expect that most of the time we will be faced with problems having large values of v (and thus of κ). However, some problem instances may have relatively small values, in which case we have a different but still interesting question: how long should we sample before we can reasonably be sure that all attractors have been found? We can model this situation as follows.

Suppose there are v attractors, and that we have already found k of them. The waiting time Wk for the (k + l)st attractor will have a geometric distribution

f(t|k)=νkν(kν)t1

si52_e

with expectation v/(vk) and variance kv/(vk)2. The overall waiting time will be

W=1+ν1k=1Wk

si53_e

and since these waiting times are independent, we can approximate the expectation and variance as follows:

E[W]=1+ν1k=1ννk=ννk=11kν(lnν+γ)

si54_e

where γ = 0.57721.. is Euler’s constant: and

V[W]=0+ν1k=1kν(νk)2=ν2ν1k=11k2νν1k=11k=ν2ζ(2)ν(lnνγ1/ν)

si55_e

where ζ(s)=1kssi56_e is Riemann’s zeta function. The case s = 2 is a special case, for which ζ(2)=π2/6si57_e, so finally

V[W](νπ)26+1v(lnv+γ)

si58_e

Although both expressions are approximations, the errors are fairly small. The error in using Euler’s constant is already below 1% for v = 15, and while larger and slower to fall, the error in the infinite sum approximation drops below 2.5% for v = 25. Further, the errors are in opposite directions—one is a lower and the other an upper bound, so the net effect is likely to be unimportant.

We can use these estimates together with a Normal approximation to establish with probability a that the waiting time will exceed a given value:

W>ν(lnνγ)+zα(νπ)26+1ν(lnν+γ).

si59_e

where za is the appropriate standard Normal value. Finally, the right-hand side of the above expression can be inverted (numerically) to find a confidence limit on v. Table 3 gives some representative values, obtained very simply by using a spreadsheet equation solver.

Table 3

Table giving upper confidence limits for v at some representative values of r. For example, if we sample 2000 times and find 210 local optima, we can be 99% (but not 99.9%) confident that we have found them all.

No. samples (r)upper limit on v
99%99.9%
1001614
2002926
5006559
1000120109
2000223204
5000511468
10000959884
2000018081673

t0020

If the actual number of distinct local optima found in r samples is no more than the value given by such calculations, we can be confident (to approximately the degree specified) that we have found all the local optima. As already remarked, while the estimates of both mean and variance have small errors, they are unlikely to have a large influence on the degree of confidence. However, while the assumption of Normality is convenient, it is probably not well-founded: there is clearly a lower bound on the value of W, since it cannot be less than v, while the upper tail of the distribution may be very long. Thus, we should perhaps make do with Chebyshev’s inequality

Pr[|WE[W]V[W]|c]1c2

si60_e

which gives a less sharp, but still useful, confidence limit.

5 AN APPLICATION

In principle, this approach could be useful in providing some assurance of the quality of solutions obtained in the course of a heuristic search. Not only does it tell us something (although not everything) about the landscape, but at least tentatively, we now have a statistical way of assessing the amount of search that we need in order to find the global optimum with a reasonable probability.

It also gives us a means of comparing different representations and operators. For example, in the case of a GA, we could estimate the number of attractors resulting from the application of a GA, and compare this with what happens using a deterministic neighbourhood search using a bit flip operator, as a way of measuring the ‘improvement’ resulting from using a GA. (Of course, just looking at the numbers is not the only criterion—the quality of the respective solutions should also be taken into account.)

In (Reeves, 2000) it is shown that it is possible to generate sets of equivalent N K- landscapes—as introduced by (Kauffman, 1993); equivalent in the sense that all have the same epistasis variance, while having widely differing numbers of local optima with respect to a bit flip neighbourhood search. (The use of this operator implies that the underlying the landscape is based on the Hamming metric. Thus we shall call it the Hamming landscape.) However, how these landscapes compare with respect to a GA is not clear. Two NK-functions with N = 10,12 and K = 4 were chosen, representing those furthest apart in terms of numbers of Hamming landscape local optima. These are labelled ‘hard’ and ‘easy’ below. Each problem was solved repeatedly using a steady-state GA having the following parameters: population size 32, linear rank-based selection of parents, crossover was always performed (either one or two point), followed by mutation at a rate of either 0.05 or 0.10 per bit. One new offspring was produced at each iteration, and it replaced one of the worst 50% of the existing population. The run was terminated at convergence to an attractor.

At this point, we should explain what is meant by ‘convergence’. In the sense of (Vose, 1999), an attractor is a population, which does not necessarily coincide with a corner point of the simplex—i.e., it may not consist of multiple copies of a single string. Whenever mutation is involved, there is always the possibility of some variation, so detecting that an attractor (in the Vose sense) has been found is somewhat imprecise. As a reasonable heuristic, in these experiments it was defined as the situation when 90% of the population agreed about the allele at each locus. In order to guard against very long convergence times in the computer experiments, a termination criterion was also included in the code: that the run should be ended after 512 strings (a large fraction of the entire search space for these problems) had been evaluated. As a matter of fact, however, this criterion was never needed for the parameters tested, although with higher mutation rates and a softer selection pressure such a criterion might become necessary.

The numbers of attractors found on this basis in a sample of 100 replicate runs were counted and an estimate made using the methodology developed above. The results were as shown in Table 4. The numbers of local optima with respect to the neighbourhood search were obtained by exact counting using a version of Jones’s ‘reverse hill-climbing’ approach (Jones, 1995). However, it is not possible to compute the numbers of attractors exactly for a GA, even when the parameters are fixed, since there is so much randomization involved. What we estimate therefore is in some sense the number of the ‘most common’ attractors, ignoring possible pathological behaviour.

Table 4

A comparison of two equally ‘epistatic’ NK-landscapes with respect to NS and GAs for two different mutation rates and for 1-point and 2-point crossover.

Landscape# NS-local optima# GA attractors
MutRate
N = 10, K = 4xover0.100.05
‘easy’911517
21214
‘hard’3511623
21818
N = 12, K = 4
‘easy’1611425
21418
‘hard’5212324
21622

t0025

This is merely a small example, but even this demonstrates some interesting differences in the performances of the GAs. The number of GA attractors for the ‘easy’ landscapes is generally smaller (with a notable exception for the N = 12 case with mutation at 0.05) than for the ‘hard’ ones, and there does appear to be a difference between the 1-point and 2-point crossovers. We might also intuit that high mutation rates would imply fewer attractors, and this is evident in most cases (with the N = 12 case again an exception). It is also clear that just examining the NS landscape may not be adequate in predicting the performance of a GA. The number of attractors is clearly more than the number of NS-local optima in the ‘easy’ case, and seems likely to be fewer in the ‘hard’ case.

In some further experiments, a set of 35 equivalent NA’-functions (with N = 15, K = 4) was investigated; for each case a GA was run (as described above, with 1-point crossover and 0.10/bit mutation rate) for 100 independent trials. The number of attractors was estimated and compared with the (known) number of bit-flip local optima for each of the related landscapes. The correlation between the number of GA attractors and the number of Hamming local optima was 0.72, a statistically significant value. When the actual strings were compared, the set of GA attractors was almost always a subset of the Hamming local optima. Thus, although the landscapes are different in general, it seems that their fixed points tend to coincide, and despite the conceptual difference between deterministic point-based NS methods and stochastic population-based GAs, their respective landscapes seem to have similar properties.

Finally, the numbers of GA attractors was re-estimated using 500 independent runs instead of 100. In every case, the initial estimated number of attractors was exceeded, and although the numbers of attractors using 500 runs was well predicted by the numbers estimated using 100 runs (all fell within the initial 95% confidence interval), the correlation with the numbers of Hamming local optima was less strong (0.51 instead of 0.72). Now the set of attractors included points that were not Hamming local optima, so this reduction in correlation was not altogether a surprise. However, the ‘new’ attractors tended to be less fit than those found in the first 100 runs, and obviously they had a smaller attractive basin. This has some implications for the usefulness of the methodology that will be discussed in the next, final, section.

6 CONCLUSIONS AND FURTHER WORK

Several procedures have been presented for an empirical investigation of one aspect of a GA landscape. Previous approaches have not been able directly to answer the question of how many attractors there are, but the approach described here is able to estimate this quantity. Several practical questions of implementation have been reviewed, and an application has been made in order to demonstrate the methodology. The techniques described are not limited to GAs, but can in principle be applied to any heuristic search method.

There are some caveats that we should mention. Firstly, there is clearly a substantial computational burden in such investigations, and the size of the problem instance is clearly a factor in the feasibility of this approach. The number of samples needed may become infeasibly large as the size of the problem space increases.

Secondly, any estimate is useful only as long as the assumptions upon which it is based are valid. The fundamental assumption made above is that the attractors are isotropically distributed in the landscape, which implies that their basins of attraction are of more or less equal size. In fact, this is almost certainly not the case for many instances of NS landscapes. Such landscapes have been investigated empirically for such problems as the TSP, graph bisection and flowshop scheduling (Kauffman, 1993; Boese et al., 1994; Reeves, 1999b), and in many cases it is found that the local optima tend to be clustered in a ‘big valley’—closer to each other (and to the global optima) than a pair of randomly chosen points would be. Furthermore, the better optima tend to be closer to the global optima, and to have larger basins of attraction, than the poorer ones. Thus, optima found by repeated local searches are disproportionately likely to be these ‘more attractive’ (and hopefully fitter) ones. The number of local optima found by assuming an isotropic distribution is therefore likely to be an under-estimate of the true value. Nevertheless, we might hope that the statistical estimates would still be useful, as many of the local optima that we failed to find in a non-isotropic landscape would not (if the big valley conjecture holds) be very important anyway.

We do not know if similar patterns affect GA landscapes and attractors, but we have to recognize the possibility. Certainly, this is suggested by the analysis of the 35 equivalent NK-functions in the previous section. When the number of runs was extended from 100 to 500, several more attractors were found, more than had been estimated by the methods of the previous sections. These attractors seemed to have small basins, and to be less fit, so the remarks above relating to local optima seem to be echoed for the case of GA landscapes too.

In the applications above the effect of a non-isotropic distribution of attractors is probably not acute. When we compare the relative performance of different operators, for example, the effects of a non-isotropic distribution may apply in the same way to all of them. Also, on the empirical evidence of many fairly large studies of NS landscapes, and on the (admittedly much smaller) examples of GA landscapes investigated above, the attractors in the ‘tail’ of the distribution of basin sizes may be relatively unimportant. The assumption of approximate uniformity among the ones that we do care about is much more reasonable, and might mean that the estimates we obtain are fairly accurate, as long as we interpret them as referring to these ‘good’ attractors.

Nevertheless, if we wish to find still better estimates of the number of attractors, the effects of a non-uniform spread of attractors should be taken into account. A second paper that is currently in preparation will both examine the extent of the effect of the assumptions in the above analysis, and also explore stochastic models of non-isotropic landscapes.

References

Aizawa A. Fitness landscape characterization by variance of decompositions. In: Belew RK, Vose MD, eds. Foundations of Genetic Algorithms 4. San Francisco, CA: Morgan Kaufmann; 1997:225–245 1997.

Beaumont GP. Intermediate Mathematical Statistics. London: Chapman & Hall; 1980.

Boese KD, Kahng AB, Muddu S. A new adaptive multi-start technique for combinatorial global optimizations. Operations Research Letters. 1994;16:101–113.

Caruana R, Mullins M. Proc. International Conference on Artificial Intelligence: Workshop on Optimization. Estimating the number of local minima in complex search spaces. 1999.

Darroch JN. The multiple-recapture census, 1: estimation of a closed population. Biometrika. 1958;45:343–359.

Davidor Y. Epistasis variance: suitability of a representation to genetic algorithms. Complex Systems. 1990;4:369–383.

Davidor Y. Epistasis variance: a viewpoint on GA-hardness. In: Rawlins GJE, ed. Foundations of Genetic Algorithms. San Mateo, CA: Morgan Kaufmann; 1991 1991.

Jones T, Forrest S. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Eshelman LJ, ed. Proceedings of the 6th International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann; 1995:184–192.

Johnson NL, Kotz S. Discrete distributions. New York: Wiley; 1969.

Jones TC. Evolutionary Algorithms, Fitness Landscapes and Search. Doctoral dissertation. Albuquerque, NM: University of New Mexico; 1995.

Kallel L, Gamier J. How to detect all maxima of a function. Talk given at Seminar 00071 on the Theory of Evolutionary Algorithms. Germany: Schloss Dagstuhl; 2000 February.

Kauffman S. The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press; 1993.

Reeves CR. Predictive measures for problem difficulty. In: Proceedings of 1999 Congress on Evolutionary Computation. 1999a:736–743 IEEE Press.

Reeves CR. Landscapes, operators and heuristic search. Annals of Operational Research. 1999b;86:473–490.

Reeves CR. Experiments with tunable landscapes. In: Schoenauer M, Deb K, Rudolph G, Yao X, Lutton E, Merelo JJ, Schwefel H.-P., eds. Parallel Problem-Solving from Nature—PPSN VI. Berlin: Springer-Verlag; 2000:139–148 2000.

Robson DS, Regier HA. Estimation of population number and mortality rates. In: Ricker WE, ed. Methods for Assessment of Fish Production in Fresh Waters. Oxford: Blackwell Scientific Publications; 1968:124–158 1968.

Schaffer JD, Caruana RA, Eshelman LJ, Das R. A study of control parameters affecting online performance of genetic algorithms for function optimization. In: Schaffer JD, ed. Proceedings of 3rd International Conference on Genetic Algorithms. Los Altos, CA: Morgan Kaufmann; 1989:51–60 1989.

Seber GAF. The Estimation of Animal Abundance. London: Charles Griffin; 1982.

Stadler PF, Wagner GP. Algebraic theory of recombination spaces. Evolutionary Computation. 1998;5:241–275.

Vose MD. What are genetic algorithms? A mathematical perspective. In: Davis LD, De Jong K, Vose MD, Whitley LD, eds. Evolutionary Algorithms: IMA Volumes in Mathematics and its Applications. New York: Springer-Verlag; 1999:251–276 1999.

Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans. Ev. Comp. 1997;1:67–82.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.180.43