Local Performance of the (μ/μI, λ)-ES in a Noisy Environment

Dirk V. Arnold [email protected]; Hans-Georg Beyer [email protected]    Department of Computer Science XI University of Dortmund 44221 Dortmund, Germany

Abstract

While noise is a phenomenon present in many real-world optimization problems, the understanding of its potential effects on the performance of evolutionary algorithms is still incomplete. In the realm of evolution strategies in particular, it can frequently be observed that one-parent strategies are outperformed by multi-parent strategies in noisy environments. However, mathematical analyses of the performance of evolution strategies in noisy environments have so far been restricted to the simpler one-parent strategies.

This paper investigates the local performance of a multi-parent evolution strategy employing intermediate multi-recombination on a noisy sphere in the limit of infinite parameter space dimension. The performance law that is derived neatly generalizes a number of previously obtained results. Genetic repair is shown to be present and unaffected by the noise. In contrast to previous findings in a noise-free environment, the efficiency of the simple (l+1)-ES can be exceeded by multi-parent strategies in the presence of noise. It is demonstrated that a much improved performance as compared to one-parent strategies can be achieved. For large population sizes the effects of noise all but vanish.

1 INTRODUCTION

Noise is a common phenomenon in many real-world optimization problems. It can stem from sources as different as errors of measurement in experiments, from the use of stochastic sampling or simulation procedures, or from interactions with users, to name but a few. Reduced convergence velocity or even inability to approach the optimum are commonly observed consequences of the presence of noise on optimization strategies.

It can frequently be observed in experiments that evolutionary algorithms (EAs) are comparatively robust with regard to the effects of noise. In fact, noisy environments are considered to be a prime area for the application of EAs. For genetic algorithms (GA), optimization in noisy environments has been investigated, among others, by Fitzpatrick and Grefenstette [10], Miller and Goldberg [12], and Rattray and Shapiro [15]. Their work has led to recommendations regarding population sizing and the use of resampling, and it has been shown that in the limit of an infinitely large population the effects of noise on the performance of a GA employing Boltzmann selection are entirely removed.

In the realm of evolution strategies (ESs), theoretical studies of the effects of noise date back to early work of Rechenberg [16] who analyzed the performance of a (1 + 1)-ES in a noisy corridor. An analysis of the performance of the (1 + A)-ES on a noisy sphere by Beyer [3] sparked empirical research by Hammel and Bäck [11] who concluded that the findings made for GAs do not always easily translate to ES. Arnold and Beyer [2] address the effects of overvaluation of the parental fitness using a (1 + 1)-ES on a noisy sphere. Both Rechenberg [17] and Beyer [7, 8] give results regarding rescaled mutations in (1, λ)- ES. For an overview of the status quo of ES research concerned with noisy environments and for additional references see [1, 8].

Empirical evidence presented by Rechenberg [17] and by Nissen and Propach [13] as well as by a number of others suggests that population-based optimization methods are generally superior to point-based methods in noisy environments. However, analytically obtained results in the field of ES have been published only for single-parent strategies. The present paper attempts to close part of this gap between theory and empirical observations by presenting a formal analysis of the local performance of a (μ/μI, λ)-ES on a noisy sphere in the limit of infinite parameter space dimensionality. The results of the analysis can serve as an approximation for large but finite parameter space dimensionality provided that population sizes are chosen not too large.

Much of the theoretically-oriented research on ES relies on the examination of the local behavior of the strategies in very simple fitness environments. While such environments bear only little resemblance to real-world problems, they have the advantage of mathematical tractability. Frequently, consideration of simple fitness environments, such as the sphere or the ridge, leads to an understanding of working mechanisms of ES that can explain phenomena observable or be of practical use in more complex fitness environments. The discovery of the genetic repair principle [5] and the recommendation regarding the sizing of the learning parameter in self-adaptive strategies in [19] are but two examples. The present paper continues this line of research by analyzing the behavior of the (μ/μI, λ)-ES on a sphere with fitness measurements disturbed by Gaussian noise. The results derived are accurate in the limit of an infinite-dimensional parameter space and can serve as approximations for large but finite-dimensional parameter spaces.

This paper is organized as follows. Section 2 outlines the (μ/μI, λ)-ES algorithm and the fitness environment for which its performance will be analyzed. The notion of fitness noise is introduced. In Section 3, a local performance measure – the expected fitness gain from one generation to the next – is computed. It is argued that for the spherically symmetric environment that is investigated, in the limit of an infinite-dimensional parameter space the expected fitness gain formally agrees with the progress rate. Section 4 discusses the results. It is demonstrated that the performance gain as compared with one-parent strategies is present for small population sizes already. It is shown that in contrast to the noise-free situation, the efficiency of the (1 + 1)-ES can be exceeded by that of a (μ/μI, λ)-ES if noise is present. Section 5 concludes with a brief summary and suggestions for future research.

2 ALGORITHM AND FITNESS ENVIRONMENT

Section 2.1 describes the (μ/μI, λ)-ES with isotropic normal mutations applied to optimization problems with a fitness function of the form f: r1N → r1. Without loss of generality, it can be assumed that the task at hand is minimization, i.e. that high values of f correspond to low fitness and vice versa. Section 2.2 outlines the fitness environment for which the performance of the algorithm is analyzed in the succeeding sections.

2.1 THE (μ/μI, λ)-ES

As an evolutionary algorithm, the(μ/μI, λ)-ES strives to drive a population of candidate solutions to an optimization problem towards increasingly better regions of the search space by means of variation and selection. The parameters λ and μ refer to the number of candidate solutions generated per time step and the number of those retained after selection, respectively. Obviously, λ is also the number of fitness function evaluations required per time step.

Selection simply consists in retaining the μ best of the candidate solutions and discarding the remaining ones. The comma in (μ/μ, λ) indicates that the set of candidate solutions to choose from consists only of the λ offspring, whereas a plus would indicate that selection is from the union of the set of offspring and the parental population. As shown in [2], plus-selection in noisy environments introduces overvaluation as an additional factor to consider, thus rendering the analysis considerably more complicated. Moreover, as detailed by Schwefel [19], comma-selection is known to be preferable if the strategy employs mutative self-adaptation of the mutation strength, making it the more interesting variant to consider.

Variation is accomplished by means of recombination and mutation. As indicated by the second μ and the subscript I in (μ/μ, λ), recombination is “global intermediate”. More specifically, let x(t)r1N, i = 1,…, μ, be the parameter space locations of the parent individuals. Recombination consists in computing the centroid

x=1μμt=1x(t)

si5_e

of the parental population. Mutations are isotropically normal. For every descendant y(j)r1N, j = 1,…, λ, a mutation vector z(j) = 1,…, λ, which consists of N independent, normally distributed components with mean 0 and variance σ2, is generated and added to the centroid of the parental population. That is,

y(j)=x+z(j)

si6_e

The standard deviation σ of the components of the mutation vectors is referred to as the mutation strength.

The fitness advantage associated with a mutation z applied to search space location x is defined as

qx(z)=f(x)f(x+z).

si7_e  (1)

For a fixed location x in parameter space, qx is a scalar random variate with a distribution that depends both on the fitness function and on the mutation strength.

The local performance of the algorithm can be measured either in parameter space or in terms of fitness. The corresponding performance measures are the progress rate φ and the expected fitness gain q, respectively. The expected fitness gain is the expected difference in fitness between the centroids of the population at consecutive time steps. That is, letting 〈x〉(g) denote the centroid of the parental population at generation g, the expected fitness gain is the expected value of f(x(g))f(x(g+`))si8_e. The progress rate is the expected distance in parameter space in direction of the location of the optimum covered by the population’s centroid from one generation to the next. That is, letting R(g) denote the Euclidean distance between the centroid of the parental population and the optimum of the fitness function, the progress rate is the expected value of R(g)R(g + 1). For the fitness environment outlined in Section 2.2 the two performance measures coincide in the limit N → ∞ if appropriately normalized as will be seen in Section 3.

2.2 THE FITNESS ENVIRONMENT

Computing the expected fitness gain is a hopeless task for all but the most simple fitness functions. In ES theory, a repertoire of fitness functions simple enough to be amenable to mathematical analysis while at the same time interesting enough to yield non-trivial results and insights has been established [4, 14]. The most commonplace of these fitness functions is the quadratic sphere

f(x)=Nt=1(ˆxtxt)2,

si9_e

where x = (x1,…, xN)T is the location in parameter space of the optimal fitness value. This fitness function simply maps points in r1N to the square of their Euclidean distance to the optimum at ˆxsi10_e. By formally letting the parameter space dimension N tend to infinity and introducing appropriate normalizations as defined below, a number of simplifying conditions hold true. Note that often results that have been obtained for the sphere can be extended to more general situations. Rudolph [18] derives bounds for the performance of ES for arbitrary convex fitness functions. Beyer [4] presents a differential geometric approach that locally approximates arbitrary fitness functions and presents a fitness gain theory that applies to general quadratic fitness functions.

In what follows it is assumed that there is noise involved in the process of evaluating the fitness function. This form of noise has therefore been termed fitness noise. Loosely speaking, it deceives the selection mechanism. An individual at parameter space location x has an ideal fitness f(x) and a perceived fitness that may differ from its ideal fitness. Fitness noise is commonly modeled by means of an additive, normally distributed random term with mean zero. That is, in a noisy environment, evaluation of the fitness function at parameter space location x yields perceived fitness f(x) + σepsδ, where δ is a standard normally distributed random variate. Quite naturally, σeps is referred to as the noise strength and may depend on the parameter space location x.

3 PERFORMANCE

This section computes the expected fitness gain, or, equivalently, the progress rate, for the (μ/μ, λ)-ES in the environment outlined in Section 2.2. It relies on a decomposition of mutation vectors suggested in both [3] and in [17] and is illustrated in Figure 1. A mutation vector z originating at parameter space location x can be written as the sum of two vectors zA and zB, where zA is parallel to ˆxxsi11_e and zB is in the hyper-plane perpendicular to that. Due to the isotropy of mutations, it can without loss of generality be assumed that zA = σ(z1,0,…, 0)T and zB = σ(0, z2,…, zN)T, where the zt, I,…, N, are independent, standard normally distributed random variates and σ is the mutation strength. Using elementary geometry and denoting the respective distances of x and of y = x + z to the location of the optimum by R and r, respectively, it can be seen from Figure 1 that

r2=(Rσz1)2+||zB||2=R22Rσz1+σ2z21+σ2Ni=2z21

si12_e  (2)

f08-01-9781558607347
Figure 1 Decomposition of a mutation vector z into two components ZA and ZB. Vector ZA is parallel to xxsi1_e, vector zB is in the hyper-plane perpendicular to that. The starting and end points, x and y, of the mutation are at distances R and r, respectively, from the location of the optimum.

Introducing normalized quantities

σ*=σNR,σ*ε=σεN2R2andq*x(z)=qx(z)N2R2,

si13_e

the normalized fitness advantage associated with mutation vector z from Equation (1) is thus

q*(z)=N2R2(R2r2)=σ*z1σ*22Nz21σ*22NN1=2z21

si14_e  (3)

Note that the index x of the normalized fitness advantage has been dropped as the normalized fitness advantage is independent of the location in parameter space due to the choice of normalizations.

The second summand on the right hand side of Equation (3) can for large N with overwhelming probability be neglected compared to the third summand. The term Nt=2z2t/Nsi15_e contained in the third summand on the right hand side of Equation (3) is according to the Central Limit Theorem of Statistics asymptotically normal with mean (N – 1)/N ≈ 1 and variance 2(N – 1)/N2 ≈ 2/N. In the limit N → ∞, the third summand and therefore the contribution of the zB component to the fitness advantage is thus simply σ*2/2si16_e. Note that the components zt, i = 2, …, N of the mutation vector do not influence the fitness advantage associated with a mutation as for large enough N they statistically average out. Thus, the Zi, i = 2, …, N, are selectively neutral and for large N the fitness advantage associated with mutation vector z can be approximated as

q*(z)σ*z1σ*22.

si17_e  (4)

As shown in Appendix A this relation is exact in the limit N → ∞.

Selection is performed on the basis of perceived fitness rather than ideal fitness. With the definition of fitness noise from Section 2.2, the perceived normalized fitness advantage associated with mutation vector z in the limit N → ∞ is

q*(z)+σ*εδ=σ*(z1+ϑδ)σ*22,

si18_e  (5)

where δ is a standard normal random variate and ϑ=σ*ε/σ*si19_e is the noise-to-signal ratio. Those μ offspring individuals with the highest perceived fitness are selected to form the parental population of the succeeding time step. Let (k; λ), k = 1,…, λ, be the index of the offspring individual with the kth highest perceived fitness. The difference between the parameter space location of the centroid of the selected offspring and the centroid of the parental population of the present time step is the progress vector

z=1μμk=`z(k;λ),

si20_e

which consists of components zi=μk=11zt(k;λ)/μ.si21_e The expected fitness gain of the strategy is the expected value of the fitness advantage

q*(z)=σ*(z1)σ*22Nz12σ*22NNt=2zt2

si22_e  (6)

associated with this vector.

The vector 〈z〉 can be decomposed into two components 〈zA〉 and 〈zB〉 in very much the same manner as shown above and as illustrated in Figure 1 for mutation vectors. As the zB components of mutation vectors are selectively neutral, the zB components of the mutation vectors that are averaged are independent. Therefore, each of the 〈zi〉, I = 2,…, N, is normal with mean 0 and variance 1/μ. Following the line of argument outlined in the derivation leading to Equation (4), the contribution of the zB components to the expected fitness gain is thus –σ*2/2 μ. Note the occurence of the factor μ in the denominator that reflects the presence of genetic repair, the influence that has in [5] been found to be responsible for the improved performance of the (μ/μI, λ)-ESas compared with the (1, λ)-ES.

We now argue that the second summand on the right hand side of Equation (6) can be neglected compared with the third provided that N is large enough. This is the case if μz12Nsi23_e. Assume that A is fixed and finite. The value of 〈z12 is independent of N. Its mean is not greater than the square of the average of the first μ order statistics of λ independent, standard normally distributed random variables. Therefore, for any eps > 0 an N can be given such that the probability that μz12Nsi24_e does not hold is less than eps. As a consequence, the second summand can be neglected.

Computing the expected value of the first summand on the right hand side of Equation (6) is technically involved but straightforward. The calculations are referred to Appendix B. Denoting the cumulative distribution function of the standard normal distribution by Φ and following [5] in defining the (μ/μ, λ)-progress coefficient as

cμ/μ,λ=λμ2π(λμ)ex2[Φ(x)]λμ1[1Φ(x)]μ1dx,

si25_e  (7)

the result arrived at in Equation (13) reads

¯z1=cμ/μ,λ1+ϑ2

si26_e

Therefore, the expected fitness gain of the (μ/μI, λ)-ES on the noisy, infinite-dimensional quadratic sphere is

ˉq*=cμ/μ,λσ*1+ϑ2σ*22μ

si27_e  (8)

The relevance of this result is discussed in Section 4.

We now make plausible that this result formally agrees with the result for the normalized progress rate, i.e. that in the limit N → ∞ the progress rate law

φ*=cμ/μ,λσ*1+ϑ2σ*22μ,

si28_e  (9)

where φ* is the expected value of N(Rr)R, holds. For finite normalized mutation strength the normalized fitness gain is finite. Therefore, with increasing N the expected value of the difference between the distance R of the parental centroid from the location of the optimum and the distance r of the centroid of the selected offspring to the location of the optimum tends to zero. As

NR2r22R2=NRrRR+r2R,

si29_e

and as (R + r)/2R tends to one, it follows that Equation (9) holds.

4 DISCUSSION

Equation (9) generalizes several previously obtained results and confirms a number of previously made hypotheses and observations. In particular, for σ*ε=0,si30_e Equation (9) agrees formally with the result of the progress rate analysis of the (μ/μI, λ)-ES on the noise-free sphere published in [5]. For μ = 1, results of the analysis of the (1, λ)-ES on the noisy sphere that have been derived in [3] and [17] appear as special cases. Moreover, solving Equation (9) for the distance to the optimum R proves the validity of a hypothesis regarding the residual location error made in [8]. Finally, Equation (9) confirms empirical findings regarding the performance of the (μ/μI, λ)-ES made by Herdy that are quoted in [17].

Figure 2 is obtained by numerically solving Equation (9) for the normalized mutation strength ˆσ*si31_e that maximizes the normalized progress rate. Throughout this section, this mutation strength is considered optimal. Both the optimal normalized mutation strength and the resulting normalized progress rate are shown as functions of the normalized noise strength. Note that the axes are scaled by dividing by the optimal normalized mutation strength ˆσ*(σ*ε=0)=μcμ/μ,λsi32_e in the absence of noise for normalized noise strength and normalized mutation strength and by the maximal normalized progress rate ˆσ*(σ*ε=0)=μc2μ/μ/λ/2si33_e in the absence of noise for the normalized progress rate. Using the scaled quantities the resulting graphs are independent of the population size parameters. For the noisy (1, λ)-ES the figure can already be found in [17].

f08-02-9781558607347
Figure 2 Optimal normalized mutation strength ˆσ*si2_e and maximal progress rate φ*si3_e as functions of normalized noise strength σε* Note the scaling of the axes: normalized mutation strength and normalized progress rate are scaled by their respective optimal values in the absence of noise; the normalized noise strength is scaled by the optimal normalized mutation strength in the absence of noise.

It can be seen from Figure 2 that a (1, λ)-ES is capable of progress up to a normalized noise strength of σ*ε=2c1,λ,si34_e while a, (μ/μI, λ)-ES is capable of progress up to a normalized noise strength of 2μcμ/μ,λ. That is, a positive progress rate can be achieved for any noise strength by increasing the population size. The reason for this improved performance is – as in the absence of noise – the genetic repair effect. Genetic repair acts in the noisy environment as it does in the noise-free one: the components of the mutation vectors that do not lead towards the optimum but away from it are reduced by the averaging effect of recombination as signified by the presence of the factor μ in the denominator of the loss term in Equation (9). As a consequence, it is possible to explore the search space at higher mutation strengths. In a noisy environment, these increased mutation strengths have the additional benefit of leading to a reduced noise-to-signal ratio ϑ that in limit of very large population sizes tends to zero. Note however that care is in place here as finite population sizes had to be assumed in the limiting process N → ∞. Therefore, no statements regarding the limiting case μ → ∞ can be made. A performance analysis of the (μ/μI, λ)-ES in finite-dimensional search spaces will be necessary to arrive at conclusive statements regarding the behavior of strategies employing very large populations or to obtain conclusive results in rather low dimensional search spaces.

The benefit of increased mutation strengths in noisy environments has been recognized before and has led to the formulation of (1, λ)-ES using rescaled mutations in [17, 7, 8]. The present analysis shows that the positive effect of an increased mutation strength can naturally be achieved in (μ/μI, λ)-ES, and that no rescaling is required. However, this does not solve the problems related to mutation strength adaptation of (1, λ)-ES using rescaled mutations pointed out in [8] as the self-adaptation of mutation strengths in (μ/μI, λ)-ES does not work optimally either.

Defining the efficiency η of an ES as the expected normalized fitness gain, or equivalently the normalized progress rate, per fitness function evaluation for optimally chosen mutation strength, Beyer [6] has shown that on the sphere in the absence of noise, the efficiency of the (l-t-l)-ES cannot be exceeded by any (μ/μI, λ)-ES but only asymptotically be reached for very large population sizes. This is not so in noisy environments. Figure 3a) shows the number of offspring per generation λ above which an optimally adjusted (μ/μI, λ)-ES outperforms an optimally adjusted (1 + 1)-ES on the infinite-dimensional noisy sphere. It has been obtained by numerically comparing the efficiency of the (μ/μI, λ)-ES that follows from dividing the normalized progress rate φ* in Equation (9) by λ with the normalized progress rate of the (1 + 1)-ES computed in [2]. It can be seen that the efficiency of the (1 + 1)-ES can be exceeded even for moderate population sizes if there is noise present. For example, for normalized noise strengths above about 0.59, a (2/2I, 6)-ES can be more efficient than a (1 + 1)-ES. Only for relatively small normalized noise strengths below 0.2 are large population sizes required to outperform the (1 + 1)-ES. This is a strong point in favor of the use of multi-parent rather than single-parent optimization strategies in noisy environments.

f08-03-9781558607347
Figure 3 a) The number of offspring per generation λ above which an optimally adjusted (μ/μI, λ)-ES outperforms an optimally adjusted (1 + 1)-ES as a function of normalized noise strength, b) Efficiency ˆηsi4_e of the μ/μI, λ)-ES with optimally chosen number of parents μ as a function of the number of offspring λ. The lines correspond to, from top to bottom, normalized noise strengths σε* = 0.0. 1.0, 2.0, 4.0, 8.0, and 16.0.

Figure 3b) shows the efficiency of (μ/μI, λ)-ES with optimally chosen parameter μ as a function of the number of offspring per generation λ for a number of noise levels. It has been obtained by numerically optimizing the efficiency while considering μ as a real valued parameter in order to smoothen the curves. It can be seen from the asymptotic behavior of each line in the figure that by sufficiently increasing the population size the efficiency can be increased up to its maximal value of 0.202 in the absence of noise. However, it is once more stressed that the relevance of this result for finite-dimensional parameter spaces is rather limited.

5 CONCLUSION

To conclude, the local performance of the (μ/μI, λ)-ES on the noisy, infinite-dimensional quadratic sphere has been analyzed. It has been shown that by increasing the population size the effects of noise can be entirely removed. This theoretical result seems to support the validity for ES of empirical observations made by Fitzpatrick and Grefenstette [10] for GAs. For (μ/μI, λ)-ES the improved performance is due to genetic repair which makes it possible to use increased mutation strengths and thereby to decrease the noise-to-signal ratio. It has also been demonstrated that, in contrast to the noise-free case, the efficiency of the (1 + 1)-ES can be exceeded by a multi-parent strategy employing recombination, and that even for relatively moderate noise strengths rather small population sizes are sufficient to achieve this.

Future work includes analyzing the performance of the (μ/μI, λ)-ES in other fitness environments and extending the analysis to strategies employing dominant rather than intermediate recombination by means of surrogate mutations as exemplified for noise-free environments in [17, 5]. Of particular interest is the extension of the analysis presented here to finite-dimensional parameter spaces. Furthermore, the analysis of the performance of ES employing no recombination or recombination with fewer than μ parents remains as a challenge for the future. The results of such an analysis may prove to be of particular interest as empirical results quoted in [17] indicate that performance improvements not accounted for by the present theory can be observed. Finally, an investigation of the effects of noise on the behavior of mutation strength adaptation schemes such as mutative self-adaptation is of great interest as such schemes are an essential ingredient of ES.

APPENDIX A: SHOWING THE ASYMPTOTIC NORMALITY OF THE FITNESS ADVANTAGE ASSOCIATED WITH A MUTATION VECTOR

The goal of this appendix is to formally show that in the limit N → ∞ the distribution of the normalized fitness advantage is normal with mean – σ*2/2 and variance σ*2. Let

f1(z1)=σ*z1σ*22Nz2I

si35_e

and

f2(z2,,zN)=σ*22NNt=2z2t.

si36_e

Then q*(z) = f1(z1+ f2(z2, , zN) and the characteristic function of q*(z) is the product of the characteristic functions of f1(z1) and f2(z2,…,zN) due to the independence of the summands.

The characteristic function of f2(z2,…, zN) is easily determined. As the sum Nt=2z2isi37_e is χ2N1si38_e -distributed, the charateristic function of f2(z2,…, zN) is

ϕ2(t)=(1+σ*2Nit)12(N1)

si39_e

where i denotes the imaginary unit.

The characteristic function of f1(z1) is not as easily obtained. It can be verified that for y ≤ N/2

f1(z1)<yz1<Nσ*[112yN]orz1>Nσ*[1+12yN]

si40_e

and that f1(z1) < y with probability 1.0 if y > N/2. Recall that z1 is standard normally distributed. Let

Φ(x)=12πxe12t2dt

si41_e

denote the cumulative distribution function of the standard normal distribution. The cumulative distribution function of f1(z1) is then

P1(y)={1Φ(Nσ*(112yN))+1Φ(Nσ*(1+12yN))ifyN/2ify<N/2

si42_e

The corresponding probability density function

p1(y)={012πσ*12y/N[exp(N22σ*2(112yN)2)+exp(N22σ*2(1+12yN)2)]ifyN/2ify<N/2

si43_e

is obtained by differentiation. Using substitution z=12y/Nsi44_e it follows

ϕ1(t)=p1(y)eitydy=N2πσ*0[exp(N2(1z)22σ*2)+exp(N2(1+z)22σ*2)]exp(itN2(1z2))dz=11+σ*2it/Nexp(12σ*2t21+σ*2it/N).

si45_e

for the characteristic function of f1(z1), where the final step can be verified using an algebraic manipulation system.

Thus, overall the characteristic function of q*(z) is

ϕ(t)=ϕ1(t)ϕ2(t)=1(1+σ*2it/N)N/2exp(12σ*2t21+σ*2it/N).

si46_e  (10)

We now use a property that Stuart and Ord [20] refer to as The Converse of the First Limit Theorem on characteristic functions: If a sequence of characteristic functions converges to some characteristic function everywhere in its parameter space, then the corresponding sequence of distribution functions tends to a distribution function that corresponds to that characteristic function. Due to the relationship

limN1(1+x/N)N/2=exp(x2),

si47_e

the first factor on the right hand side of Equation (10) tends to exp(–σ*2 it/2). The second factor tends to exp(–σ*2 t2/2). Therefore, it follows that for any t

limNϕ(t)=exp(1σ*22(t2+it)).

si48_e

As this is the characteristic function of a normal distribution with mean –σ*2/2 and variance σ*2 the asymptotic normality of the fitness advantage has been shown.

APPENDIX B COMPUTING THE MEAN OF 〈z1

Let z1(t) and δ(i), i = 1, …, λ, be independent standard normally distributed random variables. Let (k; λ) denote the index with the kth largest value of z1(i) + ϑδ(t) The goal of this appendix is to compute the expected value of the average

z1=1μμk=1z(k;λ)1,

si49_e

of those μ of the z1(i) that have the largest values of z(i)1+ϑδ(i)si50_e.

Denoting the probability density function of z1(kλ) as Pk, the expected value can be written as

¯z1=1μμk=1xpk;λ(x)dx.

si51_e  (11)

Determining the density pk requires the use of order statistics. The z1(i) have standard normal density. The terms z(t)1+ϑδ(t)si52_e are independently normally distributed with mean 0 and variance 1 + ϑ2. The value of z(t)1+ϑδ(t)si53_e is for given z(i)1=xsi54_e normally distributed with mean x and variance ϑ2. For an index to have the kth largest value of z(t)1+ϑδ(t)si55_e, k – 1 of the indices must have larger values of z(t)1+ϑδ(t)si56_e and λ – k must have smaller values of z(t)1+ϑδ(t)si57_e As there are λ times k – 1 out of λ – 1 different such cases, it follows

pk;λ(x)=12πϑλ!(λk)!(k1)!e12x2exp(12(yxϑ)2)[Φ(y1+ϑ2)]λk[1Φ(y1+ϑ2)]k1dy

si58_e  (12)

for the density of zk

Inserting Equation (12) into Equation (11) and swapping the order of integration and summation it follows

¯z1=12πμϑxe12x2exp(12(yxϑ)2)μk=1λ!(λk)!(k1)![Φ(y1+ϑ2)]λk[1Φ(y1+ϑ2)]k1dydx.

si59_e

Using the identity (compare [6], Equation (5.14))

μk=1λ!(λk)!(k1)!Pλk[1P]k1=λ!(λμ1)!(μ1)!P0zλμ1[1z]μ1dzsi60_e

it follows

¯z1=λμ2πϑ(λμ)xe12x2exp(12(yxϑ)2)Φ(y/1+ϑ2)0zλμ1[1z]μ1dzdydx.si61_e

Substituting z=Φ(u/1+ϑ2)si62_e yields

¯z1=λμ2π3ϑ1+ϑ2(λμ)xe12x2exp(12(yxϑ)2)yexp(12u21+ϑ2)[Φ(u1+ϑ2)]λμ1[1Φ(u1+ϑ2)]μ1dudydx,

si63_e

and changing the order of the integrations results in

¯z1=λμ2π3ϑ1+ϑ2(λμ)exp(12u21+ϑ2)[Φ(u1+ϑ2)]λμ1[1Φ(u1+ϑ2)]μ1xe12x2uexp(12(yxϑ)2)dydxdu.

si64_e

The inner integrals can easily be solved. It is readily verified that

12πϑxe12x2uexp(12(yxϑ)2)dydx=11+ϑ2exp(12(u1+ϑ2)2)

si65_e

Therefore,

¯z1=λμ2π(1+ϑ2)(λμ)exp(u21+ϑ2)[Φ(u1+ϑ2)]λμ1[1Φ(u1+ϑ2)]μ1du.

si66_e

Using substitution z=u/1+ϑ2si67_e it follows

¯z1=λμ2π(1+ϑ2)(λμ)ex2[Φ(x)]λμ1[1Φ(x)]μ1dx=cμ/μ,λ1+ϑ2,

si68_e  (13)

Where cμ/μ, denotes the (μ/μ,)-progress coefficient defined in Equation (7).

Acknowledgements

Support by the Deutsche Forschungsgemeinschaft (DFG) under grants Be 1578/4-1 and Be 1578/6-1 is gratefully acknowledged. The second author is a Heisenberg fellow of the DFG.

References

[1] Arnold DV. Evolution Strategies in Noisy Environments - A Survey of Existing Work. In: Kallel L, Naudts B, Rogers A, eds. Theoretical Aspects of Evolutionary Computing. Berlin: Springer; 2000:241–252.

[2] Technical Report Cl-80/00, SFB 531 Arnold DV, Beyer H-G. Local Performance of the (1 + 1)-ES in a Noisy Environment. Universitat Dortmund; 2000.

[3] Beyer H-G. Toward a Theory of Evolution Strategies: Some Asymptotical Results from the (1 + λ)-Theory. Evolutionary Computation. 1993;1(2):165–188.

[4] Beyer H-G. Towards a Theory of ‘Evolution Strategies’: Progress Rates and Quality Gain for (1 + λ)-Strategies on (Nearly) Arbitrary Fitness Functions. In: Davidor Y, Männer R, Schwefel H.-P., eds. Parallel Problem Solving from Nature. Heidelberg: Springer; 1994:58–67.

[5] Beyer H-G. Toward a Theory of Evolution Strategies: On the Benefit of Sex - the (μ/μ,λ)-Theory. Evolutionary Computation. 1995;3(1):81–111.

[6] Beyer H-G. Zur Analyse der Evolutionsstrategien. Universität Dortmund: Habilitationsschrift; 1996 (see also [9]).

[7] Beyer H-G. Mutate Large, but Inherit Small! On the Analysis of Rescaled Mutations in (, gl)-ES with Noisy Fitness Data. In: Eiben AE, Back T, Schoenauer M, Schwefel H.-P., eds. Parallel Problem Solving from Nature. Berlin: Springer; 1998:159–168.

[8] Beyer H-G. Computer Methods in Mechanics and Applied Engineering. Evolutionary Algorithms in Noisy Environments: Theoretical Issues and Guidelines for Practice. 2000;186:239–267.

[9] Beyer H-G. The Theory of Evolution Strategies. Heidelberg: Springer; 2000.

[10] Fitzpatrick JM, Grefenstette JJ. Genetic Algorithms in Noisy Environments. In: Langley P, ed. Machine Learning. Dordrecht: Kluwer; 1988:101–120.

[11] Hammel U, Back T. Evolution Strategies on Noisy Functions. How to Improve Convergence Properties. In: Davidor Y, Manner R, Schwefel H.-P., eds. Parallel Problem Solving from Nature. Heidelberg: Springer; 1994:159–168.

[12] Miller BL, Goldberg DE. Genetic Algorithms, Selection Schemes, and the Varying Effects of Noise. Evolutionary Computation. 1997;4(2):113–131.

[13] Nissen V, Propach J. Optimization with Noisy Function Evaluations. In: Eiben AE, Back T, Schoenauer M, Schwefel H.-P., eds. Parallel Problem Solving from Nature. Heidelberg: Springer; 1998:34–43.

[14] Oyman AI, Beyer H-G, Schwefel H-P. Where Elitists Start Limping: Evolution Strategies at Ridge Functions. In: Eiben AE, Back T, Schoenauer M, Schwefel H.-P., eds. Parallel Problem Solving from Nature. Heidelberg: Springer; 1998:34–43.

[15] Rattray M, Shapiro J. Noisy Fitness Evaluation in Genetic Algorithms and the Dynamics of Learning. In: Belew RK, Vose MD, eds. Foundations of Genetic Algorithms. San Mateo, CA: Morgan Kaufmann; 1997.

[16] Rechenberg I. Evolutionsstrategie: Optimierung Technischer Systeme nach den Prinzipien der biologischen Evolution. Stuttgart: Frommann-Holzboog; 1973.

[17] Rechenberg I. Evolutionsstrategie '94. Stuttgart: Frommann-Holzboog; 1994.

[18] Rudolph G. Convergence Properties of Evolutionary Algorithms. Hamburg: Kovac; 1997.

[19] Schwefel H-P. Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrategie. Basel: Birkhauser; 1977.

[20] Stuart A, Ord JK. Kendall's Advanced Theory of Statistics. Distribution Theory. Sixth edition London: Arnold; 1994.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.178.53