Mutation-Selection Algorithm: a Large Deviation Approach

Paul Albuquerque; Christian Mazza    Dept. of Computer Science, University of Geneva, 24 rue Général–Dufour, CH-1211 Geneva 4, Switzerland
Laboratoire de Probabilités, Université Claude Bernard Lyon-I, 43 Bd du 11–Novembre-1918, 69622 Villeurbanne Cedex, France

Abstract

We consider a two-operator mutation–selection algorithm designed to optimize a fitness function on the space of fixed length binary strings. Mutation acts as in classical genetic algorithms, while the fitness-based selection operates through a Gibbs measure (Boltzmann selection). The selective pressure is controlled by a temperature parameter. We provide a mathematical analysis of the convergence of the algorithm, based on the probabilistic theory of large deviations. In particular, we obtain convergence to optimum fitness by resorting to an annealing process, which makes the algorithm asymptotically equivalent to simulated annealing.

1 INTRODUCTION

Genetic algorithms (GAs) are stochastic optimization algorithms designed to solve hard, typically NP-complete, problems (Goldberg, 1989), (Bäck, 1996), (Vose, 1999). Introduced by Holland (Holland, 1975), these algorithms mimick the genetic mechanisms of natural evolution. An initial random population of potential solutions is evolved by applying genetically inspired operators: mutation, crossover and selection. With time, “better” solutions emerge in the population. The quality of a solution is evaluated in terms of a fitness function. The original optimization problem now translates into finding a global optimum of this function. Note that in general, the convergence of a GA to an optimal solution is not guaranteed. Only few rigorous mathematical results ensuring convergence of GAs are available.

For the past ten years, increasing efforts have been put into providing rigorous mathematical analyses of GAs (Rawlins, 1991), (Whitley, 1993), (Whitley, 1995), (Belew, 1997), (Banzhaf and Reeves, 1999). Towards this end, GAs have been modeled with Markov chains (Nix and Vose, 1992), (Rudolph, 1994). Application of Markov chain techniques has proved very successful in the study of simulated annealing (SA). This approach has produced an extensive mathematical literature describing the dynamics and investigating convergence properties of SA (Aarts and Laarhoven, 1987), (Aarts and Korst, 1988), (Hajek, 1988), (Catoni, 1992), (Deuschel and Mazza, 1994). It was therefore natural to try to carry over SA formalism to GAs. This was to our knowledge initiated by Goldberg (Goldberg, 1990), who borrowed the notions of thermal equilibrium and Boltzmann distribution from SA and adapted them to GA practice. A theoretical basis was later elaborated by Davis for simple GAs (Davis and Principe, 1991). His approach was further developed by Suzuki and led to a convergence result (Suzuki, 1997).

We believe the first mathematically well-founded convergence results for GAs were obtained by Cerf (Cerf, 1996a), (Cerf, 1996b), (Cerf, 1998), who constructed an asymptotic theory for the simple GA comparable in scope to that of SA. The asymptotic dynamics was investigated using the powerful tools developed by Freidlin and Wentzell (Freidlin and Wentzell, 1984) for the study of random perturbations of dynamical systems. Cerf’s pioneering work takes place in the wider context of generalized simulated annealing, which was defined by Trouvé (Trouvé, 1992a), (Trouvé, 1992b), extending results of Catoni for SA (Catoni, 1992). The dynamics for simulations in various contexts, like statistical mechanics, image processing, neural computing and optimization, can be described in this setting.

Complementary to the asymptotic approach, novel work has been achieved by Rabinovich and Wigderson in providing an original mathematical analysis of a crossover–selection algorithm (Rabinovich and Wigderson, 1999). Both analyses shed some light on the behavior of GAs. Let us still quote a paper by François in which he proves convergence of an alternate mutation-selection algorithm (François, 1998), also within the framework of generalized simulated annealing.

In this contribution, we address the problem of optimizing a fitness function F:Ω>0si1_e on the space Ω of binary strings of length l (Ω is the l–dimensional hypercube). We apply to this problem a mutation-selection algorithm, which was introduced in a slightly different form by Davis (Davis and Principe, 1991) and extensively studied in greater generality by Cerf (Cerf, 1996a), (Cerf, 1996b) (the difference resides in the Boltzmann selection to which we add a noise component). We will show that this mutation–selection algorithm is asymptotically equivalent to SA. This emphasizes the importance of the crossover operator for GAs. Our treatment also takes place within the framework defined by the theory of Freidlin–Wentzell for random perturbation of dynamical systems. The main mathematical object consists of irreducible Markov kernels with exponentially vanishing coefficients.

The paper is organized as follows. In section 2 we describe the mutation–selection algorithm and state some convergence results. The proofs of these results are sketched in section 3 where we perform a large deviation analysis of the algorithm. The algorithm is run in three different ways depending on how we let the temperature for the Boltzmann selection and the mutation probability go to zero. We finally draw some conclusions in section 4.

2 MUTATION–SELECTION ALGORITHM

We now describe a two–operator mutation-selection algorithm on the search space Ωp of populations consisting of p individuals (Ω = {0,1}l).

Mutation acts as in classical GAs. Each bit of an individual in Ω independently flips with probability 0 < τ < 1. At the population level, all individuals mutate independently of each other. Mutation is fitness–independent and operates as a blind search over Ω.p.

We consider a modified version of the selection procedure of classical GAs. We begin by adding some noise g(ξ,.) to log(F(ξ)) for technical reasons. It helps lift the degeneracy over the global maxima set of F. The real-valued random variables g(ξ,.), indexed by ξ eps Ω, are defined on a sample space I (e.g. a subinterval of r1). They are independent identically distributed (i.i.d.) with mean zero and satisfy

|g(ξ,ω)|<12min{|log(F(ξ1)F(ξ2))|:F(ξ1)F(ξ2),ξ1,ξ2Ω},ωI.

si2_e  (1)

Hence the random variables f(ξ,.) = log(F(ξ)) + g(ξ,.) have mean log(F(ξ)) and the function f(.,ω) has the same optima set as F by assumption (1), but a unique global maximum for almost every sample point ω eps I.

Fix a point ω in the sample space I. Given a population x = (x1, …, xp) eps ΩP of size p, individual xi is selected under a Gibbs distribution (Boltzmann selection) with probability

exp(βf(xi,ω))pj=1exp(βf(xj,ω)),

si3_e  (2)

from population x. The parameter β ≥ 0 corresponds to an inverse temperature as in simulated annealing and controls the selective pressure. Note that if we remove the noise and set β = 1, the above selection procedure reduces to classical GA fitness-proportional selection.

The algorithm is run only after the fitness function has been perturbed by the noise component. For any given sample point ω and τ, β fixed, we get an irreducible Markov chain on the search space Ωp by successively applying mutation and selection. Denote by μτ, βω its stationary probability distribution and by μτ, β the probability distribution obtained after averaging the μτ, βω ' s over the sample space.

Before stating results, we introduce some notations and terminology. We define the set of uniform populations

Ω=={(x1,,xp)Ωp:x1==xp}

si4_e  (3)

and the set of populations consisting only of maximal fitness individuals

Fmax={(x1,,xp)Ωp:F(xi)=maxξΩF(ξ)}.

si5_e  (4)

We also recall that the support of a probability distribution on Ωp consists of all populations having positive probability.

Each theorem stated below corresponds to a specific way of running the mutation-selection algorithm.

Theorem 1 Let β > 0 be fixed. Then, as τ → 0, the probability distribution μτ, β converges to a probability distribution μ0, β with support(μ0,β) = Ω=. Moreover, the limit probability distribution limβ→∞ μ0, β concentrates on Ω=Fmax.

The first assertion in theorem 1 was already obtained by Davis (Davis and Principe, 1991) and the second by Suzuki (Suzuki, 1997) by directly analyzing the transition probabilities of the Markov chain. However their algorithm did not include the added noise component. We give a different proof in the next section.

Theorem 1 implies that, for β, τ both fixed and τ ≈ 0, the mutation-selection algorithm concentrates on a neighboorhood of Ω= in Ωp. Hence a run has a positive probability of ending on any population in Ω=. The hope remains that the stationary probability distribution has a peak on Ω=Fmax. This is actually the case, since the probability distribution limβ→∞ concentrates on Ω=Fmax. The latter statement can be obtained as a consequence of theorem 3.

Notice that GAs are usually run with τ 0. We believe that the crossover operator improves the convergence speed, but probably not the shape of the stationary probability distribution.

Theorem 2 Let 0 < τ < 1 be fixed. Then, as β → ∞, the probability distribution μτ, β converges to a probability distribution μτ,∞ with support(μτ,∞).= Ω=. Moreover, the limit probability distribution limτ→0 μτ,∞ concentrates on Ω=.

Theorem 2 shows that, in terms of probability distribution support, increasing the selective pressure is equivalent to diminishing the mutation probability. However, limτ→0 μτ, remains concentrated on Ω=.

Consequently, it is a natural idea to link the mutation probability τ to the inverse temperature β. The algorithm becomes a simulated annealing process: the intensity of mutation is decreased, while selection becomes stronger. This actually ensures convergence of the algorithm to an optimal solution.

Theorem 3 Let τ = τ (eps, κ, β) = eps exp(–κβ) with 0 < eps < 1 and κ > 0. Then, for κ large enough, the probability distribution μτ, β converges, as β → ∞, to the uniform probability distribution over Ω=Fmax. Asymptotically, the algorithm behaves like simulated annealing on Ω= with energy function –p log F.

Notice that the initial mutation probability eps does not influence the convergence.

The first assertion in theorem 3 was obtained by Cerf (Cerf, 1996a), (Cerf, 1996b), in a much more general setting, but again with a mutation-selection algorithm not including the added noise component. However, we hope that our proof, presented below in the simple case of binary strings, is more intuitive and easier to grasp. Maybe will it illustrate the importance of Cerf’s work and the richness of the Freidlin–Wentzell theory.

3 LARGE DEVIATION ANALYSIS

In analogy with the original treatment of simulated annealing, we prefer to deal with Uω = –f(.,ω) the energy function. The optimization problem now amounts to finding the global minima set of which can be thought as the set of fundamental states of the energy function Uω. For almost every ω, has a unique fundamental state.

Denote by ρ(.,.) the Hamming distance on Ω and set

d(x,y)=pi=1ρ(xi,yi),

si6_e

with x = (x1,…, xp) and y = (y1,…, yp) populations in Ωp; d(.,.) is a metric on Ωp.

Let be the transition matrix for the mutation process on Ωp. The probability that a population x eps Ωp is transformed into y eps Ωp by mutation is

Mτ(x,y)=τd(x,y)(1τ)ipd(x,y).

si7_e  (5)

We define the partial order relation image001 on Ωp by:

xyxi{y1,,yp},i{1,,p}.

si8_e

In words, x image001 y if and only if all individuals in population x belong to population y.

Let Sβω be the transition matrix for the selection process on Ωp. The probability that a population x ∈ Ωp is transformed into y ∈ Ωp by selection (see (2)) is given by

Sωβ(x,y)={exp(βpi=1Uω(yi))(pi=1exp(βUω(xi)))pifxy,0ifxy.

si9_e  (6)

The transition matrix of the Markov chain corresponding to our mutation-selection algorithm is SωβMτsi10_e. From eqs. (5) and (6), we compute the transition probabilities

SωβMτ(x,y)=zyMτ(x,z)Sωβ(z,y)=zyτd(x,z)(1τ)lpd(x,z)exp(βpi=1Uω(yi))(pi=1exp(βUω(zi)))p.

si11_e  (7)

The mutation and selection processes are simple to treat on their own. However, their combined effect proves to be more complicated. A way of dealing with this increase in complexity is to consider these processes as asymptotically vanishing perturbations of a simple random process. We study three cases. In the first, mutation acts as the perturbation, while selection plays this role in the second. In the third case, the combination of mutation and selection acts as a perturbation of a simple selection scheme, namely equiprobable selection among the best individuals in the current population.

We will now compute three different communication cost functions corresponding to various ways of running the mutation-selection algorithm. The communication cost reflects the asymptotic difficulty for passing from one population to another under the considered random process.

Write τ = τ(α) = eα with α > 0. Asymptotically, for β fixed and α → ∞, eq. (7) yields

limα1αlog(SωβMτ(α)(x,y))=minzyd(x,z).

si12_e

Henceforth, we will use the asymptotic notation

SωβMτ(α)(x,y)exp(αminzyd(x,z)).

si13_e

The communication cost for asymptotically vanishing mutation is given by

VM(xy)=minzyd(x,y).

si14_e  (8)

Define the total energy function uω: Ωp → R by

uω(y)=pi=1Uω(yi),

si15_e

and notice that minuimage001 z uω(u) = p min1 ≤i≤p Uω (zi). Asymptotically, for τ fixed and β → ∞, eq. (7) yields

SωβMτ(x,y)exp(βminzy(uω(y)minuzuω(v))).

si16_e

The associated communication cost is given by

VS,ω(xy)=minzy(uω(y)minvzuω(v)).

si17_e  (9)

Next, let τ = τ(∈, κ, β) = ∈ exp(-κβ) with 0 < ∈ < 1 and κ > 0. Asymptotically, for eps, κ fixed and β → ∞, eq. (7) yields

SωβMτ(,k,β)(x,y)exp(βminzy(kd(x,y)+uω(y)minuzuω(v))).

si18_e

The associated communication cost is given by

VMS,ωk(xy)=minzy(kd(x,z)+uω(y)minuzuω(v)).

si19_e  (10)

3.1 KEY LEMMATA

The lemmata we state below are crucial for the proofs of theorem 1, 2 and 3. We comment on these lemmata in the appendix.

We consider a family of Markov chains on a finite set S, indexed by a real parameter α > 0, with irreducible transition matrix {(x, y)}x, y eps S satisfying

qα(x,y)exp(αV(xy)),x,yS,

si20_e

where 0 ≤ V(xy) ≤ ∞ is the communication cost function.

Let μα be the stationary probability distribution on S associated to {(x, y)}x, yepsS and define μ=limαμαsi21_e the limit probability distribution on S.

A function ψ: S → R is a potential for the communication cost function V(xy), if V(xy) – V(yx) = ψ(y) – ψ (x) for all x,y eps S such that V(xy) < ∞.

The following two results are derived from the so-called Freidlin–Wentzell theory (Freidlin and Wentzell, 1984).

Lemma 4 Assume there is a potential ψ: S → R for the communication cost function V(xy). Then

support(μ)={xS:ψ(x)=minySψ(y)}.

si22_e

Lemma 5 Let S be a subset of S with the properties:

1. for any x ∈ S+ = S S, there exists yS such that V(xy) = 0.

2. for any yS, we have V (xy) > 0 for all xS+.

Then the dynamics can be restricted from S onto S.

The sets S+ and S can be respectively thought as repulsive and attractive sets for the dynamics. The dynamics can be restricted from S onto S, because in the asymptotic regime the cost for entering S is zero, while the cost for leaving S+ is positive.

3.2 ASYMPTOTICALLY VANISHING MUTATION

In this subsection, we sketch the proof of theorem 1.

Hereafter, we will write (ξ) in place of (ξ,…, ξ) ∈ Ω=

If x ∈ Ωp Ω=, there exists (ξ) ∈ Ω= such that VM(x → (ξ)) = 0. This is readily seen by taking ξ = xi, for any i ∈ {1,…,p}, and computing VM(x → (ξ)) from (8). However, for (ξ) ∈ Ω=, the communication cost VM((ξ) → x) > 0 for all x ∈ Ωp Ω=.

Applying lemma 5 to S = Ωp and S = Ω= for the communication cost function VM(xy), we can restrict the dynamics from Ωp onto Ω=.

It is easy to check that VM((η) → (ξ)) = VM((ξ) → (η)) = ρ(η, ξ) Thus, on Ω= the cost function is symmetric and is given by the Hamming distance. Therefore,

0=VM((η)(ξ))VM((ξ)(η)),

si23_e

so the function ψ ≡ 0 is a potential for VM ((η) → (ξ)) on Ω=. Using lemma 4, we conclude that support (μω0,β)=Ω=si24_e for all ω. Finally, theorem 1 follows by averaging out the probability distributions μ0,βω over ω

3.3 INCREASING THE SELECTIVE PRESSURE

We now take care of theorem 2.

Define the set

ΩUω={(x1,,xp)Ωp:Uω(x1)==Uω(xp)}

si25_e  (11)

of equi-fitness populations.

Since VS, (xy) = minzimage002y((y) – minvimage001 z uω (v)) (see (9)) only depends on y, the function ψ(y) = VS, ω(x → y) is itself a potential on Ωp.

Let y eps ΩUω. Then, taking z = y, we have

0ψ(y)uω(y)minvyuω(v)=0.

si26_e  (12)

Thus ψ(y) = 0 for all y eps ΩUω.

If y ∈ Ωp ΩUω, there exists ij ∈ {1,…,p} such that (yi) ≠ (yj) by definition of ΩUω in (11). Hence, for any z image002 y,

uω(y)minvzuω(v)uω(y)minvyuω(v)>0,

si27_e  (13)

because {u image001 y} ⊂ {v image001 z} by transitivity of image001. Therefore ψ(y) > 0 for all y ∈ Ωp ΩUω Lemma 4 implies that support (μωτ,)=ΩUωsi28_e. Notice that the probability that Ω Uω Ω= ≠ ∅is zero, because the noise component (see (1)) removes the degeneracy from the fitness function and hence, for almost every ω, level sets of contain at most one element. We get the statement of theorem 2 by averaging out the probability distributions μωτ,si29_e over ω.

3.4 AN ANNEALING PROCESS

We go on to sketch the proof of theorem 3.

We begin by defining

Δ=maxξΩUω(ξ)minξΩUω(ξ)

si30_e  (14)

the energy barrier. Until the end of this subsection, we assume that the exponential decrease rate κ of the mutation probability, is greater than p∆.

Let x image002 y. Taking z = x, we get

κd(x,z)+Uω(y)minvzUω(v)=Uω(y)minvxUω(v)

si31_e  (15)

For zx, d(x, z)≥1 and therefore,

κd(x,z)+Uω(y)minvzUω(v)=Uω(y)minvxUω(v)

si32_e  (16)

Consequently, eqs. (15) and (16) above imply that, in (10), the minimum over all z image002 y is realized by z = x. Comparing with (9), we get

VMS,ωκ(xy)=VS,ω(xy)

si33_e  (17)

for y image001 x.

Let x ∈ ΩP ΩUω. There exists y ∈ ΩUω, y image001 x, such that VMS,ωκ(xy)=0si34_e Just recall that VS,ω(xy)=0si35_e for any y ∈ ΩUω (see eq. (12)).

However, for x ∈ ΩUω, we have VMS,ωκ(xy)=0si36_e for all y ∈ ΩP Ω This follows from eqs. (13) and (17).

Applying lemma 5 to S = ΩP and S = Ω for the communication cost function VMS,ωκ(xy)=0si37_e we can restrict the dynamics from ΩP onto Ω Since the probability that ΩUω Ω= ≠ ∅ is zero, we will assume that Ω= = ΩUω

Let (ξ),(η) ∈ Ω= with ξη. Naturally (ξ) image03.png (η) and (η) image03.png (ξ) Now let zsi38_e = (ξ,…, ξ, η, ξ,…, ξ) Then, if z image001 (η) is not of the form zsi39_e,

κd((ξ),z)κ(ρ(ξ,η)+1)>κd((ξ),ˆz)+Uω((η))minvˆzUω(v),

si40_e

where we used the assumption κ > p∆ and eq. (14). Hence,

VMS,ωκ((ξ)(η))=κd((ξ),ˆz)+Uω((η))minvˆzUω(v)=κρ(ξ,η)+pUω(η)pmin1ipUω(ˆzi)=κρ(ξ,η)+pUω(η)pmin(Uω(ξ),Uω(η))

si41_e

and thus

VMS,ωκ((ξ)(η))=κρ(ξ,η)+p(Uω(η)Uω(ξ))+

si42_e

Therefore,

Uω((ξ))Uω((η))=p(Uω(ξ)Uω(η))=VMS,ωκ((η)(ξ))VMS,ωκ((ξ)(η))

si43_e

implying that the energy function pUω is a potential on Ω=

For almost every ω, Ω= = ΩUω and thus has a unique minimum. As a consequence of lemma 4, the probability distribution μτ(ε,κ, β),βω converges for almost every ω, as β→∞, to a point mass on the unique minimum of . The latter is a maximum of the fitness function. Finally, since the random variables g(ξ,.)si44_e’s are i.i.d., the minimum of can be equiprobably any maximum of the fitness function. Therefore, averaging over w yields a uniform probability distribution over the set of global maxima of the fitness function (seen as a subset of Ω=.

4 CONCLUSION

The mutation-selection algorithm considered in this work helps us improve our understanding of GAs. It differs slightly from the algorithm considered by Davis (Davis and Principe, 1991) and Cerf (Cerf, 1996a), (Cerf, 1996b), because we added a noise component in the Boltzmann selection. This difference is however only technical. We studied this two-operator algorithm by resorting to the theory of Freidlin–Wentzell and by making use of the approach developed in (Deuschel and Mazza, 1994).

The first result (theorem 1) explains the behavior of the algorithm for small mutation probability τ ≈ 0. Under this condition, the algorithm is forced to narrow its search to a neighborhood in ΩP of the set of uniform population Ω= (defined in (3)). The latter set is 1–dimensional and corresponds to the diagonal of the search space ΩP. GAs are usually run with τ ≈ 0. Setting β = 1 and removing the noise component, our selection procedure (see (2)) reduces to classical GA fitness-proportional selection. Hence, we can infer how the combined effect of a low mutation probability and a comparatively strong selection, as in classical GAs, removes diversity from the populations in the long–run. The second result (theorem 2) shows that the effect of increasing the selective pressure is similar, in terms of restricting the search, to lowering the mutation probability.

Since gradually decreasing τ to zero and then letting β tend to infinity, actually constrains the search to the set of maximal fitness solutions in Ω=, it was natural to link mutation probability τ and selective pressure β. Theorem 3 shows that, by exponentially decreasing τ with β, the algorithm behaves like simulated annealing on Ω= with energy function pU. This procedure ensures convergence to an optimal solution in the limit β→∞. Therefore, if we fix β large enough and choose τ accordingly (see theorem 3), the algorithm will end up, in the long–run, in a neighborhood of the set of maximal fitness solutions in i2 =. Of course, the convergence performances of the algorithm will be no better than those of simulated annealing. In practical situations, the annealed version of the mutation-selection algorithm should be implemented using a cooling schedule of the form β ~ log n, where n is the generation number, as it is the case for simulated annealing. Of course, parameters like the initial temperature and initial mutation probability eps as well as the exponential rate κ for τ must be set empirically. Prior knowledge about the energy barrier ∆ (see (14)) would yield the estimate κp∆, where p is the population size.

The most interesting question arises in the comparison of the mutation–selection algorithm with classical GAs. Crossover is the distinctive feature between this algorithm and GAs. A generation in a GA is the succession of mutation, crossover and selection. From the considerations of the previous paragraph, we understand more clearly the role of crossover. The latter operator is expected to speed up the search, provided it is not fitness–decreasing on average.

More precisely, the search proceeds in successive stages. First it progresses towards the set Ω=. On Ω=, it converges towards the optimal set Ω=Fmax (see (4)). For GAs, crossover is expected to speed up the preliminary convergence stage towards Ω=, if on average the mean fitness of the offsprings is not smaller than for the parents. However, populations in Ω= are invariant under its action. Therefore, on Ω=, the GA performances are similar to those of simulated annealing.

Finally, we did not study the inhomogeneous Markov chain. It would be interesting to investigate finite time convergence and convergence rates.

APPENDIX

In this appendix, we introduce some notions from the theory of Freidlin–Wentzell (Freidlin and Wentzell, 1984).

As in subsection 3.1, consider a family of Markov chains on a finite set S, indexed by a real parameter α > 0, with irreducible transition matrix { (x,y)}x,y eps s satisfying

qα(x,y)exp(αV(xy)),x,yS,

si45_e

where 0 ≤ V (xy) ≤ ∞ is the communication cost function. Let μα denote the stationary probability distribution on S associated to {qα(x, y)}x, y eps S and μ= limα→∞ μα

Let us consider the elements of S as the vertices of the complete oriented graph G on |S| vertices. A spanning tree of G pointing at x eps S is a subtree T of G such that

1. S is the vertex set of T,

2. for every y eps S, there is a directed path from y to x.

Denote by G(x) the set of all spanning trees of G pointing at xS and define

V(γ)=(yz)γV(yz),forγG(x),V(x)=minγG(x)V(γ),Vmin=minxSV(x)

si46_e

The stationary probability distribution μα can be estimated in terms of the spanning trees defined above. This is formulated in the result of Freidlin and Wentzell (Freidlin and Wentzell, 1984, lemma 3.1, p.177) stated below.

Lemma 6 We have the following representation formula for μα:

μα(x)Qα(x)xSQα(x),xS,

si47_e

whereQα(x)=γG(x)(yz)γqα(y,z).

si48_e

Moreover,

μα(x)exp(α(V(x)Vmin)).

si49_e  (18)

As a corollary to this lemma, we get

support(μ)={xS:V(x)=Vmin

si50_e  (19)

from (18).

The next lemma is lemma 4 of subsection 3.1 and can be found in (Deuschel and Mazza, 1994). We reproduce the proof here for the sake of clarity.

Lemma 7 Assume there is a potential ψ: S→ r1 for the communication cost function V(x —> y). Then

support(μ)={xS:ψ(x)=minySψ(y)}

si51_e

Proof: Let x,yS.For an arbitrary spanning tree γ ∈ G(x), let gyxsi52_e be the unique geodesic of γ starting at y and ending at x. Define a mapping R: G(x) → G(y) by R(γ) = (γ gyxsi53_e) ∪ gxysi54_e, where gxysi55_e is just the reversed geodesic from x to y. This mapping R is well–defined and onto. Moreover,

V(R(t))=V(γ)V(gyx)+V(gxy)=V(γ)+(ee+)gxy(V(ee+)V(e+e))=V(γ)+(ee+)gxy(ψ(e+)ψ(e))

si56_e

which implies V(R(γ)) = V(γ) + ψ(y) – ψ (x) for any γ ∈ G(x) It follows that

V(y)=minˉγG(y)V(ˉγ)=minγG(x)V(R(γ))=minγG(x)(V(γ)+ψ(y)ψ(x))=V(x)+ψ(y)ψ(x)

si57_e  (20)

Thus V(y) − V(x) = ψ (y) − ψ (x)for any x, y eps S so there is a constant C such that V(x) = ψ(x) + C for any xS.

Notice from (20) that, if the cost function V(xy) admits a potential, then the function V(x) is also a potential for V(xy). Two potentials differ by a constant.

The lemma below is lemma 5 of section 3.1.

Lemma 8 Let S be a subset of S with the properties:

1. for any xS+ = S S, there exists yS such that V(x → y) = 0.

2. for any yS, we have V(x → y) > 0 for all xS+.

Then the dynamics can be restricted from S onto S.

Proof: It follows from (Freidlin and Wentzell, 1984, lemmata 4.1–3, pp.185–189) that

1. xS, V(x) is computable from graphs over S,

2. yS+,V(y)=minxS(V(x)+ˉV(xy)),si58_e

where the path communication cost ˉV(xy)si59_e is defined as

ˉV(xy)=|Ω|Pk=2minz2,,zk1(ki=2V(zi1zi)),

si60_e

with x = z1 and y = zk.

Since by assumption V(xy) > 0 for any x eps S and y eps S+, assertions 1 and 2 above justify the restriction of the dynamics to S.

Acknowledgements

This work is supported by the Swiss National Science Foundation and the Région Rhône–Alpes.

References

Aarts E, Korst J. Simulated annealing and Boltzmann machines. New-York: John Wiley and Sons; 1988.

Aarts E, Laarhoven PV. Simulated annealing: theory and applications. Kluwer Academic; 1987.

Banzhaf W, Reeves C, eds. Foundations of Genetic Algorithms-5. San Francisco, CA: Morgan Kaufmann; 1999.

Bäck T. Evolutionary Algorithms in Theory and Practice. Oxford University Press; 1996.

Belew R, ed. Foundations of Genetic Algorithms-4. San Francisco, CA: Morgan Kaufmann; 1997.

Catoni O. Rough large deviations estimates for simulated annealing, application to exponential schedules. Annals of Probability. 1992;20(3):1109–1146.

Cerf R. An asymptotic theory of genetic algorithms. In: Alliot J.-M., Lutton E, Ronald E, Schoenauer M, Snyers D, eds. Artificial Evolution, volume 1063 of Lecture Notes in Computer Science. Heidelberg: Springer-Verlag; 1996a:37–53.

Cerf R. The dynamics of mutation-selection algorithms with large population sizes. Annales de l'Institut Henri Poincaré. 1996b;32(4):455–508.

Cerf R. Asymptotic convergence of genetic algorithms. Advances in Applied Probability. 1998;30(2):521–550.

Davis T, Principe JC. A simulated annealing like convergence theory for the simple genetic algorithm. In: Belew R, Bookers L, eds. Proc. of the Fourth International Conference on Genetic Algorithm. San Mateo, CA: Morgan Kaufmann; 1991:174–181.

Deuschel J-D, Mazza C. L2 convergence of time nonhomogeneous Markov processes: I. spectral estimates. Annals of Applied Probability. 1994;4(4):1012–1056.

François O. An evolutionary strategy for global minimization and its Markov chain analysis. IEEE Transactions on Evolutionary Computation. 1998;2(3):77–91.

Freidlin M, Wentzell A. Random perturbations of dynamical systems. New-York: Springer-Verlag; 1984.

Goldberg DE. Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley; 1989.

Goldberg DE. A note on Boltzmann tournament selection for genetic algorithms and population-oriented simulated annealing. Complex Systems. 1990;4:445–460.

Hajek B. Cooling schedules for optimal annealing. Math. Oper. Res. 1988;13:311–329.

Holland J. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The University of Michigan Press; 1975.

Nix A, Vose M. Modeling genetic algorithms with Markov chains. Ann. Math. Art. Intell. 1992;5(1):79–88.

Rabinovich Y, Wigderson A. Techniques for bounding the rate of convergence of genetic algorithms. Random Structures and Algorithms. 1999;14:111–138.

Rawlins GJE, ed. Foundations of Genetic Algorithms-1. San Mateo, CA: Morgan Kaufmann; 1991.

Rudolph G. Convergence analysis of canonical genetic algorithms. IEEE Trans. on Neural Networks, special issue on Evolutionary Computation. 1994;5(1):96–101.

Suzuki J. A further result on the Markov chain model of genetic algorithms and its application to a simulated annealing-like strategy. In: Belew RK, Vose MD, eds. Foundations of Genetic Algorithms-4. Morgan Kaufmann; 1997:53–72.

Trouvé A. Massive parallelization of simulated annealing: a mathematical study. In: Azencott R, ed. Simulated Annealing: Parallelization Techniques. New-York: Wiley and Sons; 1992a.

Trouvé A. Optimal convergence rate for generalized simulated annealing. C.R. Acad. Sci. Paris, Serie 1. 1992b;315:1197–1202.

Vose MD. In: The Simple Genetic Algorithm: Foundations and Theory. Complex Adaptative Systems. Bradford Books; 1999.

Whitley D, ed. Foundations of Genetic Algorithms-2. San Mateo, CA: Morgan Kaufmann; 1993.

Whitley D, ed. Foundations of Genetic Algorithms-3. San Francisco, CA: Morgan Kaufmann; 1995.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.34.31