Paul Albuquerque; Christian Mazza Dept. of Computer Science, University of Geneva, 24 rue Général–Dufour, CH-1211 Geneva 4, Switzerland
Laboratoire de Probabilités, Université Claude Bernard Lyon-I, 43 Bd du 11–Novembre-1918, 69622 Villeurbanne Cedex, France
We consider a two-operator mutation–selection algorithm designed to optimize a fitness function on the space of fixed length binary strings. Mutation acts as in classical genetic algorithms, while the fitness-based selection operates through a Gibbs measure (Boltzmann selection). The selective pressure is controlled by a temperature parameter. We provide a mathematical analysis of the convergence of the algorithm, based on the probabilistic theory of large deviations. In particular, we obtain convergence to optimum fitness by resorting to an annealing process, which makes the algorithm asymptotically equivalent to simulated annealing.
Genetic algorithms (GAs) are stochastic optimization algorithms designed to solve hard, typically NP-complete, problems (Goldberg, 1989), (Bäck, 1996), (Vose, 1999). Introduced by Holland (Holland, 1975), these algorithms mimick the genetic mechanisms of natural evolution. An initial random population of potential solutions is evolved by applying genetically inspired operators: mutation, crossover and selection. With time, “better” solutions emerge in the population. The quality of a solution is evaluated in terms of a fitness function. The original optimization problem now translates into finding a global optimum of this function. Note that in general, the convergence of a GA to an optimal solution is not guaranteed. Only few rigorous mathematical results ensuring convergence of GAs are available.
For the past ten years, increasing efforts have been put into providing rigorous mathematical analyses of GAs (Rawlins, 1991), (Whitley, 1993), (Whitley, 1995), (Belew, 1997), (Banzhaf and Reeves, 1999). Towards this end, GAs have been modeled with Markov chains (Nix and Vose, 1992), (Rudolph, 1994). Application of Markov chain techniques has proved very successful in the study of simulated annealing (SA). This approach has produced an extensive mathematical literature describing the dynamics and investigating convergence properties of SA (Aarts and Laarhoven, 1987), (Aarts and Korst, 1988), (Hajek, 1988), (Catoni, 1992), (Deuschel and Mazza, 1994). It was therefore natural to try to carry over SA formalism to GAs. This was to our knowledge initiated by Goldberg (Goldberg, 1990), who borrowed the notions of thermal equilibrium and Boltzmann distribution from SA and adapted them to GA practice. A theoretical basis was later elaborated by Davis for simple GAs (Davis and Principe, 1991). His approach was further developed by Suzuki and led to a convergence result (Suzuki, 1997).
We believe the first mathematically well-founded convergence results for GAs were obtained by Cerf (Cerf, 1996a), (Cerf, 1996b), (Cerf, 1998), who constructed an asymptotic theory for the simple GA comparable in scope to that of SA. The asymptotic dynamics was investigated using the powerful tools developed by Freidlin and Wentzell (Freidlin and Wentzell, 1984) for the study of random perturbations of dynamical systems. Cerf’s pioneering work takes place in the wider context of generalized simulated annealing, which was defined by Trouvé (Trouvé, 1992a), (Trouvé, 1992b), extending results of Catoni for SA (Catoni, 1992). The dynamics for simulations in various contexts, like statistical mechanics, image processing, neural computing and optimization, can be described in this setting.
Complementary to the asymptotic approach, novel work has been achieved by Rabinovich and Wigderson in providing an original mathematical analysis of a crossover–selection algorithm (Rabinovich and Wigderson, 1999). Both analyses shed some light on the behavior of GAs. Let us still quote a paper by François in which he proves convergence of an alternate mutation-selection algorithm (François, 1998), also within the framework of generalized simulated annealing.
In this contribution, we address the problem of optimizing a fitness function F:Ω→ℛ>0 on the space Ω of binary strings of length l (Ω is the l–dimensional hypercube). We apply to this problem a mutation-selection algorithm, which was introduced in a slightly different form by Davis (Davis and Principe, 1991) and extensively studied in greater generality by Cerf (Cerf, 1996a), (Cerf, 1996b) (the difference resides in the Boltzmann selection to which we add a noise component). We will show that this mutation–selection algorithm is asymptotically equivalent to SA. This emphasizes the importance of the crossover operator for GAs. Our treatment also takes place within the framework defined by the theory of Freidlin–Wentzell for random perturbation of dynamical systems. The main mathematical object consists of irreducible Markov kernels with exponentially vanishing coefficients.
The paper is organized as follows. In section 2 we describe the mutation–selection algorithm and state some convergence results. The proofs of these results are sketched in section 3 where we perform a large deviation analysis of the algorithm. The algorithm is run in three different ways depending on how we let the temperature for the Boltzmann selection and the mutation probability go to zero. We finally draw some conclusions in section 4.
We now describe a two–operator mutation-selection algorithm on the search space Ωp of populations consisting of p individuals (Ω = {0,1}l).
Mutation acts as in classical GAs. Each bit of an individual in Ω independently flips with probability 0 < τ < 1. At the population level, all individuals mutate independently of each other. Mutation is fitness–independent and operates as a blind search over Ω.p.
We consider a modified version of the selection procedure of classical GAs. We begin by adding some noise g(ξ,.) to log(F(ξ)) for technical reasons. It helps lift the degeneracy over the global maxima set of F. The real-valued random variables g(ξ,.), indexed by ξ Ω, are defined on a sample space I (e.g. a subinterval of ). They are independent identically distributed (i.i.d.) with mean zero and satisfy
|g(ξ,ω)|<12min{|log(F(ξ1)F(ξ2))|:F(ξ1)≠F(ξ2),ξ1,ξ2∈Ω},∀ω∈I.
Hence the random variables f(ξ,.) = log(F(ξ)) + g(ξ,.) have mean log(F(ξ)) and the function f(.,ω) has the same optima set as F by assumption (1), but a unique global maximum for almost every sample point ω I.
Fix a point ω in the sample space I. Given a population x = (x1, …, xp) ΩP of size p, individual xi is selected under a Gibbs distribution (Boltzmann selection) with probability
exp(βf(xi,ω))∑pj=1exp(βf(xj,ω)),
from population x. The parameter β ≥ 0 corresponds to an inverse temperature as in simulated annealing and controls the selective pressure. Note that if we remove the noise and set β = 1, the above selection procedure reduces to classical GA fitness-proportional selection.
The algorithm is run only after the fitness function has been perturbed by the noise component. For any given sample point ω and τ, β fixed, we get an irreducible Markov chain on the search space Ωp by successively applying mutation and selection. Denote by μτ, βω its stationary probability distribution and by μτ, β the probability distribution obtained after averaging the μτ, βω ' s over the sample space.
Before stating results, we introduce some notations and terminology. We define the set of uniform populations
Ω=={(x1,…,xp)∈Ωp:x1=⋅⋅⋅=xp}
and the set of populations consisting only of maximal fitness individuals
Fmax={(x1,…,xp)∈Ωp:F(xi)=maxξ∈ΩF(ξ)}.
We also recall that the support of a probability distribution on Ωp consists of all populations having positive probability.
Each theorem stated below corresponds to a specific way of running the mutation-selection algorithm.
Theorem 1 Let β > 0 be fixed. Then, as τ → 0, the probability distribution μτ, β converges to a probability distribution μ0, β with support(μ0,β) = Ω=. Moreover, the limit probability distribution limβ→∞ μ0, β concentrates on Ω= ∩ Fmax.
The first assertion in theorem 1 was already obtained by Davis (Davis and Principe, 1991) and the second by Suzuki (Suzuki, 1997) by directly analyzing the transition probabilities of the Markov chain. However their algorithm did not include the added noise component. We give a different proof in the next section.
Theorem 1 implies that, for β, τ both fixed and τ ≈ 0, the mutation-selection algorithm concentrates on a neighboorhood of Ω= in Ωp. Hence a run has a positive probability of ending on any population in Ω=. The hope remains that the stationary probability distribution has a peak on Ω= ∩ Fmax. This is actually the case, since the probability distribution limβ→∞ concentrates on Ω= ∩ Fmax. The latter statement can be obtained as a consequence of theorem 3.
Notice that GAs are usually run with τ ≈ 0. We believe that the crossover operator improves the convergence speed, but probably not the shape of the stationary probability distribution.
Theorem 2 Let 0 < τ < 1 be fixed. Then, as β → ∞, the probability distribution μτ, β converges to a probability distribution μτ,∞ with support(μτ,∞).= Ω=. Moreover, the limit probability distribution limτ→0 μτ,∞ concentrates on Ω=.
Theorem 2 shows that, in terms of probability distribution support, increasing the selective pressure is equivalent to diminishing the mutation probability. However, limτ→0 μτ,∞ remains concentrated on Ω=.
Consequently, it is a natural idea to link the mutation probability τ to the inverse temperature β. The algorithm becomes a simulated annealing process: the intensity of mutation is decreased, while selection becomes stronger. This actually ensures convergence of the algorithm to an optimal solution.
Theorem 3 Let τ = τ (, κ, β) = exp(–κβ) with 0 < < 1 and κ > 0. Then, for κ large enough, the probability distribution μτ, β converges, as β → ∞, to the uniform probability distribution over Ω= ∩ Fmax. Asymptotically, the algorithm behaves like simulated annealing on Ω= with energy function –p log F.
Notice that the initial mutation probability does not influence the convergence.
The first assertion in theorem 3 was obtained by Cerf (Cerf, 1996a), (Cerf, 1996b), in a much more general setting, but again with a mutation-selection algorithm not including the added noise component. However, we hope that our proof, presented below in the simple case of binary strings, is more intuitive and easier to grasp. Maybe will it illustrate the importance of Cerf’s work and the richness of the Freidlin–Wentzell theory.
In analogy with the original treatment of simulated annealing, we prefer to deal with Uω = –f(.,ω) the energy function. The optimization problem now amounts to finding the global minima set of Uω which can be thought as the set of fundamental states of the energy function Uω. For almost every ω, Uω has a unique fundamental state.
Denote by ρ(.,.) the Hamming distance on Ω and set
d(x,y)=p∑i=1ρ(xi,yi),
with x = (x1,…, xp) and y = (y1,…, yp) populations in Ωp; d(.,.) is a metric on Ωp.
Let Mτ be the transition matrix for the mutation process on Ωp. The probability that a population x Ωp is transformed into y Ωp by mutation is
Mτ(x,y)=τd(x,y)(1−τ)ip−d(x,y).
We define the partial order relation on Ωp by:
x≺y⇔xi∈{y1,…,yp},∀i∈{1,…,p}.
In words, x y if and only if all individuals in population x belong to population y.
Let Sβω be the transition matrix for the selection process on Ωp. The probability that a population x ∈ Ωp is transformed into y ∈ Ωp by selection (see (2)) is given by
Sωβ(x,y)={exp(−β∑pi=1Uω(yi))(∑pi=1exp(−βUω(xi)))pifx≻y,0ifx⊁y.
The transition matrix of the Markov chain corresponding to our mutation-selection algorithm is Sωβ∘Mτ. From eqs. (5) and (6), we compute the transition probabilities
Sωβ∘Mτ(x,y)=∑z≻yMτ(x,z)Sωβ(z,y)=∑z≻yτd(x,z)(1−τ)lp−d(x,z)exp(−β∑pi=1Uω(yi))(∑pi=1exp(−βUω(zi)))p.
The mutation and selection processes are simple to treat on their own. However, their combined effect proves to be more complicated. A way of dealing with this increase in complexity is to consider these processes as asymptotically vanishing perturbations of a simple random process. We study three cases. In the first, mutation acts as the perturbation, while selection plays this role in the second. In the third case, the combination of mutation and selection acts as a perturbation of a simple selection scheme, namely equiprobable selection among the best individuals in the current population.
We will now compute three different communication cost functions corresponding to various ways of running the mutation-selection algorithm. The communication cost reflects the asymptotic difficulty for passing from one population to another under the considered random process.
Write τ = τ(α) = e−α with α > 0. Asymptotically, for β fixed and α → ∞, eq. (7) yields
limα→∞−1αlog(Sωβ∘Mτ(α)(x,y))=minz≻yd(x,z).
Henceforth, we will use the asymptotic notation
Sωβ∘Mτ(α)(x,y)≍exp(−αminz≻yd(x,z)).
The communication cost for asymptotically vanishing mutation is given by
VM(x→y)=minz≻yd(x,y).
Define the total energy function uω: Ωp → R by
uω(y)=p∑i=1Uω(yi),
and notice that minu z uω(u) = p min1 ≤i≤p Uω (zi). Asymptotically, for τ fixed and β → ∞, eq. (7) yields
Sωβ∘Mτ(x,y)≍exp(−βminz≻y(uω(y)−minu≺zuω(v))).
The associated communication cost is given by
VS,ω(x→y)=minz≻y(uω(y)−minv≺zuω(v)).
Next, let τ = τ(∈, κ, β) = ∈ exp(-κβ) with 0 < ∈ < 1 and κ > 0. Asymptotically, for , κ fixed and β → ∞, eq. (7) yields
Sωβ∘Mτ(∈,k,β)(x,y)≍exp(−βminz≻y(kd(x,y)+uω(y)−minu≺zuω(v))).
The associated communication cost is given by
VMS,ωk(x→y)=minz≻y(kd(x,z)+uω(y)−minu≻zuω(v)).
The lemmata we state below are crucial for the proofs of theorem 1, 2 and 3. We comment on these lemmata in the appendix.
We consider a family of Markov chains on a finite set S, indexed by a real parameter α > 0, with irreducible transition matrix {qα(x, y)}x, y S satisfying
qα(x,y)≍exp(−αV(x→y)),x,y∈S,
where 0 ≤ V(x → y) ≤ ∞ is the communication cost function.
Let μα be the stationary probability distribution on S associated to {qα(x, y)}x, yS and define μ∞=limα→∞μα the limit probability distribution on S.
A function ψ: S → R is a potential for the communication cost function V(x → y), if V(x → y) – V(y → x) = ψ(y) – ψ (x) for all x,y S such that V(x → y) < ∞.
The following two results are derived from the so-called Freidlin–Wentzell theory (Freidlin and Wentzell, 1984).
Lemma 4 Assume there is a potential ψ: S → R for the communication cost function V(x → y). Then
support(μ∞)={x∈S:ψ(x)=miny∈Sψ(y)}.
Lemma 5 Let S– be a subset of S with the properties:
1. for any x ∈ S+ = S S–, there exists y ∈ S– such that V(x → y) = 0.
2. for any y ∈ S–, we have V (x → y) > 0 for all x ∈ S+.
Then the dynamics can be restricted from S onto S–.
The sets S+ and S– can be respectively thought as repulsive and attractive sets for the dynamics. The dynamics can be restricted from S onto S–, because in the asymptotic regime the cost for entering S– is zero, while the cost for leaving S+ is positive.
In this subsection, we sketch the proof of theorem 1.
Hereafter, we will write (ξ) in place of (ξ,…, ξ) ∈ Ω=
If x ∈ Ωp Ω=, there exists (ξ) ∈ Ω= such that VM(x → (ξ)) = 0. This is readily seen by taking ξ = xi, for any i ∈ {1,…,p}, and computing VM(x → (ξ)) from (8). However, for (ξ) ∈ Ω=, the communication cost VM((ξ) → x) > 0 for all x ∈ Ωp Ω=.
Applying lemma 5 to S = Ωp and S– = Ω= for the communication cost function VM(x→ y), we can restrict the dynamics from Ωp onto Ω=.
It is easy to check that VM((η) → (ξ)) = VM((ξ) → (η)) = ρ(η, ξ) Thus, on Ω= the cost function is symmetric and is given by the Hamming distance. Therefore,
0=VM((η)→(ξ))−VM((ξ)→(η)),
so the function ψ ≡ 0 is a potential for VM ((η) → (ξ)) on Ω=. Using lemma 4, we conclude that support (μω0,β)=Ω= for all ω. Finally, theorem 1 follows by averaging out the probability distributions μ0,βω over ω
We now take care of theorem 2.
Define the set
ΩUω={(x1,…,xp)∈Ωp:Uω(x1)=⋯=Uω(xp)}
of equi-fitness populations.
Since VS, (x → y) = minzy(uω(y) – minv z uω (v)) (see (9)) only depends on y, the function ψ(y) = VS, ω(x → y) is itself a potential on Ωp.
Let y ΩUω. Then, taking z = y, we have
0≤ψ(y)≤uω(y)−minv≺yuω(v)=0.
Thus ψ(y) = 0 for all y ΩUω.
If y ∈ Ωp ΩUω, there exists i ≠ j ∈ {1,…,p} such that uω(yi) ≠ Uω(yj) by definition of ΩUω in (11). Hence, for any z y,
uω(y)−minv≺zuω(v)≥uω(y)−minv≺yuω(v)>0,
because {u y} ⊂ {v z} by transitivity of . Therefore ψ(y) > 0 for all y ∈ Ωp ΩUω Lemma 4 implies that support (μωτ,∞)=ΩUω. Notice that the probability that Ω Uω Ω= ≠ ∅is zero, because the noise component (see (1)) removes the degeneracy from the fitness function and hence, for almost every ω, level sets of Uω contain at most one element. We get the statement of theorem 2 by averaging out the probability distributions μωτ,∞ over ω.
We go on to sketch the proof of theorem 3.
We begin by defining
Δ=maxξ∈ΩUω(ξ)−minξ∈ΩUω(ξ)
the energy barrier. Until the end of this subsection, we assume that the exponential decrease rate κ of the mutation probability, is greater than p∆.
Let x y. Taking z = x, we get
κd(x,z)+Uω(y)−minv≺zUω(v)=Uω(y)−minv≺xUω(v)
For z ≠ x, d(x, z)≥1 and therefore,
κd(x,z)+Uω(y)−minv≺zUω(v)=Uω(y)−minv≺xUω(v)
Consequently, eqs. (15) and (16) above imply that, in (10), the minimum over all z y is realized by z = x. Comparing with (9), we get
VMS,ωκ(x→y)=VS,ω(x→y)
for y x.
Let x ∈ ΩP ΩUω. There exists y ∈ ΩUω, y x, such that VMS,ωκ(x→y)=0 Just recall that VS,ω(x→y)=0 for any y ∈ ΩUω (see eq. (12)).
However, for x ∈ ΩUω, we have VMS,ωκ(x→y)=0 for all y ∈ ΩP ΩUω This follows from eqs. (13) and (17).
Applying lemma 5 to S = ΩP and S– = ΩUω for the communication cost function VMS,ωκ(x→y)=0 we can restrict the dynamics from ΩP onto ΩUω Since the probability that ΩUω Ω= ≠ ∅ is zero, we will assume that Ω= = ΩUω
Let (ξ),(η) ∈ Ω= with ξ ≠ η. Naturally (ξ) (η) and (η) (ξ) Now let ⌢z = (ξ,…, ξ, η, ξ,…, ξ) Then, if z (η) is not of the form ⌢z,
κd((ξ),z)≥κ(ρ(ξ,η)+1)>κd((ξ),ˆz)+Uω((η))−minv≺ˆzUω(v),
where we used the assumption κ > p∆ and eq. (14). Hence,
VMS,ωκ((ξ)→(η))=κd((ξ),ˆz)+Uω((η))−minv≺ˆzUω(v)=κρ(ξ,η)+pUω(η)−pmin1≤i≤pUω(ˆzi)=κρ(ξ,η)+pUω(η)−pmin(Uω(ξ),Uω(η))
and thus
VMS,ωκ((ξ)→(η))=κρ(ξ,η)+p(Uω(η)−Uω(ξ))+
Therefore,
Uω((ξ))−Uω((η))=p(Uω(ξ)−Uω(η))=VMS,ωκ((η)→(ξ))−VMS,ωκ((ξ)→(η))
implying that the energy function pUω is a potential on Ω=
For almost every ω, Ω= = ΩUω and thus Uωhas a unique minimum. As a consequence of lemma 4, the probability distribution μτ(ε,κ, β),βω converges for almost every ω, as β→∞, to a point mass on the unique minimum of Uω. The latter is a maximum of the fitness function. Finally, since the random variables g(ξ,.)’s are i.i.d., the minimum of Uω can be equiprobably any maximum of the fitness function. Therefore, averaging over w yields a uniform probability distribution over the set of global maxima of the fitness function (seen as a subset of Ω=.
The mutation-selection algorithm considered in this work helps us improve our understanding of GAs. It differs slightly from the algorithm considered by Davis (Davis and Principe, 1991) and Cerf (Cerf, 1996a), (Cerf, 1996b), because we added a noise component in the Boltzmann selection. This difference is however only technical. We studied this two-operator algorithm by resorting to the theory of Freidlin–Wentzell and by making use of the approach developed in (Deuschel and Mazza, 1994).
The first result (theorem 1) explains the behavior of the algorithm for small mutation probability τ ≈ 0. Under this condition, the algorithm is forced to narrow its search to a neighborhood in ΩP of the set of uniform population Ω= (defined in (3)). The latter set is 1–dimensional and corresponds to the diagonal of the search space ΩP. GAs are usually run with τ ≈ 0. Setting β = 1 and removing the noise component, our selection procedure (see (2)) reduces to classical GA fitness-proportional selection. Hence, we can infer how the combined effect of a low mutation probability and a comparatively strong selection, as in classical GAs, removes diversity from the populations in the long–run. The second result (theorem 2) shows that the effect of increasing the selective pressure is similar, in terms of restricting the search, to lowering the mutation probability.
Since gradually decreasing τ to zero and then letting β tend to infinity, actually constrains the search to the set of maximal fitness solutions in Ω=, it was natural to link mutation probability τ and selective pressure β. Theorem 3 shows that, by exponentially decreasing τ with β, the algorithm behaves like simulated annealing on Ω= with energy function pU. This procedure ensures convergence to an optimal solution in the limit β→∞. Therefore, if we fix β large enough and choose τ accordingly (see theorem 3), the algorithm will end up, in the long–run, in a neighborhood of the set of maximal fitness solutions in i2 =. Of course, the convergence performances of the algorithm will be no better than those of simulated annealing. In practical situations, the annealed version of the mutation-selection algorithm should be implemented using a cooling schedule of the form β ~ log n, where n is the generation number, as it is the case for simulated annealing. Of course, parameters like the initial temperature and initial mutation probability as well as the exponential rate κ for τ must be set empirically. Prior knowledge about the energy barrier ∆ (see (14)) would yield the estimate κ ≈ p∆, where p is the population size.
The most interesting question arises in the comparison of the mutation–selection algorithm with classical GAs. Crossover is the distinctive feature between this algorithm and GAs. A generation in a GA is the succession of mutation, crossover and selection. From the considerations of the previous paragraph, we understand more clearly the role of crossover. The latter operator is expected to speed up the search, provided it is not fitness–decreasing on average.
More precisely, the search proceeds in successive stages. First it progresses towards the set Ω=. On Ω=, it converges towards the optimal set Ω= ∩ Fmax (see (4)). For GAs, crossover is expected to speed up the preliminary convergence stage towards Ω=, if on average the mean fitness of the offsprings is not smaller than for the parents. However, populations in Ω= are invariant under its action. Therefore, on Ω=, the GA performances are similar to those of simulated annealing.
Finally, we did not study the inhomogeneous Markov chain. It would be interesting to investigate finite time convergence and convergence rates.
In this appendix, we introduce some notions from the theory of Freidlin–Wentzell (Freidlin and Wentzell, 1984).
As in subsection 3.1, consider a family of Markov chains on a finite set S, indexed by a real parameter α > 0, with irreducible transition matrix {qα (x,y)}x,y s satisfying
qα(x,y)≍exp(−αV(x→y)),x,y∈S,
where 0 ≤ V (x → y) ≤ ∞ is the communication cost function. Let μα denote the stationary probability distribution on S associated to {qα(x, y)}x, y S and μ∞= limα→∞ μα
Let us consider the elements of S as the vertices of the complete oriented graph G on |S| vertices. A spanning tree of G pointing at x S is a subtree T of G such that
2. for every y S, there is a directed path from y to x.
Denote by G(x) the set of all spanning trees of G pointing at x ∈ S and define
V(γ)=∑(y→z)∈γV(y→z),forγ∈G(x),V(x)=minγ∈G(x)V(γ),Vmin=minx∈SV(x)
The stationary probability distribution μα can be estimated in terms of the spanning trees defined above. This is formulated in the result of Freidlin and Wentzell (Freidlin and Wentzell, 1984, lemma 3.1, p.177) stated below.
Lemma 6 We have the following representation formula for μα:
μα(x)≡Qα(x)∑x′∈SQα(x′),∀x∈S,
whereQα(x)=∑γ∈G(x)∏(y→z)∈γqα(y,z).
Moreover,
μα(x)≍exp(−α(V(x)−Vmin)).
As a corollary to this lemma, we get
support(μ∞)={x∈S:V(x)=Vmin
from (18).
The next lemma is lemma 4 of subsection 3.1 and can be found in (Deuschel and Mazza, 1994). We reproduce the proof here for the sake of clarity.
Lemma 7 Assume there is a potential ψ: S→ for the communication cost function V(x —> y). Then
support(μ∞)={x∈S:ψ(x)=miny∈Sψ(y)}
Proof: Let x,y ∈ S.For an arbitrary spanning tree γ ∈ G(x), let gyx be the unique geodesic of γ starting at y and ending at x. Define a mapping R: G(x) → G(y) by R(γ) = (γ gyx) ∪ gxy, where gxy is just the reversed geodesic from x to y. This mapping R is well–defined and onto. Moreover,
V(R(t))=V(γ)−V(gyx)+V(gxy)=V(γ)+∑(e−→e+)∈gxy(V(e−→e+)−V(e+→e−))=V(γ)+∑(e−→e+)∈gxy(ψ(e+)−ψ(e‐))
which implies V(R(γ)) = V(γ) + ψ(y) – ψ (x) for any γ ∈ G(x) It follows that
V(y)=minˉγ∈G(y)V(ˉγ)=minγ∈G(x)V(R(γ))=minγ∈G(x)(V(γ)+ψ(y)−ψ(x))=V(x)+ψ(y)−ψ(x)
Thus V(y) − V(x) = ψ (y) − ψ (x)for any x, y S so there is a constant C such that V(x) = ψ(x) + C for any x ∈ S.
Notice from (20) that, if the cost function V(x → y) admits a potential, then the function V(x) is also a potential for V(x → y). Two potentials differ by a constant.
The lemma below is lemma 5 of section 3.1.
Lemma 8 Let S– be a subset of S with the properties:
1. for any x ∈ S+ = S S–, there exists y ∈ S– such that V(x → y) = 0.
2. for any y ∈ S−, we have V(x → y) > 0 for all x ∈ S+.
Then the dynamics can be restricted from S onto S−.
Proof: It follows from (Freidlin and Wentzell, 1984, lemmata 4.1–3, pp.185–189) that
1. ∀x ∈ S–, V(x) is computable from graphs over S–,
2. ∀y∈S+,V(y)=minx∈S−(V(x)+ˉV(x→y)),
where the path communication cost ˉV(x→y) is defined as
ˉV(x→y)=|Ω|P∧k=2minz2,…,zk−1(k∑i=2V(zi−1→zi)),
with x = z1 and y = zk.
Since by assumption V(x → y) > 0 for any x S– and y S+, assertions 1 and 2 above justify the restriction of the dynamics to S–.
This work is supported by the Swiss National Science Foundation and the Région Rhône–Alpes.
3.138.34.31