Chapter 5
Monte Carlo methods

Monte Carlo methods based on Markov chains are often called MCMC methods, the acronym “MCMC” standing for “Markov Chain Monte Carlo.” These are often the only effective methods for the approximate computation of highly combinatorial quantities of interest and may be introduced even in situations in which the basic model is deterministic.

The corresponding research field is at the crossroads of disciplines such as statistics, stochastic processes, and computer science, as well as of various applied sciences that use it as a computation tool. It is the subject of a vast modern literature.

We are going to explain the main bases for these methods and illustrate them on some classic examples. We hope that we thus shall help the readers appreciate their adaptability and efficiency. This chapter thus provides better understanding on the practical importance of Markov chains, as well as some of the problematics they introduce.

5.1 Approximate solution of the Dirichlet problem

5.1.1 General principles

The central idea is to use Theorem 2.2.2, not any longer for computing

equation

by solving the equation as we have done, but on the contrary to use this probabilistic representation in order to approximate the solution of the Dirichlet problem. If

equation

then the strong law of large numbers yields a Monte Carlo method for this.

The corresponding MCMC method consists in simulating c05-math-0003 independent chains of matrix c05-math-0004 starting at c05-math-0005, denoted by c05-math-0006 for c05-math-0007, each until its first exit time c05-math-0008 from c05-math-0009, and use that

In order to obtain confidence intervals for this method, one can use for instance

  • the central limit theorem, which requires information on the variance
    equation
  • exponential Markov inequalities, or a large deviation principle, which requires information on the exponential moments of c05-math-0012.

The central limit theorem or a large deviation principle actually only yield asymptotic confidence intervals.

Naturally, this requires that c05-math-0013, and even that c05-math-0014 as another application of the strong law of large numbers yields that the total number of time steps for c05-math-0015 simulations is

equation

Hence, the obtention of exact or approximate solutions of the equations in Theorems 2.2.5 and 2.2.6 would allow to handle some essential problems, such as

  • the estimation of the mean number of steps
    equation

    required for each simulation,

  • the establishment of confidence intervals on the duration of the simulation of one of the Markov chains and on the total duration
    equation

    of the c05-math-0019 simulations: this can be obtained using c05-math-0020 and c05-math-0021 and the central limit theorem, or using c05-math-0022 and exponential Markov inequalities if the convergence radius is greater than c05-math-0023,

  • the development of criteria for abandoning simulations, which are too long, and for compensating the bias induced by these censored data.

5.1.2 Heat equation in equilibrium

5.1.2.1 Dirichlet problem

The Dirichlet problem for the stationary heat equation on a domain c05-math-0024 of c05-math-0025 with boundary condition c05-math-0026 is given, denoting the Laplacian by c05-math-0027, by

equation

It corresponds to the physical problem of describing the temperature distribution in equilibrium of a homogeneous solid occupying c05-math-0029, which is heated at its boundary c05-math-0030 according to a temperature distribution given by c05-math-0031. Physically, c05-math-0032 w.r.t. the absolute zero, and the solution is the minimal solution.

5.1.2.2 Discretization

Let us discretize space by a regular grid with mesh of size c05-math-0033. A natural approximation of c05-math-0034 is given by

equation

Considering the transition matrix c05-math-0036 of the symmetric nearest-neighbor random walk c05-math-0037 on the grid, an approximation c05-math-0038 of c05-math-0039 on the grid, the set c05-math-0040 of gridpoints outside c05-math-0041, which are at a distance c05-math-0042, and an appropriate approximation c05-math-0043 of c05-math-0044 on c05-math-0045, yields the discretized problem

equation

The quality of the approximation of the initial problem by the discretized problem can be treated by analytic methods or by probabilistic methods using the approximation of Brownian motion by random walks. Note that if c05-math-0047 is bounded with “characteristic length” c05-math-0048 then the number of points in c05-math-0049 is of order c05-math-0050.

5.1.2.3 Monte Carlo approximation

Let c05-math-0051 be the exit time of c05-math-0052 from c05-math-0053. Theorem 2.2.2 yields that if c05-math-0054 is nonnegative then the least nonnegative solution of the discretized problem is given by the probabilistic representation

equation

and that if c05-math-0056 is bounded and moreover c05-math-0057 for every c05-math-0058 then c05-math-0059 is the unique bounded solution. For c05-math-0060, a Monte Carlo approximation c05-math-0061 of c05-math-0062 is obtained as in (5.1.1).

Note that c05-math-0063 as soon as c05-math-0064 is bounded in at least one principal direction: the effective displacement along this axis will form a symmetric nearest-neighbor random walk on c05-math-0065, which always eventually reaches a unilateral boundary, as we have seen.

When c05-math-0066 is bounded, the result (2.3.10) on the exit time from a box provides a bound on the expectation of the duration of a simulation. In particular, if c05-math-0067 has “characteristic length” c05-math-0068 then the mean number of steps for obtaining the Monte Carlo approximation is of order

equation

5.1.3 Heat equation out of equilibrium

5.1.3.1 Dirichlet problem

There is an initial heat distribution c05-math-0070 on c05-math-0071, and for c05-math-0072, a heat distribution c05-math-0073 is applied to c05-math-0074. The Monte Carlo method is well adapted to this nonequilibrium situation, in which time will be considered as a supplementary spatial dimension, and the initial condition as a boundary condition.

With an adequate choice of the units of time and space in terms of the thermal conductance of the medium,

equation

5.1.3.2 Discretization

Discretizing by a spatial mesh of size c05-math-0076 and a temporal mesh of size c05-math-0077 yields

equation

which writes c05-math-0079 for the transition matrix c05-math-0080 given by

equation

Hence, c05-math-0082 and c05-math-0083 should be of the same order for this diffusive phenomenon. By approximating adequately c05-math-0084 by c05-math-0085 on c05-math-0086, the discretized problem writes

equation

This constitutes a discretization scheme, implicit in time, for the heat equation, which is unconditionally stable: there is no need for a CFL condition. This scheme has many advantages over an explicit scheme.

The quality of the approximation of the initial problem by the discretized problem can be treated by analytic methods or by probabilistic methods using the approximation of Brownian motion by random walks.

Curse of dimensionality

The deterministic numerical solution of this scheme requires to solve a linear system at each time step. If the domain c05-math-0088 is bounded with “characteristic length” c05-math-0089, then c05-math-0090 has cardinality of order c05-math-0091. The deterministic methods hence become quickly untractable as c05-math-0092 increases even to moderate sizes. Such difficulties are referred to by the terminology “the curse of dimensionality,” due to Bellman.

5.1.3.3 Monte Carlo approximation

Monte Carlo methods provide very good alternative techniques bypassing this curse. Let c05-math-0093 be the hitting time of

equation

by the chain c05-math-0095 with matrix c05-math-0096. For c05-math-0097 in c05-math-0098, Theorem 2.2.2 yields that if c05-math-0099 is nonnegative (resp. bounded) then the least nonnegative solution (resp. the unique bounded solution) of the above discretization scheme is given by the probabilistic representation

equation

At each time step, the first coordinate of c05-math-0101 advances of c05-math-0102 with probability c05-math-0103, and hence c05-math-0104 where c05-math-0105 for c05-math-0106 which are i.i.d., and have geometric law given by c05-math-0107 for c05-math-0108. Simple computations show that c05-math-0109 is finite, a.s., has some finite exponential moments, and

equation

so that we have bounds linear in the dimension c05-math-0111 for the expectation and the standard deviation.

Then, c05-math-0112 can be approximated using the Monte Carlo method by c05-math-0113 given in (5.1.1). The central limit theorem yields that the precision is of order c05-math-0114 with a constant that depends on the variance and hence very reasonably on the dimension c05-math-0115.

This is a huge advantage of Monte Carlo methods over deterministic methods, which suffer of the curse of dimensionally and of which the complexity increases much faster with the dimension.

5.1.4 Parabolic partial differential equations

5.1.4.1 Dirichlet problem

More general parabolic problems may be considered. Let a time-dependent elliptic operator acting on c05-math-0116 in c05-math-0117 be given by

equation

where the matrix c05-math-0119 is symmetric nonnegative. Consider the Dirichlet problem

equation

5.1.4.2 Discretization: diagonal case

Let us first assume that c05-math-0121 is diagonal. It is always possible to find an orthonormal basis in which is locally the case, corresponding to the principal axes of c05-math-0122, and it is a good idea to find them if c05-math-0123 is constant. By discretizing the c05-math-0124th coordinate by a step of c05-math-0125 and time by a step of c05-math-0126,

equation

This writes c05-math-0128 for the transition matrix c05-math-0129 equal to

equation

in which the normalizing constant is given by

equation

This is again a discretization scheme for the PDE, which is implicit in time and unconditionally stable. Then, c05-math-0132 should be of the same order than c05-math-0133 if c05-math-0134 and of same order as c05-math-0135 if c05-math-0136. The relative choices for the c05-math-0137 should be influenced by the sizes of the c05-math-0138 or the c05-math-0139, and in particular if c05-math-0140 is constant in c05-math-0141, then a good choice would be c05-math-0142 if c05-math-0143 and c05-math-0144 if c05-math-0145.

5.1.4.3 Monte Carlo approximation

The matrix c05-math-0146 is a transition matrix, and the previous probabilistic representations and Monte Carlo methods are readily generalized to this situation.

5.1.4.4 Discretization: nondiagonal case

In general, a natural discretization of the Dirichlet problem is given by

equation

which features “diagonal” jumps, and thus extended c05-math-0148 and c05-math-0149 w.r.t. the previous situation.

5.1.4.5 Monte Carlo approximation

In order to have an interpretation in terms of a transition matrix, it is necessary and sufficient that

equation

which is not always true when c05-math-0151, as can be seen with the matrix in which all terms are c05-math-0152. It this holds for all c05-math-0153, then the previous probabilistic representations and Monte Carlo methods can easily be extended to this situation. Else other methods will be used, which do not involve a spatial grid discretization as this causes a strong anisotropy.

5.2 Invariant law simulation

In this section, we mostly assume that c05-math-0154 is finite and set c05-math-0155.

5.2.1 Monte Carlo methods and ergodic theorems

Let c05-math-0156 be a state space, and c05-math-0157 a probability measure on c05-math-0158, which can be interpreted as the invariant law for an irreducible transition matrix c05-math-0159. Finding an explicit expression for c05-math-0160 may raise the following difficulties.

  1. The invariant measure c05-math-0161 is a nonnegative nonzero solution of the linear system c05-math-0162. This develops into a system of c05-math-0163 equations for c05-math-0164 unknowns, usually highly coupled and as such difficult to solve, see the global balance equations (3.2.4). If c05-math-0165 is reversible, then the local balance equations (3.2.5) can be used, yielding for c05-math-0166 the semiexplicit expression in Lemma 3.3.8.
  2. The invariant law c05-math-0167 has total mass c05-math-0168, and thus the following normalization problem arises. Once an invariant law c05-math-0169 is obtained, then c05-math-0170 must be computed in order to make c05-math-0171 explicit. In general, such a summation is an NP-complete problem in terms of c05-math-0172.

These difficulties are elegantly bypassed by Monte Carlo approximation methods. These replace the resolution of an equation, which implicitly defines an invariant measure, and the normalization of the solution, by the simulation of a Markov chain, which is explicitly given by its transition matrix.

5.2.1.1 Pointwise ergodic theorem

Assume that a real function c05-math-0173 on c05-math-0174 is given and that the mean of c05-math-0175 w.r.t. c05-math-0176 is sought. We then are interested in

equation

Even if c05-math-0178 were explicitly known, such a summation is again, in general, an NP-complete problem. Recall that c05-math-0179 is usually at best known only up to a normalizing constant, that is, as an invariant measure c05-math-0180.

If a term c05-math-0181 of c05-math-0182 or the probability c05-math-0183 of some subset of c05-math-0184 is sought, then this corresponds to taking c05-math-0185 or c05-math-0186. Note that if an invariant law c05-math-0187 is known and a term c05-math-0188 has been computed, then

equation

The pointwise ergodic theorem (Theorem 4.1.1) yields a Monte Carlo method for approximating c05-math-0190. Indeed, if c05-math-0191 is a Markov chain with matrix c05-math-0192 and arbitrary initial condition, then

equation

The Markov chain central limit theorem (Theorem 4.1.5) yields confidence intervals for this convergence.

5.2.1.2 Kolmogorov's ergodic theorem

Some situations require to simulate random variables of law c05-math-0194, for instance in view of integrating them in simulations of more complex systems. Even if c05-math-0195 were explicitly known, this is usually a very difficult task if c05-math-0196 is large; moreover, c05-math-0197 is usually at best known only up to a normalizing constant.

If c05-math-0198 is aperiodic, then the Kolmogorov ergodic theorem yields a Monte Carlo method for obtaining samples drawn from an approximation of c05-math-0199. Indeed, if c05-math-0200 is a Markov chain with matrix c05-math-0201 and arbitrary initial condition, then

equation

Exponential convergence bounds are given by Theorem 1.3.4, in terms of the Doeblin condition, which is satisfied, but usually the bounds are very poor. Much better exponential bounds are given in various results and developments around the Dirichlet forms, spectral gap estimations, and Poincaré inequalities, which have been introduced in Section 4.3.

5.2.2 Metropolis algorithm, Gibbs law, and simulated annealing

5.2.2.1 Metropolis algorithm

Let c05-math-0203 be a law of interest, determined at least up to a normalizing constant. It may be given for instance by statistical mechanics considerations. The first step is to interpret c05-math-0204 as the reversible law (and thus the invariant law) of an explicit transition matrix c05-math-0205.

Let c05-math-0206 be an irreducible transition matrix c05-math-0207 on c05-math-0208, called the selection matrix, s.t. c05-math-0209 for c05-math-0210 in c05-math-0211, and a function

equation

for instance c05-math-0213 or c05-math-0214. Let

equation

which depends on c05-math-0216 only up to a normalizing constant. Let c05-math-0217 be defined by

equation

with c05-math-0219 if c05-math-0220 and c05-math-0221.

It is a simple matter to check that c05-math-0222 is an irreducible transition matrix, which is reversible w.r.t. c05-math-0223, and that c05-math-0224 is aperiodic if c05-math-0225 is aperiodic or if c05-math-0226. Hence, the above-mentioned Monte Carlo methods for approximating quantities related to c05-math-0227 can be implemented using c05-math-0228.

The Metropolis algorithm is a method for sequentially drawing a sample c05-math-0229 of a Markov chain c05-math-0230 of transition matrix c05-math-0231, using directly c05-math-0232 and c05-math-0233:

  • Step c05-math-0234 (initialization): draw c05-math-0235 according to the arbitrary initial law.
  • Step c05-math-0236 (from c05-math-0237 to c05-math-0238): draw c05-math-0239 according to the law c05-math-0240:
    • with probability c05-math-0241 set c05-math-0242
    • else set c05-math-0243.

This is an acceptance–rejection method: the choice of a candidate for a new state c05-math-0244 is made according to c05-math-0245 and is accepted with probability c05-math-0246 and else rejected. The actual evolution is thus made in accordance with c05-math-0247, which has invariant law c05-math-0248, instead of c05-math-0249.

A classic case uses c05-math-0250 given by c05-math-0251. Then, c05-math-0252 is accepted systematically if

equation

and else with probability

equation

5.2.2.2 Gibbs laws and Ising model

Let c05-math-0255 be a function and c05-math-0256 a real number. The Gibbs law with energy function c05-math-0257 and temperature c05-math-0258 is given by

equation

This law characterizes the thermodynamical equilibrium of a system in which an energy c05-math-0260 corresponds to each configuration c05-math-0261. More precisely, it is well known that among all laws c05-math-0262 for which the mean energy c05-math-0263 has a given value c05-math-0264, the physical entropy

equation

is maximal for a Gibbs law c05-math-0266 for a well-defined c05-math-0267.

Owing to the normalizing constant, c05-math-0268 does not change if a constant is added to c05-math-0269, and hence

the uniform law on the set of minima of c05-math-0271.

Ising model

A classic example is given by the Ising model. This is a magnetic model, in which c05-math-0272 and the energy of configuration c05-math-0273 is given by

equation

in which the matrix c05-math-0275 can be taken symmetric and with a null diagonal, and c05-math-0276 and c05-math-0277 are the two possible orientations if a magnetic element at site c05-math-0278. The term c05-math-0279 quantifies the interaction between the two sites c05-math-0280, for instance c05-math-0281 for a ferromagnetic interaction in which configurations c05-math-0282 are favored. The term c05-math-0283 corresponds to the effect of an external magnetic field on site c05-math-0284.

The state space c05-math-0285 has cardinal c05-math-0286, so that the computation or even estimation of the normalizing constant c05-math-0287, called the partition function, is very difficult as soon as c05-math-0288 is not quite small.

Simulation

The simulation by the Metropolis algorithm is straightforward. Let c05-math-0289 be an irreducible matrix on c05-math-0290, for instance corresponding to choosing uniformly c05-math-0291 in c05-math-0292 and changing c05-math-0293 into c05-math-0294. The algorithm proceeds as follows:

  • Step c05-math-0295 (initialization): draw c05-math-0296 according to the arbitrary initial law.
  • Step c05-math-0297 (from c05-math-0298 to c05-math-0299): draw c05-math-0300 according to the law c05-math-0301:
    • set c05-math-0302 with probability equation
    • else set c05-math-0304.

If moreover c05-math-0305 and c05-math-0306, then c05-math-0307 is accepted systematically if

equation

and else with probability

equation

5.2.2.3 Global optimization and simulated annealing

A deterministic algorithm for the minimization of c05-math-0310 would only accept states that actually decrease the energy function c05-math-0311. They allow to find a local minimum for c05-math-0312, which may be far from the global minimum.

The Metropolis algorithms also accepts certain states that increase the energy, with a probability, which decreases sharply if the energy increase is large of the temperature c05-math-0313 is low. This allows it to escape from local minima and explore the state space, and its instantaneous laws converge to the Gibbs c05-math-0314, which distributes its mass mainly close to the global minima. We recall (5.2.2): the limit as c05-math-0315 goes to zero of c05-math-0316 is the uniform law on the minima of c05-math-0317.

Let c05-math-0318 be a sequence of temperatures, and c05-math-0319 the inhomogeneous Markov chain for which

equation

where the transition matrix c05-math-0321 has invariant law c05-math-0322, for instance corresponds to the Metropolis algorithm of energy function c05-math-0323 and temperature c05-math-0324.

A natural question is whether it is possible to choose c05-math-0325 decreasing to c05-math-0326 in such a way that the law of c05-math-0327 converges to the uniform law on the minima of c05-math-0328. This will yield a stochastic optimization algorithm for c05-math-0329, in which the chain c05-math-0330 will be simulated for a sufficient length of time (also to be estimated), and then the instantaneous values should converge to a global minimum of c05-math-0331.

From a physical viewpoint, this is similar to annealing techniques in metallurgy, in which an alloy is heated and then its temperature let decrease sufficiently slowly that the final state is close to a energy minimum. In quenched techniques, on the contrary, the alloy is plunged into a cold bath in order to obtain a desirable state close to a local minimum.

Clearly, the temperature should be let decrease sufficiently slowly. The resulting theoretic results, see for example Duflo, M. (1996, Section 6.4, p. 264), are for logarithmic temperature decrease, for instance of the form c05-math-0332 or c05-math-0333 c05-math-0334 for large enough c05-math-0335.

This is much too slow in practice, and temperatures are let decrease much faster than that in actual algorithms. These nevertheless provide good results, in particular high-quality suboptimal values in combinatorial optimization problems.

5.2.3 Exact simulation and backward recursion

5.2.3.1 Identities in law and backward recursion

The random recursion in Theorem 1.2.3, given by

equation

where c05-math-0337 is a sequence of i.i.d. random functions from c05-math-0338 to c05-math-0339, which is independent of the r.v. c05-math-0340 of c05-math-0341, allows to construct a Markov chain c05-math-0342 on c05-math-0343 with matrix c05-math-0344 of generic term c05-math-0345. In particular, c05-math-0346 has law c05-math-0347.

As the c05-math-0348 are i.i.d., the random functions c05-math-0349 and c05-math-0350 have same law, and if c05-math-0351 has law c05-math-0352 and is independent of c05-math-0353 then

equation

has also law c05-math-0355 for c05-math-0356, as c05-math-0357.

Far past start interpretation

A possible description of c05-math-0358 is as follows. If c05-math-0359 is interpreted as a draw taking the state at time c05-math-0360 to time c05-math-0361 of c05-math-0362, then c05-math-0363 can be seen as the instantaneous value at time c05-math-0364 of a Markov chain of matrix c05-math-0365 started in the past at time c05-math-0366 at c05-math-0367. Letting c05-math-0368 go to infinity then corresponds to letting the initial instant tend to c05-math-0369, the randomness at the different following instants remaining fixed.

5.2.3.2 Coalescence and invariant law

On the event in which the random function c05-math-0370 is constant on c05-math-0371, for any c05-math-0372, it holds that

equation

Let c05-math-0374 for c05-math-0375, and their first coalescence time

This is a stopping time, after which all c05-math-0377 are constant and equal. On c05-math-0378 let c05-math-0379 be defined by the constant value taken by c05-math-0380, so that c05-math-0381 for all c05-math-0382 and c05-math-0383. If c05-math-0384, then for any initial r.v. c05-math-0385, it holds that

equation

5.2.3.3 Propp and Wilson algorithm

The Propp and Wilson algorithm uses this for the exact simulation of draws from the invariant law c05-math-0408 of c05-math-0409, using the random functions c05-math-0410. The naive idea is the following.

  • Start with c05-math-0411 being the identical mapping on c05-math-0412,
  • for c05-math-0413: draw c05-math-0414 and compute c05-math-0415, that is, c05-math-0416, and then test for coalescence:
    • if c05-math-0417 is constant, then the algorithm terminates issuing this constant value,
    • else increment c05-math-0418 by c05-math-0419 and continue.

If the law of the c05-math-0420 is s.t. c05-math-0421, then the algorithm terminates after a random number of iterations and issues a draw from c05-math-0422.

5.2.3.4 Criteria on the coalescence time

It is important to obtain verifiable criteria for having c05-math-0423 and good estimates on c05-math-0424.

One criteria is to choose the law of c05-math-0425 so that there exists c05-math-0426 s.t.

equation

Then c05-math-0428, where the c05-math-0429 are i.i.d. with geometric law c05-math-0430 for c05-math-0431. This implies that c05-math-0432 and c05-math-0433 and yields exponential bounds on the duration of the simulation.

If c05-math-0434 satisfies the Doeblin condition in Theorem 1.3.4 for c05-math-0435 and c05-math-0436 and a law c05-math-0437, then we may consider independent r.v. c05-math-0438 of law c05-math-0439, random functions c05-math-0440 from c05-math-0441 to c05-math-0442 satisfying

equation

which may be constructed as in Section 1.2.3, and r.v. c05-math-0444 s.t. c05-math-0445 and c05-math-0446. Then, for c05-math-0447,

equation

define i.i.d. random functions s.t. c05-math-0449, for which c05-math-0450 has geometric law c05-math-0451 for c05-math-0452.

Note that the algorithm can be applied to some well-chosen power c05-math-0453 of c05-math-0454, which has same invariant law c05-math-0455 and is more likely to satisfy one of the above-mentioned criteria.

5.2.3.5 Monotone systems

An important problem with the algorithm is that in order to check for coalescence it is in general necessary to simulate the c05-math-0456 for all c05-math-0457 in c05-math-0458, which is untractable.

Some systems are monotone, in the sense that there exists a partial ordering on c05-math-0459 denoted by c05-math-0460 s.t. one can choose random mappings c05-math-0461 for the random recursion satisfying, a.s.,

equation

If there exist a least element c05-math-0463 and a greatest c05-math-0464 in c05-math-0465, hence s.t.

equation

then clearly c05-math-0467, and in order to check for coalescence, it is enough to simulate c05-math-0468 and c05-math-0469.

Many queuing models are of this kind, for instance the queue with capacity c05-math-0470 given by

equation

is monotone, with c05-math-0472 and c05-math-0473.

Ising model

Another interesting example is given by the ferromagnetic Ising model, in which adjacent spins tend to have the same orientations c05-math-0474 or c05-math-0475. The natural partial order is that c05-math-0476 if and only if c05-math-0477 for all sites c05-math-0478. The least state c05-math-0479 is the configuration constituted of all c05-math-0480, and the greatest state c05-math-0481 the configuration constituted of all c05-math-0482. Many natural dynamics preserve the partial order, for instance the Metropolis algorithm we have described earlier. This is the scope of the original study in Propp, J.G. and Wilson, D.B. (1996).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.123.155