A gentle introduction to Markov chains

In order to discuss the MCMC algorithms, it's necessary to introduce the concept of Markov chains. In fact, while the direct sample method draws samples without any particular order, the MCMC strategies draw a sequence of samples according to a precise transition probability from a sample to the following one.

Let's consider a time-dependent random variable X(t), and let's assume a discrete time sequence X₁, X₂, ..., X_t, X_t+1, ... where X_t represents the value assumed at time t. In the following diagram, there's a schematic representation of this sequence:

Structure of a generic Markov chain

We can suppose to have N different states s_i for i=1..N, therefore it's possible to consider the probability P(X_t=s_i|X_t-1=s_j, ..., X₁=s_p). X(t) is defined as a first-order Markov process if:

In other words, in a Markov process (from now on, we omit first-order, even if there are cases when it's useful to consider more previous states), the probability that X(t) is in a certain state depends only on the state assumed in the previous time instant. Therefore, we can define a transition probability for every couple i, j:

Considering all the couples (i, j), it's also possible to build a transition probability matrix T(i, j) = P(i → j). The marginal probability that X_t=s_i using a standard notation is defined as:

At this point, it's easy to prove (Chapman-Kolmogorov equation) that:

In the previous expression, in order to compute π_i(t+1), we need to sum over all possible previous states, considering the relative transition probability. This operation can be rewritten in matrix form, using a vector π(t) containing all states and the transition probability matrix T (the uppercase superscript T means that the matrix is transposed). The evolution of the chain can be computed recursively:

For our purposes, it's important to consider Markov chains that are able to reach a stationary distribution π_s:

In other words, the state does not depend on the initial condition π(1), and it's no longer able to change. The stationary distribution is unique if the underlying Markov process is ergodic. This concept means that the process has the same properties if averaged over time (which is often impossible), or averaged vertically (freezing the time) over the states (which is simpler in the majority of cases).

The process of ergodicity for Markov chains is assured by two conditions. The first is aperiodicity for all states, which means that it is impossible to find a positive number p so that the chain returns in the same state sequence after a number of instants equal to a multiple of p. The second condition is that all states must be positive recurrent: this means that, given a random variable N_instants(i), describing the number of time instants needed to return to the state s_i, E[N_instants(i)] < ∞; therefore, potentially, all the states can be revisited in a finite time.

The reason why we need the ergodicity condition, and hence the existence of a unique stationary distribution, is that we are considering the sampling processes modeled as Markov chains, where the next value is sampled according to the current state. The transition from one state to another is done in order to find better samples, as we're going to see in the Metropolis-Hastings sampler, where we can also decide to reject a sample and keep the chain in the same state. For this reason, we need to be sure that the algorithms converge to the unique stable distribution (that approximates the real full joint distribution of our Bayesian network). It's possible to prove that a chain always reaches a stationary distribution if:

The previous equation is called detailed balance, and implies the reversibility of the chain. Intuitively, it means that the probability of finding the chain in the state A times the probability of a transition to the state B is equal to the probability of finding the chain in the state B times the probability of a transition to A.

For both methods that we are going to discuss, it's possible to prove that they satisfy the previous condition, and therefore their convergence is assured.

Table of Contents for A gentle introduction to Markov chains

Create new playlist

Sign In

Sign Up

Table of Contents for
A gentle introduction to Markov chains