Gibbs sampling

Let's suppose that we want to obtain the full joint probability for a Bayesian network P(x₁, x₂, x₃, ..., x_N); however, the number of variables is large and there's no way to solve this problem easily in a closed form. Moreover, imagine that we would like to get some marginal distribution, such as P(x₂), but to do so we should integrate the full joint probability, and this task is even harder. Gibbs sampling allows approximating of all marginal distributions with an iterative process. If we have N variables, the algorithm proceeds with the following steps:

Initialize the variable N_Iterations
Initialize a vector S with shape (N, N_Iterations)
Randomly initialize x₁⁽⁰⁾, x₂⁽⁰⁾, ..., x_N⁽⁰⁾ (the superscript index is referred to the iteration)
For t=1 to N_Iterations:
1. Sample x₁^(t) from p(x₁|x₂^(t-1), x₃^(t-1), ..., x_N^(t-1)) and store it in S[0, t]
2. Sample x₂^(t) from p(x₂|x₁^(t), x₃^(t-1), ..., x_N^(t-1)) and store it in S[1, t]
3. Sample x₃^(t) from p(x₃|x₁^(t), x₂^(t), ..., x_N^(t-1)) and store it in S[2, t]
4. ...
5. Sample x_N^(t) from p(x_N|x₁^(t), x₂^(t), ..., x_N-1^(t)) and store it in S[N-1, t]

At the end of the iterations, vector S will contain N_Iterationssamples for each distribution. As we need to determine the probabilities, it's necessary to proceed like in the direct sampling algorithm, counting the number of single occurrences and normalizing dividing by N_Iterations. If the variables are continuous, it's possible to consider intervals, counting how many samples are contained in each of them.

For small networks, this procedure is very similar to direct sampling, except that when working with very large networks, the sampling process could become slow; however, the algorithm can be simplified after introducing the concept of the Markov blanket of X_i, which is the set of random variables that are predecessors, successors, and successors' predecessors of X_i (in some books, they use the terms parents and children). In a Bayesian network, a variable X_i is a conditional independent of all other variables given its Markov blanket. Therefore, if we define the function MB(X_i), which returns the set of variables in the blanket, the generic sampling step can be rewritten as p(x_i|MB(X_i)), and there's no more need to consider all the other variables.

To understand this concept, let's consider the network shown in the following diagram:

Bayesian network for the Gibbs sampling example

The Markov blankets are:

MB(X₁) = { X₂, X₃ }
MB(X₂) = { X_1,X₃, X₄ }
MB(X₃) = { X_1,X₂, X₄, X₅ }
MB(X₄) = { X₃ }
MB(X₅) = { X₃ }
MB(X₆) = { X₂ }

In general, if N is very large, the cardinality of |MB(X_i)| << N, thus simplifying the process (the vanilla Gibbs sampling needs N-1 conditions for each variable). We can prove that the Gibbs sampling generates samples from a Markov chain that is in detailed balance:

Therefore, the procedure converges to the unique stationary distribution. This algorithm is quite simple; however, its performance is not excellent, because the random walks are not tuned up in order to explore the right regions of the state-space, where the probability to find good samples is high. Moreover, the trajectory can also return to bad states, slowing down the whole process. An alternative (also implemented by PyMC3 for continuous random variables) is the No-U-Turn algorithm, which we don't discuss in this book. The reader interested in this topic can find a full description in The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo, Hoffmann M. D., Gelman A., arXiv:1111.4246.

Table of Contents for Gibbs sampling

Create new playlist

Sign In

Sign Up

Table of Contents for
Gibbs sampling