Gibbs sampling

Let's suppose that we want to obtain the full joint probability for a Bayesian network P(x1, x2, x3, ..., xN); however, the number of variables is large and there's no way to solve this problem easily in a closed form. Moreover, imagine that we would like to get some marginal distribution, such as P(x2), but to do so we should integrate the full joint probability, and this task is even harder. Gibbs sampling allows approximating of all marginal distributions with an iterative process. If we have N variables, the algorithm proceeds with the following steps:

  1. Initialize the variable NIterations
  2. Initialize a vector S with shape (N, NIterations)
  3. Randomly initialize x1(0)x2(0), ..., xN(0) (the superscript index is referred to the iteration)
  4. For t=1 to NIterations:
    1. Sample x1(t) from p(x1|x2(t-1)x3(t-1), ..., xN(t-1)) and store it in S[0, t]
    2. Sample x2(t) from p(x2|x1(t)x3(t-1), ..., xN(t-1)) and store it in S[1, t]
    3. Sample x3(t) from p(x3|x1(t)x2(t), ..., xN(t-1)) and store it in S[2, t]
    4. ...
    5. Sample xN(t) from p(xN|x1(t)x2(t), ..., xN-1(t)) and store it in S[N-1, t]

At the end of the iterations, vector S will contain NIterations samples for each distribution. As we need to determine the probabilities, it's necessary to proceed like in the direct sampling algorithm, counting the number of single occurrences and normalizing dividing by NIterations. If the variables are continuous, it's possible to consider intervals, counting how many samples are contained in each of them. 

For small networks, this procedure is very similar to direct sampling, except that when working with very large networks, the sampling process could become slow; however, the algorithm can be simplified after introducing the concept of the Markov blanket of Xi, which is the set of random variables that are predecessors, successors, and successors' predecessors of Xi (in some books, they use the terms parents and children). In a Bayesian network, a variable Xi is a conditional independent of all other variables given its Markov blanket. Therefore, if we define the function MB(Xi), which returns the set of variables in the blanket, the generic sampling step can be rewritten as p(xi|MB(Xi)), and there's no more need to consider all the other variables.

To understand this concept, let's consider the network shown in the following diagram:

Bayesian network for the Gibbs sampling example

The Markov blankets are:

  • MB(X1) = { X2, X3 }
  • MB(X2) = { X1, X3, X4 }
  • MB(X3) = { X1, X2X4X5 }
  • MB(X4) = { X3 }
  • MB(X5) = { X3 }
  • MB(X6) = { X2 }

In general, if N is very large, the cardinality of |MB(Xi)| << N, thus simplifying the process (the vanilla Gibbs sampling needs N-1 conditions for each variable). We can prove that the Gibbs sampling generates samples from a Markov chain that is in detailed balance:

Therefore, the procedure converges to the unique stationary distribution. This algorithm is quite simple; however, its performance is not excellent, because the random walks are not tuned up in order to explore the right regions of the state-space, where the probability to find good samples is high. Moreover, the trajectory can also return to bad states, slowing down the whole process. An alternative (also implemented by PyMC3 for continuous random variables) is the No-U-Turn algorithm, which we don't discuss in this book. The reader interested in this topic can find a full description in The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo, Hoffmann M. D., Gelman A., arXiv:1111.4246.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.153