Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10
Stochastic Processes

Package(s): sna

Dataset(s): testtpm, testtpm2, testtpm3

10.1 Introduction

Stochastic processes commonly refer to a collection of random variables. Such a collection may vary over time, sample number, geographical location, etc. It is to be noted that a stochastic process may consist of finite, countably infinite, or uncountably infinite collections of random variables. An important characteristic of the stochastic process is that it allows us to relax the assumption of independence for a sequence of random variables.

A few examples are in order.

Example 10.1.1. Examples of Stochastic Processes

We will consider situations where a stochastic process may be discrete or continuous.

The maximum temperature of the day. Let $c10-math-0003$ denote the maximum temperature recorded on the $c10-math-0004$ day. Then, the collection $c10-math-0005$ forms an example of a stochastic process.

Since a very hot day is unlikely to be immediately followed by a chilly day, it may be easily seen that the $c10-math-0006$ does not form an independent sequence of random variables.
The Closure Points of the SENSEX. Consider the points of closure at the end of the day of BSE-SENSEX. This collection is also an example of a stochastic process.
Sum of Heads in $c10-math-0007$ successive throws of a coin. Define

For $c10-math-0009$ , define the sum of heads as

10.2

Then, $c10-math-0011$ , is a collection of random variables which forms a stochastic process. Again, it may be easily seen that $c10-math-0012$ , and hence $c10-math-0013$ forms a sequence of dependent RVs.
Number of Customers arriving at a Ticket Counter. Let $c10-math-0014$ denote the number of customers arriving at a ticket counter up to time $c10-math-0015$ . Then $c10-math-0016$ is again a stochastic process. $c10-math-0017$

Section 10.2 will ensure that the probability space for the stochastic process is properly defined. A very important class of stochastic process is the Markov Chains, and this will be detailed in Section 10.3. The chapter will close with a brief discussion of how Markov Chains are useful in the practical area of computational statistics in Section 10.4.

10.2 Kolmogorov's Consistency Theorem

As seen in the introductory section, a stochastic process is a collection of infinite random variables. It needs to be ensured that the probability measures are well defined for such a collection of RVs. An affirmative answer in this direction is provided by Prof Kolmogorov. The related definitions and theorems (without proof) have been adapted from Adke and Manjunath (1984) and Athreya and Lahiri (2005). We consider the probability space $c10-math-0018$ and follow it up with a sequence of RVs $c10-math-0019$ . Here $c10-math-0020$ is any non-empty set. Then for any $c10-math-0021$ , the random vector $c10-math-0022$ has a joint probability distribution $c10-math-0023$ over $c10-math-0024$ , and here $c10-math-0025$ is the Borel $c10-math-0026$ -field over $c10-math-0027$ .

For any $c10-math-0030$ , and any $c10-math-0031$ , the family of fdds satisfies the following two conditions:

[C1].
10.3
[C2]. For any permutation $c10-math-0033$ of $c10-math-0034$ :
10.4

Kolmogorov's consistency theorem addresses the issue that if there exists a family of distributions $c10-math-0036$ in finite dimensional Euclidean spaces, then there exists a real valued stochastic process $c10-math-0037$ whose fdds coincides with $c10-math-0038$ .

Having being assured of the existence of probability measures for the stochastic processes, let us now look at the important family of stochastic processes: Markov Chains.

10.3 Markov Chains

In earlier chapters, we assumed the observations were independent. In many random phenomenon, the observations are not independent. As seen in the examples in Section 10.1, the maximum temperature of the current day may depend on the maximum temperature of the previous day. The sum of heads in $c10-math-0055$ trials depends on the corresponding sum in the first $c10-math-0056$ trials. Markov Chains are useful for tackling such dependent observations. In this section, we will be considering only discrete phenomenon.

To begin with, we need to define the state space associated with the sequence of random variables. The state space is the set of possible values taken by the stochastic process. In Example 10.1.1 of Section 10.1, the state space for the sum of heads in $c10-math-0057$ throws of a coin is $c10-math-0058$ . The state space considered in this section is at most countably infinite. At time $c10-math-0059$ , the stochastic process $c10-math-0060$ will take one of the values in $c10-math-0061$ , that is, $c10-math-0062$ .

This definition says that the probability of $c10-math-0069$ being observed in a state $c10-math-0070$ , given the entire history $c10-math-0071$ , depends only on the recent past state of $c10-math-0072$ , and not on the history. The matrix array $c10-math-0073$ is called the Transition Probability Matrix, abbreviated as TPM, of the Markov Chain.

Example 10.3.1. The Ehrenfest Model

Consider a system of two urns. Among the two urns, we have $c10-math-0074$ balls. At each instance, a ball is selected from one of the urns and placed in the other. Assume that there are $c10-math-0075$ balls in Urn I, and the remaining $c10-math-0076$ balls in Urn II. Then at any instance, the probability of selecting a ball from Urn I and placing it in Urn II is $c10-math-0077$ , and the other way of placing a ball from Urn II to Urn I is $c10-math-0078$ . At each instant we let the number $c10-math-0079$ of balls in the Urn I to be the state of the system. Thus, the state space is $c10-math-0080$ . Then we can pass from state $c10-math-0081$ only to either of the states $c10-math-0082$ or $c10-math-0083$ , and the transition probabilities are given by

10.6

Here, $c10-math-0085$ . Suppose that $c10-math-0086$ . Then the TPM of such an Ehrenfest model is given by

It can be shown that the above TPM $c10-math-0088$ forms a Markov Chain, that is, the definition 10.5 is satisfied for all $c10-math-0089$ . The Ehrenfest function is first defined here and obtained for the cases of $c10-math-0090$ . The program will be described following the brief R session.

> Ehrenfest <- function(n) {
+ States <- c(0, seq(1,2*n))
+ TPM <- matrix(0,nrow=length(States),ncol=length(States),dimnames=
+ list(seq(0,2*n),seq(0,2*n)))
+ tran_prob <- function(i,n) {
+ tranRow <- rep(0,2*n+1)
+ if(i==0) tranRow[2] <- 1
+ if(i==2*n) tranRow[(2*n+1)-1] <- 1
+ if(i!=0 & i!=2*n)  {
+ j=i+1
+ tranRow[j-1] <- i/(2*n)
+ tranRow[j+1] <- 1-i/(2*n)
+  }
+ return(tranRow)
+  }
+ for(j in 0:(2*n))TPM[j+1,] <- tran_prob(j,n)
+ return(TPM)
+  }
> Ehrenfest(2)
     0   1    2   3    4
0 0.00 1.0 0.00 0.0 0.00
1 0.25 0.0 0.75 0.0 0.00
2 0.00 0.5 0.00 0.5 0.00
3 0.00 0.0 0.75 0.0 0.25
4 0.00 0.0 0.00 1.0 0.00
> Ehrenfest(3)
       0      1      2      3      4      5      6
0 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.1667 0.0000 0.8333 0.0000 0.0000 0.0000 0.0000
2 0.0000 0.3333 0.0000 0.6667 0.0000 0.0000 0.0000
3 0.0000 0.0000 0.5000 0.0000 0.5000 0.0000 0.0000
4 0.0000 0.0000 0.0000 0.6667 0.0000 0.3333 0.0000
5 0.0000 0.0000 0.0000 0.0000 0.8333 0.0000 0.1667
6 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000

The Ehrenfest function accepts $c10-math-0091$ as an input and returns the TPM for the Ehrenfest model associated with $c10-math-0092$ . The two objects are defined in this function; States and TPM. The object States generates the related state space $c10-math-0093$ . Essentially, the number of balls in Urn I varies from $c10-math-0094$ , and for each of the possible states, we need to obtain the transition probabilities to the other states according to Equation 10.6 . The R function tran_prob within the Ehrenfest function gives us this row probabilities. The transition probability row for the cases $c10-math-0095$ and $c10-math-0096$ is obtained in a routine manner. The if condition i!=0 & i!=2*n is coded to ensure that the computations on the right-hand side of Equation 10.6 are completely met. The indexing is defined in a slightly vague way with j=i+1. However, it should not be intriguing to reason out why it is required in this way, and especially a similar trick is used in the loop for(j in 0:(2*n)) TPM[j+1,]. The function Ehrenfest is defined in a way which will return the TPM for the Ehrenfest model, and Ehrenfest(2) and Ehrenfest(3) give us the required TPMs. $c10-math-0097$

In the next subsection, we will consider how the states of a Markov Chain can be classified into a meaningful set of various characteristics.

10.3.1 The m-Step TPM

The TPM $c10-math-0103$ is a one-step transition probability array, namely, its elements give the probability of moving from state $c10-math-0104$ to state $c10-math-0105$ in the next step. The $c10-math-0106$ -step transition probability of a movement from state $c10-math-0107$ to state $c10-math-0108$ is defined by

10.7

Let $c10-math-0110$ denote the $c10-math-0111$ -step transition probability matrix. It can be shown that

10.8

Equation 10.8 is based on the well-known Chapman-Kolmogorov equation. The Chapman-Kolmogorov lemma says that for any $c10-math-0113$

We can easily use the Chapman-Kolmogorov relationship to obtain the $c10-math-0115$ -step TPM of a Markov Chain. The R program for obtaining the $c10-math-0116$ -step TPM through msteptpm is given in the next illustration.

Example 10.3.3. The Ehrenfest Model

Consider the Ehrenfest model for $c10-math-0117$ balls. The 4-step TPM for the Ehrenfest Markov model can be easily obtained using R. This is done as follows:

> msteptpm <- function(TPM,m){
+ if(m==1) return(TPM) else {
+ temp <- TPM
+ for(i in 1:(m-1)) temp=temp
+ return(temp)
+ }
+ }
> EF2 <- Ehrenfest(2)
> msteptpm(as.matrix(EF2),4)
        0      1    2      3       4
0 0.15625 0.0000 0.75 0.0000 0.09375
1 0.00000 0.5312 0.00 0.4688 0.00000
2 0.12500 0.0000 0.75 0.0000 0.12500
3 0.00000 0.4688 0.00 0.5312 0.00000
4 0.09375 0.0000 0.75 0.0000 0.15625
> EF2

The for loop in the msteptmp function simply carries out the required computation required in Equation 10.8. The interpretations are as follows. If we begin in Urn I with 0 balls, the probability of Urn I containing 1 to 4 balls after four steps is 0.0000, 0.75, 0.0000, and 0.09375 respectively. Similarly, the probabilities in the rest of the matrix $c10-math-0118$ may be interpreted. $c10-math-0119$

10.3.2 Classification of States

In the Ehrenfest example we see that it is possible to move from a particlar state to one of its adjacent states only. However, it is possible for us to move from state 1 to states 3 and 4 in 2 and 3 steps. The question that then arises is how to identify the accessible states? Towards this discussion, a few definitions are required.

Accessibility is denoted by $c10-math-0124$ .

Communication between two states is denoted by $c10-math-0127$ . The collection of the states where each communicates with the other is said to belong to the same class. Some properties of communication, without proof, are listed below:

$c10-math-0128$ .
If $c10-math-0129$ , then $c10-math-0130$ .
If $c10-math-0131$ and $c10-math-0132$ , then $c10-math-0133$ .

Irreducible Markov Chains are also called ergodic Markov Chains. A stronger requirement than irreducibility is given next.

It can be easily seen that the presence of an absorbing state implies that the Markov Chain is neither regular nor irreducible.

A state $c10-math-0140$ is said to have period $c10-math-0141$ if $c10-math-0142$ whenever $c10-math-0143$ is not divisible by $c10-math-0144$ and $c10-math-0145$ is the greatest integer with this property. The period of a state is denoted by $c10-math-0146$ . If it is not possible to return to state $c10-math-0147$ and be retained in $c10-math-0148$ (starting from state $c10-math-0149$ of course), the period of the state $c10-math-0150$ is infinite. On the other hand, if a state has period 1, we call that state aperiodic.

Digraph is a powerful visualizing tool for understanding the accessible and communicating states of a Markov Chain. Here, we use the package sna for achieving this. Note that this package is developed for the purpose of an emerging field Social Network Analysis. The gplot function from this package is useful for our purpose though.

Example 10.3.4. Digraph Plots for Markov Chains

Contd. The TPM of Markov Chains may be visualized using digraphs. In R, we use the gplot graph function of the sna package to achieve this. We plot the digraphs of the Ehrenfest Markov model, and some three artificial TPMs. The R program next gives the necessary codes.

> library(sna,quietly=TRUE)
> ehrenfest <- Ehrenfest(2)
> rownames(ehrenfest) <- colnames(ehrenfest)=0:4
> ehrenfest
     0   1    2   3    4
0 0.00 1.0 0.00 0.0 0.00
1 0.25 0.0 0.75 0.0 0.00
2 0.00 0.5 0.00 0.5 0.00
3 0.00 0.0 0.75 0.0 0.25
4 0.00 0.0 0.00 1.0 0.00
> data(testtpm)
> rownames(testtpm) <- colnames(testtpm)
> testtpm
       A    B     C    D    E      F
A 1.0000 0.00 0.000 0.00 0.00 0.0000
B 0.2500 0.50 0.000 0.25 0.00 0.0000
C 0.0000 0.00 0.000 1.00 0.00 0.0000
D 0.0625 0.25 0.125 0.25 0.25 0.0625
E 0.0000 0.00 0.000 0.25 0.50 0.2500
F 0.0000 0.00 0.000 0.00 0.00 1.0000
> data(testtpm2)
> rownames(testtpm2) <- colnames(testtpm2)
> testtpm2
  A B    C    D   E    F
A 0 1 0.00 0.00 0.0 0.00
B 1 0 0.00 0.00 0.0 0.00
C 0 0 0.25 0.45 0.0 0.30
D 0 0 0.00 0.50 0.5 0.00
E 0 0 0.25 0.00 0.5 0.25
F 0 0 0.00 0.00 0.3 0.70
> data(testtpm3)
> rownames(testtpm3) <- colnames(testtpm3)
> testtpm3
    A   B   C   D   E   F   G
A 0.3 0.1 0.2 0.2 0.1 0.1 0.0
B 0.0 0.5 0.0 0.0 0.0 0.0 0.5
C 0.0 0.0 0.4 0.6 0.0 0.0 0.0
D 0.0 0.0 0.3 0.2 0.0 0.5 0.0
E 0.0 0.0 0.2 0.3 0.4 0.1 0.0
F 0.0 0.0 1.0 0.0 0.0 0.0 0.0
G 0.0 0.8 0.0 0.0 0.0 0.0 0.2
> par(mfrow=c(2,2))
> gplot(ehrenfest,diag=TRUE,vertex.cex=6,vertex.sides=4,
+ vertex.col=1:5,vertex.border=2:6,vertex.rot=(0:4)*100,
+ displaylabels=TRUE, main="A: Digraph for Ehrenfest Model")
> gplot(testtpm,diag=TRUE,vertex.cex=6,vertex.sides=4,vertex.col=1:6,
+ vertex.border=2:7,vertex.rot=(0:5)* 100,displaylabels=TRUE,
+ main="B: Digraph for testtpm")
> gplot(testtpm2,diag=TRUE,vertex.cex=6,vertex.sides=4,
+ vertex.col=1:6,vertex.border=2:7,vertex.rot=(0:5)*100,
+ displaylabels=TRUE, main="C: Digraph for testtpm2")
> gplot(testtpm3,diag=TRUE,vertex.cex=6,vertex.sides=4,
+ vertex.col=1:7, vertex.border=2:8,vertex.rot=(0:6)*100,
+ displaylabels=TRUE, main="D: Digraph for testtpm3")

The digraphs helps us to understand the nature of states, Figure 10.1. We can classify the states into transition states and recurrent states. Let us discuss the digraph in detail.

Figure 10.1 Digraphs for Classification of States of a Markov Chain

In the Ehrenfest model we see that $c10-math-0151$ , $c10-math-0152$ , $c10-math-0153$ , and $c10-math-0154$ . Hence, we can see by the commutative property of accessible states that all the states communicate with each other. Thus, we have an irreducible Markov Chain. In an irreducible Markov Chain, all the states are recurrent states (a proof is not in the scope of this text).

For the example where the TPM is given by testtpm, we make the following observations. The pair of communicative states are the following: $c10-math-0155$ , $c10-math-0156$ , $c10-math-0157$ , and consequently these states are accessible from each other. However, it is clear from the diagraph that states $c10-math-0158$ and $c10-math-0159$ are absorbing states as no other state is accessible from them. Thus testtpm is not a regular or irreducible Markov Chain.

In the case of testtpm2, it is easily seen to be decomposible into two Markov Chains. One Markov Chain consists of states $c10-math-0160$ and $c10-math-0161$ , and the other $c10-math-0162$ . In light of the fact that $c10-math-0163$ , we can easily conclude the periodicity of the two states is 2. Interpretation for testtpm3 is left as an exercise. $c10-math-0164$

10.3.3 Canonical Decomposition of an Absorbing Markov Chain

In the previous discussion we have seen different types of TPM characteristics for the Markov Chains. In an irreducibile Markov Chain, all the states are recurrent states. However, if there is an absorbing state, as in testtpm, we may be interested in the following issues:

1. The probability of the process ending up in an absorbing state.
2. The average time for the Markov Chain to get absorbed.
3. The average time spent in each of the transient states.

The canonical decomposition helps with the answer to the above questions.

Arrange the states of an absorbing Markov Chain in the form of (TRANSIENT, ABSORBING). For example, reorder the states of testtpm2 as $c10-math-0165$ . Let $c10-math-0166$ be the number of absorbing states and $c10-math-0167$ the number of transient states. The total number of states is $c10-math-0168$ . Arrange the TPM as below:

10.9

Here $c10-math-0170$ is an $c10-math-0171$ identity matrix, $c10-math-0172$ an $c10-math-0173$ zero matrix, $c10-math-0174$ a $c10-math-0175$ matrix, and $c10-math-0176$ a non-zero $c10-math-0177$ matrix. Let us closely look at the matrix $c10-math-0178$ and calculate $c10-math-0179$ for some large $c10-math-0180$ . Note that it is very easy to rearrange the matrix in the required form in R.

> testtpm <- as.matrix(testtpm)
> testtpm <- testtpm[c(2,3,4,5,1,6),c(2,3,4,5,1,6)]
> Q <- testtpm[c(1:4),c(1:4)]
> R <- testtpm[c(1:4),c(5,6)]
> Q
     B     C    D    E
B 0.50 0.000 0.25 0.00
C 0.00 0.000 1.00 0.00
D 0.25 0.125 0.25 0.25
E 0.00 0.000 0.25 0.50
> R
       A      F
B 0.2500 0.0000
C 0.0000 0.0000
D 0.0625 0.0625
E 0.0000 0.2500
> testtpm
     B     C    D    E      A      F
B 0.50 0.000 0.25 0.00 0.2500 0.0000
C 0.00 0.000 1.00 0.00 0.0000 0.0000
D 0.25 0.125 0.25 0.25 0.0625 0.0625
E 0.00 0.000 0.25 0.50 0.0000 0.2500
A 0.00 0.000 0.00 0.00 1.0000 0.0000
F 0.00 0.000 0.00 0.00 0.0000 1.0000
> msteptpm(testtpm,n=100)[c(1:4),c(1:4)]
             B            C            D            E
B 1.635836e-10 3.124169e-11 2.022004e-10 1.635836e-10
C 2.499335e-10 4.773305e-11 3.089348e-10 2.499335e-10
D 2.022004e-10 3.861685e-11 2.499335e-10 2.022004e-10
E 1.635836e-10 3.124169e-11 2.022004e-10 1.635836e-10

We can easily then see that

10.10

The Fundamental Matrix for an absorbing Markov chain is given by

10.11

The elements $c10-math-0183$ of $c10-math-0184$ gives the expected number of times the process will be in transient state $c10-math-0185$ if the process started in state $c10-math-0186$ . For the testtpm example, we have the answers in the output given below:

> N <- solve(diag(rep(1,nrow(Q)))-Q)
> N
          B         C        D         E
B 2.6666667 0.1666667 1.333333 0.6666667
C 1.3333333 1.3333333 2.666667 1.3333333
D 1.3333333 0.3333333 2.666667 1.3333333
E 0.6666667 0.1666667 1.333333 2.6666667

Starting from a transient state $c10-math-0187$ , the expected number of steps before the Markov Chain is absorbed and the probabilities of it being absorbed into one of the absorbing states are respectively given by

10.12

10.13

where $c10-math-0190$ is an $c10-math-0191$ column of 1 and $c10-math-0192$ and $c10-math-0193$ are respectively the expected number of steps and probabilities of the absorption matrix. For our dummy example, the computation concludes with the program below.

> t <- N
> t
      [,1]
B 4.833333
C 6.666667
D 5.666667
E 4.833333
> B <- N %*% R
> B
     A    F
B 0.75 0.25
C 0.50 0.50
D 0.50 0.50
E 0.25 0.75

The three questions asked about an absorbing Markov Chain are also applicable to an Ergodic Markov Chain in a slightly different sense. This topic will be taken up next.

10.3.4 Stationary Distribution and Mean First Passage Time of an Ergodic Markov Chain

If we have an ergodic Markov Chain, we know that each state will be visited infinitely often. However, this implies that over the long run, the number of times it will be in a given state may be obtained.

A stationary distribution $c10-math-0197$ is a (left) eigenvector of the TPM whose associated eigenvalue is equal to one. For an ergodic Markov Chain, the next program gives us the stationary distribution.

> stationdistTPM <- function(M){
+       eigenprob <- eigen(t(M))
+       temp <- which(round(eigenprob$values,1)==1)
+       stationdist <- eigenprob$vectors[,temp]
+       stationdist <- stationdist/sum(stationdist)
+       return(stationdist)
+ }
> P <- matrix(nrow=3,ncol=3) # An example
> P[1,] <- c(1/3,1/3,1/3)
> P[2,] <- c(1/4,1/2,1/4)
> P[3,] <- c(1/6,1/3,1/2)
> stationdistTPM(P)
[1] 0.24 0.40 0.36

The function uses the eigen function to obtain the eigenvalues and eigenvectors.

The mean recurrence time of an ergodic Markov Chain is given by

10.16

For the previous example:

> 1/stationdistTPM(P)
[1] 4.166667 2.500000 2.777778

We will next consider the concept of passage time.

The matrix of mean first passage time is denoted by $c10-math-0207$ . We will next briefly state the formulas to obtain $c10-math-0208$ . Let $c10-math-0209$ be a matrix where each row consists of the stationary probability vector. Define $c10-math-0210$ , where $c10-math-0211$ is an identity matrix, and $c10-math-0212$ is a matrix where each row is the stationary probability vector $c10-math-0213$ . The elements of $c10-math-0214$ are then given by

For the Ehrenfest model, the mean recurrence times are given below:

> ehrenfest <- as.matrix(ehrenfest)
> w <- stationdistTPM(ehrenfest)
> W <- matrix(rep(w,each=nrow(ehrenfest)),nrow=nrow(ehrenfest))
> Z <- solve(diag(rep(1,nrow(ehrenfest)))-ehrenfest+W)
> M <- ehrenfest*0
> for(i in 1:nrow(ehrenfest)){
+     for(j in 1:nrow(ehrenfest)){
+ M[i,j] <- (Z[j,j]-Z[i,j])/W[j,j]
+ }
+ }
> M
         0        1        2        3        4
0  0.00000 1.000000 2.666667 6.333333 21.33333
1 15.00000 0.000000 1.666667 5.333333 20.33333
2 18.66667 3.666667 0.000000 3.666667 18.66667
3 20.33333 5.333333 1.666667 0.000000 15.00000
4 21.33333 6.333333 2.666667 1.000000  0.00000

For details, refer to Chapter 11 of Grinstead and Snell (2002).

10.3.5 Time Reversible Markov Chain

Consider a stationary and ergodic Markov Chain with stationary distribution $c10-math-0216$ .

In simple words, for a stationary ergodic Markov Chain with $c10-math-0222$ , the backward chain $c10-math-0223$ is also a Markov Chain. An intuitive explanation of the phenomenon is that given the present state, the past and future states are independent events. The TPM of the reversed Markov Chain is given by $c10-math-0224$ :

10.17

The Gamblers random walk in a finite state space is an example of time reversible Markov Chain.

$c10-math-0226$

10.4 Application of Markov Chains in Computational Statistics

Modern computations are driven by hi-speed computers and without the latter some of the algorithms cannot be put to good use. In this section two of the famous Monte Carlo techniques will be discussed whose premise is in the usage of Markov Chains, viz., the Metropolis-Hastings algorithm and the Gibbs sampler. The current section relies on Chapter 10 of Ross (2006).¹ Robert and Casella (1999–2004) is also an excellent exposition for the two algorithms to be discussed here.

10.4.1 The Metropolis-Hastings Algorithm

Consider a finite sequence of positive numbers $c10-math-0227$ , for some large integer $c10-math-0228$ . The positive numbers may be interpreted as the weights of an RV $c10-math-0229$ taking the values $c10-math-0230$ . Define $c10-math-0231$ , and suppose that $c10-math-0232$ is a difficult number to compute. The PMF of $c10-math-0233$ is then given by

10.18

It will be seen in Chapter 11 that for large $c10-math-0235$ values, simulation from the probability distribution $c10-math-0236$ becomes a daunting task. The Metropolis-Hastings algorithm builds a time-reversible Markov Chain argument for simulation from $c10-math-0237$ . The requirement is then to find a Markov Chain with TPM $c10-math-0238$ , which is easier to simulate and its stationary distribution must be the same as $c10-math-0239$ . Let $c10-math-0240$ represent the TPM of an irreducible time-reversible Markov Chain. A Markov Chain $c10-math-0241$ , useful for simulation from $c10-math-0242$ , is set up as follows. Suppose that the current state is $c10-math-0243$ , that is, $c10-math-0244$ . Generate an RV $c10-math-0245$ with PMF $c10-math-0246$ , $c10-math-0247$ . Then $c10-math-0248$ is assigned the state $c10-math-0249$ with probability $c10-math-0250$ , or the state $c10-math-0251$ with probability $c10-math-0252$ . The simulation problem is solved if we can determine these $c10-math-0253$ probabilities. The TPM $c10-math-0254$ of the Markov Chain should thus satisfy the following condition:

10.19

10.20

The Markov Chain with TPM $c10-math-0257$ is a time reversible Markov Chain if

This relationship is satisfied for the choice of $c10-math-0259$ given by

10.21

The reader should verify the need of 1 in Equation 10.21! The Metropolis-Hastings algorithm for generation of a Markov Chain $c10-math-0261$ can be summarized as follows:

1. Select a time-reversible irreducible Markov Chain $c10-math-0262$ .
2. Choose an integer $c10-math-0263$ .
3. Set $c10-math-0264$ , and generate an RV $c10-math-0265$ such that $c10-math-0266$ .
4. Generate a random number $c10-math-0267$ between 0 and 1, see Section 11.2, and set $c10-math-0268$ if
10.22

else, $c10-math-0270$ .
5. Set $c10-math-0271$ and return to Step 2.

The quantity $c10-math-0272$ is called the Metropolis-Hastings acceptance probability.

Alternately, the Metropolis-Hastings algorithm can be stated for the continuous RVs case too, see Chapter 7 of Robert and Casella (1999–2004) for more details. Assume that $c10-math-0273$ represents the pdf of interest and that a conditional density $c10-math-0274$ is available, which is a dominating measure with respect to $c10-math-0275$ . The Metropolis-Hastings algorithm can be implemented in practice if the two conditions hold: (i) the density $c10-math-0276$ is known to the extent that the ratio $c10-math-0277$ is known up to a constant which is independent of $c10-math-0278$ , and (ii) the density $c10-math-0279$ is either explicitly available or symmetric in the sense of $c10-math-0280$ . The density $c10-math-0281$ is known as the target density, while $c10-math-0282$ is called the instrumental or proposal density. The Metropolis-Hastings algorithm starting with $c10-math-0283$ is then given by

1. Simulate $c10-math-0284$ .
2. Simulate $c10-math-0285$ as follows:
10.23

where

The Metropolis-Hastings algorithm will be illustrated following a discussion of the Gibbs sampler.

10.4.2 Gibbs Sampler

The Gibbs sampler is a particular case of the Metropolis-Hastings algorithm. However, its intuitive and appealing steps have made it more popular and thus wide applications are carried using it. The algorithm description is as follows.

Suppose $c10-math-0288$ is a (discrete) random vector with probability measure $c10-math-0289$ . Assume that the measure $c10-math-0290$ is specified up to a constant, that is, $c10-math-0291$ , where $c10-math-0292$ is a multiplicative constant. The Gibbs sampler deals with the problem of generating an observation from $c10-math-0293$ . The Gibbs sampler essentially uses the Metropolis-Hastings algorithm with the state space $c10-math-0294$ as $c10-math-0295$ . The transition probabilities in this state space are set up as follows. Assume that the present state is $c10-math-0296$ . A coordinate of the state space $c10-math-0297$ is selected at random, that is, an observation from the index 1, 2,…, $c10-math-0298$ , is selected as a sample from a discrete uniform distribution with $c10-math-0299$ number of points. The main assumption of the Gibbs sampler is that for any state $c10-math-0300$ and values $c10-math-0301$ , a random variable $c10-math-0302$ can be simulated with pmf

10.24

Now, the coordinate $c10-math-0304$ is selected at random and using the elements of $c10-math-0305$ an observation with value $c10-math-0306$ is simulated which will replace the previous $c10-math-0307$ value. Thus, the new state is $c10-math-0308$ . In other words, the Gibbs sampler uses the Metropolis-Hastings algorithm where

10.25

The necessity of the stationary distribution to be $c10-math-0310$ requires that the new vector $c10-math-0311$ be accepted as the new state with probability

10.26

Applications of the Metropolis-Hastings algorithm and Gibbs sampler will be described in the next subsection.

10.4.3 Illustrative Examples

Three examples for each of the algorithms will be discussed briefly. As a fitting tribute to the inventor of the algorithm, the next example of a random walk generation will be discussed, which was originally illustrated by Hastings (1970) in his breakthrough paper. The first two examples are from Robert and Casella (2004).

Example 10.4.1. Random Walk Generation

Random walk/Gamblers walk is seen in Section 10.3 as an important example of a Markov Chain. Consider the continuous version of random walk where the sequence of RVs $c10-math-0313$ is given by

Here, $c10-math-0315$ 's are generated independently of the $c10-math-0316$ 's and $c10-math-0317$ . In the original work, Hastings considers a uniform distribution on $c10-math-0318$ as the proposal density, see Robert and Casella (2004). Then, the probability of acceptance is

The R program is thus set up in the following.

> yMH <- 0
> Trials <- 15000
> delta <- .1
> for(i in 2:Trials){
+ z <- runif(1,-delta,delta)
+ alpha <- min(1,exp((yMH[i-1]ˆ2-zˆ2)/2))
+ wp <- runif(1)
+ ifelse(wp<alpha,yMH[i]<-z,yMH[i]<-yMH[i-1])
+ }
> plot.ts(yMH,main="A: Gamblers Walk")

The random walk begins at 0, yMH <- 0, and the proposal density is initiated as the uniform distribution with $c10-math-0320$ , runif(1,-delta,delta).² Now, at each (next) iteration of the random walk, a point z is generated in this interval and the Metropolis-Hastings acceptance probability alpha is calculated. A random observation is then generated in the unit interval and if the acceptance probability is higher than this number wp<alpha, the random walk is updated to that of yMH[i]<-z, or else the previous iteration yMH[i-1] is retained. The result of the program is given in Part A of Figure 10.2.

Figure 10.2 Metropolis-Hastings Algorithm in Action

Example 10.4.2. Simulation from Gamma Distribution

Simulation from gamma distribution is always a complex task, that is, until we resort to built-in modules. To simulate observations from Gamma $c10-math-0321$ , the Metropolis-Hastings algorithm is set up as follows. Assume that $c10-math-0322$ , where $c10-math-0323$ denotes the integer part of the argument. Next, generate $c10-math-0324$ . The Metropolis-Hastings algorithm for simulation from gamma distribution is given by the following:

1. Simulate an observation $c10-math-0325$ .
2. Set the next value of observation as follows:
where

The R program which implements this Metropolis-Hastings algorithm is given next.

> theta <- 2.3; k <- 1
> b <- floor(theta)/theta
> yMH <- 1/theta
> Trials <- 1e5
> for(i in 2:Trials){
+ z <- rgamma(1,shape=b,rate=k)
+ alpha <- min(1,((yMH[i-1]/z)*exp((z-yMH[i-1])/theta))ˆ(theta-floor(theta)))
+ wp <- runif(1)
+ ifelse(wp<alpha,yMH[i]<-z,yMH[i]<-yMH[i-1])
+ }
> hist(yMH,prob=TRUE,main="B: Gamma RV Simulation")
> curve(dgamma(x,shape=b,rate=theta),add=TRUE)

The algorithm is implemented as follows. First, the values of $c10-math-0328$ and $c10-math-0329$ are arbitrarily chosen as theta <- 2.3 and k <- 1. An initial value of yMH is selected as $c10-math-0330$ . The values of z and yMH[i] are easily seen to follow the steps of the Metropolis-Hastings algorithm. The actual gamma curve is imposed to check if the simulated values of the $c10-math-0331$ in the Metropolis-Hastings algorithm is a good sample or not. The final result is shown in Part B of Figure 10.2. $c10-math-0332$

Next, the use of Gibbs sampler will be illustrated.

Example 10.4.3. Generating n Random Points at d Distance in a Circle

Some of the probability problems are indeed interesting! Suppose that we are asked to find the probability that if a sample of $c10-math-0333$ points are randomly selected from a circle with unit radius, what is the probability that each of the points is at least at a $c10-math-0334$ distance from the other points? Though this problem can be completely solved using the Gibbs sampler, the problem of obtaining such a sample is discussed here. The answer may be obtained easily though and it is an exercise for the reader.

Now, without loss of generality, the circle may be assumed to have 0 as its origin. The Gibbs sample for obtaining the required sample distribution is implemented in the following algorithm:

1. Initialize $c10-math-0335$ points such that no two points are within a distance of $c10-math-0336$ .
2. Select one of the points from 1– $c10-math-0337$ at random, say $c10-math-0338$ .
3. Simulate a point at random in the circle.
4. If the simulated point is within a distance of $c10-math-0339$ with respect to any of the $c10-math-0340$ points except the $c10-math-0341$ th one, reject the simulated observation.
5. If the simulated point is not within $c10-math-0342$ distance of any of the $c10-math-0343$ points, replace the $c10-math-0344$ th point with the simulated point.
6. Repeat the above process a large number of times, say $c10-math-0345$ .
The final collection of $c10-math-0346$ points has the desired probability. The next program codes the above algorithm. First, a circle is drawn with the $c10-math-0347$ initial points and all the points are given at the end of the algorithm.

> radius <- 1
> ddist <- 0.1
> n <- floor(2*pi/ddist)
> GetRandomPoint <- function(radius){
+ rr <- runif(1,0,radius)
+ theta <- runif(1,0,2*pi)
+ return(c(rr*sin(theta),rr*cos(theta)))
+ }
> # Testing the working of GetRandomPoint
> windows(width=10,height=10)
> theta <- seq(0,2*pi,length.out=200)
> plot(radius*sin(theta),radius*cos(theta),"l",xlab="x",ylab="y")
> abline(h=c(-1,1),v=c(-1,1))
> InitialPoints <- cbind(radius*sin(seq(1:n)*(1/(2*pi))),radius*cos(seq(1:n)
+ *(1/(2*pi))))
> points(InitialPoints,col="red")
> # Gibbs sampling
> Trials <- 1000
> testCounter <- 0
> for(i in 1:Trials){
+ CurrPoints <- InitialPoints
+ testPoint <- GetRandomPoint(radius)
+ testIndex <- sample(1:n,1)
+ CurrPoints <- CurrPoints[-testIndex,]
+ CurrPoints <- rbind(testPoint,CurrPoints)
+ if(min(as.matrix(dist(CurrPoints,upper=TRUE))[1,-1])>=ddist){
+ InitialPoints[testIndex,] <- testPoint
+ testCounter <- testCounter+1
+ }
+ }
> points(InitialPoints,col="green")
> min(dist(CurrPoints))
[1] 0.1002729

If the distance $c10-math-0348$ is given, we can generate at most n <- floor(2*pi/ddist) points on the circumference of the circle. The function GetRandomPoint generates a random point within the unit circle around the origin. This is easily seen since rr <- runif(1,0,radius) has a random distance of less than or equal to the radius, while a random angle is generated with theta <- runif(1,0,2*pi). It is easy to see that the polar transformation leads to a random point in the unit circle. A circle is first generated on the graphics window, and its explanation may be followed by the reader. The InitialPoints are then plotted on the circumference of the circle and each point is colored as red.

The Gibbs sampler implementation is now explained. It is a difficult proposition to simulate the $c10-math-0349$ initial points and hence a deterministic choice is made here. Thus, the points are initialized with InitialPoints, where radius*sin(seq(1:n)*(1/(2*pi))) gives the $c10-math-0350$ -coordinate and radius*cos(seq(1:n)*(1/(2*pi))) returns the $c10-math-0351$ -coordinate. The Gibbs sampler implementation is indeed easier to follow here. The result is shown in Figure 10.3. $c10-math-0352$

Figure 10.3 Gibbs Sampler in Action

Example 10.4.4. Exponential RVs with Sum Greater than some c

Suppose $c10-math-0353$ . Define $c10-math-0354$ . The problem is to generate the random vector $c10-math-0355$ given that the event $c10-math-0356$ has been observed, where $c10-math-0357$ is a large positive number. The Gibbs sampling algorithm to solve this problem is given in the following:

1. Initialize a vector $c10-math-0358$ such that each $c10-math-0359$ and $c10-math-0360$ .
2. Select an observation at random, that is, choose an integer from 1, 2, …, $c10-math-0361$ . Suppose that the integer chosen is $c10-math-0362$ .
3. Simulate an observation from exponential distribution with rate $c10-math-0363$ , say $c10-math-0364$ , and set, see page 254 of Ross (2006), the value of $c10-math-0365$ as
where $c10-math-0367$ takes the value $c10-math-0368$ if $c10-math-0369$ and 0 otherwise.
4. Repeat the previous steps a large number of times.

The expression for $c10-math-0370$ needs an explanation. When $c10-math-0371$ , we need to simulate an exponential observation $c10-math-0372$ with rate $c10-math-0373$ conditional on the event $c10-math-0374$ . Equivalently, we need to simulate an observation of $c10-math-0375$ which is conditional on the event $c10-math-0376$ . A useful result here is that if an exponential RV is greater than a positive constant, then it is distributed as an exponential distribution (with rate $c10-math-0377$ ) plus that constant. Hence, the value of $c10-math-0378$ is chosen as that given in the above equation.

The implementation of the program is given in the following:

> # Exponential Sum S = sum_i=ˆn X_i> c
> n <- 15
> constant <- 200
> rate <- 1/(1:n)
> x0 <- sort((constant*rate)/sum(rate))
> Trials <- 1e5
> xGS <- x0
> for(i in 1:Trials){
+ currIndex <- sample(1:n,1)
+ Sum <- sum(xGS[-currIndex])
+ xGS[currIndex] <- max(constant-Sum,0)-log(runif(1))/rate[currIndex]
+ }
> xGS
 [1]   1.33068757   0.07708722     13.07502344   9.00714096
[12]   1.50248658  21.37157461  21.83680094 104.26333132
> 1/rate
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

In the above program the average of the $c10-math-0379$ -th exponential RV is $c10-math-0380$ . The program is again simple to follow and hence it is left to the reader to interpret it. $c10-math-0381$

The next problem is a continuation of the exponential RVs sum.

Example 10.4.5. Probability of Product of Exponential RVs

Suppose that $c10-math-0382$ , and that $c10-math-0383$ . Assume that the sum is observed to be 15 and the problem is to find the probability that the product exceeds 120, that is, we need to compute

This example is again adapted from Ross (2006), page 260. To obtain this probability, the Gibbs sampler will randomly select two observations from $c10-math-0385$ . Let us denote the two selected observations as $c10-math-0386$ and $c10-math-0387$ , which are independent with rates $c10-math-0388$ and $c10-math-0389$ . Without loss of generality, assume that $c10-math-0390$ . Then, the conditional distribution of $c10-math-0391$ given $c10-math-0392$ is

Thus, the conditional distribution of $c10-math-0394$ is again an exponential distribution with rate $c10-math-0395$ , which is conditioned to be less than $c10-math-0396$ . Hence, the conditional distribution of $c10-math-0397$ and $c10-math-0398$ , given the other values is the conditional distribution of two exponential distributions with rate $c10-math-0399$ and $c10-math-0400$ given that their sum is $c10-math-0401$ , where $c10-math-0402$ indicates that the elements $c10-math-0403$ and $c10-math-0404$ are removed from $c10-math-0405$ .Now, the values of $c10-math-0406$ and $c10-math-0407$ can be simulated by first simulating an observation from $c10-math-0408$ , which is conditional on $c10-math-0409$ and setting the value of $c10-math-0410$ to meet the constraint $c10-math-0411$ . This process is then repeated a large number of times, so that the proportion of times the product exceeds 120 is accepted as the required probability. The next R program implements this Gibbs sampling technique.

> itexp <- function(u, m, t) { -log(1-u*(1-exp(-t*m)))/m }
> rtexp <- function(n, m, t) { itexp(runif(n), m, t) }
> # http://www.r-bloggers.com/r-help-follow-up-truncated-exponential/
> rate <- 1/(1:5)
> x0 <- c(1.08,2.38,2.84,3.84,4.86)
> Sx0 <- sum(x0); Px0 <- 120
> Trials <- 1000; prodYes <- 0
> for(i in 1:Trials){
+ twoIndex <- sort(sample(1:5,2))
+ a <- sum(x0)-sum(x0[-twoIndex])
+ x0[twoIndex[1]] <- rtexp(1,rate[twoIndex[2]-rate[twoIndex[1]]],a)
+ x0[twoIndex[1]] <- Sx0-x0[twoIndex[1]]-sum(x0[-twoIndex])
+ if(prod(x0)>Px0) prodYes <- prodYes+1
+ }
> prob_beta <- prodYes/Trials
> prob_beta
[1] 0.317

Two functions are required to simulate observations from the truncated exponential distribution and these programs have been adapted from the link given in the program. The initial vector x0 is quite arbitrary. The twoIndex randomly draws two observations, and a performs the computation as described earlier. The reader can now complete the remainder of the description, and based on the 1000 trials, the estimated probability is 0.317.

The purpose of this section has been to introduce the two algorithms discussed here. However, it must be said that there is much more detail to the use and applications of these algorithms than even indicated here. However, it is the stochastic process part of these algorithms which is of interest here. The applications of these techniques to Bayesian inference will be considered in the next chapter.

$c10-math-0412$

10.5 Further Reading

Feller's two volumes are again useful for a host of theories and applications of stochastic processes. Doob (1953) is the first book on stochastic processes. Karlin and Taylor's (1975, 1981) two volumes have been a treatise on this subject. Taylor and Karlin (1998) is another variant of the two volumes by Karlin and Taylor. Bhattacharya and Waymire (1990–2009) has been found to be very useful by the students. Ross (1996) and Medhi (1992) are nice introductory texts.

Feldman and Valdez-Flores (2010) considers the modern applications of stochastic processes. Adke and Manjunath (1984) consider statistical inference related to the finite Markov process.

10.6 Complements, Problems, and Programs

Problem 10.1 The TPM of a gamblers walk consists of infinite states. Restricting the matrix over $c10-math-0413$ states, that is considering only the corresponding rows and columns and not the restricted gamblers walk, obtain the digraph using the sna package.
Problem 10.2 Using the msteptpm function, obtain $c10-math-0414$ for testtpm, testtpm2, and testtpm3 TPM's.
Problem 10.3 Carry out the canonical decomposition for testtpm3.
Problem 10.4 Find the stationary distribution for the Ehrenfest Markov Chain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.