Chapter 10
Stochastic Processes

Package(s): sna

Dataset(s): testtpm, testtpm2, testtpm3

10.1 Introduction

Stochastic processes commonly refer to a collection of random variables. Such a collection may vary over time, sample number, geographical location, etc. It is to be noted that a stochastic process may consist of finite, countably infinite, or uncountably infinite collections of random variables. An important characteristic of the stochastic process is that it allows us to relax the assumption of independence for a sequence of random variables.

A few examples are in order.

Section 10.2 will ensure that the probability space for the stochastic process is properly defined. A very important class of stochastic process is the Markov Chains, and this will be detailed in Section 10.3. The chapter will close with a brief discussion of how Markov Chains are useful in the practical area of computational statistics in Section 10.4.

10.2 Kolmogorov's Consistency Theorem

As seen in the introductory section, a stochastic process is a collection of infinite random variables. It needs to be ensured that the probability measures are well defined for such a collection of RVs. An affirmative answer in this direction is provided by Prof Kolmogorov. The related definitions and theorems (without proof) have been adapted from Adke and Manjunath (1984) and Athreya and Lahiri (2005). We consider the probability space c10-math-0018 and follow it up with a sequence of RVs c10-math-0019. Here c10-math-0020 is any non-empty set. Then for any c10-math-0021, the random vector c10-math-0022 has a joint probability distribution c10-math-0023 over c10-math-0024, and here c10-math-0025 is the Borel c10-math-0026-field over c10-math-0027.

For any c10-math-0030, and any c10-math-0031, the family of fdds satisfies the following two conditions:

  1. [C1].
    10.3 equation
  2. [C2]. For any permutation c10-math-0033 of c10-math-0034:
    10.4 equation

Kolmogorov's consistency theorem addresses the issue that if there exists a family of distributions c10-math-0036 in finite dimensional Euclidean spaces, then there exists a real valued stochastic process c10-math-0037 whose fdds coincides with c10-math-0038.

Having being assured of the existence of probability measures for the stochastic processes, let us now look at the important family of stochastic processes: Markov Chains.

10.3 Markov Chains

In earlier chapters, we assumed the observations were independent. In many random phenomenon, the observations are not independent. As seen in the examples in Section 10.1, the maximum temperature of the current day may depend on the maximum temperature of the previous day. The sum of heads in c10-math-0055 trials depends on the corresponding sum in the first c10-math-0056 trials. Markov Chains are useful for tackling such dependent observations. In this section, we will be considering only discrete phenomenon.

To begin with, we need to define the state space associated with the sequence of random variables. The state space is the set of possible values taken by the stochastic process. In Example 10.1.1 of Section 10.1, the state space for the sum of heads in c10-math-0057 throws of a coin is c10-math-0058. The state space considered in this section is at most countably infinite. At time c10-math-0059, the stochastic process c10-math-0060 will take one of the values in c10-math-0061, that is, c10-math-0062.

This definition says that the probability of c10-math-0069 being observed in a state c10-math-0070, given the entire history c10-math-0071, depends only on the recent past state of c10-math-0072, and not on the history. The matrix array c10-math-0073 is called the Transition Probability Matrix, abbreviated as TPM, of the Markov Chain.

In the next subsection, we will consider how the states of a Markov Chain can be classified into a meaningful set of various characteristics.

10.3.1 The m-Step TPM

The TPM c10-math-0103 is a one-step transition probability array, namely, its elements give the probability of moving from state c10-math-0104 to state c10-math-0105 in the next step. The c10-math-0106-step transition probability of a movement from state c10-math-0107 to state c10-math-0108 is defined by

10.7 equation

Let c10-math-0110 denote the c10-math-0111-step transition probability matrix. It can be shown that

Equation 10.8 is based on the well-known Chapman-Kolmogorov equation. The Chapman-Kolmogorov lemma says that for any c10-math-0113

equation

We can easily use the Chapman-Kolmogorov relationship to obtain the c10-math-0115-step TPM of a Markov Chain. The R program for obtaining the c10-math-0116-step TPM through msteptpm is given in the next illustration.

10.3.2 Classification of States

In the Ehrenfest example we see that it is possible to move from a particlar state to one of its adjacent states only. However, it is possible for us to move from state 1 to states 3 and 4 in 2 and 3 steps. The question that then arises is how to identify the accessible states? Towards this discussion, a few definitions are required.

Accessibility is denoted by c10-math-0124.

Communication between two states is denoted by c10-math-0127. The collection of the states where each communicates with the other is said to belong to the same class. Some properties of communication, without proof, are listed below:

  • c10-math-0128.
  • If c10-math-0129, then c10-math-0130.
  • If c10-math-0131 and c10-math-0132, then c10-math-0133.

Irreducible Markov Chains are also called ergodic Markov Chains. A stronger requirement than irreducibility is given next.

It can be easily seen that the presence of an absorbing state implies that the Markov Chain is neither regular nor irreducible.

A state c10-math-0140 is said to have period c10-math-0141 if c10-math-0142 whenever c10-math-0143 is not divisible by c10-math-0144 and c10-math-0145 is the greatest integer with this property. The period of a state is denoted by c10-math-0146. If it is not possible to return to state c10-math-0147 and be retained in c10-math-0148 (starting from state c10-math-0149 of course), the period of the state c10-math-0150 is infinite. On the other hand, if a state has period 1, we call that state aperiodic.

Digraph is a powerful visualizing tool for understanding the accessible and communicating states of a Markov Chain. Here, we use the package sna for achieving this. Note that this package is developed for the purpose of an emerging field Social Network Analysis. The gplot function from this package is useful for our purpose though.

10.3.3 Canonical Decomposition of an Absorbing Markov Chain

In the previous discussion we have seen different types of TPM characteristics for the Markov Chains. In an irreducibile Markov Chain, all the states are recurrent states. However, if there is an absorbing state, as in testtpm, we may be interested in the following issues:

  1. 1. The probability of the process ending up in an absorbing state.
  2. 2. The average time for the Markov Chain to get absorbed.
  3. 3. The average time spent in each of the transient states.

The canonical decomposition helps with the answer to the above questions.

Arrange the states of an absorbing Markov Chain in the form of (TRANSIENT, ABSORBING). For example, reorder the states of testtpm2 as c10-math-0165. Let c10-math-0166 be the number of absorbing states and c10-math-0167 the number of transient states. The total number of states is c10-math-0168. Arrange the TPM as below:

10.9 equation

Here c10-math-0170 is an c10-math-0171 identity matrix, c10-math-0172 an c10-math-0173 zero matrix, c10-math-0174 a c10-math-0175 matrix, and c10-math-0176 a non-zero c10-math-0177 matrix. Let us closely look at the matrix c10-math-0178 and calculate c10-math-0179 for some large c10-math-0180. Note that it is very easy to rearrange the matrix in the required form in R.

> testtpm <- as.matrix(testtpm)
> testtpm <- testtpm[c(2,3,4,5,1,6),c(2,3,4,5,1,6)]
> Q <- testtpm[c(1:4),c(1:4)]
> R <- testtpm[c(1:4),c(5,6)]
> Q
     B     C    D    E
B 0.50 0.000 0.25 0.00
C 0.00 0.000 1.00 0.00
D 0.25 0.125 0.25 0.25
E 0.00 0.000 0.25 0.50
> R
       A      F
B 0.2500 0.0000
C 0.0000 0.0000
D 0.0625 0.0625
E 0.0000 0.2500
> testtpm
     B     C    D    E      A      F
B 0.50 0.000 0.25 0.00 0.2500 0.0000
C 0.00 0.000 1.00 0.00 0.0000 0.0000
D 0.25 0.125 0.25 0.25 0.0625 0.0625
E 0.00 0.000 0.25 0.50 0.0000 0.2500
A 0.00 0.000 0.00 0.00 1.0000 0.0000
F 0.00 0.000 0.00 0.00 0.0000 1.0000
> msteptpm(testtpm,n=100)[c(1:4),c(1:4)]
             B            C            D            E
B 1.635836e-10 3.124169e-11 2.022004e-10 1.635836e-10
C 2.499335e-10 4.773305e-11 3.089348e-10 2.499335e-10
D 2.022004e-10 3.861685e-11 2.499335e-10 2.022004e-10
E 1.635836e-10 3.124169e-11 2.022004e-10 1.635836e-10

We can easily then see that

10.10 equation

The Fundamental Matrix for an absorbing Markov chain is given by

10.11 equation

The elements c10-math-0183 of c10-math-0184 gives the expected number of times the process will be in transient state c10-math-0185 if the process started in state c10-math-0186. For the testtpm example, we have the answers in the output given below:

> N <- solve(diag(rep(1,nrow(Q)))-Q)
> N
          B         C        D         E
B 2.6666667 0.1666667 1.333333 0.6666667
C 1.3333333 1.3333333 2.666667 1.3333333
D 1.3333333 0.3333333 2.666667 1.3333333
E 0.6666667 0.1666667 1.333333 2.6666667

Starting from a transient state c10-math-0187, the expected number of steps before the Markov Chain is absorbed and the probabilities of it being absorbed into one of the absorbing states are respectively given by

10.12 equation
10.13 equation

where c10-math-0190 is an c10-math-0191 column of 1 and c10-math-0192 and c10-math-0193 are respectively the expected number of steps and probabilities of the absorption matrix. For our dummy example, the computation concludes with the program below.

> t <- N
> t
      [,1]
B 4.833333
C 6.666667
D 5.666667
E 4.833333
> B <- N %*% R
> B
     A    F
B 0.75 0.25
C 0.50 0.50
D 0.50 0.50
E 0.25 0.75

The three questions asked about an absorbing Markov Chain are also applicable to an Ergodic Markov Chain in a slightly different sense. This topic will be taken up next.

10.3.4 Stationary Distribution and Mean First Passage Time of an Ergodic Markov Chain

If we have an ergodic Markov Chain, we know that each state will be visited infinitely often. However, this implies that over the long run, the number of times it will be in a given state may be obtained.

A stationary distribution c10-math-0197 is a (left) eigenvector of the TPM whose associated eigenvalue is equal to one. For an ergodic Markov Chain, the next program gives us the stationary distribution.

> stationdistTPM <- function(M){
+       eigenprob <- eigen(t(M))
+       temp <- which(round(eigenprob$values,1)==1)
+       stationdist <- eigenprob$vectors[,temp]
+       stationdist <- stationdist/sum(stationdist)
+       return(stationdist)
+ }
> P <- matrix(nrow=3,ncol=3) # An example
> P[1,] <- c(1/3,1/3,1/3)
> P[2,] <- c(1/4,1/2,1/4)
> P[3,] <- c(1/6,1/3,1/2)
> stationdistTPM(P)
[1] 0.24 0.40 0.36

The function uses the eigen function to obtain the eigenvalues and eigenvectors.

The mean recurrence time of an ergodic Markov Chain is given by

10.16 equation

For the previous example:

> 1/stationdistTPM(P)
[1] 4.166667 2.500000 2.777778

We will next consider the concept of passage time.

The matrix of mean first passage time is denoted by c10-math-0207. We will next briefly state the formulas to obtain c10-math-0208. Let c10-math-0209 be a matrix where each row consists of the stationary probability vector. Define c10-math-0210, where c10-math-0211 is an identity matrix, and c10-math-0212 is a matrix where each row is the stationary probability vector c10-math-0213. The elements of c10-math-0214 are then given by

equation

For the Ehrenfest model, the mean recurrence times are given below:

> ehrenfest <- as.matrix(ehrenfest)
> w <- stationdistTPM(ehrenfest)
> W <- matrix(rep(w,each=nrow(ehrenfest)),nrow=nrow(ehrenfest))
> Z <- solve(diag(rep(1,nrow(ehrenfest)))-ehrenfest+W)
> M <- ehrenfest*0
> for(i in 1:nrow(ehrenfest)){
+     for(j in 1:nrow(ehrenfest)){
+ M[i,j] <- (Z[j,j]-Z[i,j])/W[j,j]
+ }
+ }
> M
         0        1        2        3        4
0  0.00000 1.000000 2.666667 6.333333 21.33333
1 15.00000 0.000000 1.666667 5.333333 20.33333
2 18.66667 3.666667 0.000000 3.666667 18.66667
3 20.33333 5.333333 1.666667 0.000000 15.00000
4 21.33333 6.333333 2.666667 1.000000  0.00000

For details, refer to Chapter 11 of Grinstead and Snell (2002).

10.3.5 Time Reversible Markov Chain

Consider a stationary and ergodic Markov Chain with stationary distribution c10-math-0216.

In simple words, for a stationary ergodic Markov Chain with c10-math-0222, the backward chain c10-math-0223 is also a Markov Chain. An intuitive explanation of the phenomenon is that given the present state, the past and future states are independent events. The TPM of the reversed Markov Chain is given by c10-math-0224:

10.17 equation

The Gamblers random walk in a finite state space is an example of time reversible Markov Chain.

c10-math-0226

10.4 Application of Markov Chains in Computational Statistics

Modern computations are driven by hi-speed computers and without the latter some of the algorithms cannot be put to good use. In this section two of the famous Monte Carlo techniques will be discussed whose premise is in the usage of Markov Chains, viz., the Metropolis-Hastings algorithm and the Gibbs sampler. The current section relies on Chapter 10 of Ross (2006).1 Robert and Casella (1999–2004) is also an excellent exposition for the two algorithms to be discussed here.

10.4.1 The Metropolis-Hastings Algorithm

Consider a finite sequence of positive numbers c10-math-0227, for some large integer c10-math-0228. The positive numbers may be interpreted as the weights of an RV c10-math-0229 taking the values c10-math-0230. Define c10-math-0231, and suppose that c10-math-0232 is a difficult number to compute. The PMF of c10-math-0233 is then given by

10.18 equation

It will be seen in Chapter 11 that for large c10-math-0235 values, simulation from the probability distribution c10-math-0236 becomes a daunting task. The Metropolis-Hastings algorithm builds a time-reversible Markov Chain argument for simulation from c10-math-0237. The requirement is then to find a Markov Chain with TPM c10-math-0238, which is easier to simulate and its stationary distribution must be the same as c10-math-0239. Let c10-math-0240 represent the TPM of an irreducible time-reversible Markov Chain. A Markov Chain c10-math-0241, useful for simulation from c10-math-0242, is set up as follows. Suppose that the current state is c10-math-0243, that is, c10-math-0244. Generate an RV c10-math-0245 with PMF c10-math-0246, c10-math-0247. Then c10-math-0248 is assigned the state c10-math-0249 with probability c10-math-0250, or the state c10-math-0251 with probability c10-math-0252. The simulation problem is solved if we can determine these c10-math-0253 probabilities. The TPM c10-math-0254 of the Markov Chain should thus satisfy the following condition:

10.19 equation
10.20 equation

The Markov Chain with TPM c10-math-0257 is a time reversible Markov Chain if

equation

This relationship is satisfied for the choice of c10-math-0259 given by

The reader should verify the need of 1 in Equation 10.21! The Metropolis-Hastings algorithm for generation of a Markov Chain c10-math-0261 can be summarized as follows:

  1. 1. Select a time-reversible irreducible Markov Chain c10-math-0262.
  2. 2. Choose an integer c10-math-0263.
  3. 3. Set c10-math-0264, and generate an RV c10-math-0265 such that c10-math-0266.
  4. 4. Generate a random number c10-math-0267 between 0 and 1, see Section 11.2, and set c10-math-0268 if
    10.22 equation

    else, c10-math-0270.

  5. 5. Set c10-math-0271 and return to Step 2.

The quantity c10-math-0272 is called the Metropolis-Hastings acceptance probability.

Alternately, the Metropolis-Hastings algorithm can be stated for the continuous RVs case too, see Chapter 7 of Robert and Casella (1999–2004) for more details. Assume that c10-math-0273 represents the pdf of interest and that a conditional density c10-math-0274 is available, which is a dominating measure with respect to c10-math-0275. The Metropolis-Hastings algorithm can be implemented in practice if the two conditions hold: (i) the density c10-math-0276 is known to the extent that the ratio c10-math-0277 is known up to a constant which is independent of c10-math-0278, and (ii) the density c10-math-0279 is either explicitly available or symmetric in the sense of c10-math-0280. The density c10-math-0281 is known as the target density, while c10-math-0282 is called the instrumental or proposal density. The Metropolis-Hastings algorithm starting with c10-math-0283 is then given by

  1. 1. Simulate c10-math-0284.
  2. 2. Simulate c10-math-0285 as follows:
    10.23 equation

    where

    equation

The Metropolis-Hastings algorithm will be illustrated following a discussion of the Gibbs sampler.

10.4.2 Gibbs Sampler

The Gibbs sampler is a particular case of the Metropolis-Hastings algorithm. However, its intuitive and appealing steps have made it more popular and thus wide applications are carried using it. The algorithm description is as follows.

Suppose c10-math-0288 is a (discrete) random vector with probability measure c10-math-0289. Assume that the measure c10-math-0290 is specified up to a constant, that is, c10-math-0291, where c10-math-0292 is a multiplicative constant. The Gibbs sampler deals with the problem of generating an observation from c10-math-0293. The Gibbs sampler essentially uses the Metropolis-Hastings algorithm with the state space c10-math-0294 as c10-math-0295. The transition probabilities in this state space are set up as follows. Assume that the present state is c10-math-0296. A coordinate of the state space c10-math-0297 is selected at random, that is, an observation from the index 1, 2,…, c10-math-0298, is selected as a sample from a discrete uniform distribution with c10-math-0299 number of points. The main assumption of the Gibbs sampler is that for any state c10-math-0300 and values c10-math-0301, a random variable c10-math-0302 can be simulated with pmf

10.24 equation

Now, the coordinate c10-math-0304 is selected at random and using the elements of c10-math-0305 an observation with value c10-math-0306 is simulated which will replace the previous c10-math-0307 value. Thus, the new state is c10-math-0308. In other words, the Gibbs sampler uses the Metropolis-Hastings algorithm where

10.25 equation

The necessity of the stationary distribution to be c10-math-0310 requires that the new vector c10-math-0311 be accepted as the new state with probability

10.26 equation

Applications of the Metropolis-Hastings algorithm and Gibbs sampler will be described in the next subsection.

10.4.3 Illustrative Examples

Three examples for each of the algorithms will be discussed briefly. As a fitting tribute to the inventor of the algorithm, the next example of a random walk generation will be discussed, which was originally illustrated by Hastings (1970) in his breakthrough paper. The first two examples are from Robert and Casella (2004).

Next, the use of Gibbs sampler will be illustrated.

The next problem is a continuation of the exponential RVs sum.

The purpose of this section has been to introduce the two algorithms discussed here. However, it must be said that there is much more detail to the use and applications of these algorithms than even indicated here. However, it is the stochastic process part of these algorithms which is of interest here. The applications of these techniques to Bayesian inference will be considered in the next chapter.

c10-math-0412

10.5 Further Reading

Feller's two volumes are again useful for a host of theories and applications of stochastic processes. Doob (1953) is the first book on stochastic processes. Karlin and Taylor's (1975, 1981) two volumes have been a treatise on this subject. Taylor and Karlin (1998) is another variant of the two volumes by Karlin and Taylor. Bhattacharya and Waymire (1990–2009) has been found to be very useful by the students. Ross (1996) and Medhi (1992) are nice introductory texts.

Feldman and Valdez-Flores (2010) considers the modern applications of stochastic processes. Adke and Manjunath (1984) consider statistical inference related to the finite Markov process.

10.6 Complements, Problems, and Programs

  1. Problem 10.1 The TPM of a gamblers walk consists of infinite states. Restricting the matrix over c10-math-0413 states, that is considering only the corresponding rows and columns and not the restricted gamblers walk, obtain the digraph using the sna package.

  2. Problem 10.2 Using the msteptpm function, obtain c10-math-0414 for testtpm, testtpm2, and testtpm3 TPM's.

  3. Problem 10.3 Carry out the canonical decomposition for testtpm3.

  4. Problem 10.4 Find the stationary distribution for the Ehrenfest Markov Chain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.219.166