Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

R. Li, A. NakanoSimulation with Pythonhttps://doi.org/10.1007/978-1-4842-8185-7_2

2. Markov Chain, a Peek into the Future

Rongpeng Li¹ and Aiichiro Nakano¹

(1)

Los Angeles, CA, USA

In this chapter, we continue our exploration in the world of simulation. Different from the previous Monte Carlo simulation where the scenario is purely static, which means there is no dynamics in the simulation, we are going to study dynamics of a system.

The Markov chain, specifically the discrete-time Markov chain, is named after Russian mathematician Andrey Andreyevich Markov. He is a pioneer in the study of stochastic processes and the first to introduce the concept of Markov chains.

Let’s introduce the Markov chain with a simple example of weather forecasting.

Weather Forecasting

Suppose we have a weather forecasting system that predicts the weather in the next hour. The weather can only take three possibilities: sunny, cloudy, or rainy. Here, we call these possibilities the states. We won’t predict the weather continuously but rather forecast the weather in the next hour. This makes our system discrete.

The continuous-time Markov chain is beyond the scope of this book. It requires more rigorous analysis. However, the fundamental ideas of the discrete-time Markov chain remain unchanged.

Weather will change so there is a probability that a sunny day will turn into a cloudy day. Similarly, a cloudy day will turn into a rainy day, etc. We can define the transition probabilities in Table 2-1. The columns represent the current weather states, and the rows represent the next hour’s.

Table 2-1

Weather transition probabilities

	Sunny	Cloudy	Rainy
Sunny (next hour)	$frac{1}{2}$	$frac{1}{3}$	$frac{2}{3}$
Cloudy (next hour)	$frac{1}{3}$	$frac{1}{3}$	$frac{1}{6}$
Rainy (next hour)	$frac{1}{6}$	$frac{1}{3}$	$frac{1}{6}$

The way to interpret the table is to read it column-wise. For example, if the current weather is sunny, then the probability of the next hour’s weather being cloudy is $frac{1}{4}$ . This is indicated by the second row in the first column (besides the row name column).

We can also denote this probability by the transition probability notation. Let’s use s_i to denote the state of the weather at hour i. Then the previous transition probability can be denoted as follows:

${P}_{sunny-> cloudy}=Pleft({s}_1= cloudyleft|;{s}_0= sunny ight. ight)=frac{1}{3}$

We used the notation of conditional probability in the expression P(s₁ = cloudy| s₀ = sunny) . It simply means that given the weather at the previous hour is sunny, which is a condition, the probability of the weather at the next hour being cloudy is $frac{1}{3}$ .

With the definition of transition probability, we can use a graph to represent the weather forecast. The graph shown in Figure 2-1 illustrates this.

Figure 2-1
A graph representing the weather forecast based upon transition probability

Well, what does this even mean? How can we use the table or the graph to forecast weather? The idea is quite simple; we start from a current weather, a.k.a. a state, then transition to other possible states according to the transition probabilities. For example, if the current state is sunny, then at the next hour, we have a chance of $frac{1}{2}$ to remain sunny, a chance of $frac{1}{3}$ to become cloudy, etc. This looks easy so far. How about the next weather? We have to combine all possible trajectories to do the forecast. For example, the sunny state can be achieved from three different trajectories, from being sunny, being cloudy, and being rainy, which gives us the following combined probability:

$Pleft({s}_2= sunny ight)=frac{1}{2};frac{1}{2}+frac{1}{3};frac{1}{3}+frac{1}{6};frac{2}{3}=frac{17}{36}$

It is still possible to continue the calculation manually, but we would like to leave the labor to computers.

Formally speaking, a system like we just introduced must satisfy two important properties to be a Markov chain:

1.
The first one is called the Markovian or memoryless property. It means that the system will only remember the immediate past state but not further. For example, the weather forecast system will only remember and use the current weather to forecast the next hour’s weather but not previous hours’ weathers.
2.
The second one should be treated as a simplification, which can be removed if you want to match real-world scenarios. It is that our Markov transition probabilities are fixed regardless of the time index i. In real life, as seasons change, our transition probabilities should change.

The following is the code snippet to automate the calculation. Notice that I already used the matrix notation to represent the transition probability:

sunny_to = {"sunny":1/2, "cloudy":1/3,"rainy":1/6}

cloudy_to = {"sunny":1/3, "cloudy":1/3,"rainy":1/3}

rainy_to = {"sunny":2/3, "cloudy":1/6,"rainy":1/6}

state = {"sunny":1, "cloudy":0,"rainy":0}

weathers = ["sunny","cloudy","rainy"]

for _ in range(10):

next_state = {}

for weather in weathers:

next_state[weather] = (sunny_to[weather] * state["sunny"] +

cloudy_to[weather] * state["cloudy"] +

rainy_to[weather] * state["rainy"])

state = next_state

print(state)

The result looks like the following. You can check that the probabilities roughly sum up to unit 1:

{'sunny': 0.5, 'cloudy': 0.3333333333333333, 'rainy': 0.16666666666666666}

{'sunny': 0.4722222222222222, 'cloudy': 0.3055555555555556, 'rainy': 0.2222222222222222}

{'sunny': 0.4861111111111111, 'cloudy': 0.2962962962962963, 'rainy': 0.2175925925925926}

{'sunny': 0.4868827160493827, 'cloudy': 0.2970679012345679, 'rainy': 0.21604938271604937}

{'sunny': 0.48649691358024694, 'cloudy': 0.2973251028806584, 'rainy': 0.21617798353909465}

{'sunny': 0.48647548010973934, 'cloudy': 0.29730366941015085, 'rainy': 0.21622085048010972}

{'sunny': 0.4864861968449931, 'cloudy': 0.29729652491998165, 'rainy': 0.21621727823502512}

{'sunny': 0.4864867922191738, 'cloudy': 0.29729712029416244, 'rainy': 0.21621608748666357}

{'sunny': 0.4864864945320834, 'cloudy': 0.29729731875222265, 'rainy': 0.2162161867156937}

{'sunny': 0.48648647799391176, 'cloudy': 0.29729730221405093, 'rainy': 0.21621621979203703}

In the nested for loop, we implemented the logic that one state can be achieved through multiple paths. For example, the weather after 3 hours can be essentially achieved through 3³ = 27 different paths.

The preceding code can be written in a much more concise way using matrix notation as follows. If you do the math, you will see that the matrix multiplication operation matches the system evolution operation exactly. To run the following code, you need to install the numpy library as np per the Python community convention:

tm = np.array([[1/2,1/3,2/3],

[1/3,1/3,1/6],

[1/6,1/3,1/6]])

state = np.array([1,0,0])

for _ in range(10):

state = tm@state

print(state)

Now, let’s visualize the probabilities of each weather as illustrated in Figure 2-2.

Figure 2-2
Weather state probabilities for ten iterations, starting with sunny weather

What’s going on here? Why are the probabilities stale after, say, the third iteration? Before moving on to the explanation, let’s take a look at another initial condition. How about starting with rainy weather? You can check the simulation result in Figure 2-3.

Figure 2-3
Weather state probabilities for ten iterations, starting with rainy weather

Indeed, it looks like not only we reach a stable distribution, the distribution is also independent of our initial weather.

Eigenstates of Markov Chains

Is there anything special about the stable probability distribution $$ overrightarrow{x}approx left[0.486,0.297,0.216
ight] $$ ? It turns out that it is an eigenvector of the transition matrix with an eigenvalue of 1. In other words, it represents the eigenstate of the Markov chain. One can actually decompose any square matrix to find out its eigenvalues and eigenvectors. You can use numpy.linalg.eig(tm)[0] to obtain the eigenvalues of the matrix tm and use numpy.linalg.eig(tm)[1] to obtain the corresponding eigenvectors.

Note that eigenstate has a special meaning in quantum mechanics. Here, I just borrow the word as it makes sense in this context. It also has other names like stationary distribution, equilibrium distribution, limit distribution, etc.

First, let’s take a look at the transition matrix as already being used in the previous simulation. We use $overrightarrow{x^{hbox{'}}}$ and $$ overrightarrow{x} $$

to denote the new (next hour) and old (current hour) states, respectively. The transition matrix is denoted by T:

${displaystyle egin{array}{l}kern0.72em overrightarrow{x^{hbox{'}}}=Toverrightarrow{x}\ {}(1)\ {}kern1.2em =left[egin{array}{ccc}frac{1}{2}& frac{1}{3}& frac{2}{3}\ {}frac{1}{3}& frac{1}{3}& frac{1}{6}\ {}frac{1}{6}& frac{1}{3}& frac{2}{6}end{array} ight];left(egin{array}{c}0.48648649\ {}0.2972973\ {}0.21621622end{array} ight)\ {}(2)\ {}kern1.08em approx left(egin{array}{c}0.48648649\ {}0.2972973\ {}0.21621622end{array} ight)\ {}(3)\ {}kern1.08em =overrightarrow{x}\ {}(4)end{array}}$

Note that the calculation ignores some negligible precision-caused numerical errors.

This equation tells us once our system reaches the eigenstate, it will remain there for the rest of the transition. This means that after several hours, the probability of being sunny will be about 50% regardless of the current weather. The matrix notation also enlightens us that the evolution of the system can also be written as the multiplication of a series of matrices. For simplicity, I reuse the notation of s_n. You have already seen s_i, right? They are basically the same thing.

${displaystyle egin{array}{l}kern0.84em {s}_n=T;{s}_{n-1}\ {}(1)\ {}kern1.44em ={T}^2;{s}_{n-2}\ {}(2)\ {}kern1.44em ={T}^n;{s}_0\ {}(3)\ {}kern1.56em approx left[egin{array}{ccc}0.49& 0.49& 0.49\ {}0.30& 0.30& 0.30\ {}0.22& 0.22& 0.22end{array} ight];{s}_0\ {}(4)end{array}}$

As you can see, no matter our initial state s₀ is, as long as the elements sum up to 1, the output, s_n is the eigenstate. This is a remarkable fact.

You may ask whether all transition matrices give such a nice property. The answer is no. Here is an example:

tm = np.array([[0, 0, 1],

[1, 0, 0],

[0, 1, 0]])

state = np.array([1,0,0])

for _ in range(10):

state = tm@state

print(state)

Without running this code, can you guess what the states look like? Our transition matrix basically says that a sunny day will definitely become a cloudy day and a cloudy day will become a rainy day, etc. You may expect the following output:

[0 1 0]

[0 0 1]

[1 0 0]

[0 1 0]

...

Another issue is the rate of convergence. The preceding example shows that sometimes the probabilities may never converge. Here is another transition matrix that converges slower than the earlier one we discussed. Feel free to try it on your own.

$left[egin{array}{ccc}0.00& 0.05& 0.98\ {}0.99& 0.00& 0.02\ {}0.01& 0.95& 0.00end{array} ight]$

Exercise

1.
Find out the eigenstates of the transition matrix with a slower convergence rate.
2.
All of our previous simulation is based on analytical probability calculation. Can you do a simulation that just picks a random weather and evolves it according to the transition probabilities? Let’s say you do it for 10,000 steps, which is roughly 400 days, and count how many of these 10,000 data points are sunny, cloudy, and rainy. What’s your expectation? Does your discovery agree with your expectation?

Markov Chain Applications

In this section, let’s look at two interesting applications of the Markov chain. First, let’s look at how the Markov chain can be used to answer a nontrivial probability question. Then, we will use the Markov chain as a generative model to generate some natural languages.

A Random Walk That Has an End

Suppose you have a fruit-loving tortoise that moves in a tube. The tube is 7 inches long. At the left end, there is a banana, and at the right end, there is an apple. Now, the tortoise starts at a position that is 3 inches away from the left end, which means it is closer to the banana than the apple by 1 inch. The tortoise can only move 1 inch per minute, and it moves randomly to the left or to the right with equal probability. The tortoise is so active that it will move every minute until it reaches one of the fruits. The setting can be visualized as in Figure 2-4.

Figure 2-4
A graph that represents the states the tortoise can be in

The question is, what are the probabilities that the tortoise eventually reaches the banana and apple?

Spend some time to think about the question yourself. Here are some intuitions and observations:

1.
Given enough time, intuitively the tortoise will reach one of the fruits. Our tube is just 7 inches long, and the tortoise just keeps moving.
2.
The tortoise should have a higher, probably not much, probability of reaching the banana than the apple. The setting is not symmetric.

Let’s perform a set of simulation runs to directly simulate such a system and evaluate such probabilities:

def tortoise_run(state = 3, left_prob = 0.5):

steps = 0

while state % 7 != 0:

if np.random.random() < left_prob:

state -= 1

else:

state += 1

steps += 1

if state == 0:

return steps, 0

else:

return steps, 7

We can run the simulation for 10,000 times and count how many times the tortoise reaches the banana and apple. I can plot the result with the following code:

simulations = [tortoise_run() for _ in range(10000)]

bananas = np.array([x[1] == 0 for x in simulations]).cumsum()

apples = np.array([x[1] == 7 for x in simulations]).cumsum()

with plt.xkcd():

fig, ax = plt.subplots(figsize=(8,6))

plt.plot(bananas/(bananas+apples), lw=3,label="bananas")

plt.plot(apples/(bananas+apples), lw=3,label="apples")

plt.legend()

plt.title("Probabilities of Reaching Bananas/Apples")

plt.xlabel("# of runs")

The result looks like Figure 2-5.

Figure 2-5
Probabilities that the tortoise reaches bananas or apples for 10,000 runs

It does agree with our intuition that the tortoise is slightly more likely to reach the banana. If you check the end of the bananas/(bananas+apples) array, you will find that the probability of reaching the banana is about 0.575.

As an in-chapter exercise, you can plot a histogram to check the distribution of the number of steps before the tortoise stops moving. I am going to leave this as an exercise. Your result should look like the one in Figure 2-6. Note that the x axis is in log scale.

Figure 2-6
Number of steps before the tortoise stops moving

Now, let’s use our knowledge of the Markov chain to find the exact probabilities in the original question. If you think about it, the tortoise’s random walk can be treated as a Markov chain that has the following transition matrix. The dimension matches the number of possible states.

A matrix of probabilities. Values are arranged in 8 columns and 8 rows enclosed within brackets. The digit 0 occupies maximum positions except for the following. Row 1: column 1, 1; The positions of 1 over 2 are as follows. Column 2, rows 1 and 3. Column 3, rows 2 and 4. Columns 4, rows 3 and 5. Columns 5, rows 4 and 6. Columns 6, rows 5 and 7.

This Markov chain is different from the weather one mainly because it has two so-called absorbing states. The absorbing states are the left and right ends of the tube. Once the tortoise reaches one of the ends, it stops moving. The first and the last element in the transition matrix diagonal are 1 which indicate the absorbing states.

Now, let’s find out the eigenstate of the multiple-step transition matrix. The idea is that we treat multiple continuous transitions, say, 50 steps, as one transition and consider its transition matrix. You can find the transition matrix for 50 steps using the one-liner reduce(lambda x,y:x@y,[tortoise_tm for _ in range(50)]). If you change 50 to a larger number, the matrix remains largely the same up to a precision error.

A matrix of probabilities. Values are arranged in 8 columns and 8 rows enclosed within brackets. The positions of values are as follows. Row 1: 1.00, 0.86, 0.71, 0.57, 0.43, 0.28, 0.14, and 0.00. Row 8: 0.00, 0.14, 0.28, 0.43, 0.57, 0.71, 0.86, 1.00. Rows 2 to 7 contain 0.00 in all the columns.

The symmetry is pretty clear. The closer the tortoise is to the left end, the more likely it will be to reach the banana. The highest probability is 0.86. This agrees with our intuition as the tortoise does have a chance to reach for the apple although the banana is just 1 inch away. We multiply the transition matrix with the initial state numpy.array([0,0,0,1,0,0,0,0]) to get the probability of reaching the banana eventually. Note that it does agree with our simulation performed earlier.

Alright. This is the end of our first section. You have seen how to use the Markov chain to predict the weather and simulate a hungry tortoise movement. Next, let’s try to use the Markov chain as a generative model to write some poems.

Sonnet Written by Drunk Shakespeare

A Markov chain can be used to model human language as a simplistic first approach. Human language has intrinsic patterns such that the probabilities that a certain word follows another word are very different. A Markov chain fits in this scenario perfectly.

Let’s try to grab some sonnets from Shakespeare and turn his text into a Markov chain with corresponding probabilities.

First, we need to process the raw text by removing the punctuation, etc. We are going to use a third-party library called “nltk” to do the lemmatization. However, we will build the Markov chain on our own. If you are interested in comparing your result with output from mature libraries, check this repository. The text file that contains the sonnets is the sonnet.txt file, which is provided in the associated GitHub repo. You can also find the sonnets on the Project Gutenberg website.

Lemmatization is a process of converting a word or phrase into its base form. For example, the word “dogs” is converted to “dog,” and the word “went” is converted to “go.” This is helpful because our sonnet dataset is not that large that reducing words with different forms, but similar meaning, to the same one can centralize our transition probabilities somehow. You are free to explore the nonlemmatized version and compare the differences.

Here is a sample from the text file:

From fairest creatures we desire increase,

That thereby beauty’s rose might never die,

But as the riper should by time decease,

His tender heir might bear his memory:

But thou contracted to thine own bright eyes,

Feed’st thy light’s flame with self-substantial fuel,

Making a famine where abundance lies,

Thy self thy foe, to thy sweet self too cruel:

Thou that art now the world’s fresh ornament,

And only herald to the gaudy spring,

Within thine own bud buriest thy content,

And, tender churl, mak’st waste in niggarding:

Pity the world, or else this glutton be,

To eat the world’s due, by the grave and thee.

The sonnets are separated with empty lines. The following code will preprocess each sentence, which can be incomplete, to lowercases and punctuation-free:

# sonnet preprocessing

import string

import nltk

from nltk.stem.wordnet import WordNetLemmatizer

# nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

sonnet_path = "../../code_examples/chap2/sonnet.txt"

with open(sonnet_path,"r") as fp:

sonnets = fp.readlines()

sonnets = [sentence.strip().lower().replace("'s","") for sentence in sonnets if sentence != " " ]

sonnets = ["".join([char for char in sentence if char not in string.punctuation]) for sentence in sonnets]

Note that you may need to uncomment the wordnet download line to download additional data. The sonnets variable is a list of sentences. We are simplifying the problem by only handling lowercases and ignoring the punctuation like beauty’s, which becomes beauty.

Now, let’s build a defaultdict to store the counting from each word to its next word existing in the sonnets. Such word pairs are also called bigrams.

from collections import defaultdict

transition_dict = defaultdict(lambda : defaultdict(int))

for sentence in sonnets:

words = list(filter(lambda x: len(x.strip()) > 0, sentence.split(" ")))

words_pairs = [(lemmatizer.lemmatize(words[i]),lemmatizer.lemmatize(words[i+1])) for i in range(len(words)-1)]

for (word_from, word_to) in words_pairs:

transition_dict[word_from][word_to] += 1

Before using the transition_dict to generate sentences, let’s take a look at the paths for the bigrams. The following visualization simply uses the width of edges to represent the frequency of the bigram. As it is not possible to show all edges, we use the count_threshold variable to control the number of transitional edges a word can have to connect to the next word:

# sonnet

from graphviz import Digraph

count_threshold = 10

# dot, fdp, neato, circo, twopi, and osage.

G = Digraph('G',format='png',engine='neato')

font_size = "300"

G.attr('graph', pad='1', ranksep='1', nodesep='1')

G.attr('node', shape='circle', fixedsize='true')

G.attr(overlap="false")

for word_from in transition_dict.keys():

for word_to in transition_dict[word_from].keys():

lw = transition_dict[word_from][word_to]

if lw > count_threshold:

G.edge(word_from, word_to, penwidth=str(lw*0.1),label=str(lw))

G.view("sonnet")

The result looks like a big spider web as in Figure 2-7.

Figure 2-7
The transitional relationship between words

You can try to manipulate the value of the edge threshold. You will be able to see more fine structures at the cost of more overlapping edges.

Now, we can use the transition_dict to generate a sentence. Here is an example that I start with i and keep adding words to the sentence until there is no matching in the transition_dict:

def sample_according_to_value(to_dict:dict):

# return a key that corresponds to the value as frequency

keys, freqs = [],[]

for key,val in to_dict.items():

keys.append(key)

freqs.append(val)

freqs = np.array(freqs)

freqs = freqs/sum(freqs)

return np.random.choice(keys,p=freqs)

def generate_sentence(start_word = "i", transition_dict = transition_dict, hard_limit = 14):

word = start_word

sentence = []

while word in transition_dict and len(sentence) < hard_limit:

sentence.append(word)

word = sample_according_to_value(transition_dict[word])

sentence.append(word)

return " ".join(sentence)

generate_sentence()

I got the following:

i praise that keep thee lie onward and lovely argument too much a foe commend.

I set a hard limit of 14 words in a sentence and generated a sonnet. The modification is trivial so I leave it to you. Here is what I got:

doom and the rest forgot upon my mistress over wrack

always write of their rank before the dead wood whose worth in it in them

wear this book of forebemoaned moan the rose have given thee i my heart right

hindmost hold his memory death eternal slave to register and wrinkle graven there reign love

authorizing thy part wa false painting set a far remote where your soundless deep a

audit canst thou not be it not for my lameness and by that million of

brain inhearse

epitaph to whom thou take them say thy picture sight would have any be brought

leapt with the gentle thou hast thou dost advance

tonguetied muse brings forth your broad main doth nightly make the blazon of that heaven

added feather to lay upon thy beauty wear this purpose laid by this shall burn

tempest and no matter then did i may be taken

shower are mine eye corrupt by thy lusty day they themselves a the world common

wrong than hawk and beauty thou be scorned like her wish i see what merit

Not bad, isn’t it? Just look at this sentence: tempest and no matter then did i may be taken. It reads like something Shakespeare would write when he was drunk.

Alrighty. This concludes this chapter. Before moving on to our next chapter, here are some exercises.

Exercise

1.
Can you adjust the transition probability matrix of the tortoise’s movement to make it more realistic? For example, the tortoise can perhaps smell the fruit’s scent when it’s near and move toward it. How will the change impact the result?
2.
For the tortoise problem, if you try to find the eigenvalues of the transition matrix, you should see two identical eigenvalues and some others. What are the repeated eigenvalues? What is the implication of the repeated eigenvalues?
3.
Can you try trigram instead of bigram in the text generation?
4.
Try to use a large corpus to improve the accuracy of your text generation model. For example, nltk has a large corpus of English texts. You can download the corpus from here.

Summary

We continue our journey of simulating using randomness. We studied the Markov chain model and learned the mathematics behind it. We applied the Markovian model to weather prediction, absorbing state cases, and the sonnet writing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Markov Chain, a Peek into the Future

Create new playlist

Sign In

Sign Up

2. Markov Chain, a Peek into the Future

Weather Forecasting

Eigenstates of Markov Chains

Exercise

Markov Chain Applications

A Random Walk That Has an End

Sonnet Written by Drunk Shakespeare

Exercise

Summary

Table of Contents for
2. Markov Chain, a Peek into the Future