The Markov chain

What do all the Markov-type models have in common? All of them are essentially based on the same stochastic process, which is better known as the Markov chain. Under this section, you will find a brief review of the fundamental ideas and properties that follow the Markov chain.

Even though this section relies on some statistical terms and matrixial notations, it is designed to be accessible, regardless of that.

Stochastic processes are frequently assumed to be independent and identically distributed (i.i.d.). Borrowing the dice analogy, this assumption rules that no matter what the past rolls were, they won't affect the likelihood (probability) of the next roll. The Markov chain won't assume i.i.d.; instead, it will define a set of systems' transitional probabilities, meaning that the likelihood of the next state (outcome) will be a function of the current state.

Using a matrixial notation, a system S with three states could be described as follows:

Where S₁, S₂, and S₃ represent different states of a system. States could be (almost) anything. They could tell whether the weather is sunny, rainy, or snowy, or whether an internet user is currently navigating through website A, B, or C. Given each state, Markov defines transitional probabilities, which lists the chances for the system's state in the next time step.

For the sake of simplicity, this chapter refers to the Markov chain as a time discrete process. The Markov chain can also be designed where time is taken as being continuous—a more realistic assumption.

A transitional probability can be expressed as a conditional chance:

Generically, the transitional probability p_ji describes the probability for the system to assume the state S_i during the time step t+1, given the state S_j during time step t. This elegant device, transitional probability, drops the independence assumption from the stochastic process, adding it to an extra layer of complexity.

Transitional probabilities are directly related to one of the most commented properties of the Markov chain. It asserts that the system's chance of getting in a certain state next time only depends on which state it is currently in. Thus, all of the past distance doesn't matter if you're willing to make inferences about the future.

Markov chains can also be designed to look into further time steps in the past.

For a system with n states, there will be n² transitional probabilities. These are frequently expressed as a matrix. Departing from the set of states S described previously, the transitional matrix (T) would look like the following:

Where p₁₂ is the probability for the system to jump into state two (S₂), given it was in the state 1 (S₁) before; p₃₂ would be the probability of jumping into state two (S₂), given it was in state three (S₃) before. No element should be negative and each row should sum one.

The transitional matrix could also be represented in an alternative way so that the columns will sum one. A matrix where no element is less than zero and each row (or column) sum one is called a Markov matrix.

Representing the transitional probabilities through the transition matrix is useful for computational purposes. On the other hand, there could be a history, which can be represented through the following diagram so that things would be much easier to understand. It might look a little bit silly but it's for a very good reason.

The tale goes like this: there was a little polygon called Circled Squared Triangle (CST). The little CST was cursed by a witch, she said, every morning you may or may not turn yourself into a different shape and the odds will only be related to your current shape:

Figure 9.1: CST diagram representation

The preceding diagram illustrates both how the Markov chain works and the tale about CST. Keeping in mind that this is a time-ordered process, the diagram displays all the basic elements from a Markov process. There is a set of states—square (S₁), triangle (S₂), and circle (S₃)—to iterate from, and a set of ruling probabilities.

Markov chains frequently come along with a vector of initial probabilities (π).

Although the diagram is a great way of explaining how the whole process goes on, the matrix is the usual way to represent it. The matrix form is also useful to do computations. To draw the transitional matrix, the rows have to sum one. The current example's transitional matrix would look like the following:

A matrix for which all the rows individually total one is called a Markovian matrix. The element a₁₂ (first row, second column) shows that if the system is at state one (row one), there is a 65% chance that it will turn into state two (column two) during the next time step. The whole idea is pretty straightforward, but what is it used for?

In a Markov process, it's possible for a state to culminate in that very same state in the next step. This is called a self-transition. Probabilities for self-transitions are displayed at the main diagonal from the transitional matrix.

To begin with, you can use it to answer questions such as, How likely is it to get the sequence 1, 1, 2, 3, 1? Analytically-wise, the Markov chain is able to give you the chance of a particular sequence of states to happen. You can either assume a transitional matrix to do so or easily estimate one. But there is far more to it than that.

Generally speaking, Markov models can aid in almost any kind of time-related pattern discovery process: this can be guessed from the previous explanation. So, what would be the next letter that would be typed next, based on the letter that would be entered now? What would be the video that would be watched next based on the current one? All of these are questions that could be answered through Markov models.

The real question is: does a sequence of events produce a pattern that can be modeled and exploited?

Basically, if you can model the subject as a sequence of events, there is a great chance to uncover a pattern using Markov, if there is a relevant pattern to be discovered. This explains why this kind of model has been used for such wide fields of research, from French poetry all the way to finance. There are some really cool applications of Markov in finance, which we will approach later in this chapter.

Before we dive into the practical example, there is another Markov type-model worthy of a mention: the HMM. While regular Markov models assume that the real system's state is directly observable, HMMs go the other way. For the latter, the actual system's states are not directly observable, instead, you can only look at a variable that will be probabilistically related to the system's states.

Think of it in this way. In regular Markov, the states of a system could be the weather—sunny, cloudy, or rainy—and such information would be available steadily, you can peak at the weather. For hidden Markov, the states of a system would be the weather and you could only look at your neighbors' outfit to infer whether the weather is sunny, cloudy, or rainy.

In the HMM, the states are invisible (hidden), but they generate a set of observations that are probabilistically related to the invisible (hidden) states. These observations are called visible states or simply observations.

The name hidden Markov is very self-explanatory when looking from this prism. HMMs are regular Markov with an extra layer of complexity. Besides the transition probabilities (T) and the vector of initial probabilities (π), the HMM also requires a matrix of observation probabilities (B), which dictates the probabilities of each invisible state S_i generating each visible state M_j.

HMM models can be expressed as a function of T, π, and B.

Essentially, Markov models, both hidden and regular, are used to solve two kinds of problems:

Evaluation problem: Consists of estimating the probability for a model M generating a given sequence of states. The HMM will evaluate the probability given a visible sequence instead.
Learning problem: It is a problem where the parameters of a Markov model (transitional matrix, initial probabilities, observation probabilities) have to be learned/estimated from a training sample.

Additionally, there is an extra kind of problem being solved exclusively through the HMM. It's called a decoding problem, which consists of inferring the most likely sequence of hidden states based on an observed sequence of visible states (observations).

Think about a word recognition problem. One way of tackling it is by building a single HMM for all the words. The hidden states would be all the characters in the alphabet. Transition and initial probabilities would be calculated from a language model, while observations would be made of typed characters segmented from images.

Now, all that is left is to determine the best sequence of hidden states produced by an image; that is, word recognition. The solution just described looks at the word recognition problem as a decoding problem. The proposed model is only one out of several ways of solving it using the HMM.

How to solve problems like the ones discussed before? Brute force might get things done for when there are only a few possibilities to explore. On the other hand, if the possibilities are plentiful, there are more clever and efficient ways of doing it:

Evaluation problems can be solved through forward-backward HMM algorithms
Decoding problems are most easily solved by Viterbi algorithms
Learning problems can be solved through the Baum—Welch algorithm

Although the HMM is actually applied to as many fields as there are, it is heavily used in finance. The next section shows how you can build and analyze the HMM through finance data using R. Keep in mind that you shouldn't invest your money based on that alone. Investing is always risky and you need to study a lot before trusting any protocol—there are no shortcuts.

Table of Contents for The Markov chain

Create new playlist

Sign In

Sign Up

Table of Contents for
The Markov chain