As seen in the previous chapter, the Naive Bayes model does not make any assumptions regarding the order of events. The likelihood and prior probabilities are computed by aggregating counts. However, some applications such as text or voice recognition, language translation, or decoding, rely on an ordered sequence of states or events [7:1].
Chapter 3, Data Pre-processing, introduced some deterministic solutions to modeling sequences of data, collectively known as time series analysis. An alternative probabilistic solution relies on Markov process and models.
The broad universe of Markov models encompasses computational concepts such as the Markov decision process, discrete Markov, Markov chain Monte Carlo for Bayesian networks, and hidden Markov models.
The first section of this chapter introduces and describes the hidden Markov model along with the dynamic programming techniques used in the evaluation, decoding, and training of models for sequential events.
The second and last section of the chapter is dedicated to a discriminative model alternative to the hidden Markov model: conditional random fields. Our example leverages the open source CRF Java library authored by Sunita Sarawagi from the Indian Institute of Technology, Bombay [7:2].
This first section also describes the basic concepts you need to know to understand, develop, and apply the hidden Markov model, starting with the Markov property.
The Markov property is a characteristic of a stochastic process where the conditional probability distribution of a future state depends on the current state and not on its past states. In this case, the transition between the states occurs at a discrete time, and the Markov property is known as the discrete Markov chain.
The following example is taken from Introduction to Machine Learning by E. Alpaydin [7:3].
Let's consider the following use case. N balls of different colors are hidden in N boxes (one each). The balls can have only three colors {Blue, Red, and Green}. The experimenter draws the balls one by one. The state of the discovery process is defined by the color of the latest ball drawn from one of the boxes: S0 = Blue, S1 = Red, and S2 = Green.
Let {π0, π1, π2} be the initial probabilities for having an initial set of colors in each of the boxes.
Let qt denote the color of the ball drawn at the time t. The probability of drawing a ball of color Sk at the time k after drawing a ball of the color Sj at the time j is defined as the following:
p(qt = Sk | qt-1 = Sj ) = ajk.
The probability to draw a red ball at the first attempt is p(qt0 = S1 ) = π1.
The probability to draw a blue ball in the second attempt is as follows:
p(q0 = S1 ) p(q1 = S0 |q0 = S1 ) = π1 a10.
The process is repeated to create a sequence of the state {St } = {Red, Blue, Blue, Green, …} with the following probability:
p(q0 = S1).p(q1 = S0 |q0 = S1 ).p(q2 = S0 |q1 = S0 ).p(q3 = S2 |q2 = S0 )… = π1 .a10 .a00 .a02 …
The sequence of states/colors can be represented as follows:
Let's estimate the probabilities p using historical data (learning phase):
Nth-order Markov:
The Markov property is popular, mainly because of its simplicity. As you will discover while studying the hidden Markov model, having a state solely dependent on the previous state allows us to apply efficient dynamic programming techniques. However, some problems require dependencies between more than two states. These models are known as Markov random fields.
Although the discrete Markov process can be applied to trial and error types of applications, its applicability is limited to solving problems for which the observations do not depend on hidden states. Hidden Markov models are a commonly applied technique to meet such a challenge.
3.133.127.37