Chapter 7. Sequential Data Models

As seen in the previous chapter, the Naive Bayes model does not make any assumptions regarding the order of events. The likelihood and prior probabilities are computed by aggregating counts. However, some applications such as text or voice recognition, language translation, or decoding, rely on an ordered sequence of states or events [7:1].

Chapter 3, Data Pre-processing, introduced some deterministic solutions to modeling sequences of data, collectively known as time series analysis. An alternative probabilistic solution relies on Markov process and models.

The broad universe of Markov models encompasses computational concepts such as the Markov decision process, discrete Markov, Markov chain Monte Carlo for Bayesian networks, and hidden Markov models.

The first section of this chapter introduces and describes the hidden Markov model along with the dynamic programming techniques used in the evaluation, decoding, and training of models for sequential events.

The second and last section of the chapter is dedicated to a discriminative model alternative to the hidden Markov model: conditional random fields. Our example leverages the open source CRF Java library authored by Sunita Sarawagi from the Indian Institute of Technology, Bombay [7:2].

Markov decision processes

This first section also describes the basic concepts you need to know to understand, develop, and apply the hidden Markov model, starting with the Markov property.

The Markov property

The Markov property is a characteristic of a stochastic process where the conditional probability distribution of a future state depends on the current state and not on its past states. In this case, the transition between the states occurs at a discrete time, and the Markov property is known as the discrete Markov chain.

The first-order discrete Markov chain

The following example is taken from Introduction to Machine Learning by E. Alpaydin [7:3].

Let's consider the following use case. N balls of different colors are hidden in N boxes (one each). The balls can have only three colors {Blue, Red, and Green}. The experimenter draws the balls one by one. The state of the discovery process is defined by the color of the latest ball drawn from one of the boxes: S0 = Blue, S1 = Red, and S2 = Green.

Let {π0, π1, π2} be the initial probabilities for having an initial set of colors in each of the boxes.

Let qt denote the color of the ball drawn at the time t. The probability of drawing a ball of color Sk at the time k after drawing a ball of the color Sj at the time j is defined as the following:

p(qt = Sk | qt-1 = Sj ) = ajk.

The probability to draw a red ball at the first attempt is p(qt0 = S1 ) = π1.

The probability to draw a blue ball in the second attempt is as follows:

p(q0 = S1 ) p(q1 = S0 |q0 = S1 ) = π1 a10.

The process is repeated to create a sequence of the state {St } = {Red, Blue, Blue, Green, …} with the following probability:

p(q0 = S1).p(q1 = S0 |q0 = S1 ).p(q2 = S0 |q1 = S0 ).p(q3 = S2 |q2 = S0 )… = π1 .a10 .a00 .a02

The sequence of states/colors can be represented as follows:

The first-order discrete Markov chain

Illustration of the ball and boxes example

Let's estimate the probabilities p using historical data (learning phase):

  • The estimation of the probability to draw a red ball (S1) in the first attempt is π1, which is computed as the number of sequences starting with S1 (red) / total number of balls.
  • The estimation of the probability of retrieving a blue ball in the second attempt is a10, the number of sequences for which a blue ball is drawn after a red ball/total number of sequences, and so on.

    Note

    Nth-order Markov:

    The Markov property is popular, mainly because of its simplicity. As you will discover while studying the hidden Markov model, having a state solely dependent on the previous state allows us to apply efficient dynamic programming techniques. However, some problems require dependencies between more than two states. These models are known as Markov random fields.

Although the discrete Markov process can be applied to trial and error types of applications, its applicability is limited to solving problems for which the observations do not depend on hidden states. Hidden Markov models are a commonly applied technique to meet such a challenge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.127.37