Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Building Hidden Markov Models for sequential data

The Hidden Markov Models (HMMs) are really powerful when it comes to sequential data analysis. They are used extensively in finance, speech analysis, weather forecasting, sequencing of words, and so on. We are often interested in uncovering hidden patterns that appear over time.

Any source of data that produces a sequence of outputs could produce patterns. Note that HMMs are generative models, which means that they can generate the data once they learn the underlying structure. HMMs cannot discriminate between classes in their base forms. This is in contrast to discriminative models that can learn to discriminate between classes but cannot generate data.

Getting ready

For example, let's say that we want to predict whether the weather will be sunny, chilly, or rainy tomorrow. To do this, we look at all the parameters, such as temperature, pressure, and so on, whereas the underlying state is hidden. Here, the underlying state refers to the three available options: sunny, chilly, or rainy. If you wish to learn more about HMMs, check out this tutorial at https://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf.

We will use hmmlearn to build and train HMMs. Make sure that you install this before you proceed. You can find the installation instructions at http://hmmlearn.readthedocs.org/en/latest.

How to do it…

Create a new Python file, and import the following packages:

import datetime

import numpy as np
import matplotlib.pyplot as plt
from hmmlearn.hmm import GaussianHMM

from convert_to_timeseries import convert_data_to_timeseries

We will use the data from a file named data_hmm.txt that is already provided to you. This file contains comma-separated lines. Each line contains three values: a year, a month, and a floating point data. Let's load this into a NumPy array:
```
# Load data from input file
input_file = 'data_hmm.txt'
data = np.loadtxt(input_file, delimiter=',')
```
Let's stack the data column-wise for analysis. We don't need to technically column-stack this because it's only one column. However, if you had more than one column to analyze, you can use this structure:
```
# Arrange data for training 
X = np.column_stack([data[:,2]])
```
Create and train the HMM using four components. The number of components is a hyperparameter that we have to choose. Here, by selecting four, we say that the data is being generated using four underlying states. We will see how the performance varies with this parameter soon:
```
# Create and train Gaussian HMM 
print "
Training HMM...."
num_components = 4
model = GaussianHMM(n_components=num_components, covariance_type="diag", n_iter=1000)
model.fit(X)
```

Run the predictor to get the hidden states:

# Predict the hidden states of HMM 
hidden_states = model.predict(X)

Compute the mean and variance of the hidden states:

print "
Means and variances of hidden states:"
for i in range(model.n_components):
    print "
Hidden state", i+1
    print "Mean =", round(model.means_[i][0], 3)
    print "Variance =", round(np.diag(model.covars_[i])[0], 3)

As we discussed earlier, HMMs are generative models. So, let's generate, for example, 1000 samples and plot this:

# Generate data using model
num_samples = 1000
samples, _ = model.sample(num_samples) 
plt.plot(np.arange(num_samples), samples[:,0], c='black')
plt.title('Number of components = ' + str(num_components))

plt.show()

The full code is given in the hmm.py file that is already provided to you. If you run the code, you will see the following figure:
You can experiment with the n_components parameter to see how the curve gets nicer as you increase it. You can basically give it more freedom to train and customize by allowing a larger number of hidden states. If you increase it to 8, you will see the following figure:
If you increase this to 12, it will get even smoother:
In the Terminal, you will get the following output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Building Hidden Markov Models for sequential data

Create new playlist

Sign In

Sign Up

Building Hidden Markov Models for sequential data

Getting ready

How to do it…

Table of Contents for
Building Hidden Markov Models for sequential data