The Hidden Markov Models (HMMs) are really powerful when it comes to sequential data analysis. They are used extensively in finance, speech analysis, weather forecasting, sequencing of words, and so on. We are often interested in uncovering hidden patterns that appear over time.
Any source of data that produces a sequence of outputs could produce patterns. Note that HMMs are generative models, which means that they can generate the data once they learn the underlying structure. HMMs cannot discriminate between classes in their base forms. This is in contrast to discriminative models that can learn to discriminate between classes but cannot generate data.
For example, let's say that we want to predict whether the weather will be sunny, chilly, or rainy tomorrow. To do this, we look at all the parameters, such as temperature, pressure, and so on, whereas the underlying state is hidden. Here, the underlying state refers to the three available options: sunny, chilly, or rainy. If you wish to learn more about HMMs, check out this tutorial at https://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf.
We will use hmmlearn
to build and train HMMs. Make sure that you install this before you proceed. You can find the installation instructions at http://hmmlearn.readthedocs.org/en/latest.
import datetime import numpy as np import matplotlib.pyplot as plt from hmmlearn.hmm import GaussianHMM from convert_to_timeseries import convert_data_to_timeseries
data_hmm.txt
that is already provided to you. This file contains comma-separated lines. Each line contains three values: a year, a month, and a floating point data. Let's load this into a NumPy array:# Load data from input file input_file = 'data_hmm.txt' data = np.loadtxt(input_file, delimiter=',')
# Arrange data for training X = np.column_stack([data[:,2]])
# Create and train Gaussian HMM print " Training HMM...." num_components = 4 model = GaussianHMM(n_components=num_components, covariance_type="diag", n_iter=1000) model.fit(X)
# Predict the hidden states of HMM hidden_states = model.predict(X)
print " Means and variances of hidden states:" for i in range(model.n_components): print " Hidden state", i+1 print "Mean =", round(model.means_[i][0], 3) print "Variance =", round(np.diag(model.covars_[i])[0], 3)
1000
samples and plot this:# Generate data using model num_samples = 1000 samples, _ = model.sample(num_samples) plt.plot(np.arange(num_samples), samples[:,0], c='black') plt.title('Number of components = ' + str(num_components)) plt.show()
hmm.py
file that is already provided to you. If you run the code, you will see the following figure:n_components
parameter to see how the curve gets nicer as you increase it. You can basically give it more freedom to train and customize by allowing a larger number of hidden states. If you increase it to 8
, you will see the following figure:12
, it will get even smoother:3.148.102.90