Building Hidden Markov Models for sequential data

The Hidden Markov Models (HMMs) are really powerful when it comes to sequential data analysis. They are used extensively in finance, speech analysis, weather forecasting, sequencing of words, and so on. We are often interested in uncovering hidden patterns that appear over time.

Any source of data that produces a sequence of outputs could produce patterns. Note that HMMs are generative models, which means that they can generate the data once they learn the underlying structure. HMMs cannot discriminate between classes in their base forms. This is in contrast to discriminative models that can learn to discriminate between classes but cannot generate data.

Getting ready

For example, let's say that we want to predict whether the weather will be sunny, chilly, or rainy tomorrow. To do this, we look at all the parameters, such as temperature, pressure, and so on, whereas the underlying state is hidden. Here, the underlying state refers to the three available options: sunny, chilly, or rainy. If you wish to learn more about HMMs, check out this tutorial at https://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf.

We will use hmmlearn to build and train HMMs. Make sure that you install this before you proceed. You can find the installation instructions at http://hmmlearn.readthedocs.org/en/latest.

How to do it…

  1. Create a new Python file, and import the following packages:
    import datetime
    
    import numpy as np
    import matplotlib.pyplot as plt
    from hmmlearn.hmm import GaussianHMM
    
    from convert_to_timeseries import convert_data_to_timeseries
  2. We will use the data from a file named data_hmm.txt that is already provided to you. This file contains comma-separated lines. Each line contains three values: a year, a month, and a floating point data. Let's load this into a NumPy array:
    # Load data from input file
    input_file = 'data_hmm.txt'
    data = np.loadtxt(input_file, delimiter=',')
  3. Let's stack the data column-wise for analysis. We don't need to technically column-stack this because it's only one column. However, if you had more than one column to analyze, you can use this structure:
    # Arrange data for training 
    X = np.column_stack([data[:,2]])
  4. Create and train the HMM using four components. The number of components is a hyperparameter that we have to choose. Here, by selecting four, we say that the data is being generated using four underlying states. We will see how the performance varies with this parameter soon:
    # Create and train Gaussian HMM 
    print "
    Training HMM...."
    num_components = 4
    model = GaussianHMM(n_components=num_components, covariance_type="diag", n_iter=1000)
    model.fit(X)
  5. Run the predictor to get the hidden states:
    # Predict the hidden states of HMM 
    hidden_states = model.predict(X)
  6. Compute the mean and variance of the hidden states:
    print "
    Means and variances of hidden states:"
    for i in range(model.n_components):
        print "
    Hidden state", i+1
        print "Mean =", round(model.means_[i][0], 3)
        print "Variance =", round(np.diag(model.covars_[i])[0], 3)
  7. As we discussed earlier, HMMs are generative models. So, let's generate, for example, 1000 samples and plot this:
    # Generate data using model
    num_samples = 1000
    samples, _ = model.sample(num_samples) 
    plt.plot(np.arange(num_samples), samples[:,0], c='black')
    plt.title('Number of components = ' + str(num_components))
    
    plt.show()
  8. The full code is given in the hmm.py file that is already provided to you. If you run the code, you will see the following figure:
    How to do it…
  9. You can experiment with the n_components parameter to see how the curve gets nicer as you increase it. You can basically give it more freedom to train and customize by allowing a larger number of hidden states. If you increase it to 8, you will see the following figure:
    How to do it…
  10. If you increase this to 12, it will get even smoother:
    How to do it…
  11. In the Terminal, you will get the following output:
    How to do it…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.102.90