Summary

In this chapter, we introduced ourselves to one of the very active areas of research in machine learning, namely the field of probabilistic graphical models. These models involve using a graphical structure to encode conditional independence relations between random variables. We saw how Bayes' Theorem, a very simple formula that essentially tells us how we can predicate cause by observing effect, can be used to build a simple classifier known as the Naïve Bayes classifier. This is a simple model where we are trying to predict an output class that best explains a set of observed features, all of which are assumed to be independent of each other given the output class.

We used this model to predict user sentiment on a set of movie reviews where the features were the words that were present in the reviews. Although we obtained reasonable accuracy, we found that the assumptions in our model are quite strict and prevent us from doing substantially better. Often, a Naïve Bayes model is built during the modeling process to provide us with a baseline performance that we know we should exceed with more sophisticated models.

We also studied Hidden Markov models, which are models typically used to label and predict sequences. Every position in the sequence is comprised of a hidden state and an observation emitted from that state. The key assumption of the model is that every state is independent of the entire sequence history, given the state that immediately preceded it. In addition, all observations are independent of each other as well as all other states in the sequence, given the state from which they were emitted.

When we have labeled sequences, we can train a hidden Markov model by using state transition and symbol emission counts obtained from the data themselves. It is also possible to train an unsupervised HMM using a very smart algorithm known as the Baum-Welch algorithm. Even though we did not dive into the algorithmic details, we saw an example of how this works in practice by training an HMM on sequences of characters in English words.

From this, we saw that the resulting model picked up on some interesting properties of language. Incidentally, even though we did not mention it, it is also possible to train a Naïve Bayes model with missing class labels, this time using the EM algorithm. Despite also having relatively strict independence assumptions, HMMs are quite powerful and have been successfully applied to a wide variety of applications from speech processing to molecular biology.

In the next chapter, we will look at analyzing and making predictions on time series. Many real-world applications involve taking measurements over a particular period of time and using them to make predictions about the future. For example, we might want to predict tomorrow's weather based on the weather today, or tomorrow's stock market index based on market fluctuations over the past few weeks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.220.83