The Conditional Random Fields (CRFs) are probabilistic models used to analyze structured data. They are frequently used to label and segment sequential data. CRFs are discriminative models as opposed to HMMs, which are generative models. CRFs are used extensively to analyze sequences, stocks, speech, words, and so on. In these models, given a particular labeled observation sequence, we define a conditional probability distribution over this sequence. This is in contrast with HMMs where we define a joint distribution over the label and the observed sequence.
HMMs assume that the current output is statistically independent of the previous outputs. This is needed by HMMs to ensure that the inference works in a robust way. However, this assumption need not always be true! The current output in a time series setup, more often than not, depends on previous outputs. One of the main advantages of CRFs over HMMs is that they are conditional by nature, which means that we are not assuming any independence between output observations. There are a few other advantages of using CRFs over HMMs. CRFs tend to outperform HMMs in a number of applications, such as linguistics, bioinformatics, speech analysis, and so on. In this recipe, we will learn how to use CRFs to analyze sequences of letters.
We will use a library called pystruct
to build and train CRFs. Make sure that you install this before you proceed. You can find the installation instructions at https://pystruct.github.io/installation.html.
import os import argparse import cPickle as pickle import numpy as np import matplotlib.pyplot as plt from pystruct.datasets import load_letters from pystruct.models import ChainCRF from pystruct.learners import FrankWolfeSSVM
C
value as an input argument. C
is a hyperparameter that controls how specific you want your model to be without losing the power to generalize:def build_arg_parser(): parser = argparse.ArgumentParser(description='Trains the CRF classifier') parser.add_argument("--c-value", dest="c_value", required=False, type=float, default=1.0, help="The C value that will be used for training") return parser
class CRFTrainer(object):
init
function to initialize the values:def __init__(self, c_value, classifier_name='ChainCRF'): self.c_value = c_value self.classifier_name = classifier_name
if self.classifier_name == 'ChainCRF': model = ChainCRF()
self.clf = FrankWolfeSSVM(model=model, C=self.c_value, max_iter=50) else: raise TypeError('Invalid classifier type')
def load_data(self): letters = load_letters()
X, y, folds = letters['data'], letters['labels'], letters['folds'] X, y = np.array(X), np.array(y) return X, y, folds
# X is a numpy array of samples where each sample # has the shape (n_letters, n_features) def train(self, X_train, y_train): self.clf.fit(X_train, y_train)
def evaluate(self, X_test, y_test): return self.clf.score(X_test, y_test)
# Run the classifier on input data def classify(self, input_data): return self.clf.predict(input_data)[0]
def decoder(arr):
alphabets = 'abcdefghijklmnopqrstuvwxyz' output = '' for i in arr: output += alphabets[i] return output
main
function and parse the input arguments:if __name__=='__main__': args = build_arg_parser().parse_args() c_value = args.c_value
C
value:crf = CRFTrainer(c_value)
X, y, folds = crf.load_data()
X_train, X_test = X[folds == 1], X[folds != 1] y_train, y_test = y[folds == 1], y[folds != 1]
print " Training the CRF model..." crf.train(X_train, y_train)
score = crf.evaluate(X_test, y_test) print " Accuracy score =", str(round(score*100, 2)) + '%'
print " True label =", decoder(y_test[0]) predicted_output = crf.classify([X_test[0]]) print "Predicted output =", decoder(predicted_output)
crf.py
file that is already provided to you. If you run this code, you will get the following output on your Terminal. As we can see, the word is supposed to be "commanding". The CRF does a pretty good job of predicting all the letters:3.149.238.159