K-fold cross-validation

The input data is split into K parts where one is reserved for testing, and the other K-1 for training. This process is repeated K times and the evaluation metrics are averaged. This helps in determining how well a model would generalize to new datasets.

In our example, we have labeled 96 observations in three classes (positive, negative, and neutral). We used 80 as a training set and 16 observations (17%) as a test set. Many tweets are ambiguous for sentiment classification even for human beings. Therefore, we would expect the performance in terms of precision of around 80%.

We have split our tests into three parts:

Training set 83% - Test set 17%
Cross validation
Qualitative verbatim evaluation

print("Naive Bayes") 
print(classification_report(test_labels, nb.predict(test_vectors))) 
print(confusion_matrix(test_labels, nb.predict(test_vectors))) 
predicted = cross_val_predict(nb, train_vectors, train_labels, cv=10) 
print("Cross validation %s" % accuracy_score(train_labels, predicted))

The first test showed a precision of 75%, which is acceptable for a dataset with few labels:

Naive Bayes	precision	recall	f1-score	support
negative	0.80	0.50	0.62	8
neutral	1.00	0.20	0.33	5
positive	0.20	0.67	0.31	3
avg / total	0.75	0.44	0.47	16

In terms of k-fold cross-validation, we obtained the results of around 73% of precision:

Cross validation = 0.7375: Thus, the human check of the sentiment of the tweets looks very promising. We have extracted some random verbatims to illustrate the results.
Positive: The success of the Premier League, with its record-breaking takings is impacting in Europe https://t.co/JDulKICszb
Neutral: Arsenal and Manchester United home fixtures moved https://t.co/kyg7H1H6BN, #saintsfc
Negative: Wenger's future at Arsenal plunged into further uncertainty as Palace profit https://t.co/gvbbdLH9gi

As you can see, certain verbatims are too ambiguous for even humans to correctly interpret the sentiment, so a perfect sentiment analysis algorithm is unrealistic. In the cases when we analyze content on a specific topic, such as football in this chapter, creating a custom sentiment analysis algorithm is a good idea. In the case of mixed content, where the topic is not evident, one can use a readily available open source module. However, when building a custom algorithm, it's critical to use validation techniques to be sure of a minimum accuracy.

Table of Contents for K-fold cross-validation

Create new playlist

Sign In

Sign Up

Table of Contents for
K-fold cross-validation