K-fold cross-validation

The input data is split into K parts where one is reserved for testing, and the other K-1 for training. This process is repeated K times and the evaluation metrics are averaged. This helps in determining how well a model would generalize to new datasets.

In our example, we have labeled 96 observations in three classes (positive, negative, and neutral). We used 80 as a training set and 16 observations (17%) as a test set. Many tweets are ambiguous for sentiment classification even for human beings. Therefore, we would expect the performance in terms of precision of around 80%.

We have split our tests into three parts:

  • Training set 83% - Test set 17%
  • Cross validation
  • Qualitative verbatim evaluation
print("Naive Bayes") 
print(classification_report(test_labels, nb.predict(test_vectors))) 
print(confusion_matrix(test_labels, nb.predict(test_vectors))) 
predicted = cross_val_predict(nb, train_vectors, train_labels, cv=10) 
print("Cross validation %s" % accuracy_score(train_labels, predicted)) 

The first test showed a precision of 75%, which is acceptable for a dataset with few labels:

Naive Bayes

precision

recall

f1-score

support

negative

0.80

0.50

0.62

8

neutral

1.00

0.20

0.33

5

positive

0.20

0.67

0.31

3

avg / total

0.75

0.44

0.47

16

In terms of k-fold cross-validation, we obtained the results of around 73% of precision:

  • Cross validation = 0.7375: Thus, the human check of the sentiment of the tweets looks very promising. We have extracted some random verbatims to illustrate the results.
  • Positive: The success of the Premier League, with its record-breaking takings is impacting in Europe https://t.co/JDulKICszb
  • Neutral: Arsenal and Manchester United home fixtures moved https://t.co/kyg7H1H6BN, #saintsfc
  • Negative: Wenger's future at Arsenal plunged into further uncertainty as Palace profit https://t.co/gvbbdLH9gi

As you can see, certain verbatims are too ambiguous for even humans to correctly interpret the sentiment, so a perfect sentiment analysis algorithm is unrealistic. In the cases when we analyze content on a specific topic, such as football in this chapter, creating a custom sentiment analysis algorithm is a good idea. In the case of mixed content, where the topic is not evident, one can use a readily available open source module. However, when building a custom algorithm, it's critical to use validation techniques to be sure of a minimum accuracy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.76.89