Chapter 6. Classification II – Sentiment Analysis

For companies, it is vital to closely monitor the public reception of key events such as product launches or press releases. With real-time access and easy accessibility of user-generated content on Twitter, it is now possible to do sentiment classification of tweets. Sometimes also called opinion mining, it is an active field of research in which several companies are already selling their products. As this shows that a market obviously exists, we have motivation to use our classification muscles built in the previous chapter to build our own home-grown sentiment classifier.

Sketching our roadmap

Sentiment analysis of tweets is particularly hard because of Twitter's size limitation of 140 characters. This leads to a special syntax, creative abbreviations, and seldom well-formed sentences. The typical approach of analyzing sentences, aggregating their sentiment information per paragraph and then calculating the overall sentiment of a document, therefore, does not work here.

Clearly, we will not try to build a state-of-the-art sentiment classifier. Instead, we want to:

  • Use this scenario as a vehicle to introduce yet another classification algorithm: Naive Bayes
  • Explain how Part Of Speech (POS) tagging works and how it can help us
  • Show some more tricks from the scikit-learn toolbox that come in handy from time to time
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.82.217