Chapter 5. Sentiment Analysis in Text

One of the most powerful skills we can master in data mining is learning how to deal with large amounts of unstructured or semi-structured textual data. Textual data, sometimes just called text, is important because it is everywhere, and because it conveys so much detail about the human experience in so many formats: books, news media, journals, government reports, case law, e-mail messages, chat logs, product reviews, and so on. We also find text data in places we might not expect. For example, when the spoken word is written down it also becomes text, as do song lyrics and video transcripts. When we look at the code that makes up web pages and computer programs, we find text. When we need a computer to leave a record of what activities have transpired, we have it create a text log file. When we need a common, universally interoperable medium for communicating between devices, we often use plain text to do so.

Over the next few chapters, we will be exploring some of the ways that we can find patterns in text data, and in particular, we will look for patterns in natural human language text. First, in this chapter, we will learn how to detect the opinions or sentiments expressed in a text. This task, called sentiment analysis, helps us understand texts better by discerning the mood or tone of the human who wrote the text. We will learn:

  • What sentiment analysis is, and why we might care about it
  • How to understand some of the most common techniques for finding the sentiment in a text, and what software tools are available to implement these techniques
  • How to apply sentiment analysis to two real-world collections of text

What is sentiment analysis?

Many texts contain language that can be described as emotional. Whether to express the feelings of the writer, or to inspire a particular feeling in the reader, human language can convey anger, disappointment, disgust, joy, happiness, amusement, and so on. Discovering this type of emotional content can tell us a great deal about the writer, including what the writer's intention was and the expected response of the reader. Even noticing the absence of emotional content in a text can be interesting. Once we understand how to discern the emotional content of a text, or lack thereof, we can compare texts and writers to each other in terms of the emotional content, we can compare emotional content over time, and we can sometimes even predict how a reader will respond to a particular text.

Analyzing a text for its emotional content can take many forms. In this chapter, we will be primarily concerned with sentiment analysis, sometimes called opinion mining. Sentiment analysis looks for the subjective feelings presented in text by the writer, and attempts to label a text accordingly.

Common application areas for sentiment mining are product reviews (how do shoppers feel about this product?), and political pulse-taking (how do the voters feel about this candidate's position on an issue?). We can extract these feelings from texts of different lengths, for example from news articles, movie reviews, product reviews, tweets, e-mails, and text messages. Many sentiment analysis applications attempt to create a summary, or tally, of feelings, described in terms of polarity, such as positive/negative or like/dislike. Examples of summarized sentiment described in terms of polarity are:

  • 60% of tweets are positive about the candidate's speech on Issue X
  • 9 out of 10 reviews of this movie are negative
  • The users of chat room A used more negative language than the users of chat room B when discussing Issue X

Some of the difficulties we have with sentiment analysis are the same ones that humans have in communicating with each other. People use words differently, and often in unexpected ways. The meaning of words can change in subtle ways, which is both a blessing and a curse for communication. Words, sentences, and entire documents can certainly express multiple complex feelings, sometimes contradicting each other or negating and confirming each other in the same sentence. Training a computer program to detect the subtleties of language is a tricky task indeed.

There is a growing body of work describing the research and theory of sentiment analysis, but much of it is found in the academic literature and is quite dense. However, if you need more information on any of the concepts in this chapter, two of the classic references that I would recommend are Bo Pang and Lillian Lee's 2008 paper called Opinion mining and sentiment analysis (available on the second author's website at http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf) and Bing Liu's 2012 paper called Sentiment analysis and opinion mining (available at the author's site at https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.pdf). Both of these papers are considered classics in the field, and have been cited hundreds of times each. As survey papers, they both provide many links to other papers upon which this research field is based.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.86