Sentiment polarity – data and classification

Social media mining primarily involves the following two steps:

  1. Identifying and retrieving content related to the topic of interest.
  2. Measuring the polarity of each datum.

The first step, message retrieval, requires some a priori insight into the topic of interest. The goal of message retrieval is to seek out only the messages or pieces of text that contain sentiment-laden content related to a particular topic. This topic could be almost anything of interest, subject to the constraint that information exists about it on public social media. For instance, in Chapter 3, Mining Twitter with R, we examined the topic Big Data, and in Chapter 6, Social Media Mining – Case Studies, we delve into social issues such as abortion and the economy. Atmospherics, that is, data gathered in an effort to track local sentiments with regards to economic, cultural, or political topics, can also be analyzed, as we do in Chapter 6, Social Media Mining – Case Studies. Lest readers think that social media is too diffuse to be useful, as of the writing of this book, at least one hedge fund uses atmospherics gleaned from Twitter to gauge stock prices.

To gather data, we generally collect content that contains a manually specified (set of) keyword(s). This is called the target. For example, the target for presidential approval would use the topic keyword obama. We may wish to add context to analyses done on particular keywords by adding additional opposing or specifying keywords. For example, in addition to obama, we could add romney to provide a counterpoint if we were studying the 2012 presidential election campaigns. Depending on the purpose of our analysis, we could jointly search for, say, obama and economy to target more specific subjects.

Topic models represent a second, more sophisticated, and potentially more thorough way of capturing bits of text that are relevant to a particular analysis. These models take very large sets of documents as their inputs and group them probabilistically into estimated topics. That is, each document is proclaimed to be a mixture of one or more topics that are themselves estimated from the data. This allows users to find texts that are related to a topic, though they may not explicitly use a particular keyword. The details of this class of statistical models are outside the scope of this text; however, in Appendix, Conclusions and Next Steps, we point readers to references on the theory and estimation of this exciting new class of tools.

Social data mining is the detection of attitudes, and the easiest way to understand it is through the following structure:

sentiment = {data source, source, target, sentiment, polarity}}

The parameters are explained in detail as follows:

  • Data source: This relates to understanding the source of the data; that is, is the source a sentence or an entire document? Twitter or a blog?
  • Source or holder: This is the one that expresses a sentiment or an opinion,
  • Target or aspect: The target or aspect is what or to whom the sentiment is directed toward.
  • Type of sentiment: This is the type(s) of emotion(s) expressed, that is, like, love, hate, value, desire, and so on.
  • Polarity: These are juxtapositional sentiments on a dimension, that is, positive or negative.

The following examples highlight these components and also some of the challenges involved in sentiment analysis. We have parts of two reviews: one about Steven Spielberg and another about John Carpenter. In both examples, the data source is the Internet Movie Database (IMDB) that considers itself the world's most popular and authoritative source for movie, TV, and celebrity content. The holder is the one who wrote the review, and the targets are Steven Spielberg and John Carpenter respectively. However, the target is complicated by mentions of various movies over time. Also, complicating matters is the variety of sentiment types and polarities.

Steven Spielberg's second epic film on World War II is an unquestioned masterpiece. Spielberg, ever the student on film, has managed to resurrect the war genre by producing one of its grittiest and most powerful entries. He also managed to cast this era's greatest answer to Jimmy Stewart, Tom Hanks, who delivers a performance that is nothing short of an astonishing miracle for about 160 out of its 170 minutes; Saving Private Ryan is flawless, literally!

There was a time when John Carpenter was a great horror director. Of course, his best film was 1978's masterpiece Halloween; however, he also made The Fog in the 1980s and 1987's underrated Prince of Darkness. Even, Heck made a good film, In the mouth of madness, in 1995. However, something terribly wrong happened to him in 1992 with the terrible comedy Memoirs of an Invisible Man.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.36