The nature of social media data

The mechanical differences between blogs and/or consumer reviews and Twitter are obvious. Blogs and consumer review sites allow users to contribute finite but possibly large amounts of text data, whereas Twitter has a document length limit of 140 characters. While at first this may seem like a major limitation of Twitter data, in fact, it proves to be quite useful. When faced with this stark limit, users tend to be pithy and accurate rather than loquacious and artful. This brevity makes sentiment extraction much simpler than it is for longer documents; however, it comes with its own unique challenges.

Throughout this book, social media data, and especially Twitter data, will be examined in increasing detail and with increasing sophistication. In order to accomplish this aim, we first need to get a sense of some of the complicating factors involved in studying sentiments. The first factor might well be that sentiment is often nonhomogeneous. For instance, groups of people may be split in their opinions about a topic. In fact, a single person may hold conflicting views as well. However, trying to condense group-level or individual-level sentiments into a single numeric measure may obscure this heterogeneity. For instance, a group of people with neutral views on a topic and a group with two vehemently opposed subgroups would both score nearly zero (or neutral) on many additive scales.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.174.111