Quantitative approaches

In this research, we aim to mine and summarize online opinions in reviews, tweets, blogs, forum discussions, and so on. Our approach is highly quantitative (that is, mathematical and/or statistical) as opposed to qualitative (that is, involving close study of a few instances). In social sciences, these two approaches are sometimes at odds, or at least their practitioners are. In this section, we will lay out the rationale for a quantitative approach to understanding online opinions. Our use of quantitative approaches is entirely pragmatic rather than dogmatic. We do, however, find the famous Bill James' words relating to the quantitative and qualitative tension to resonate with our pragmatic voice.

"The alternative to good statistics is not "no statistics", it's bad statistics. People who argue against statistical reasoning often end up backing up their arguments with whatever numbers they have at their command, over- or under-adjusting in their eagerness to avoid anything systematic."

One traditional rationale for using qualitative approaches to sentiment analysis, such as focus groups, is lack of available data. Looking closely at what a handful of consumers think about a product is a viable way to generate opinion data if none, or very little, exists. However, in the era of big social data, analysts are awash in opinion-laden text and online actions. In fact, the use of statistical approaches is often necessary to handle the sheer volume of data generated by the social web. Furthermore, the explosion of data is obviating traditional hypothesis-testing concerns about sampling, as samples converge in size towards the population of interest.

The exploration of large sets of opinion data is what Openshaw (1988) would call a data-rich but theory-poor environment. Often, qualitative methods are well suited for inductively deriving theories from small numbers of test cases. However, our aim as sentiment analyzers is usually less theoretical and more descriptive; that is, we want to measure opinions and not understand the process by which they are generated. As such, this book covers important quantitative methods that reflect the state of discipline and that allow data to have a voice. This type of analysis accomplishes what Gould (1981) refers to as "letting the data speak for itself."

Perhaps the strongest reason to choose quantitative methods over qualitative ones is the ability of quantitative methods, when coupled with large and valid data-sets, to generate accurate measures in the face of analyst biases. Qualitative methods, even when applied correctly, put researchers at risk of a plethora of inferential problems. Foremost is apophenia, the human tendency to discover patterns where there are none; for example, a Type I error of sorts and dubbed patternicity by Michael Shermer (2008). A second pitfall of qualitative work is the atomistic fallacy, that is, the problem of generalizing based on an insufficient number of individual observations. The atomistic fallacy is real. Most people rely on advice from only a few sources, over-weighting information from within their networks rather than third parties such as Consumers Reports. Allowing an individual observation (for example, an opinion) to influence our actions or decisions is unreliably compared to what constitutes sensible samples in Consumers Reports.

The natural sciences benefited from the invention and proliferation of a host of new measurement tools during the twentieth century. For example, advances in microscopes led to a range of discoveries. The advent of the social web, with its seemingly endless amounts of opinionated data, and new measurement tools such as the ones covered in this book calls for a set of new discoveries. This book introduces readers to tools that will assist in that pursuit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.163.248