Traditional versus nontraditional social data

We compare social media data with traditional social science data in an effort to provide context for its analysis. In this section, we also make readers aware of the general classes of pitfalls that affect any measurements or inferences drawn from data. The ultimate goal of this discussion is to set out a sound analytical framework as well as generate an awareness of the limitations of social data and methods. This is so that analysts can "measure (all) what is measurable, and make measurable (all) what is not so" (Galileo, 2008) responsibly.

Prior to the era of high-velocity data, computational social scientists relied on two sources of observational (that is, nonexperimental) data: they could hand-code it themselves or use large, slow-moving datasets collated by governments and organizations. Collecting one's own data is slow and costly. It involves interviewing experts and checking for reference sources. This type of data collection is also prone to errors due to differences between coders or within coders over time if the collection is ever done again. An alternative is to use data sources that are large but slow to change.

For instance, the United States Census Bureau serves as a leading provider of demographic and economic data. The Census Bureau compiles mountains of data with broad coverage and high accuracy. However, the Bureau's best known collection, the Census of Population and Housing, is only collected once a decade and is often not appropriate for the sophisticated questions that we have today.

These surveys have been crucial to social scientific research. The multivariate nature of this data provides rich sets of independent and dependent variables. However, the aggregate and infrequent nature of institutional data often results in pooled cross sections of randomly sampled individuals at different points in time. Rather than analyzing groups at discrete points in time, or aggregating them into heterogeneous bundles, social media data allows the possibility of tracking groups and individuals over time.

Social media is captured at the individual level and at an extraordinary rate. Their continuous nature over time and space makes them ideal for multivariate analysis and cross-correlation. Furthermore, aggregating social data often proves to be much simpler than attempting to disaggregate institutional data. Put differently, inferential challenges surrounding the study of individuals are much more easily overcome than those surrounding the study of groups. We explore these issues one by one in the forthcoming sections.

The conversation about social media data often becomes a conversation about Big Data and how Big Data is hard to analyze. While some are caught up in defining Big Data in terms of storage (too big to fit in a relational database) or analysis (too big to use standard maximum likelihood techniques), we find that these definitional points are less crucial than a deep understanding of the measurement and inferential challenges inherent in dealing with social media data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.234.114