3
Dimensions of information quality and InfoQ assessment

3.1 Introduction

Information quality (InfoQ) is a holistic abstraction or a construct. To be able to assess such a construct in practice, we operationalize it into measurable variables. Like InfoQ, data quality is also a construct that requires operationalization. The issue of assessing data quality has been discussed and implemented in several fields and by several international organizations. We start this chapter by looking at the different approaches to operationalizing data quality. We then take, in Section 3.2, a similar approach for operationalizing InfoQ. Section 3.3 is about methods for assessing InfoQ dimensions and Section 3.4 provides an example of an InfoQ rating‐based assessment. Additional in‐depth examples are provided in Part II.

3.1.1 Operationalizing “data quality” in marketing research

In marketing research and in the medical literature, data quality is assessed by defining the criteria of recency, accuracy, availability, and relevance of a dataset (Patzer, 1995):

  1. Recency refers to the duration between the time of data collection and the time the study is conducted.
  2. Accuracy refers to the quality of the data.
  3. Availability describes the information in the data made available to the analyst.
  4. Relevance refers to the relevance of the data to the analysis goal: whether the data contains the required variables in the right form and whether they are drawn from the population of interest.

Kaynak and Herbig (2014) mention four criteria to consider for data quality in cross‐cultural marketing research:

  1. Compatibility and comparability—When comparing different sets of data from different countries, are similar units of measurements and definitions used?
  2. Accuracy and reliability of the data—Has the data been consciously distorted, or was the collection flawed?
  3. Recency—Is the data infrequently and unpredictably updated?
  4. Availability

The four criteria of Patzer and of Kaynak and Herbig consider data (X) and goal (g), but do not consider data analysis method (f) and utility (U). Specifically, recency, accuracy, reliability, availability, and comparability are all characteristics of the dataset and relate implicitly to the analysis goal, while only relevance relates directly to the data and the analysis goal.

3.1.2 Operationalizing “data quality” in public health research

Boslaugh (2007) considers three main questions to help assess the quality of secondary data (data collected for purposes other than the study at hand):

  1. What was the original purpose for which the data was collected?
  2. What kind of data is it, and when and how was the data collected?
  3. What cleaning and/or recoding procedures have been applied to the data?

These questions are useful at the prestudy stage, when one must evaluate the usefulness of a dataset for the study at hand. The concepts in the three questions can be summarized into collection purpose, data type, data age, data collection instrument and process, and data preprocessing. They can be grouped into “source quality” and “data quality” criteria (Kaynak and Herbig, 2014). Obviously, source quality affects data quality:

It is almost impossible to know too much about the data collection process because it can influence the quality of the data in many ways, some of them not obvious.

Boslaugh (2007, p. 5) further considers availability, completeness, and data format:

A secondary data set should be examined carefully to confirm that it includes the necessary data, that the data are defined and coded in a manner that allows for the desired analysis, and that the researcher will be allowed to access the data required.

We again note that the questions and criteria mentioned relate to the data and goal, but not to an analysis method or utility; the InfoQ definition, however, requires all four components.

3.1.3 Operationalizing “data quality” in management information systems

In the field of management information systems (MIS), data quality is defined as the level of conformance to specifications or standards. Wang et al. (1993) define data quality as “conformance to requirements.” They operationalize this construct by defining quality indicators that are based on objective measures such as data source, creation time, and collection method, as well as subjective measures such as the credibility level of the data at hand, as determined by the researcher.

As mentioned in Chapter 2, Lee et al. (2002) propose a methodology for assessment and benchmarking of InfoQ of IT systems called AIMQ. They collate 15 dimensions from academic papers in MIS: accessibility, appropriate amount, believability, completeness, concise representation, consistent representation, ease of operation, free of error, interpretability, objectivity, relevancy, reputation, security, timeliness, and understandability. They then group the 15 dimensions into four categories: intrinsic, contextual, representational, and accessibility. While they use the term IQ, it is different from InfoQ. The concept of IQ indicates a consideration of the user of the IT system (and therefore some of the dimensions include relevance, timeliness, etc.). However, IQ does not consider data analysis at all. To operationalize the four categories, Lee et al. (2002) develop a questionnaire with eight items for each of the 15 dimensions. This instrument is then used for scoring an IT system of an organization and for benchmarking it against best practice and other organizations.

3.1.4 Operationalizing “data quality” at government and international organizations

Assessing data quality is one of the core aspects of statistical agencies’ work. Government agencies and international organizations that collect data for decision making have developed operationalizations of data quality by considering multiple dimensions. The abstraction data quality is usually defined as “fitness for use” in terms of user needs. This construct is operationalized by considering a set of dimensions. We briefly list the dimensions used by several notable organizations.

The concept of quality of statistical data has been developed and used in European official statistics as well as organizations such as the International Monetary Fund (IMF), Statistics Canada, and the Organization for Economic Cooperation and Development (OECD). The OECD operationalizes this construct by defining seven dimensions for quality assessment (see chapter 5 in Giovanni, 2008):

  1. Relevance—A qualitative assessment of the value contributed by the data
  2. Accuracy—The degree to which the data correctly estimate or describe the quantities or characteristics that they are designed to measure
  3. Timeliness and punctuality—The length of elapsed time between data availability and the phenomenon described
  4. Accessibility—How readily the data can be located and accessed
  5. Interpretability—The ease with which the data may be understood and analyzed
  6. Coherence—The degree to which the data is logically connected and mutually consistent
  7. Credibility—Confidence of users in the data based on their perception of the data producer

The European Commission’s Eurostat agency uses seven dimensions for assessing the quality of data from surveys (Ehling and Körner, 2007):

  1. Relevance of statistical concept refers to whether all statistics that are needed are produced and the extent to which concepts used (definitions, classifications, etc.) reflect user needs.
  2. Accuracy of estimates denotes the closeness of computations or estimates to the exact or true values.
  3. Timeliness and punctuality in disseminating results—Timeliness of information reflects the length of time between its availability and the event or phenomenon it describes; punctuality refers to the time lag between the release date of data and the target date when it should have been delivered.
  4. Accessibility and clarity of the information—Accessibility refers to the physical conditions in which users can obtain data; clarity refers to the data’s information environment (whether data are accompanied with appropriate metadata, illustrations such as graphs and maps, etc.).
  5. Comparability is the extent to which differences between statistics are attributed to differences between the true values of the statistical characteristic.
  6. Coherence of statistics refers to their adequacy to be reliably combined in different ways and for various uses.

The US Environmental Protection Agency (EPA) has developed the Quality Assurance (QA) Project Plan as a tool for project managers and planners to document the type and quality of data and information needed for making environmental decisions. The program aims to control and enhance data quality in terms of precision, accuracy, representativeness, completeness, and comparability (PARCC) of environmental measurements used in its studies. They define these dimensions as follows:

  1. Precision is the degree of agreement among repeated measurements of the same characteristic on the same sample or on separate samples collected as close as possible in time and place.
  2. Accuracy is a measure of confidence in a measurement. The smaller the difference between the measurement of a parameter (the estimate) and its “true” or expected value, the more accurate the measurement.
  3. Representativeness is the extent to which measurements actually depict the true environmental condition or population being evaluated.
  4. Completeness is a measure of the number of samples you must take to be able to use the information, as compared to the number of samples you originally planned to take.
  5. Comparability is the extent to which data from one study can be compared directly to either past data from the current project or data from another study.

The World Health Organization (WHO) established a data quality framework called Health Metrics Network Framework (HMN, 2006), based on the IMF Data Quality Assessment Framework (DQAF) and IMF General Data Dissemination System (GDDS). The framework uses six criteria for assessing the quality of health‐related data and indicators that are generated from health information systems:

  1. Timeliness—The period between data collection and its availability to a higher level or its publication
  2. Periodicity—The frequency with which an indicator is measured
  3. Consistency—The internal consistency of data within a dataset as well as consistency between datasets and over time and the extent to which revisions follow a regular, well‐established, and transparent schedule and process
  4. Representativeness—The extent to which data adequately represents the population and relevant subpopulations
  5. Disaggregation—The availability of statistics stratified by sex, age, socioeconomic status, major geographical or administrative region, and ethnicity, as appropriate
  6. Confidentiality, data security, and data accessibility—The extent to which practices are in accordance with guidelines and other established standards for storage, backup, transport of information (especially over the Internet), and retrieval

These examples provide the background for the assessment of InfoQ. Our goal in presenting the InfoQ dimensions is to propose a generic structure that applies to any empirical analysis and expands on the data quality approaches described above.

3.2 The eight dimensions of InfoQ

Taking an approach that is similar to data quality assessment described in the previous section, we define eight dimensions for assessing InfoQ that consider and affect not only the data and goal, X and g, but also the method of analysis (f) and the utility of the study (U). With this approach, we provide a decomposition of InfoQ that can be used for assessing and improving research initiatives and for evaluating completed studies.

3.2.1 Data resolution

Data resolution refers to the measurement scale and aggregation level of X. The measurement scale of the data should be carefully evaluated in terms of its suitability to the goal, the analysis methods to be used and the required resolution of U. Given the original recorded scale, the researcher should evaluate its adequacy. It is usually easy to produce a more aggregated scale (e.g., two income categories instead of ten), but not a finer scale. Data might be recorded by multiple instruments or by multiple sources. To choose between the multiple measurements, supplemental information about the reliability and precision of the measuring devices or sources of data is useful. A finer measurement scale is often associated with more noise; hence the choice of scale can affect the empirical analysis directly.

The data aggregation level must also be evaluated relative to g. For example, consider daily purchases of over‐the‐counter medications at a large pharmacy. If the goal of the analysis is to forecast future inventory levels of different medications, when restocking is done weekly, then weekly aggregates are preferable to daily aggregates owing to less data recording errors and noise. However, for the early detection of outbreaks of disease, where alerts that are generated a day or two earlier can make a significant difference in terms of treatment, then weekly aggregates are of low quality. In addition to data frequency, the aggregation level is also important: for purposes of inventory, medication level information is required, whereas for disease outbreak detection medications can be grouped by symptoms, and the symptom‐aggregated daily series would be preferable.

Another example relates to the case studies on online auctions in Chapter 1. In many online auction platforms, bid times are typically recorded in seconds and prices in a currency unit. On eBay, for example, bid times are reported at the level of seconds (e.g., August 20, 2010, 03.14.07 Pacific Daylight Time) and prices at the dollar and cent level (e.g., $23.01). The forecasting model by Wang et al. (2008) uses bid times at second level and cent‐level bid amounts until the time of prediction to produce forecasts of price in cents for any second during the auction. In contrast, the forecasting model by Ghani and Simmons (2004) produces forecasts of the final price in terms of $5 intervals, using only information available at the start of the auction.

The concept of rational subgroup that is used in statistical process control is a special case of aggregation level. The rational subgroup setup determines the level of process variability and the type of signals to detect. If the rational subgroup consists of measurements within a short period of a production process, then statistical process control methods will pick up short‐term out‐of‐control signals, whereas rational subgroups spread over longer periods will support detection of longer‐term trends and out‐of‐control signals (see Kenett et al., 2014). Using our notation, f is the statistical process control method, X is the data, g1 is the short‐term signal, g2 is the long‐term signal, and U is a measure of desirable alerting behavior.

3.2.2 Data structure

Data structure relates to the type(s) of data and data characteristics such as corrupted and missing values due to the study design or data collection mechanism. Data types include structured numerical data in different forms (e.g., cross‐sectional, time series, and network data) as well as unstructured nonnumerical data (e.g., text, text with hyperlinks, audio, video, and semantic data). The InfoQ level of a certain data type depends on the goal at hand. Bapna et al. (2006) discussed the value of different “data types” for answering new research questions in electronic commerce research:

For each research investigation, we seek to identify and utilize the best data type, that is, that data which is most appropriate to help achieve the specific research goals.

An example from the online auctions literature is related to the effect of “seller feedback” on the auction price. Sellers on eBay receive numerical feedback ratings and textual comments. Although most explanatory studies of price determinants use the numerical feedback ratings as a covariate, a study by Pavlou and Dimoka (2006) showed that using the textual comments as a covariate in a model for price leads to much higher R2 values (U) than using the numerical rating.

Corrupted and missing values require handling by removal, imputation, data recovery, or other methods, depending on g. Wrong values may be treated as missing values when the purpose is to estimate a population parameter, such as in surveys where respondents intentionally enter wrong answers. Yet, for some goals, intentionally submitted wrong values might be informative and therefore should not be discarded or “corrected.”

3.2.3 Data integration

Integrating multiple sources and/or types of data often creates new knowledge regarding the goal at hand, thereby increasing InfoQ. An example is the study estimating consumer surplus in online auctions (Bapna et al., 2008a; see Chapter 1), where data from eBay (X1) that lacked the highest bid values were combined with data from a website called Cniper.com (now no longer active) (X2) that contained the missing information. Estimating consumer surplus was impossible by using either X1 or X2, and only their combination yielded the sufficient InfoQ. In the auction example of Pavlou and Dimoka (2006), textual comments were used as covariates.

New analysis methodologies, such as functional data analysis and text mining, are aimed at increasing InfoQ of new data types and their combination. For example, in the online auction forecasting study by Wang et al. (2008) (see Chapter 1), functional data analysis was used to integrate temporal bid sequences with cross‐sectional auction and seller information. The combination allowed more precise forecasts of final prices compared to models based on cross‐sectional data alone. The functional approach has also enabled quantifying the effects of different factors on the price process during an auction (Bapna et al., 2008b).

Another aspect of data integration is linking records across databases. Although record linkage algorithms are popular for increasing InfoQ, studies that use record linkage often employ masking techniques that reduce risks of identification and breaches of privacy and confidentiality. Such techniques (e.g., removing identifiers, adding noise, data perturbation, and microaggregation) can obviously decrease InfoQ, even to the degree of making the combined dataset useless for the goal at hand. Solutions, such as “privacy‐preserving data mining” and “selective revelation,” are aimed at utilizing the linked dataset with high InfoQ without compromising privacy (see, e.g., Fienberg, 2006).

3.2.4 Temporal relevance

The process of deriving knowledge from data can be placed on a timeline that includes the data collection, data analysis, and study deployment periods as well as the temporal gaps between these periods (as depicted in Figure 3.1). These different durations and gaps can each affect InfoQ. The data collection duration can increase or decrease InfoQ, depending on the study goal (e.g., studying longitudinal effects versus a cross‐sectional goal). Similarly, uncontrollable transitions during the collection phase can be useful or disruptive, depending on g.

Image described by caption.

Figure 3.1 Timeline of study, from data collection to study deployment.

For this reason, online auction studies that collect data on fashionable or popular products (which generate large amounts of data) for estimating an effect try to restrict the data collection period as much as possible. The experiment by Katkar and Reiley (2006) on the effect of reserve prices on online auction prices (see Chapter 1) was conducted over a two‐week period in April 2000. The data on auctions for Harry Potter books and Microsoft Xbox consoles in Wang et al. (2008) was collected in the nonholiday months of August and September 2005. In contrast, a study that is interested in comparing preholiday with postholiday bidding or selling behaviors would require collection over a period that includes both preholiday and postholiday times. The gap between data collection and analysis, which coincides with the recency criterion in Section 3.1, is typically larger for secondary data (data not collected for the purpose of the study). In predictive modeling, where the context of prediction should be as close as possible to the data collection context, temporal lags can significantly decrease InfoQ. For instance, a 2010 dataset of online auctions for iPads on eBay will probably be of low InfoQ for forecasting or even estimating current iPad prices because of the fast changing interest in electronic gadgets.

Another aspect affecting temporal relevance is analysis timeliness, or the timeliness of f(X|g). Raiffa (1970, p. 264) calls this an “error of the fourth kind: solving the right problem too late.” Analysis timeliness is affected by the nature of X, by the complexity of f and ultimately by the application of f to X. The nature of a dataset (size, sparseness, etc.) can affect analysis timeliness and in turn affect its utility for the goal at hand. For example, computing summary statistics for a very large dataset might take several hours, thereby deeming InfoQ low for the purpose of real‐time tasks (g1) but high for retrospective analysis (g2). The computational complexity of f also determines analysis time: Markov chain Monte Carlo estimation methods and computationally intensive predictive algorithms take longer than estimating linear models or computing summary statistics. In the online auction price forecasting example, the choice of a linear forecasting model was needed for producing timely forecasts of an ongoing auction. Wang et al. (2008) used smoothing splines to estimate price curves for each auction in the dataset—information which is then used in the forecasting model. Although smoothing splines do not necessarily produce monotone curves (as would be expected of a price curve from the start to the end of an eBay‐type auction), this method is much faster than fitting monotone smoothing splines, which do produce monotonic curves. Therefore, in this case smoothing splines generated higher InfoQ than monotone splines for real‐time forecasting applications. Temporal relevance and analysis timeliness obviously depend on the availability of software and hardware as well as on the efficiency of the researcher or analysis team.

3.2.5 Chronology of data and goal

The choice of variables to collect, the temporal relationship between them and their meaning in the context of g all critically affect InfoQ. We must consider the retrospective versus prospective nature of the goal as well as its type in terms of causal explanation, prediction, or description (Shmueli, 2010). In predictive studies, the input variables must be available at the time of prediction, whereas in explanatory models, causal arguments determine the relationship between dependent and independent variables. The term endogeneity, or reverse causation, can occur when a causal input variable is omitted from a model, resulting in biased parameter estimates. Endogeneity therefore yields low InfoQ in explanatory studies, but not necessarily in predictive studies, where omitting input variables can lead to higher predictive accuracy (see Shmueli, 2010). Also related is the Granger causality test (Granger, 1969) aimed at determining whether a lagged time series X contains useful information for predicting future values of another time series Y by using a regression model.

In the online auction context, the level of InfoQ that is contained in the “number of bidders” for models of auction price depends on the study goal. Classic auction theory specifies the number of bidders as an important factor influencing price: the more bidders, the higher the price. Hence, data on number of bidders is of high quality in an explanatory model of price. However, for the purpose of forecasting prices of ongoing online auctions, where the number of bidders is unknown until the end of the auction, the InfoQ of “number of bidders,” even if available in a retrospective dataset, is very low. For this reason, the forecasting model by Wang et al. (2008) described in Chapter 1 excludes the number of bidders or number of bids and instead uses the cumulative number of bids until the time of prediction.

3.2.6 Generalizability

The utility of f(X|g) is dependent on the ability to generalize f to the appropriate population. Two types of generalizability are statistical and scientific generalizability. Statistical generalizability refers to inferring from a sample to a target population. Scientific generalizability refers to applying a model based on a particular target population to other populations. This can mean either generalizing an estimated population pattern or model f to other populations or applying f estimated from one population to predict individual observations in other populations.

Determining the level of generalizability requires careful characterization of g. For instance, for inference about a population parameter, statistical generalizability and sampling bias are the focus, and the question of interest is, “What population does the sample represent?” (Rao, 1985). In contrast, for predicting the values of new observations, the question of interest is whether f captures associations in the training data X (the data that are used for model building) that are generalizable to the to‐be‐predicted data.

Generalizability is a dimension useful for clarifying the concepts of reproducibility, repeatability, and replicability (Kenett and Shmueli, 2015). The three terms are referred to with different and sometimes conflicting meanings, both between and within fields (see Chapter 11). Here we only point out that the distinction between replicating insights and replicating exact identical numerical results is similar and related to the distinction between InfoQ (insights) and data or analysis quality (numerical results).

Another type of generalization, in the context of ability testing, is the concept of specific objectivity (Rasch, 1977). Specific objectivity is achieved if outcomes of questions in a questionnaire that is used to compare levels of students are independent of the specific questions and of other students. In other words, the purpose is to generalize from data on certain students answering a set of questions to the population of outcomes, irrespective of the particular responders or particular questions.

The type of required generalizability affects the choice of f and U. For instance, data‐driven methods are more prone to overfitting, which conflicts with scientific generalizability. Statistical generalizability is commonly evaluated by using measures of sampling bias and goodness of fit. In contrast, scientific generalizability for predicting new observations is typically evaluated by the accuracy of predicting a holdout set from the to‐be‐predicted population, to protect against overfitting.

The online auction studies from Chapter 1 illustrate the different generalizability types. The “effect of reserve price on final price” study (Katkar and Reiley, 2006) is concerned with statistical generalizability. Katkar and Reiley (2006) designed the experiment so that it produces a representative sample. Their focus is on standard errors and statistical significance. The forecasting study by Wang et al. (2008) is concerned with generalizability to new individual auctions. They evaluated predictive accuracy on a holdout set. The third study on “consumer surplus in eBay” is concerned with statistical generalizability from the sample to all eBay auctions in 2003. Because the sample was not drawn randomly from the population, Bapna et al. (2008a) performed a special analysis, comparing their sample with a randomly drawn sample (see appendix B in Bapna et al., 2008a).

3.2.7 Operationalization

Two types of operationalization of the analysis results are considered: construct operationalization and action operationalization.

3.2.7.1 Construct operationalization

Constructs are abstractions that describe a phenomenon of theoretical interest. Measurable data is an operationalization of underlying constructs. For example, psychological stress can be measured via a questionnaire or by physiological measures, such as cortisol levels in saliva (Kirschbaum and Hellhammer, 1989), and economic prosperity can be measured via income or by unemployment rate. The relationship between the underlying construct χ and its operationalization X = θ(χ) can vary, and its level relative to g is another important aspect of InfoQ. The role of construct operationalization is dependent on g(X = θ(χ|g)) and especially on whether the goal is explanatory, predictive, or descriptive. In explanatory models, based on underlying causal theories, multiple operationalizations might be acceptable for representing the construct of interest. As long as X is assumed to measure χ, the variable is considered adequate. Using our earlier example in the preceding text, both questionnaire answers and physiological measurements would be acceptable for measuring psychological stress. In contrast, in a predictive task, where the goal is to create sufficiently accurate predictions of a certain measurable variable, the choice of operationalized variable is critical. Predicting psychological stress as reported in a questionnaire (X1) is different from predicting levels of a physiological measure (X2). Hence, the InfoQ in predictive studies relies more heavily on the quality of X and its stability across the periods of model building and deployment, whereas in explanatory studies InfoQ relies more on the adequacy of X for measuring χ.

Returning to the online auction context, the consumer surplus study relies on observable bid amounts, which are considered to reflect an underlying “willingness‐to‐pay” construct for a bidder. The same construct is operationalized differently in other types of studies. In contrast, in price forecasting studies the measurable variable of interest is auction price, which is always defined very similarly. An example is the work by McShane and Wyner (2011) in the context of climate change, showing that for purposes of predicting temperatures, theoretically based “natural covariates” are inferior to “pseudoproxies” that are lower dimension approximations of the natural covariates. Descriptive tasks are more similar to predictive tasks in the sense of the focus on the observable level. In descriptive studies, the goal is to uncover a signal in a dataset (e.g., to estimate the income distribution or to uncover the temporal patterns in a time series). Because there is no underlying causal theory behind descriptive studies, and because results are reported at the level of the measured variables, InfoQ relies, as in predictive tasks, on the quality of the measured variables rather than on their relationship to an underlying construct.

3.2.7.2 Action operationalization

Action operationalizing is about deriving concrete actions from the information provided by a study. When a report, presenting an analysis of a given dataset in the context of specific goals, leads to clear follow‐up actions, we consider such a report of higher InfoQ. The dimension of action operationalization has been discussed in various contexts. In the business and industry settings, an operational definition consists of (i) a criterion to be applied to an object or a group of objects, (ii) a test of compliance for the object or group, and (iii) a decision rule for interpreting the test results as to whether the object or group is, or is not, in compliance. This definition by Deming (2000) closely parallels Shewhart’s opening statement in his book Statistical Method from the Viewpoint of Quality Control (Shewhart, 1986):

Broadly speaking there are three steps in a quality control process: the specification of what is wanted, the production of things to satisfy the specification, and the inspection of the things produced to see if they satisfy the specification.

In a broad context of organizational performance, Deming (2000) poses three important questions to help assess the level of action operationalization of a specific organizational study. These are the following:

  1. What do you want to accomplish?
  2. By what method will you accomplish it?
  3. How will you know when you have accomplished it?

In the context of an educational system, the National Education Goals Panel (NEGP) in the United States recommended that states answer four questions on their student reports that are of interest to parents (Goodman and Hambleton, 2004):

  1. How did my child do?
  2. What types of skills or knowledge does his or her performance reflect?
  3. How did my child perform in comparison to other students in the school, district, state, and, if available, the nation?
  4. What can I do to help my child improve?

The action operationalization of official statistics has also been discussed extensively by official statistics agencies, internally, and in the literature. Quoting Forbes and Brown (2012):

An issue that can lead to misconception is that many of the concepts used in official statistics often have specific meanings which are based on, but not identical to, their everyday usage meaning… Official statistics “need to be used to be useful” and utility is one of the overarching concepts in official statistics… All staff producing statistics must understand that the conceptual frameworks underlying their work translate the real world into models that interpret reality and make it measurable for statistical purposes… The first step … is to define the issue or question(s) that statistical information is needed to inform. That is, to define the objectives for the framework, and then work through those to create its structure and definitions. An important element … is understanding the relationship between the issues and questions to be informed and the definitions themselves.

3.2.8 Communication

Effective communication of the analysis f(X|g) and its utility U directly affects InfoQ. Common communication media include visual, textual, and verbal presentations and reports. Within research environments, communication focuses on written publications and conference presentations. Research mentoring and the refereeing process are aimed at improving communication (and InfoQ) within the research community. Research results are communicated to the public via articles in the popular media and interviews on television and conferences such as www.ted.com and more recently through blogs and other Internet media. Here the risk of miscommunication is much greater. For example, the “consumer surplus in eBay auctions” study was covered by public media. However, the main results were not always conveyed properly by journalists. For example, the nytimes.com article (http://bits.blogs.nytimes.com/2008/01/28/tracking‐consumer‐savings‐on‐ebay/) failed to mention that the study results were evaluated under different assumptions, thereby affecting generalizability. As a result, some readers doubted the study results (“Is the Cniper sample skewed?”). In response, one of the study coauthors posted an online clarification.

In industry, communication is typically done via internal presentations and reports. The failure potential of O‐rings at low temperatures that caused the NASA shuttle Challenger disaster was ignored because the engineers failed to communicate the results of their analysis: the 13 charts that were circulated to the teleconferences did not clearly show the relationship between the temperature in 22 previous launches and the 22 recordings of O‐ring conditions (see Tufte, 1992). In terms of our notation, the meaning of f—in this case risk analysis—and its implications were not properly communicated.

In discussing scientific writing, Gopen and Swan (1990) state that if the reader is to grasp what the writer means, the writer must understand what the reader needs. In general, this is an essential element in effective communication. It is important to emphasize that scientific discourse is not the mere presentation of information, but rather its actual communication. It does not matter how pleased an author might be to have converted all the right data into sentences and paragraphs; it matters only whether a large majority of the reading audience accurately perceives what the author had in mind. Communication is the eighth InfoQ dimension.

3.3 Assessing InfoQ

The eight InfoQ dimensions allow us to evaluate InfoQ for an empirical study (whether implemented or proposed), by evaluating each of the dimensions. In the following, we describe five assessment approaches. The approaches offer different views of the study and one can implement more than a single approach for achieving deeper understanding.

3.3.1 Rating‐based evaluation

Similar to the use of “data quality” dimensions by statistical agencies for evaluating data quality, we evaluate each of the eight InfoQ dimensions to assess InfoQ. This evaluation integrates different aspects of a study and assigns an overall InfoQ score based on experts’ ratings. The broad perspective of InfoQ dimensions is designed to help researchers enhance the added value of their studies.

Assessing InfoQ using quantitative metrics can be done in several ways. We present a rating‐based approach that examines a study report and scores each of the eight InfoQ dimensions. A rough approach is to rate each dimension on a 1–5 scale:

Very lowLowAcceptableHighVery high
12345

The ratings for each of the eight dimensions (Yi, i = 1, …, 8) can then be normalized into a desirability function (see Figini et al., 2010) separately for each dimension (0 ≤ d(Yi) ≤ 1). The desirability scores are then combined to produce an overall InfoQ score using the geometric mean of the individual desirabilities:

images

The approach using desirability scores produces zero scores when at least one of the elements is rated at the lower values of the scale. In other words, if any of the dimensions is at the lowest rating, InfoQ is considered to be zero. Smoother options consist in averaging the rating scores with an arithmetic mean or geometric mean. In the examples in this book, we used the desirability approach.

We illustrate the use of this rating‐based approach for the Katkar and Reiley (2006) study in Section 3.4. We also use this approach for each of the studies described in Parts II and III of the book.

3.3.2 Scenario building

A different approach to assessing InfoQ, especially at the “proof of concept” stage, is to spell out the types of answers that the analysis is expected to yield and then to examine the data in an exploratory fashion, alternatively specifying the ideal data, as if the data analyst has control over the data collection, and then comparing the existing results to the ideal results.

For example, some studies in biosurveillance are aimed at evaluating the usefulness of tracking prediagnostic data for detecting disease outbreaks earlier than traditional diagnostic measures. To evaluate the usefulness of such data (and potential algorithms) in the absence of real outbreak data requires building scenarios of how disease outbreaks manifest themselves in prediagnostic data. Building scenarios can rely on knowledge such as singular historic cases (e.g., Goldenberg et al., 2002) or on integrating epidemiological knowledge into a wide range of data simulations to generate “data with outbreaks” (e.g., Lotze et al., 2010). The wide range of simulations reflects the existing uncertainty in mapping the epidemiological knowledge into data footprints.

3.3.3 Pilot sampling

In many fields it is common practice to begin the analysis with a pilot study based on a small sample. This approach provides initial insights on the dimensions of InfoQ. Following such a pilot, the dataset can be augmented, a new time window for recording the data can be decided, and more in‐depth elicitation of the problem at hand and the key stakeholders can be initiated. This strategy is common practice also in survey design, where a pilot with representative responders is conducted to determine the validity and usability of a questionnaire (Kenett and Salini, 2012).

3.3.4 Exploratory data analysis (EDA)

Modern statistical and visualization software provides a range of visualization techniques such as matrix plots, parallel coordinate plots, and dynamic bubble plots and capabilities such as interactive visualization. These techniques support the analyst in exploring and determining, with “freehand format,” the level of InfoQ in the data. Exploratory data analysis (EDA) is often conducted iteratively by zooming in on salient features and outliers and triggering further investigations and additional data collection. Other exploratory tools that are useful for assessing InfoQ, termed “exploratory models” by De Veaux (2009), include classification and regression trees, cluster analysis, and data reduction techniques. EDA is therefore another strategy for evaluating and increasing InfoQ.

3.3.5 Sensitivity analysis

Sensitivity analysis is an important type of quantitative assessment applied in a wide range of domains that involve policy making, including economic development, transportation systems, urban planning, and environmental trends. InfoQ provides an efficient approach to sensitivity analysis by changing one of the InfoQ components while holding the other three constant. For example, one might evaluate InfoQ for three different goals, g1, g2, g3, given the same dataset X, a specific analysis method f, and specific utility U. Differences between the InfoQ derived for the different goals can then indicate the boundaries of usefulness of X, f, and U.

For example, consider the use of ensemble models (combining different models from different sources) in predicting climate change. In an incisive review of models used in climate change studies, Saltelli et al. (2015) state that ensembles are not representative of the range of possible (and plausible) models that fit the data generated by the physical model. This implies that the models used represent structural elements with poor generalizability to the physical model. They also claim that the sensitivity analysis performed on these models varies only a subset of the assumptions and only one at a time. Such single‐assumption manipulation precludes interactions among the uncertain inputs, which may be highly relevant to climate projections. This also indicates poor generalizability. In terms of operationalization, the authors distinguish policy simulation from policy justification. Policy simulations represent alternative scenarios; policy justification requires establishment of a causal link. The operationalization of the climate models by policy makers requires an ability to justify specific actions. This is the problematic part the authors want to emphasize. An InfoQ assessment of the various studies quoted by the authors can help distinguish between studies providing policy simulations and studies providing policy justifications.

3.4 Example: InfoQ assessment of online auction experimental data

As described in Chapter 1, Katkar and Reiley (2006) investigated the effect of two types of reserve price on the final auction price on eBay. Their data X came from an experiment selling 25 identical pairs of Pokémon cards, where each card was auctioned twice, once with a public reserve price and once with a secret reserve price. The data consists of complete information on all 50 auctions. Katkar and Reiley used linear regression (f) to test for the effect of private or public reserve on the final price and to quantify it. The utility (U) was statistical significance to evaluate the effect of private or public reserve price and the regression coefficient for quantifying the magnitude of the effect. They conclude that

A secret‐reserve auction will generate a price $0.63 lower, on average, than will a public‐reserve auction.

We evaluate the eight InfoQ dimensions on the basis of the paper by Katkar and Reiley (2006). A more thorough evaluation would have required interaction with the authors of the study and access to their data. For demonstration purposes we use a 1–5 scale and generate an InfoQ score based on a desirability function with d(1) = 0, d(2) = 0.25, d(3) = 0.5, d(4) = 0.75, and d(5) = 1.

3.4.1 Data resolution

The experiment was conducted over two weeks in April 2000. We therefore have no data on possible seasonal effects during other periods of the year. Data resolution was in USD cents, but individual bids were dropped and only the final price was considered. Other time series (e.g., the cumulative number of bids) were also aggregated to create end‐of‐auction statistics such as “total number of bids.” Given the general goal of quantifying the effect of using a secret versus public reserve price on the final price of an auction, the data appears somewhat restrictive. The two‐week data window allows for good control of the experiment but limits data resolution for studying a more general effect. Hence we rate the data resolution as Y1 = 4 (high).

3.4.2 Data structure

The data included only information on the factor levels that were set by the researchers and the three outcomes: final price, whether the auction transacted, and the number of bids received. The data was either set by the experimenters or collected from the auction website. Although time series data was potentially available for the 50 auctions (e.g., the series of bids and cumulative number of bidders), the researchers aggregated them into auction totals. Textual data was available but not used. For example, bidder usernames can be used to track individual bidders who place multiple bids. With respect to corrupted data, one auction winner unexpectedly rated the sellers, despite the researchers’ request to refrain from doing so (to keep the rating constant across the experiment). Luckily, this corruption did not affect the analysis, owing to the study design. Another unexpected source of data corruption was eBay’s policy on disallowing bids below a public reserve price. Hence, the total number of bids in auctions with a secret reserve price could not be compared with the same measure in public reserve price auctions. The researchers resorted to deriving a new “total serious bids” variable, which counts the number of bids above the secret reserve price.

Given the level of detailed attention to the experimental conditions, but the lack of use of available time series and textual data, we rate this dimension as Y2 = 4 (high).

3.4.3 Data integration

The researchers analyzed the two‐week data in the context of an experimental design strategy. The integration with the DOE factors was clearly achieved. No textual or other semantic data seems to have been integrated. We rate this dimension as Y3 = 4 (high).

3.4.4 Temporal relevance

The short duration of the experiment and the experimental design assured that the results would not be confounded with the effect of time. The experimenters tried to avoid confounding the results with a changing seller rating and therefore actively requested winners to avoid rating the seller. Moreover, the choice of Pokémon cards was aligned with timeliness, since at the time such items were in high demand. Finally, because of the retrospective nature of the goal, there is no urgency in conducting the data analysis shortly after data collection. We rate this dimension as Y4 = 5 (very high).

3.4.5 Chronology of data and goal

The causal variable (secret or public reserve) and the blocking variable (week) were determined at the auction design stage and manipulated before the auction started. We rate this dimension as Y5 = 5 (very high).

3.4.6 Generalizability

The study is concerned with statistical generalizability: Do effects that were found in the sample generalize to the larger context of online auctions? One possible bias, which was acknowledged by the authors, is their seller’s rating of zero (indicating a new seller) which limits the generalizability of the study to more reputable sellers. In addition, they limited the generality of their results to low value items, which might not generalize to more expensive items. We rate this dimension as Y6 = 3 (acceptable).

3.4.7 Operationalization

In construct operationalization, the researchers considered two theories that explain the effect of a secret versus public reserve price on the final price. One is a psychological explanation: bidders can become “caught up in the bidding” at low bid amounts and end up bidding more than they would have had the bidding started higher. The second theory is a model of rational bidders: “an auction with a low starting bid and a high secret reserve can provide more information to bidders than an auction with a high starting bid.” Although these two theories rely on operationalizing constructs such as “information” and “caught up in the bidding,” the researchers limited their study to eBay’s measurable reserve price options and final prices.

In terms of action operationalization, the study results can be directly used by buyers and sellers in online auction platforms, as well as auction sites (given the restrictions of generalizing beyond eBay and beyond Pokémon cards). Recall that the study examined the effect of a reserve price not only on the final auction price but also on the probability of the auction resulting in a sale. The authors concluded:

Only 46% of secret‐reserve auctions resulted in a sale, compared with 70% of public‐reserve auctions for the same goods. Secret‐reserve auctions resulted in 0.72 fewer serious bidders per auction, and $0.62 less in final auction price, than did public‐reserve auctions on average. We can therefore recommend that sellers avoid the use of secret reserve prices, particularly for Pokémon cards.

The authors limit their recommendation to low‐cost items by quoting from The Official eBay Guide (Kaiser and Kaiser, 1999): “If your minimum sale price is below $25, think twice before using a reserve auction. Bidders frequently equate reserve with expensive.”

Note that because the study result is applicable to the “average auction,” it is most actionable for either an online auction platform which holds many auctions or for sellers who sell many items. The results do not tell us about the predictive accuracy for a single auction.

We rate this dimension as Y7 = 4 (high).

3.4.8 Communication

This research study communicated the analysis via a paper published in a peer‐reviewed journal. Analysis results are presented in the form of a scatter plot, a series of estimated regression models (estimated effects and standard errors) and their interpretation in the text. We assume that the researches made additional dissemination efforts (e.g., the paper is publicly available online as a working paper). The paper’s abstract is written in nontechnical and clear language and can therefore be easily understood not only by academics and researchers but also by eBay participants. The main communication weakness of the analysis is in terms of visualization, where plots would have conveyed some of the results more clearly. We therefore rate this dimension as Y8 = 4 (high).

3.4.9 Information quality score

The scores that we assigned for each of the dimensions were the following:

1. Data resolution4
2. Data structure4
3. Data integration4
4. Temporal relevance5
5. Chronology of data and goal5
6. Generalizability3
7. Operationalization4
8. Communication4

On the basis of these subjective assessments, which represent expert opinions derived from the single publication on the auction experiments, we obtain an InfoQ score based on the geometric mean of desirabilities of 77%, that is, relatively high. The relatively weak dimension is generalizability; the strongest dimensions are temporal relevance and chronology of data and goal. An effort to review the scores with some perspective of time proved these scores to be robust even though expert opinions tend to differ to a certain extent. To derive consensus‐based scores, one can ask a number of experts (three to five) to review the case and compare their scores. If the scores are consistent, one can derive a consistent InfoQ score. If they show discrepancies, one would conduct a consensus meeting of the experts where the reasoning behind their score is discussed and some score reconciliation is attempted. If a range of scores remains, then the InfoQ score can be presented as a range of values.

3.5 Summary

In this chapter we break down the InfoQ concept into eight dimensions, each dimension relating to a different aspect of the goal–data–analysis–utility components. Given an empirical study, we can then assess the level of its InfoQ by examining each of the eight dimensions. We present four assessment approaches and illustrate the rating‐based approach by applying it to the study by Katkar and Reiley (2006) on the effect of reserve prices in online auctions.

The InfoQ assessment can be done at the planning phase of a study, during a study, or after the study has been reported. In Chapter 13 we discuss the application of InfoQ assessment to research proposals of graduate students. In Chapters 4 and 5, we focus on statistical methods that can be applied, a priori or a posteriori, to enhance InfoQ, and Chapters 610 are about InfoQ assessments of completed studies. Such assessments provide opportunities for InfoQ enhancement, at the study design, during or after a study has been completed.

Each of the InfoQ dimensions relates to methods for InfoQ improvement that require multidisciplinary skills. For example, data integration is related to IT capabilities such as extract–transform–load (ETL) technologies, and action operationalization can be related to management processes where action items are defined in order to launch focused interventions. For a comprehensive treatment of data analytic techniques, see Shmueli et al. (2016).

In Part II, we examine a variety of studies from different areas using the rating‐based approach for assessing the eight dimensions of InfoQ. The combination of application area and InfoQ assessment provides context‐based examples. We suggest starting with a specific domain of interest, reviewing the examples in the respective chapter and then moving on to other domains and chapters. This combination of domain‐specific examples and cross‐domain case studies was designed to provide in‐depth and general perspectives of the added value of InfoQ assessments.

References

  1. Bapna, R., Goes, P., Gopal, R. and Marsden, J.R. (2006) Moving from data‐constrained to data‐enabled research: experiences and challenges in collecting, validating and analyzing large‐scale e‐commerce data. Statistical Science, 21, pp. 116–130.
  2. Bapna, R., Jank, W. and Shmueli, G. (2008a) Consumer surplus in online auctions. Information Systems Research, 19, pp. 400–416.
  3. Bapna, R., Jank, W. and Shmueli, G. (2008b) Price formation and its dynamics in online auctions. Decision Support Systems, 44, pp. 641–656.
  4. Boslaugh, S. (2007) Secondary Data Sources for Public Health: A Practical Guide. Cambridge University Press, Cambridge, UK.
  5. De Veaux, R.D. (2009) Successful Exploratory Data Mining in Practice. JMP Explorer Series. http://www.williams.edu/Mathematics/rdeveaux/success.pdf (accessed May 24, 2016).
  6. Deming, W.E. (2000) Out of Crisis. MIT Press, Cambridge, MA.
  7. Ehling, M. and Körner, T. (2007) Eurostat Handbook on Data Quality Assessment Methods and Tools, Wiesbaden. http://ec.europa.eu/eurostat/web/quality/quality‐reporting (accessed April 30, 2016).
  8. Fienberg, S.E. (2006) Privacy and confidentiality in an e‐commerce world: data mining, data warehousing, matching and disclosure limitation. Statistical Science, 21, pp. 143–154.
  9. Figini, S., Kenett, R.S. and Salini, S. (2010) Integrating operational and financial risk assessments. Quality and Reliability Engineering International, 26, pp. 887–897.
  10. Forbes, S. and Brown, D. (2012) Conceptual thinking in national statistics offices. Statistical Journal of the IAOS, 28, pp. 89–98.
  11. Ghani, R. and Simmons, H. (2004) Predicting the End‐Price of Online Auctions. International Workshop on Data Mining and Adaptive Modelling Methods for Economics and Management, Pisa.
  12. Giovanni, E. (2008) Understanding Economic Statistics. Organisation for Economic Cooperation and Development Publishing, Geneva.
  13. Goldenberg, A., Shmueli, G., Caruana, R.A. and Fienberg, S.E. (2002) Early statistical detection of anthrax outbreaks by tracking over‐the‐counter medication sales. Proceedings of the National Academy of Sciences, 99(8), pp. 5237–5240.
  14. Goodman, D. and Hambleton, R. (2004) Student test score reports and interpretive guides: review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), pp. 145–220.
  15. Gopen, G. and Swan, J. (1990) The science of scientific writing. American Scientist, 78, pp. 550–558.
  16. Granger, C.W.J. (1969) Investigating causal relations by econometric models and cross‐spectral methods. Econometrica, 37, pp. 424–438.
  17. Health Metrics Network Secretariat (2008) Health Metrics Network Framework and Standards for Country Health Information Systems, 2nd edition. World Health Organization, Health Metrics Network, Geneva.
  18. Kaiser, L.F. and Kaiser, M. (1999) The Official eBay Guide to Buying, Selling, and Collecting Just About Anything. Simon & Schuster, New York.
  19. Katkar, R. and Reiley, D.H. (2006) Public versus secret reserve prices in eBay auctions: results from a Pokemon field experiment. Advances in Economic Analysis and Policy, 6(2), article 7.
  20. Kaynak, E. and Herbig, P. (2014) Handbook of Cross‐Cultural Marketing. Routledge, London.
  21. Kenett, R.S. and Salini, S. (2012) Modern Analysis of Customer Satisfaction Surveys: With Applications Using R. John Wiley & Sons, Ltd, Chichester, UK.
  22. Kenett, R.S. and Shmueli, G. (2015) Clarifying the terminology that describes scientific reproducibility. Nature Methods, 12, pp. 699.
  23. Kenett, R., Zacks, S. and Amberti, D. (2014) Modern Industrial Statistics: With Applications in R, MINITAB and JMP, 2nd edition. John Wiley & Sons, Chichester, West Sussex, UK.
  24. Kirschbaum, C. and Hellhammer, D.H. (1989) Salivary cortisol in psychobiological research: an overview. Neuropsychobiology, 22, pp. 150–169.
  25. Lee, Y., Strong, D., Kahn, B. and Wang, R. (2002) AIMQ: a methodology for information quality assessment. Information & Management, 40, pp. 133–146.
  26. Lotze, T., Shmueli, G. and Yahav, I. (2010) Simulating and Evaluating Biosurveillance Datasets, in Biosurveillance: Methods and Case Studies Priority, Kass‐Hout, T. and Zhang, X. (editors), CRC Press, Boca Raton, FL.
  27. McShane, B.B. and Wyner, A.J. (2011) A statistical analysis of multiple temperature proxies: are reconstructions of surface temperatures over the last 1000 years reliable? Annals of Applied Statistics, 5, pp. 5–44.
  28. Patzer, G.L. (1995) Using Secondary Data in Marketing Research. Praeger, Westport, CT.
  29. Pavlou, P.A. and Dimoka, A. (2006) The nature and role of feedback text comments in online marketplaces: implications for trust building, price premiums, and seller differentiation. Information Systems Research, 17(4), pp. 392–414.
  30. Raiffa, H. (1970) Decision Analysis: Introductory Lectures on Choices under Uncertainty. Addison‐Wesley, Reading, MA.
  31. Rao, C.R. (1985) Weighted Distributions Arising Out of Methods of Ascertainment: What Population Does a Sample Represent?, in A Celebration of Statistics: The ISI Centenary Volume, Atkinson, A.C. and Fienberg, S.E. (editors), Springer, New York, pp. 543–569.
  32. Rasch, G. (1977) On specific objectivity: an attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, pp. 58–93.
  33. Saltelli, A., Stark, P., Becker, W. and Stano, P. (2015) Climate models as economic guides: scientific challenge or quixotic quest? Issues in Science and Technology, 31(3). http://issues.org/31‐3/climate‐models‐as‐economic‐guides‐scientific‐challenge‐or‐quixotic‐quest (accessed April 30, 2016).
  34. Shewhart, W.A. (1986) Statistical Method from the Viewpoint of Quality Control, Deming, W.D. (editor), Dover Publications, New York.
  35. Shmueli, G. (2010) To explain or to predict? Statistical Science, 25(3), pp. 289–310.
  36. Shmueli, G., Bruce, P. and Patel, N.R. (2016) Data Mining for Business Analytics: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 3nd edition. John Wiley & Sons, Inc., Hoboken, NJ.
  37. Tufte, R.E. (1992) The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT.
  38. Wang, R.Y., Kon, H.B. and Madnick, S.E. (1993) Data Quality Requirements Analysis and Modeling. 9th International Conference on Data Engineering, Vienna.
  39. Wang, S., Jank, W. and Shmueli, G. (2008) Explaining and forecasting online auction prices and their dynamics using functional data analysis. Journal of Business and Economic Statistics, 26, pp. 144–160.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.127.232