Information quality (InfoQ) is a holistic abstraction or a construct. To be able to assess such a construct in practice, we operationalize it into measurable variables. Like InfoQ, data quality is also a construct that requires operationalization. The issue of assessing data quality has been discussed and implemented in several fields and by several international organizations. We start this chapter by looking at the different approaches to operationalizing data quality. We then take, in Section 3.2, a similar approach for operationalizing InfoQ. Section 3.3 is about methods for assessing InfoQ dimensions and Section 3.4 provides an example of an InfoQ rating‐based assessment. Additional in‐depth examples are provided in Part II.
In marketing research and in the medical literature, data quality is assessed by defining the criteria of recency, accuracy, availability, and relevance of a dataset (Patzer, 1995):
Kaynak and Herbig (2014) mention four criteria to consider for data quality in cross‐cultural marketing research:
The four criteria of Patzer and of Kaynak and Herbig consider data (X) and goal (g), but do not consider data analysis method (f) and utility (U). Specifically, recency, accuracy, reliability, availability, and comparability are all characteristics of the dataset and relate implicitly to the analysis goal, while only relevance relates directly to the data and the analysis goal.
Boslaugh (2007) considers three main questions to help assess the quality of secondary data (data collected for purposes other than the study at hand):
These questions are useful at the prestudy stage, when one must evaluate the usefulness of a dataset for the study at hand. The concepts in the three questions can be summarized into collection purpose, data type, data age, data collection instrument and process, and data preprocessing. They can be grouped into “source quality” and “data quality” criteria (Kaynak and Herbig, 2014). Obviously, source quality affects data quality:
It is almost impossible to know too much about the data collection process because it can influence the quality of the data in many ways, some of them not obvious.
Boslaugh (2007, p. 5) further considers availability, completeness, and data format:
A secondary data set should be examined carefully to confirm that it includes the necessary data, that the data are defined and coded in a manner that allows for the desired analysis, and that the researcher will be allowed to access the data required.
We again note that the questions and criteria mentioned relate to the data and goal, but not to an analysis method or utility; the InfoQ definition, however, requires all four components.
In the field of management information systems (MIS), data quality is defined as the level of conformance to specifications or standards. Wang et al. (1993) define data quality as “conformance to requirements.” They operationalize this construct by defining quality indicators that are based on objective measures such as data source, creation time, and collection method, as well as subjective measures such as the credibility level of the data at hand, as determined by the researcher.
As mentioned in Chapter 2, Lee et al. (2002) propose a methodology for assessment and benchmarking of InfoQ of IT systems called AIMQ. They collate 15 dimensions from academic papers in MIS: accessibility, appropriate amount, believability, completeness, concise representation, consistent representation, ease of operation, free of error, interpretability, objectivity, relevancy, reputation, security, timeliness, and understandability. They then group the 15 dimensions into four categories: intrinsic, contextual, representational, and accessibility. While they use the term IQ, it is different from InfoQ. The concept of IQ indicates a consideration of the user of the IT system (and therefore some of the dimensions include relevance, timeliness, etc.). However, IQ does not consider data analysis at all. To operationalize the four categories, Lee et al. (2002) develop a questionnaire with eight items for each of the 15 dimensions. This instrument is then used for scoring an IT system of an organization and for benchmarking it against best practice and other organizations.
Assessing data quality is one of the core aspects of statistical agencies’ work. Government agencies and international organizations that collect data for decision making have developed operationalizations of data quality by considering multiple dimensions. The abstraction data quality is usually defined as “fitness for use” in terms of user needs. This construct is operationalized by considering a set of dimensions. We briefly list the dimensions used by several notable organizations.
The concept of quality of statistical data has been developed and used in European official statistics as well as organizations such as the International Monetary Fund (IMF), Statistics Canada, and the Organization for Economic Cooperation and Development (OECD). The OECD operationalizes this construct by defining seven dimensions for quality assessment (see chapter 5 in Giovanni, 2008):
The European Commission’s Eurostat agency uses seven dimensions for assessing the quality of data from surveys (Ehling and Körner, 2007):
The US Environmental Protection Agency (EPA) has developed the Quality Assurance (QA) Project Plan as a tool for project managers and planners to document the type and quality of data and information needed for making environmental decisions. The program aims to control and enhance data quality in terms of precision, accuracy, representativeness, completeness, and comparability (PARCC) of environmental measurements used in its studies. They define these dimensions as follows:
The World Health Organization (WHO) established a data quality framework called Health Metrics Network Framework (HMN, 2006), based on the IMF Data Quality Assessment Framework (DQAF) and IMF General Data Dissemination System (GDDS). The framework uses six criteria for assessing the quality of health‐related data and indicators that are generated from health information systems:
These examples provide the background for the assessment of InfoQ. Our goal in presenting the InfoQ dimensions is to propose a generic structure that applies to any empirical analysis and expands on the data quality approaches described above.
Taking an approach that is similar to data quality assessment described in the previous section, we define eight dimensions for assessing InfoQ that consider and affect not only the data and goal, X and g, but also the method of analysis (f) and the utility of the study (U). With this approach, we provide a decomposition of InfoQ that can be used for assessing and improving research initiatives and for evaluating completed studies.
Data resolution refers to the measurement scale and aggregation level of X. The measurement scale of the data should be carefully evaluated in terms of its suitability to the goal, the analysis methods to be used and the required resolution of U. Given the original recorded scale, the researcher should evaluate its adequacy. It is usually easy to produce a more aggregated scale (e.g., two income categories instead of ten), but not a finer scale. Data might be recorded by multiple instruments or by multiple sources. To choose between the multiple measurements, supplemental information about the reliability and precision of the measuring devices or sources of data is useful. A finer measurement scale is often associated with more noise; hence the choice of scale can affect the empirical analysis directly.
The data aggregation level must also be evaluated relative to g. For example, consider daily purchases of over‐the‐counter medications at a large pharmacy. If the goal of the analysis is to forecast future inventory levels of different medications, when restocking is done weekly, then weekly aggregates are preferable to daily aggregates owing to less data recording errors and noise. However, for the early detection of outbreaks of disease, where alerts that are generated a day or two earlier can make a significant difference in terms of treatment, then weekly aggregates are of low quality. In addition to data frequency, the aggregation level is also important: for purposes of inventory, medication level information is required, whereas for disease outbreak detection medications can be grouped by symptoms, and the symptom‐aggregated daily series would be preferable.
Another example relates to the case studies on online auctions in Chapter 1. In many online auction platforms, bid times are typically recorded in seconds and prices in a currency unit. On eBay, for example, bid times are reported at the level of seconds (e.g., August 20, 2010, 03.14.07 Pacific Daylight Time) and prices at the dollar and cent level (e.g., $23.01). The forecasting model by Wang et al. (2008) uses bid times at second level and cent‐level bid amounts until the time of prediction to produce forecasts of price in cents for any second during the auction. In contrast, the forecasting model by Ghani and Simmons (2004) produces forecasts of the final price in terms of $5 intervals, using only information available at the start of the auction.
The concept of rational subgroup that is used in statistical process control is a special case of aggregation level. The rational subgroup setup determines the level of process variability and the type of signals to detect. If the rational subgroup consists of measurements within a short period of a production process, then statistical process control methods will pick up short‐term out‐of‐control signals, whereas rational subgroups spread over longer periods will support detection of longer‐term trends and out‐of‐control signals (see Kenett et al., 2014). Using our notation, f is the statistical process control method, X is the data, g1 is the short‐term signal, g2 is the long‐term signal, and U is a measure of desirable alerting behavior.
Data structure relates to the type(s) of data and data characteristics such as corrupted and missing values due to the study design or data collection mechanism. Data types include structured numerical data in different forms (e.g., cross‐sectional, time series, and network data) as well as unstructured nonnumerical data (e.g., text, text with hyperlinks, audio, video, and semantic data). The InfoQ level of a certain data type depends on the goal at hand. Bapna et al. (2006) discussed the value of different “data types” for answering new research questions in electronic commerce research:
For each research investigation, we seek to identify and utilize the best data type, that is, that data which is most appropriate to help achieve the specific research goals.
An example from the online auctions literature is related to the effect of “seller feedback” on the auction price. Sellers on eBay receive numerical feedback ratings and textual comments. Although most explanatory studies of price determinants use the numerical feedback ratings as a covariate, a study by Pavlou and Dimoka (2006) showed that using the textual comments as a covariate in a model for price leads to much higher R2 values (U) than using the numerical rating.
Corrupted and missing values require handling by removal, imputation, data recovery, or other methods, depending on g. Wrong values may be treated as missing values when the purpose is to estimate a population parameter, such as in surveys where respondents intentionally enter wrong answers. Yet, for some goals, intentionally submitted wrong values might be informative and therefore should not be discarded or “corrected.”
Integrating multiple sources and/or types of data often creates new knowledge regarding the goal at hand, thereby increasing InfoQ. An example is the study estimating consumer surplus in online auctions (Bapna et al., 2008a; see Chapter 1), where data from eBay (X1) that lacked the highest bid values were combined with data from a website called Cniper.com (now no longer active) (X2) that contained the missing information. Estimating consumer surplus was impossible by using either X1 or X2, and only their combination yielded the sufficient InfoQ. In the auction example of Pavlou and Dimoka (2006), textual comments were used as covariates.
New analysis methodologies, such as functional data analysis and text mining, are aimed at increasing InfoQ of new data types and their combination. For example, in the online auction forecasting study by Wang et al. (2008) (see Chapter 1), functional data analysis was used to integrate temporal bid sequences with cross‐sectional auction and seller information. The combination allowed more precise forecasts of final prices compared to models based on cross‐sectional data alone. The functional approach has also enabled quantifying the effects of different factors on the price process during an auction (Bapna et al., 2008b).
Another aspect of data integration is linking records across databases. Although record linkage algorithms are popular for increasing InfoQ, studies that use record linkage often employ masking techniques that reduce risks of identification and breaches of privacy and confidentiality. Such techniques (e.g., removing identifiers, adding noise, data perturbation, and microaggregation) can obviously decrease InfoQ, even to the degree of making the combined dataset useless for the goal at hand. Solutions, such as “privacy‐preserving data mining” and “selective revelation,” are aimed at utilizing the linked dataset with high InfoQ without compromising privacy (see, e.g., Fienberg, 2006).
The process of deriving knowledge from data can be placed on a timeline that includes the data collection, data analysis, and study deployment periods as well as the temporal gaps between these periods (as depicted in Figure 3.1). These different durations and gaps can each affect InfoQ. The data collection duration can increase or decrease InfoQ, depending on the study goal (e.g., studying longitudinal effects versus a cross‐sectional goal). Similarly, uncontrollable transitions during the collection phase can be useful or disruptive, depending on g.
For this reason, online auction studies that collect data on fashionable or popular products (which generate large amounts of data) for estimating an effect try to restrict the data collection period as much as possible. The experiment by Katkar and Reiley (2006) on the effect of reserve prices on online auction prices (see Chapter 1) was conducted over a two‐week period in April 2000. The data on auctions for Harry Potter books and Microsoft Xbox consoles in Wang et al. (2008) was collected in the nonholiday months of August and September 2005. In contrast, a study that is interested in comparing preholiday with postholiday bidding or selling behaviors would require collection over a period that includes both preholiday and postholiday times. The gap between data collection and analysis, which coincides with the recency criterion in Section 3.1, is typically larger for secondary data (data not collected for the purpose of the study). In predictive modeling, where the context of prediction should be as close as possible to the data collection context, temporal lags can significantly decrease InfoQ. For instance, a 2010 dataset of online auctions for iPads on eBay will probably be of low InfoQ for forecasting or even estimating current iPad prices because of the fast changing interest in electronic gadgets.
Another aspect affecting temporal relevance is analysis timeliness, or the timeliness of f(X|g). Raiffa (1970, p. 264) calls this an “error of the fourth kind: solving the right problem too late.” Analysis timeliness is affected by the nature of X, by the complexity of f and ultimately by the application of f to X. The nature of a dataset (size, sparseness, etc.) can affect analysis timeliness and in turn affect its utility for the goal at hand. For example, computing summary statistics for a very large dataset might take several hours, thereby deeming InfoQ low for the purpose of real‐time tasks (g1) but high for retrospective analysis (g2). The computational complexity of f also determines analysis time: Markov chain Monte Carlo estimation methods and computationally intensive predictive algorithms take longer than estimating linear models or computing summary statistics. In the online auction price forecasting example, the choice of a linear forecasting model was needed for producing timely forecasts of an ongoing auction. Wang et al. (2008) used smoothing splines to estimate price curves for each auction in the dataset—information which is then used in the forecasting model. Although smoothing splines do not necessarily produce monotone curves (as would be expected of a price curve from the start to the end of an eBay‐type auction), this method is much faster than fitting monotone smoothing splines, which do produce monotonic curves. Therefore, in this case smoothing splines generated higher InfoQ than monotone splines for real‐time forecasting applications. Temporal relevance and analysis timeliness obviously depend on the availability of software and hardware as well as on the efficiency of the researcher or analysis team.
The choice of variables to collect, the temporal relationship between them and their meaning in the context of g all critically affect InfoQ. We must consider the retrospective versus prospective nature of the goal as well as its type in terms of causal explanation, prediction, or description (Shmueli, 2010). In predictive studies, the input variables must be available at the time of prediction, whereas in explanatory models, causal arguments determine the relationship between dependent and independent variables. The term endogeneity, or reverse causation, can occur when a causal input variable is omitted from a model, resulting in biased parameter estimates. Endogeneity therefore yields low InfoQ in explanatory studies, but not necessarily in predictive studies, where omitting input variables can lead to higher predictive accuracy (see Shmueli, 2010). Also related is the Granger causality test (Granger, 1969) aimed at determining whether a lagged time series X contains useful information for predicting future values of another time series Y by using a regression model.
In the online auction context, the level of InfoQ that is contained in the “number of bidders” for models of auction price depends on the study goal. Classic auction theory specifies the number of bidders as an important factor influencing price: the more bidders, the higher the price. Hence, data on number of bidders is of high quality in an explanatory model of price. However, for the purpose of forecasting prices of ongoing online auctions, where the number of bidders is unknown until the end of the auction, the InfoQ of “number of bidders,” even if available in a retrospective dataset, is very low. For this reason, the forecasting model by Wang et al. (2008) described in Chapter 1 excludes the number of bidders or number of bids and instead uses the cumulative number of bids until the time of prediction.
The utility of f(X|g) is dependent on the ability to generalize f to the appropriate population. Two types of generalizability are statistical and scientific generalizability. Statistical generalizability refers to inferring from a sample to a target population. Scientific generalizability refers to applying a model based on a particular target population to other populations. This can mean either generalizing an estimated population pattern or model f to other populations or applying f estimated from one population to predict individual observations in other populations.
Determining the level of generalizability requires careful characterization of g. For instance, for inference about a population parameter, statistical generalizability and sampling bias are the focus, and the question of interest is, “What population does the sample represent?” (Rao, 1985). In contrast, for predicting the values of new observations, the question of interest is whether f captures associations in the training data X (the data that are used for model building) that are generalizable to the to‐be‐predicted data.
Generalizability is a dimension useful for clarifying the concepts of reproducibility, repeatability, and replicability (Kenett and Shmueli, 2015). The three terms are referred to with different and sometimes conflicting meanings, both between and within fields (see Chapter 11). Here we only point out that the distinction between replicating insights and replicating exact identical numerical results is similar and related to the distinction between InfoQ (insights) and data or analysis quality (numerical results).
Another type of generalization, in the context of ability testing, is the concept of specific objectivity (Rasch, 1977). Specific objectivity is achieved if outcomes of questions in a questionnaire that is used to compare levels of students are independent of the specific questions and of other students. In other words, the purpose is to generalize from data on certain students answering a set of questions to the population of outcomes, irrespective of the particular responders or particular questions.
The type of required generalizability affects the choice of f and U. For instance, data‐driven methods are more prone to overfitting, which conflicts with scientific generalizability. Statistical generalizability is commonly evaluated by using measures of sampling bias and goodness of fit. In contrast, scientific generalizability for predicting new observations is typically evaluated by the accuracy of predicting a holdout set from the to‐be‐predicted population, to protect against overfitting.
The online auction studies from Chapter 1 illustrate the different generalizability types. The “effect of reserve price on final price” study (Katkar and Reiley, 2006) is concerned with statistical generalizability. Katkar and Reiley (2006) designed the experiment so that it produces a representative sample. Their focus is on standard errors and statistical significance. The forecasting study by Wang et al. (2008) is concerned with generalizability to new individual auctions. They evaluated predictive accuracy on a holdout set. The third study on “consumer surplus in eBay” is concerned with statistical generalizability from the sample to all eBay auctions in 2003. Because the sample was not drawn randomly from the population, Bapna et al. (2008a) performed a special analysis, comparing their sample with a randomly drawn sample (see appendix B in Bapna et al., 2008a).
Two types of operationalization of the analysis results are considered: construct operationalization and action operationalization.
Constructs are abstractions that describe a phenomenon of theoretical interest. Measurable data is an operationalization of underlying constructs. For example, psychological stress can be measured via a questionnaire or by physiological measures, such as cortisol levels in saliva (Kirschbaum and Hellhammer, 1989), and economic prosperity can be measured via income or by unemployment rate. The relationship between the underlying construct χ and its operationalization X = θ(χ) can vary, and its level relative to g is another important aspect of InfoQ. The role of construct operationalization is dependent on g(X = θ(χ|g)) and especially on whether the goal is explanatory, predictive, or descriptive. In explanatory models, based on underlying causal theories, multiple operationalizations might be acceptable for representing the construct of interest. As long as X is assumed to measure χ, the variable is considered adequate. Using our earlier example in the preceding text, both questionnaire answers and physiological measurements would be acceptable for measuring psychological stress. In contrast, in a predictive task, where the goal is to create sufficiently accurate predictions of a certain measurable variable, the choice of operationalized variable is critical. Predicting psychological stress as reported in a questionnaire (X1) is different from predicting levels of a physiological measure (X2). Hence, the InfoQ in predictive studies relies more heavily on the quality of X and its stability across the periods of model building and deployment, whereas in explanatory studies InfoQ relies more on the adequacy of X for measuring χ.
Returning to the online auction context, the consumer surplus study relies on observable bid amounts, which are considered to reflect an underlying “willingness‐to‐pay” construct for a bidder. The same construct is operationalized differently in other types of studies. In contrast, in price forecasting studies the measurable variable of interest is auction price, which is always defined very similarly. An example is the work by McShane and Wyner (2011) in the context of climate change, showing that for purposes of predicting temperatures, theoretically based “natural covariates” are inferior to “pseudoproxies” that are lower dimension approximations of the natural covariates. Descriptive tasks are more similar to predictive tasks in the sense of the focus on the observable level. In descriptive studies, the goal is to uncover a signal in a dataset (e.g., to estimate the income distribution or to uncover the temporal patterns in a time series). Because there is no underlying causal theory behind descriptive studies, and because results are reported at the level of the measured variables, InfoQ relies, as in predictive tasks, on the quality of the measured variables rather than on their relationship to an underlying construct.
Action operationalizing is about deriving concrete actions from the information provided by a study. When a report, presenting an analysis of a given dataset in the context of specific goals, leads to clear follow‐up actions, we consider such a report of higher InfoQ. The dimension of action operationalization has been discussed in various contexts. In the business and industry settings, an operational definition consists of (i) a criterion to be applied to an object or a group of objects, (ii) a test of compliance for the object or group, and (iii) a decision rule for interpreting the test results as to whether the object or group is, or is not, in compliance. This definition by Deming (2000) closely parallels Shewhart’s opening statement in his book Statistical Method from the Viewpoint of Quality Control (Shewhart, 1986):
Broadly speaking there are three steps in a quality control process: the specification of what is wanted, the production of things to satisfy the specification, and the inspection of the things produced to see if they satisfy the specification.
In a broad context of organizational performance, Deming (2000) poses three important questions to help assess the level of action operationalization of a specific organizational study. These are the following:
In the context of an educational system, the National Education Goals Panel (NEGP) in the United States recommended that states answer four questions on their student reports that are of interest to parents (Goodman and Hambleton, 2004):
The action operationalization of official statistics has also been discussed extensively by official statistics agencies, internally, and in the literature. Quoting Forbes and Brown (2012):
An issue that can lead to misconception is that many of the concepts used in official statistics often have specific meanings which are based on, but not identical to, their everyday usage meaning… Official statistics “need to be used to be useful” and utility is one of the overarching concepts in official statistics… All staff producing statistics must understand that the conceptual frameworks underlying their work translate the real world into models that interpret reality and make it measurable for statistical purposes… The first step … is to define the issue or question(s) that statistical information is needed to inform. That is, to define the objectives for the framework, and then work through those to create its structure and definitions. An important element … is understanding the relationship between the issues and questions to be informed and the definitions themselves.
Effective communication of the analysis f(X|g) and its utility U directly affects InfoQ. Common communication media include visual, textual, and verbal presentations and reports. Within research environments, communication focuses on written publications and conference presentations. Research mentoring and the refereeing process are aimed at improving communication (and InfoQ) within the research community. Research results are communicated to the public via articles in the popular media and interviews on television and conferences such as www.ted.com and more recently through blogs and other Internet media. Here the risk of miscommunication is much greater. For example, the “consumer surplus in eBay auctions” study was covered by public media. However, the main results were not always conveyed properly by journalists. For example, the nytimes.com article (http://bits.blogs.nytimes.com/2008/01/28/tracking‐consumer‐savings‐on‐ebay/) failed to mention that the study results were evaluated under different assumptions, thereby affecting generalizability. As a result, some readers doubted the study results (“Is the Cniper sample skewed?”). In response, one of the study coauthors posted an online clarification.
In industry, communication is typically done via internal presentations and reports. The failure potential of O‐rings at low temperatures that caused the NASA shuttle Challenger disaster was ignored because the engineers failed to communicate the results of their analysis: the 13 charts that were circulated to the teleconferences did not clearly show the relationship between the temperature in 22 previous launches and the 22 recordings of O‐ring conditions (see Tufte, 1992). In terms of our notation, the meaning of f—in this case risk analysis—and its implications were not properly communicated.
In discussing scientific writing, Gopen and Swan (1990) state that if the reader is to grasp what the writer means, the writer must understand what the reader needs. In general, this is an essential element in effective communication. It is important to emphasize that scientific discourse is not the mere presentation of information, but rather its actual communication. It does not matter how pleased an author might be to have converted all the right data into sentences and paragraphs; it matters only whether a large majority of the reading audience accurately perceives what the author had in mind. Communication is the eighth InfoQ dimension.
The eight InfoQ dimensions allow us to evaluate InfoQ for an empirical study (whether implemented or proposed), by evaluating each of the dimensions. In the following, we describe five assessment approaches. The approaches offer different views of the study and one can implement more than a single approach for achieving deeper understanding.
Similar to the use of “data quality” dimensions by statistical agencies for evaluating data quality, we evaluate each of the eight InfoQ dimensions to assess InfoQ. This evaluation integrates different aspects of a study and assigns an overall InfoQ score based on experts’ ratings. The broad perspective of InfoQ dimensions is designed to help researchers enhance the added value of their studies.
Assessing InfoQ using quantitative metrics can be done in several ways. We present a rating‐based approach that examines a study report and scores each of the eight InfoQ dimensions. A rough approach is to rate each dimension on a 1–5 scale:
Very low | Low | Acceptable | High | Very high |
1 | 2 | 3 | 4 | 5 |
The ratings for each of the eight dimensions (Yi, i = 1, …, 8) can then be normalized into a desirability function (see Figini et al., 2010) separately for each dimension (0 ≤ d(Yi) ≤ 1). The desirability scores are then combined to produce an overall InfoQ score using the geometric mean of the individual desirabilities:
The approach using desirability scores produces zero scores when at least one of the elements is rated at the lower values of the scale. In other words, if any of the dimensions is at the lowest rating, InfoQ is considered to be zero. Smoother options consist in averaging the rating scores with an arithmetic mean or geometric mean. In the examples in this book, we used the desirability approach.
We illustrate the use of this rating‐based approach for the Katkar and Reiley (2006) study in Section 3.4. We also use this approach for each of the studies described in Parts II and III of the book.
A different approach to assessing InfoQ, especially at the “proof of concept” stage, is to spell out the types of answers that the analysis is expected to yield and then to examine the data in an exploratory fashion, alternatively specifying the ideal data, as if the data analyst has control over the data collection, and then comparing the existing results to the ideal results.
For example, some studies in biosurveillance are aimed at evaluating the usefulness of tracking prediagnostic data for detecting disease outbreaks earlier than traditional diagnostic measures. To evaluate the usefulness of such data (and potential algorithms) in the absence of real outbreak data requires building scenarios of how disease outbreaks manifest themselves in prediagnostic data. Building scenarios can rely on knowledge such as singular historic cases (e.g., Goldenberg et al., 2002) or on integrating epidemiological knowledge into a wide range of data simulations to generate “data with outbreaks” (e.g., Lotze et al., 2010). The wide range of simulations reflects the existing uncertainty in mapping the epidemiological knowledge into data footprints.
In many fields it is common practice to begin the analysis with a pilot study based on a small sample. This approach provides initial insights on the dimensions of InfoQ. Following such a pilot, the dataset can be augmented, a new time window for recording the data can be decided, and more in‐depth elicitation of the problem at hand and the key stakeholders can be initiated. This strategy is common practice also in survey design, where a pilot with representative responders is conducted to determine the validity and usability of a questionnaire (Kenett and Salini, 2012).
Modern statistical and visualization software provides a range of visualization techniques such as matrix plots, parallel coordinate plots, and dynamic bubble plots and capabilities such as interactive visualization. These techniques support the analyst in exploring and determining, with “freehand format,” the level of InfoQ in the data. Exploratory data analysis (EDA) is often conducted iteratively by zooming in on salient features and outliers and triggering further investigations and additional data collection. Other exploratory tools that are useful for assessing InfoQ, termed “exploratory models” by De Veaux (2009), include classification and regression trees, cluster analysis, and data reduction techniques. EDA is therefore another strategy for evaluating and increasing InfoQ.
Sensitivity analysis is an important type of quantitative assessment applied in a wide range of domains that involve policy making, including economic development, transportation systems, urban planning, and environmental trends. InfoQ provides an efficient approach to sensitivity analysis by changing one of the InfoQ components while holding the other three constant. For example, one might evaluate InfoQ for three different goals, g1, g2, g3, given the same dataset X, a specific analysis method f, and specific utility U. Differences between the InfoQ derived for the different goals can then indicate the boundaries of usefulness of X, f, and U.
For example, consider the use of ensemble models (combining different models from different sources) in predicting climate change. In an incisive review of models used in climate change studies, Saltelli et al. (2015) state that ensembles are not representative of the range of possible (and plausible) models that fit the data generated by the physical model. This implies that the models used represent structural elements with poor generalizability to the physical model. They also claim that the sensitivity analysis performed on these models varies only a subset of the assumptions and only one at a time. Such single‐assumption manipulation precludes interactions among the uncertain inputs, which may be highly relevant to climate projections. This also indicates poor generalizability. In terms of operationalization, the authors distinguish policy simulation from policy justification. Policy simulations represent alternative scenarios; policy justification requires establishment of a causal link. The operationalization of the climate models by policy makers requires an ability to justify specific actions. This is the problematic part the authors want to emphasize. An InfoQ assessment of the various studies quoted by the authors can help distinguish between studies providing policy simulations and studies providing policy justifications.
As described in Chapter 1, Katkar and Reiley (2006) investigated the effect of two types of reserve price on the final auction price on eBay. Their data X came from an experiment selling 25 identical pairs of Pokémon cards, where each card was auctioned twice, once with a public reserve price and once with a secret reserve price. The data consists of complete information on all 50 auctions. Katkar and Reiley used linear regression (f) to test for the effect of private or public reserve on the final price and to quantify it. The utility (U) was statistical significance to evaluate the effect of private or public reserve price and the regression coefficient for quantifying the magnitude of the effect. They conclude that
A secret‐reserve auction will generate a price $0.63 lower, on average, than will a public‐reserve auction.
We evaluate the eight InfoQ dimensions on the basis of the paper by Katkar and Reiley (2006). A more thorough evaluation would have required interaction with the authors of the study and access to their data. For demonstration purposes we use a 1–5 scale and generate an InfoQ score based on a desirability function with d(1) = 0, d(2) = 0.25, d(3) = 0.5, d(4) = 0.75, and d(5) = 1.
The experiment was conducted over two weeks in April 2000. We therefore have no data on possible seasonal effects during other periods of the year. Data resolution was in USD cents, but individual bids were dropped and only the final price was considered. Other time series (e.g., the cumulative number of bids) were also aggregated to create end‐of‐auction statistics such as “total number of bids.” Given the general goal of quantifying the effect of using a secret versus public reserve price on the final price of an auction, the data appears somewhat restrictive. The two‐week data window allows for good control of the experiment but limits data resolution for studying a more general effect. Hence we rate the data resolution as Y1 = 4 (high).
The data included only information on the factor levels that were set by the researchers and the three outcomes: final price, whether the auction transacted, and the number of bids received. The data was either set by the experimenters or collected from the auction website. Although time series data was potentially available for the 50 auctions (e.g., the series of bids and cumulative number of bidders), the researchers aggregated them into auction totals. Textual data was available but not used. For example, bidder usernames can be used to track individual bidders who place multiple bids. With respect to corrupted data, one auction winner unexpectedly rated the sellers, despite the researchers’ request to refrain from doing so (to keep the rating constant across the experiment). Luckily, this corruption did not affect the analysis, owing to the study design. Another unexpected source of data corruption was eBay’s policy on disallowing bids below a public reserve price. Hence, the total number of bids in auctions with a secret reserve price could not be compared with the same measure in public reserve price auctions. The researchers resorted to deriving a new “total serious bids” variable, which counts the number of bids above the secret reserve price.
Given the level of detailed attention to the experimental conditions, but the lack of use of available time series and textual data, we rate this dimension as Y2 = 4 (high).
The researchers analyzed the two‐week data in the context of an experimental design strategy. The integration with the DOE factors was clearly achieved. No textual or other semantic data seems to have been integrated. We rate this dimension as Y3 = 4 (high).
The short duration of the experiment and the experimental design assured that the results would not be confounded with the effect of time. The experimenters tried to avoid confounding the results with a changing seller rating and therefore actively requested winners to avoid rating the seller. Moreover, the choice of Pokémon cards was aligned with timeliness, since at the time such items were in high demand. Finally, because of the retrospective nature of the goal, there is no urgency in conducting the data analysis shortly after data collection. We rate this dimension as Y4 = 5 (very high).
The causal variable (secret or public reserve) and the blocking variable (week) were determined at the auction design stage and manipulated before the auction started. We rate this dimension as Y5 = 5 (very high).
The study is concerned with statistical generalizability: Do effects that were found in the sample generalize to the larger context of online auctions? One possible bias, which was acknowledged by the authors, is their seller’s rating of zero (indicating a new seller) which limits the generalizability of the study to more reputable sellers. In addition, they limited the generality of their results to low value items, which might not generalize to more expensive items. We rate this dimension as Y6 = 3 (acceptable).
In construct operationalization, the researchers considered two theories that explain the effect of a secret versus public reserve price on the final price. One is a psychological explanation: bidders can become “caught up in the bidding” at low bid amounts and end up bidding more than they would have had the bidding started higher. The second theory is a model of rational bidders: “an auction with a low starting bid and a high secret reserve can provide more information to bidders than an auction with a high starting bid.” Although these two theories rely on operationalizing constructs such as “information” and “caught up in the bidding,” the researchers limited their study to eBay’s measurable reserve price options and final prices.
In terms of action operationalization, the study results can be directly used by buyers and sellers in online auction platforms, as well as auction sites (given the restrictions of generalizing beyond eBay and beyond Pokémon cards). Recall that the study examined the effect of a reserve price not only on the final auction price but also on the probability of the auction resulting in a sale. The authors concluded:
Only 46% of secret‐reserve auctions resulted in a sale, compared with 70% of public‐reserve auctions for the same goods. Secret‐reserve auctions resulted in 0.72 fewer serious bidders per auction, and $0.62 less in final auction price, than did public‐reserve auctions on average. We can therefore recommend that sellers avoid the use of secret reserve prices, particularly for Pokémon cards.
The authors limit their recommendation to low‐cost items by quoting from The Official eBay Guide (Kaiser and Kaiser, 1999): “If your minimum sale price is below $25, think twice before using a reserve auction. Bidders frequently equate reserve with expensive.”
Note that because the study result is applicable to the “average auction,” it is most actionable for either an online auction platform which holds many auctions or for sellers who sell many items. The results do not tell us about the predictive accuracy for a single auction.
We rate this dimension as Y7 = 4 (high).
This research study communicated the analysis via a paper published in a peer‐reviewed journal. Analysis results are presented in the form of a scatter plot, a series of estimated regression models (estimated effects and standard errors) and their interpretation in the text. We assume that the researches made additional dissemination efforts (e.g., the paper is publicly available online as a working paper). The paper’s abstract is written in nontechnical and clear language and can therefore be easily understood not only by academics and researchers but also by eBay participants. The main communication weakness of the analysis is in terms of visualization, where plots would have conveyed some of the results more clearly. We therefore rate this dimension as Y8 = 4 (high).
The scores that we assigned for each of the dimensions were the following:
1. Data resolution | 4 |
2. Data structure | 4 |
3. Data integration | 4 |
4. Temporal relevance | 5 |
5. Chronology of data and goal | 5 |
6. Generalizability | 3 |
7. Operationalization | 4 |
8. Communication | 4 |
On the basis of these subjective assessments, which represent expert opinions derived from the single publication on the auction experiments, we obtain an InfoQ score based on the geometric mean of desirabilities of 77%, that is, relatively high. The relatively weak dimension is generalizability; the strongest dimensions are temporal relevance and chronology of data and goal. An effort to review the scores with some perspective of time proved these scores to be robust even though expert opinions tend to differ to a certain extent. To derive consensus‐based scores, one can ask a number of experts (three to five) to review the case and compare their scores. If the scores are consistent, one can derive a consistent InfoQ score. If they show discrepancies, one would conduct a consensus meeting of the experts where the reasoning behind their score is discussed and some score reconciliation is attempted. If a range of scores remains, then the InfoQ score can be presented as a range of values.
In this chapter we break down the InfoQ concept into eight dimensions, each dimension relating to a different aspect of the goal–data–analysis–utility components. Given an empirical study, we can then assess the level of its InfoQ by examining each of the eight dimensions. We present four assessment approaches and illustrate the rating‐based approach by applying it to the study by Katkar and Reiley (2006) on the effect of reserve prices in online auctions.
The InfoQ assessment can be done at the planning phase of a study, during a study, or after the study has been reported. In Chapter 13 we discuss the application of InfoQ assessment to research proposals of graduate students. In Chapters 4 and 5, we focus on statistical methods that can be applied, a priori or a posteriori, to enhance InfoQ, and Chapters 6–10 are about InfoQ assessments of completed studies. Such assessments provide opportunities for InfoQ enhancement, at the study design, during or after a study has been completed.
Each of the InfoQ dimensions relates to methods for InfoQ improvement that require multidisciplinary skills. For example, data integration is related to IT capabilities such as extract–transform–load (ETL) technologies, and action operationalization can be related to management processes where action items are defined in order to launch focused interventions. For a comprehensive treatment of data analytic techniques, see Shmueli et al. (2016).
In Part II, we examine a variety of studies from different areas using the rating‐based approach for assessing the eight dimensions of InfoQ. The combination of application area and InfoQ assessment provides context‐based examples. We suggest starting with a specific domain of interest, reviewing the examples in the respective chapter and then moving on to other domains and chapters. This combination of domain‐specific examples and cross‐domain case studies was designed to provide in‐depth and general perspectives of the added value of InfoQ assessments.
3.144.127.232