Official statistics are produced by a variety of organizations including central bureaus of statistics, regulatory healthcare agencies, educational systems, and national banks. Official statistics are designed to be used, and increasing their utility is one of the overarching concepts in official statistics. An issue that can lead to misconception is that many of the terms used in official statistics have specific meanings not identical to their everyday usage. Forbes and Brown (2012) state: “All staff producing statistics must understand that the conceptual frameworks underlying their work translate the real world into models that interpret reality and make it measurable for statistical purposes…. The first step in conceptual framework development is to define the issue or question(s) that statistical information is needed to inform. That is, to define the objectives for the framework, and then work through those to create its structure and definitions. An important element of conceptual thinking is understanding the relationship between the issues and questions to be informed and the definitions themselves.”
In an interview‐based study of 58 educators and policymakers, Hambleton (2002) found that the majority misinterpreted the official statistics reports on reading proficiency that compare results across school grades and across years. This finding was particularly distressing since policymakers rely on such reports for funding appropriations and for making other key decisions. In terms of information quality (InfoQ), the quality of the information provided by the reports was low. The translation from statistics to the domain of education policy was faulty.
The US Environmental Protection Agency, together with the Department of Defense and Department of Energy, launched the Quality Assurance Project Plan (see EPA, 2005), which presents “steps … to ensure that environmental data collected are of the correct type and quality required for a specific decision or use.” They used the term data quality objectives to describe “statements that express the project objectives (or decisions) that the data will be expected to inform or support.” These statements relate to descriptive goals, such as “Determine with greater than 95% confidence that contaminated surface soil will not pose a human exposure hazard.” These statements are used to guide the data collection process. They are also used for assessing the resulting data quality.
Central bureaus of statistics are now combining survey data with administrative data in dynamically updated studies that have replaced the traditional census approach, so that proper integration of data sources is becoming a critical requirement. We suggest in this chapter and related papers that evaluating InfoQ can significantly contribute to the range of examples described earlier (Kenett and Shmueli, 2016).
The chapter proceeds as follows: Section 10.2 reviews the InfoQ dimensions in the context of official statistics research studies. Section 10.3 presents quality standards applicable to official statistics and their relationship with InfoQ dimensions, and Section 10.4 describes standards used in customer surveys and their relationship to InfoQ. We conclude with a chapter summary in Section 10.5.
We revisit here the eight InfoQ dimensions with guiding questions that can be used in planning, designing, and evaluating official statistics reports. We accompany these with examples from official statistics studies.
Data resolution refers to the measurement scale and aggregation level of the data. The measurement scale of the data should be carefully evaluated in terms of its suitability to the goal (g), the analysis methods used (f), and the required resolution of the utility (U). Questions one should ask to determine the strength of this dimension include the following:
A low rating on data resolution can be indicative of low trust in the usefulness of the study’s findings. An example of data resolution, in the context of official statistics, is the Google Flu Trends (www.google.org/flutrends) case. The goal of the original application by Google was to forecast the prevalence of influenza on the basis of the type and extent of Internet search queries. These forecasts were shown to strongly correlate with the official figures published by the Centers for Disease Control and Prevention (CDC) (Ginsberg et al., 2009). The advantage of Google’s tracking system over the CDC system is that data is available immediately and forecasts have only a day’s delay, compared to the week or more that it takes for the CDC to assemble a picture based on reports from confirmed laboratory cases. Google is faster because it tracks the outbreak by finding a correlation between what users search for online and whether they have flu symptoms. In other words, it uses immediately available data on searches that correlate with symptoms. Although Google initially claimed very high forecast accuracy, it turned out that there were extreme mispredictions and other challenges (David et al., 2014; Lazer et al., 2014). Following the criticism, Google has reconfigured Flu Trends to include data from the CDC to better forecast the flu season (Marbury, 2014). Integrating the two required special attention to the different resolutions of the two data sources.
Data structure refers to the type(s) of data and data characteristics such as corrupted and missing values due to the study design or data collection mechanism. As discussed in Chapter 3, data types include structured, numerical data in different forms (e.g., cross‐sectional, time series, and network data) as well as unstructured, nonnumerical data (e.g., text, text with hyperlinks, audio, video, and semantic data). Another type of data generated by official statistics surveys, called paradata, is related to the process by which the survey data was collected. Examples of paradata include the times of day on which the survey interviews were conducted, how long the interviews took, how many times there were contacts with each interviewee or attempts to contact the interviewee, the reluctance of the interviewee and the mode of communication (phone, Web, email, or in person). These attributes affect the costs and management of a survey, the findings of a survey, evaluations of interviewers, and analysis of nonresponders.
Questions one should ask to determine the data structure include the following:
With the variety of data sources and data types available today, studies sometimes integrate data from multiple sources and/or types to create new knowledge regarding the goal at hand. Such integration can increase InfoQ, but in other cases, it can reduce InfoQ, for example, by creating privacy breaches.
Questions that help assess data integration levels include the following:
One example of data integration and integrated analysis is the calibration of data from a company survey with official statistics data (Dalla Valle and Kenett, 2015) where Bayesian networks (BNs) were used to match the relation between variables in the official and the administrative data sets by conditioning. Another example of data integration between official data and company data is the Google Flu Trends case described in Section 10.2.1, which now integrates data from the CDC. For further examples of data integration in official statistics, see also Penny and Reale (2004), Figini et al. (2010), and Vicard and Scanu (2012).
The process of deriving knowledge from data can be put on a timeline that includes data collection, data analysis, and results’ usage periods as well as the temporal gaps between these three stages. The different durations and gaps can each affect InfoQ. The data collection duration can increase or decrease InfoQ, depending on the study goal, for example, studying longitudinal effects versus a cross‐sectional goal. Similarly, if the collection period includes uncontrollable transitions, this can be useful or disruptive, depending on the study goal.
Questions that help assess temporal relevance include the following:
A low rating on temporal relevance can be indicative of an analysis with low relevance to decision makers due to data collected in a different contextual condition. This can happen in economic studies, with policy implications based on old data. The Google Flu Trends application, which now integrates CDC data, is a case where temporal relevance is key. The original motivation for using Google search of flu‐related keywords in place of the official CDC data of confirmed cases was the delay in obtaining laboratory data. The time gap between data collection and its availability is not an issue with Google search, as opposed to CDC data. Yet, in the new Google Flu Trends application, a way was found to integrate CDC data while avoiding long delays in forecasts. In this case, data collection, data analysis, and deployment (generating forecasts) are extremely time sensitive, and this time sensitivity was the motivation for multiple studies exploring alternative data sources and algorithms for detecting influenza and other disease outbreaks (see Goldenberg et al. (2002) and Shmueli and Burkom (2010)).
The choice of variables to collect, the temporal relationship between them, and their meaning in the context of the goal at hand affect InfoQ. This is especially critical in predictive goals and in inferring causality (e.g., impact studies).
Questions that help assess this dimension include the following:
A low rating on chronology of data and goal can be indicative of low relevance of a specific data analysis due to misaligned timing. For example, consider the reporting of the consumer price index (CPI). This index measures changes in the price level of a market basket of consumer goods and services purchased by households and is used as a measure of inflation with impact on wages, salaries, and pensions. A delayed reporting of CPI will have a huge economic impact which affects temporal relevance. In terms of chronology of data and goal, we must make sure that the household data on consumer goods and services, from which the CPI index is computed, is available sufficiently early for producing the CPI index.
The utility of f (X | g) is dependent on the ability to generalize to the appropriate population. Official statistics are mostly concerned with statistical generalizability. Statistical generalizability refers to inferring f from a sample to a target population. Scientific generalizability refers to generalizing an estimated population pattern/model f to other populations or applying f estimated from one population to predict individual observations in other populations.
While census data are, by design, general to the entire population of a country, they can be used to estimate a model which can be compared with models in other countries or used to predict outcomes in another country, thereby invoking scientific generalizability. In addition, using census data to forecast future values is also a generalization issue, where forecasts are made to a yet unknown context.
Questions that help assess the generalization type and aspects include the following:
We consider two types of operationalization: construct operationalization and action operationalization. Constructs are abstractions that describe a phenomenon of theoretical interest. Official statistics are often collected by organizations and governments to study constructs such as poverty, well‐being, and unemployment. These constructs are carefully defined and economic or other measures as well as survey questions are crafted to operationalize the constructs for different purposes. Construct operationalization questions include the following:
Action operationalizing refers to the following three questions posed by W. Edwards Deming (1982):
A low rating on operationalization indicates that the study might have academic value but, in fact, has no practical impact. In fact, many statistical agencies see their role only as data providers, leaving the dimension of action operationalization to others. In contrast, Forbes and Brown (2012) clearly state that official statistics “need to be used to be useful” and utility is one of the overarching concepts in official statistics. With this approach, operationalization is a key dimension in the InfoQ of official statistics.
Effective communication of the analysis and its utility directly impacts InfoQ. Communicating official statistics is especially sensitive, since they are usually intended for a broad nontechnical audience. Moreover, such statistics can have important implications, so communicating the methodology used to reach such results can be critical. An example is reporting of agricultural yield forecasts generated by government agencies such as the National Agricultural Statistics Service (NASS) at the US Department of Agriculture (USDA), where such forecasts can be seen as political, leading to funding and other important policies and decisions:
NASS has provided detailed descriptions of their crop estimating and forecasting procedures. Still, market participants continue to demonstrate a lack of understanding of NASS methodology for making acreage, yield, and production forecasts and/or a lack of trust in the objectives of the forecasts… Beyond misunderstanding, some market participants continue to express the belief that the USDA has a hidden agenda associated with producing the estimates and forecasts. This “agenda” centers on price manipulation for a variety of purposes, including such things as managing farm program costs and influencing food prices
(Good and Irwin, 2011).
In education, a study of how decision makers understand National Assessment of Educational Progress (NAEP) reports was conducted by Hambleton (2002) and Goodman and Hambleton (2004). Among other things, the study shows that a table presenting the level of advanced proficiency of grade 4 students was misunderstood by 53% of the respondents who read the report. These readers assumed that the number represented the percentage of students in that category when, in fact, it represented the percentage in all categories, up to advanced proficiency, that is, basic, proficient, and advanced proficiency. The implication is that the report showed a much gloomier situation than the one understood by more than half of the readers.
Questions that help assess the level of communication include the following:
A concept of Quality of Statistical Data was developed and used in European official statistics and international organizations such as the International Monetary Fund (IMF) and the Organization for Economic Cooperation and Development (OECD). This concept refers to the usefulness of summary statistics produced by national statistics agencies and other producers of official statistics. Quality is evaluated, in this context, in terms of the usefulness of the statistics for a particular goal. The OECD uses seven dimensions for quality assessment: relevance, accuracy, timeliness and punctuality, accessibility, interpretability, coherence, and credibility (see Chapter 5 in Giovanini, 2008). Eurostat’s quality dimensions are relevance of statistical concepts, accuracy of estimates, timeliness and punctuality in disseminating results, accessibility and clarity of the information, comparability, coherence, and completeness. See also Biemer and Lyberg (2003), Biemer et al. (2012), and Eurostat (2003, 2009).
In the United States, the National Center for Science and Engineering Statistics (NCSES), formerly the Division of Science Resources Statistics, was established within the National Science Foundation with a general responsibility for statistical data. Part of its mandate is to provide information that is useful to practitioners, researchers, policymakers, and the public. NCSES prepares about 30 reports per year based on surveys.
The purpose of survey standards is to set a framework for assuring data and reporting quality. Guidance documents are meant to help (i) increase the reliability and validity of data, (ii) promote common understanding of desired methodology and processes, (iii) avoid duplication and promote the efficient transfer of ideas, and (iv) remove ambiguities and inconsistencies. The goal is to provide the clearest possible presentations of data and its analysis. Guidelines typically focus on technical issues involved in the work rather than issues of contract management or publication formats.
Specifically, NCSES aims to adhere to the ideals set forth in “Principles and Practices for a Federal Statistical Agency.” As a US federal statistical agency, NCSES surveys must follow guidelines and policies as set forth in the Paperwork Reduction Act and other legislations related to surveys. For example, NCSES surveys must follow the implementation guidance, survey clearance policies, response rate requirements, and related orders prepared by the Office of Management and Budget (OMB). The following standards are based on US government standards for statistical surveys (see www.nsf.gov/statistics/). We list them with an annotation mapping to InfoQ dimensions, when relevant (Table 10.1 summarizes these relationships) See also Office for National Statistics (2007).
Standard 1.1: Agencies initiating a new survey or major revision of an existing survey must develop a written plan that sets forth a justification, including goals and objectives, potential users, the decisions the survey is designed to inform, key survey estimates, the precision required of the estimates (e.g., the size of differences that need to be detected), the tabulations and analytic results that will inform decisions and other uses, related and previous surveys, steps taken to prevent unnecessary duplication with other sources of information, when and how frequently users need the data and the level of detail needed in tabulations, confidential microdata, and public‐use data files.
This standard requires explicit declaration of goals and methods for communicating results. It also raises the issue of data resolution in terms of dissemination and generalization (estimate precision).
Standard 1.2: Agencies must develop a survey design, including defining the target population, designing the sampling plan, specifying the data collection instruments and methods, developing a realistic timetable and cost estimate, and selecting samples using generally accepted statistical methods (e.g., probabilistic methods that can provide estimates of sampling error). Any use of nonprobability sampling methods (e.g., cutoff or model‐based samples) must be justified statistically and be able to measure estimation error. The size and design of the sample must reflect the level of detail needed in tabulations and other data products and the precision required of key estimates. Documentation of each of these activities and resulting decisions must be maintained in the project files for use in documentation (see Standards 7.3 and 7.4).
This standard advises on data resolution, data structure, and data integration. The questionnaire design addresses the issue of construct operationalization, and estimation error relates to generalizability.
Standard 1.3: Agencies must design the survey to achieve the highest practical rates of response, commensurate with the importance of survey uses, respondent burden, and data collection costs, to ensure that survey results are representative of the target population so that they can be used with confidence to inform decisions. Nonresponse bias analyses must be conducted when unit or item response rates or other factors suggest the potential for bias to occur.
The main focus here is on statistical generalization, but this standard also deals with action operationalization. The survey must be designed and conducted in a way that encourages respondents to take action and respond.
Standard 1.4: Agencies must ensure that all components of a survey function as intended when implemented in the full‐scale survey and that measurement error is controlled by conducting a pretest of the survey components or by having successfully fielded the survey components on a previous occasion.
Pretesting is related to data resolution and to the question of whether the collection instrument is sufficiently reliable and precise.
Standard 2.1: Agencies must ensure that the frames for the planned sample survey or census are appropriate for the study design and are evaluated against the target population for quality.
Sampling frame development is crucial for statistical generalization. Here we also ensure chronology of data and goal in terms of the survey deployment.
Standard 2.2: Agencies must ensure that each collection of information instrument clearly states the reasons why the information is planned to be collected, the way such information is planned to be used to further the proper performance of the functions of the agency, whether responses to the collection of information are voluntary or mandatory (citing authority), the nature and extent of confidentiality to be provided, if any (citing authority), an estimate of the average respondent burden together with a request that the public direct to the agency any comments concerning the accuracy of this burden estimate and any suggestions for reducing this burden, the OMB control number, and a statement that an agency may not conduct and a person is not required to respond to an information collection request unless it displays a currently valid OMB control number.
This is another aspect of action operationalization.
Standard 2.3: Agencies must design and administer their data collection instruments and methods in a manner that achieves the best balance between maximizing data quality and controlling measurement error while minimizing respondent burden and cost.
The standards in Section 10.3.3 are focused on the data component and, in particular, assuring data quality and confidentiality.
Standard 3.1: Agencies must edit data appropriately, based on available information, to mitigate or correct detectable errors.
Standard 3.2: Agencies must appropriately measure, adjust for, report, and analyze unit and item nonresponse to assess their effects on data quality and to inform users. Response rates must be computed using standard formulas to measure the proportion of the eligible sample that is represented by the responding units in each study, as an indicator of potential nonresponse bias.
This relates to generalizability.
Standard 3.3: Agencies must add codes to the collected data to identify aspects of data quality from the collection (e.g., missing data) in order to allow users to appropriately analyze the data. Codes added to convert information collected as text into a form that permits immediate analysis must use standardized codes, when available, to enhance comparability.
Standard 3.4: Agencies must implement safeguards throughout the production process to ensure that survey data are handled confidentially to avoid disclosure.
Standard 3.5: Agencies must evaluate the quality of the data and make the evaluation public (through technical notes and documentation included in reports of results or through a separate report) to allow users to interpret results of analyses and to help designers of recurring surveys focus improvement efforts.
This is related to communication.
Standard 4.1: Agencies must use accepted theory and methods when deriving direct survey‐based estimates, as well as model‐based estimates and projections that use survey data. Error estimates must be calculated and disseminated to support assessment of the appropriateness of the uses of the estimates or projections. Agencies must plan and implement evaluations to assess the quality of the estimates and projections.
This standard is aimed at statistical generalizability and focuses on the quality of the data analysis (deriving estimates can be considered part of the data analysis component).
Standard 5.1: Agencies must develop a plan for the analysis of survey data prior to the start of a specific analysis to ensure that statistical tests are used appropriately and that adequate resources are available to complete the analysis.
This standard is again focused on analysis quality.
Standard 5.2: Agencies must base statements of comparisons and other statistical conclusions derived from survey data on acceptable statistical practice.
Standard 6.1: Agencies are responsible for the quality of information that they disseminate and must institute appropriate content/subject matter, statistical, and methodological review procedures to comply with OMB and agency InfoQ guidelines.
Standard 7.1: Agencies must release information intended for the general public according to a dissemination plan that provides for equivalent, timely access to all users and provides information to the public about the agencies’ dissemination policies and procedures including those related to any planned or unanticipated data revisions.
This standard touches on chronology of data and goal and communication. It can also affect temporal relevance of studies that rely on the dissemination schedule.
Standard 7.2: When releasing information products, agencies must ensure strict compliance with any confidentiality pledge to the respondents and all applicable federal legislation and regulations.
Standard 7.3: Agencies must produce survey documentation that includes those materials necessary to understand how to properly analyze data from each survey, as well as the information necessary to replicate and evaluate each survey’s results (see also Standard 1.2). Survey documentation must be readily accessible to users, unless it is necessary to restrict access to protect confidentiality. Proper documentation is essential for proper communication.
Standard 7.4: Agencies that release microdata to the public must include documentation clearly describing how the information is constructed and provide the metadata necessary for users to access and manipulate the data (see also Standard 1.2). Public‐use microdata documentation and metadata must be readily accessible to users. This standard is aimed at adequate communication of the data (not the results).
These standards provide a comprehensive framework for the various activities involved in planning and implementing official statistics surveys. Section 10.4 is focused on customer satisfaction surveys such as the surveys on service of general interest (SGI) conducted within the European Union (EU).
Customer satisfaction, according to the ISO 10004:2010 standards of the International Organization for Standardization (ISO), is the “customer’s perception of the degree to which the customer’s requirements have been fulfilled.” It is “determined by the gap between the customer’s expectations and the customer’s perception of the product [or service] as delivered by the organization”.
ISO describes the importance of standards on their website: “ISO is a nongovernmental organization that forms a bridge between the public and private sectors. Standards ensure desirable characteristics of products and services such as quality, environmental friendliness, safety, reliability, efficiency, and interchangeability—and at an economical cost.”
ISO’s work program ranges from standards for traditional activities such as agriculture and construction, to mechanical engineering, manufacturing, and distribution, to transport, medical devices, information and communication technologies, and standards for good management practice and for services. Its primary aim is to share concepts, definitions, and tools to guarantee that products and services meet expectations. When standards are absent, products may turn out to be of poor quality, they may be incompatible with available equipment or they could be unreliable or even dangerous
The goals and objectives of customer satisfaction surveys are clearly described in ISO 10004. “The information obtained from monitoring and measuring customer satisfaction can help identify opportunities for improvement of the organization’s strategies, products, processes, and characteristics that are valued by customers, and serve the organization’s objectives. Such improvements can strengthen customer confidence and result in commercial and other benefits.”
We now provide a brief description of the ISO 10004 standard which provides guidelines for monitoring and measuring customer satisfaction. The rationale of the ISO 10004 standard—as reported in Clause 1—is to provide “guidance in defining and implementing processes to monitor and measure customer satisfaction.” It is intended for use “by organizations regardless of type, size, or product provided,” but it is related only “to customers external to the organization.”
The ISO approach outlines three phases in the processes of measuring and monitoring customer satisfaction: planning (Clause 6), operation (Clause 7), and maintenance and improvement (Clause 8). We examine each of these three and their relation to InfoQ. Table 10.2 summarizes these relationships.
Table 10.2 Relationship between ISO 10004 guidelines and InfoQ dimensions. Shaded cells indicate an existing relationship.
ISO 10004 phase | Data resolution | Data structure | Data integration | Temporal relevance | Data–goal chronology | Generalizability | Operationalization | Communication |
Planning | ||||||||
Operation | ||||||||
Maintenance and improvement |
The planning phase refers to “the definition of the purposes and objectives of measuring customer satisfaction and the determination of the frequency of data gathering (regularly, on an occasional basis, dictated by business needs or specific events).” For example, an organization might be interested in investigating reasons for customer complaints after the release of a new product or for the loss of market share. Alternatively it might want to regularly compare its position relative to other organizations. Moreover, “Information regarding customer satisfaction might be obtained indirectly from the organization’s internal processes (e.g., customer complaints handling) or from external sources (e.g., reported in the media) or directly from customers.” In determining the frequency of data collection, this clause is related to chronology of data and goal as well as to temporal relevance. The “definition of … customer satisfaction” concerns construct operationalization. The collection of data from different sources indirectly touches on data structure and resolution. Yet, the use of “or” for choice of data source indicates no intention of data integration.
The operation phase represents the core of the standard and introduces operational steps an organization should follow in order to meet the requirements of ISO 10004. These steps are as follows:
Statistical issues mentioned in ISO 10004 relate to the number of customers to be surveyed (sample size), the method of sampling (Clause 7.3.3.3 and Annex C.3.1, C3.2), and the choice of the scale of measurement (Clause 7.3.3.4 and Annex C.4). Identifying the population of interest and sample design is related to generalization. Communication is central to step (d). Step (e) refers to data integration and the choice of measurement scale is related to data resolution.
The maintenance and improvement phase includes periodic review, evaluation, and continual improvement of processes for monitoring and measuring customer satisfaction. This phase is aimed at maintaining generalizability and temporal relevance as well as the appropriateness of construct operationalization (“reviewing the indirect indicators of customer satisfaction”). Data integration is used to validate the information against other sources. Communication and actionable operationalization are also mentioned (See Table 10.2).
As mentioned in the introduction to this chapter, official statistics are produced by a variety of organizations including central bureaus of statistics, regulatory healthcare agencies, educational systems, and national banks. A common trend is the integration of official statistics and organizational data to derive insights at the local and global levels. An example is provided by the Intesa Sanpaolo Bank in Italy that maintains an integrated database for supporting analytic research requests by management and various decision makers (Forsti et al., 2012). The bank uses regression models applied to internal data integrated with data from a range of official statistics providers such as:
In this example, the competitiveness of an enterprise was assessed using factors such as innovation and R&D; intangibles (e.g., human capital, brands, quality, and environmental awareness); and foreign direct investment. Some of the challenges encountered in order to obtain a coherent integrated database included incomplete matching using “tax ID No.” as the key since patent, certification, and trademark archives contain only the business name and address of the enterprise. As a result, an algorithm was developed for matching a business name and address to other databases containing both the same information and the tax ID No. With this approach different business names and addresses may appear for the same enterprise (for instance, abbreviated names, acronyms with or without full stops, presence of the abbreviated legal form, etc.). The tax ID No. of an enterprise may also change over the years. Handling these issues properly is key to the quality of information generated by regression analysis. For more details, see Forsti et al. (2012).
This section is about information derived from an analysis of official statistics data integrated with administrative datasets such as the Intesa bank example. We describe the use of graphical models for performing the integration. The objective is to provide decision makers with high‐quality information. We use the InfoQ concept framework for evaluating the quality of such information to decision makers or other stakeholders. We refer here to two case studies analyzed in Dalla Valle and Kenett (2015), one from the field of education and the other from accident reporting. The case studies demonstrate a calibration methodology developed by Dalla Valle and Kenett (2015) for calibrating official statistics with administrative data. For other examples, see Dalla Valle (2014).
BNs are directed acyclic graphs (DAGs) whose nodes represent variables and the edges represent causal relationships between the variables. These variables are associated with conditional probability functions that, together with the DAG, are able to provide a compact representation of high‐dimensional distributions. For an introduction and for more details about the definitions and main properties of BNs, see Chapters 7 and 8. These models were applied in official statistics data analysis by Penny and Reale (2004) who used graphical models to identify relevant components in a saturated structural VAR model of the quarterly gross domestic product that aggregates a large number of economic time series. More recently, Vicard and Scanu (2012) also applied BNs to official statistics, showing that the use of poststratification allows integration and missing data imputation. For general applications of BNs, see Kenett and Salini (2009), Kenett and Salini (2012), and Kenett (2016). For an application of BNs in healthcare, see Section 8.4.
The methodology proposed by Dalla Valle and Kenett (2015) increases InfoQ via data integration of official and administrative information, thus enhancing temporal relevance and chronology of data and goal. The idea is in the same spirit of external benchmarking used in small area estimation (Pfeffermann, 2013). In small area estimation, benchmarking robustifies the inference by forcing the model‐based predictors to agree with a design‐based estimator. Similarly, the calibration methodology by Dalla Valle and Kenett (2015) is based on qualitative data calibration performed via conditionalizing graphical models, where official statistics estimates are updated to agree with more timely administrative data estimates. The calibration methodology is structured in three phases:
These phases are applied to each of the datasets to be integrated. In Sections 10.5.3 and 10.5.4, we demonstrate the application of these three phases using case studies from education and transportation.
The dataset used here was collected by the Italian Stella association in 2009. Stella is an interuniversity initiative aiming at cooperation and coordination of the activities of supervision, statistical analysis, and evaluation of the graduate and postgraduate paths. The initiative includes universities from the north and the center of Italy. The Stella dataset contains information about postdoctoral placements after 12 months for graduates who obtained a PhD in 2005, 2006, and 2007. The dataset includes 665 observations and eight variables.
The variables are as follows:
The second dataset contains information collected through an internal small survey conducted locally in a few universities of Lombardy, in Northern Italy, and in Rome. The sample survey on university graduates’ vocational integration is based on interviews with graduates who attained the university degree in 2004. The survey aims at detecting graduates’ employment conditions about four years after graduation. From the initial sample, the researchers only considered those individuals who concluded their PhD and are currently employed. After removing the missing values, they obtained a total of 52 observations. The variables in this dataset are as follows:
We now describe the approach taken by Dalla Valle and Kenett (2015) for integrating the Stella data with the Graduates data for the purpose of updating the 2004 survey data with 2009 data from the Stella data.
Furthermore, the Stella dataset is conditioned on a low value of begsal and emp and for a high value of yPhD. The variable yPhD is considered as a proxy of the starting year of employment, used in the Graduates dataset. In this case, the average salary decreases to a figure similar to the average value of the Graduates dataset, as in Figure 10.3.
Based on this analysis, we summarize the InfoQ scores for each dimension as shown in Table 10.3. The overall InfoQ score for this study is 74%.
Table 10.3 Scores for InfoQ dimensions for Stella education case study.
InfoQ dimension | Score |
Data resolution | 3 |
Data structure | 3 |
Data integration | 5 |
Temporal relevance | 3 |
Chronology of data and goal | 5 |
Generalizability | 4 |
Operationalization | 5 |
Communication | 5 |
Scores are on a 5‐point scale.
The second case study is based on the Vehicle Safety dataset. The National Highway Traffic Safety Administration (NHTSA), under the US Department of Transportation, was established by the Highway Safety Act of 1970, as the successor to the National Highway Safety Bureau, to carry out safety programs under the National Traffic and Motor Vehicle Safety Act of 1966 and the Highway Safety Act of 1966. NHTSA also carries out consumer programs established by the Motor Vehicle Information and Cost Savings Act of 1972 (www.nhtsa.gov). NHTSA is responsible for reducing deaths, injuries, and economic losses resulting from motor vehicle crashes. This is accomplished by setting and enforcing safety performance standards for motor vehicles and motor vehicle equipment and through grants to state and local governments to enable them to conduct effective local highway safety programs. NHTSA investigates safety defects in motor vehicles; sets and enforces fuel economy standards; helps states and local communities to reduce the threat of drunk drivers; promotes the use of safety belts, child safety seats, and air bags; investigates odometer fraud; establishes and enforces vehicle anti‐theft regulations; and provides consumer information on motor vehicle safety topics. NHTSA also conducts research on driver behavior and traffic safety to develop the most efficient and effective means of bringing about safety improvements.
The Vehicle Safety data represents official statistics. After removing the missing data, we obtain a final dataset with 1241 observations, where each observation includes 14 variables on a car manufacturer, between the late 1980s and the early 1990s. The variables are as follows:
The Crash Test dataset contains information about vehicle crash tests collected by a car manufacturer company for marketing purposes. The data contains variables measuring injuries of actual crash tests and is collected following good accuracy standards. We consider this data as administrative or organizational data.
The dataset is a small sample of 176 observations about vehicle crash tests. A range of US‐made vehicles containing dummies in the driver and front passenger seats were crashed into a test wall at 35 miles/hour and information was collected, recording how each crash affected the dummies. The injury variables describe the extent of head injuries, chest deceleration, and left and right femur load. The data file also contains information on the type and safety features of each vehicle. A brief description of the variables within the data is provided as follows:
We now describe the graphical approach for combining the official and administrative datasets.
Based on this analysis, we summarize the InfoQ scores for each dimension as shown in Table 10.4. The overall InfoQ score for this study is 81%.
Table 10.4 Scores for InfoQ dimensions for the NHTSA safety case study.
InfoQ dimension | Score |
Data resolution | 3 |
Data structure | 4 |
Data integration | 5 |
Temporal relevance | 3 |
Chronology of data and goal | 5 |
Generalizability | 5 |
Operationalization | 5 |
Communication | 5 |
Scores are on a 5‐point scale.
With the increased availability of data sources and ubiquity of analytic technologies, the challenge of transforming data to information and knowledge is growing in importance (Kenett, 2008). Official statistics play a critical role in this context and applied research, using official statistics, needs to ensure the generation of high InfoQ (Kenett and Shmueli, 2014, 2016). In this chapter, we discussed the various elements that determine the quality of such information and described several proposed approaches for achieving it. We compared the InfoQ concept with NCSES and ISO standards and also discussed examples of how official statistics data and data from internal sources are integrated to generate higher InfoQ.
The sections on quality standards in official statistics and ISO standards related to customer surveys discuss aspects related to five of the InfoQ dimensions: data resolution, data structure, data integration, temporal relevance, and chronology of data and goal. Considering each of these InfoQ dimensions with their associated questions can help in increasing InfoQ. The chapter ends with two examples where official statistics datasets are combined with organizational data in order to derive, through analysis, information of higher quality. An analysis using BNs permits the calibration of the data, thus strengthening the quality of the information derived from the official data. As before, the InfoQ dimensions involved in such calibration include data resolution, data structure, data integration, temporal relevance, and chronology of data and goal. These two case studies demonstrate how concern for the quality of the information derived from an analysis of a given data set requires attention to several dimensions, beyond the quality of the analysis method used. The eight InfoQ dimensions provide a general template for identifying and evaluating such challenges.
18.118.137.243