CHAPTER 3

Total Survey Error

Error is inevitable and occurs in any survey. If you need perfection, don’t bother doing research. What is important is to identify the possible sources of error and then try to minimize these errors.

Herbert Weisberg defines error as the “difference between an obtained value and the true value.”1 Typically we don’t know what the true value is, but that doesn’t change our definition of error. When we discussed sampling in the previous chapter, it was the population value. When we focus on measurement, it is the true or actual value of whatever is being measured. Error is the difference between that true value and whatever the obtained or observed value turns out to be. It’s important to keep in mind that error can occur at any point in the process of doing a survey from the initial design of the survey through the writing of the report.

Weisberg points out that error can be random or systematic.2 For example, when we select a sample from a population, there will be sampling error. No sample is a perfect representation of the population. Assuming we are using probability sampling, this error will be random. However, sometimes some elements in the population are systematically left out of the sample. For example, if we are doing a phone survey and rely exclusively on landlines that could produce a systematic error because we have left out cell-phone-only households. Systematic error is often referred to as bias. We need to be aware of both random and systematic error.

There are many types of error that can occur. In the past, the focus was on sampling error and nonresponse error, which occurred as a result of refusals or the inability to contact respondents. Instead of focusing on just a couple of types of error, we should focus on all possible types of survey error. This is often referred to as total survey error.3 Paul Biemer defines total survey error as the “accumulation of all errors that may arise in the design, collection, processing and analysis of survey data.”4

There are various ways of categorizing the different types of survey error. Typically we consider the following types of error5:

  • Sampling error
  • Coverage error
  • Nonresponse error
  • Measurement error

Weisberg also discusses survey administration issues such as the following:

  • Mode effects, which refers to the fact that different modes of survey delivery such as telephone, face-to-face, mailed, and web surveys sometimes produce different results; and
  • Postsurvey error, which occurs during the processing and analysis of data.6

To this we would add error that occurs in the reporting of survey data.

We’re going to look at each of these types of error, discuss some of the research findings about each type, and talk about how you can try to minimize error.

Sampling Error

Sampling error is one of the issues in sample design and occurs whenever you select a sample from a population. No sample is ever a perfect picture of the population. Let’s say that your population is all households in the city in which you live. You select a sample of 500 households from this population.i You’re interested in the proportion of households that recycle such things as cans, bottles, and other recyclable materials. It turns out that 45 percent of the sample recycles. That doesn’t mean that 45 percent of the population recycles. Why? Sampling always carries with it some amount of sampling error. It’s inevitable.

Here’s another way to understand sampling error. We can use sample data to estimate population values. If you were to select repeated random samples of the same size from the same population, your sample estimates would vary from sample to sample. If you think about it, this makes sense. Each sample will contain a different set of households. So why would you expect all the samples to give you the same estimate of the households that recycle?

One of the advantages of probability sampling is that you can estimate the amount of sampling error there will be from sample to sample. Assuming that you used probability sampling to get your sample and that you properly selected your sample, the resulting sampling error will be random. And to make things even better, there are things you can do to reduce sampling error.

Minimizing Sampling Error

Here are two ways you can reduce sampling error.

  • Increase the size of your sample. Of course, there are practical limits to how big a sample you choose. You’re limited by the cost and time it will take to collect the data. If you can decide how much sampling error you’re willing to tolerate, you can determine the size of the sample that you will need.
  • You can also stratify your sample to reduce sampling error. With stratified sampling, you start by dividing your population into homogenous groups, such as males and females. Then you sample from each of these groups. Often you choose your sample such that the sample has the same proportion of males and females as does your population. If you stratify your sample by some variable that is related to what you want to estimate, then this will reduce sampling error.7

Coverage Error

Earl Babbie distinguishes between the population and the study population. The population is the “theoretically specified aggregation of study elements,” while the study population is the “aggregation of elements from which a sample is actually selected.”8 In other words, the population you want to make statements about can be different from the study population from which you draw your sample. The sampling frame is the actual list from which the sample is selected.9

Coverage error occurs when the sampling frame does not match the population. In other words, sometimes the list from which the sample is selected does not match the population, and this produces coverage error. For example, some elements in the population may have been left out of the list from which the sample is selected.ii Let’s look at some examples.

  • The university wants to know how students feel about raising student fees to partially fund a new student center. The population is all students registered at the university during the current semester (or quarter). The list from which the sample is drawn is the registrar’s official student roster. In this case, the list from which the sample is drawn almost perfectly matches the population. The only coverage error would be a result of errors in the registrar’s list.
  • Our research group has a contract to do a consumer attitudes survey in your county. We want to know how people feel about consumer spending, investments, borrowing, and savings. The population is all adults (18 years and above) living in your county at the time of the survey. We decide to do a telephone survey but we’re not sure how to select our sample. Here are some possibilities that have been suggested.
    • One member of the team suggests that we draw our sample from all individuals listed in the phone directory published by the telephone company. However, this is quickly rejected when another member points out that this would systematically omit all people with unlisted numbers and those with only cell phones. That would create coverage error since we would be systematically omitting a large proportion of our population and people with listed numbers are systematically different from those with unlisted numbers and with only cell phones.
    • Another team member suggests using a random-digit dialing approach, in which residential prefixes of landlines in your county are sampled and then random digits are added to these prefixes to produce the list of phone numbers which we would call.10 This is also rejected when someone points out that while we would be including those with unlisted landlines we would be omitting households which have only cell phones or no landline.
    • Then someone tells us that the U.S. Postal Service has a list of residential addresses which is available through commercial providers and which might work for us.
      This is referred to as address-based sampling. Shelley Roth et al. suggest that this provides “nearly complete coverage of residential addresses in the United States.”11 David McNabb notes that there are some coverage issues that you might encounter when using this approach. Someone suggests that this list might undercover rural areas and groups such as college students living in dorms. New homes are constantly being built and some homes may be destroyed by fire and natural disasters.12 So there might still be coverage error, but it would be considerably less than in the first two options.13
  • The General Social Survey (GSS) is a large national probability survey that began in 1972 and is now conducted biannually by the National Opinion Research Center at the University of Chicago.14 The population is all adults (18 years and above) residing in households in the United States as of a particular date. The sample was originally drawn from all adults who speak English and live in noninstitutionalized settings. In 2006, Spanish-speaking individuals were added to the sample. That means that individuals living in institutionalized settings are systematically excluded, and prior to 2006 non-English-speaking individuals were excluded. From 2006 onward, those who didn’t speak English or Spanish were excluded. If those who are excluded are a small part of the population, this will probably introduce a small amount of coverage error into the design. Cost considerations may compel the researcher to live with this small amount of bias in order to reduce costs.
  • Let’s say you want to do a survey of church members in your county. You want to find out why some church members are more active in their church than other members. First, you have to compile a list of all churches in your county. You’re surprised to find out that such a list is not immediately available, but with some work, you assemble the list. Now you select a sample of churches to which you plan to send your survey. You contact the churches in your sample and ask them to send you their membership lists so you can select a sample of members from each church in the sample.iii Some churches are not willing to send you their membership list, but most offer to send the list on the condition that you do not share it with anyone other than the project staff. However, many of the churches tell you that their membership list is out of date. After more discussion, you find out that there are several problems.
    • Some members have moved away but are still on the membership list.
    • Not all new members have been added to the list.
    • It’s possible that some members appear twice on the list.

    You realize this is going to produce coverage error. The best solution is to work with each church in your sample to delete members who are no longer there, add in the new members, and take out the duplicates. It takes some work, but it’s worth it because it reduces coverage error.

Minimizing Coverage Error

So how do we try to reduce coverage error? First, we have to ask the right questions. Don Dillman et al. suggest that there are certain questions that we ought to always consider.15

  • “Does the sample frame contain everyone in the sample population?”
  • “Does the sample frame include names of people who are not in the study population?”
  • “Are the same sample units included on the frame more than once?”
  • “How is the frame maintained and updated?”
  • “Does the frame contain other information that can be used to improve the survey?” This could include information such as phone numbers and e-mail addresses, which could be used to follow up those who don’t respond to our survey.
  • “What to do when no list is available?” Dillman et al. use the example of visitors to a national park. In cases like this, we might sample people as they enter or leave the park.16

So the general strategy for dealing with coverage error is to first identify the sources of error. Once we know what the problems are, then we can try to reduce them keeping in mind that eliminating all coverage error is probably not possible. This can be done in several ways.

  • We can try to make the list from which we draw our sample more complete by taking out the elements that shouldn’t be in the list, adding in the missing elements, and deleting duplicates. Using the example of sampling church members discussed earlier in this chapter, we could work with the staff of the churches to bring their membership lists up to date.
  • We can look to see if there are other lists that we can use to improve coverage. For example, churches might have lists of new members or members who have transferred out of the church even though they haven’t updated their membership lists.
  • Even if we can’t completely eliminate all coverage error, we can at least be aware of the error that exists and take this into account when we write our report. We need to be careful to limit the degree to which we generalize our results. For example, with the GSS discussed previously, we should be careful to only generalize to adults living in noninstitutionalized settings who speak English or Spanish.

Nonresponse Error

Ideally, we want everyone in our sample to complete the survey, but we know that probably it isn’t possible for two reasons.

  • We probably won’t be able to contact every person in our sample. For example, in a phone survey, some people are seldom at home or use caller ID to screen their calls and only answer when they know who is calling.
  • Some of the people who we do contact will refuse to do the survey. Refusals can occur in two ways.
    • People might completely refuse our request to do the survey. In other words, they don’t answer any of our questions. This is sometimes referred to as unit nonresponse. We’ll discuss this next.
    • Other people consent to being interviewed but refuse to answer certain questions, such as family income or race. This is often referred to as item nonresponse since they are refusing to answer particular questions or items in our survey.

Theories of Survey Participation

It helps to think about the different reasons that people might agree or refuse to be interviewed. These are often referred to as theories of participation.

  • Social exchange theory. This approach looks at interviewing as a type of social exchange. Dillman suggests that “people are more likely to comply . . . if they believe and trust that the rewards for complying with that request will eventually exceed the costs of complying.”17 Costs include such things as the time it takes to complete the survey or the amount of energy required to do the survey. Individuals also receive rewards from doing the survey, such as monetary incentives they might be given or the satisfaction of helping. Another important factor affecting participation is the trust respondents have that completing the survey will “provide a valued reward in the future.”18 Eleanor Singer suggests that the sharp decrease in “willingness to participate in surveys” is partially due to the decline in “trust in all social institutions.”19 From the perspective of social exchange theory, participation can be encouraged by reducing the costs associated with the survey, increasing the rewards from survey participation, and ensuring the trust of the respondent that rewards will be forthcoming.
  • Leverage-salience theory. Robert Groves et al. developed the leverage-salience theory of survey participation. Groves outlines this theory.
    Under the theory, different persons place different importance on features of the survey request (e.g., the topic of the survey, how long the interview might take, the sponsor of the survey, what the data will be used for). Some persons may positively value some attributes, others negatively. Of course, these differences are generally unknown to the survey researcher. When the survey approach is made to the sample person, one or more of these attributes are made salient in the interaction with the interviewer or the survey materials provided to the sample person. Depending on what is made salient and how much the person negatively or positively values the attribute, the result could be a refusal or an acceptance.20

In other words, different things are important to different people. Some place importance on the length of the survey, while others focus more on the topic or incentives. Groves refers to this as leverage. Researchers place emphasis on different aspects of the survey. Some emphasize the topic, while others focus on the length particularly if it is a short survey. Groves refers to this as salience. This approach suggests that we should try to understand what is important to our respondents and emphasize those aspects of the survey. It also suggests that we ought not to focus on only one aspect of the survey when contacting respondents, but we should focus on different aspects which might be important to different respondents.

Nonresponse and Nonresponse Bias

It’s clear that nonresponse has been increasing and that this is a critical problem for surveys. Roger Tourangeau and Thomas Plewes looked at a number of large national surveys conducted in the United States and concluded that “nonresponse rates continue to increase in all types of cross-sectional surveys, with little to suggest that that the trend has plateaued.”21 They also examined the different ways of calculating response rate as suggested by AAPOR.22

Edith de Leeuw et al. focus on the difference in response rates for different modes of survey delivery. They concluded that

in general, face-to-face surveys tend to obtain higher response rates than comparable telephone surveys, and mail surveys tend to have a lower response rate than comparable face-to-face and in lesser degree to telephone surveys. In addition, the response rates for both telephone and face-to-face surveys are declining, although such a trend is not as evident for mail surveys.23

But why is nonresponse a critical problem for surveys? One reason is that nonresponse has become sizable, and this can increase the risk of nonresponse bias. The other reason is that people who don’t respond to surveys are often systematically different from those who do respond, and this has the potential for creating bias in our survey data. If the difference between those who respond and those who don’t respond is related to what the survey is about, then bias will occur. However, nonresponse does not necessarily lead to bias when “nonrespondents are missing at random.”24

Let’s consider some examples of nonresponses bias. Andy Peytchev et al. looked at self-reports of abortion and concluded that “those with a lower likelihood to participate in the survey were also more likely to underreport such experiences.”25 Many researchers have observed that voting in elections tends to be overreported. Roger Tourangeau et al. note that “nonvoters were both less likely to take part in the survey and more likely to misreport if they did take part.”26 In both these examples, those who took part in the survey were different from those who did not take part, and this difference was related to the focus of the survey.

Another example of nonresponse bias is described in Thomas Holmes and James Schmitz’s analysis of the “Characteristics of Business Owners Survey.” Holmes and Schmitz focus on estimating “the probability that an individual discontinues ownership of his or her business.”27 Data are based on a sample of tax returns from 1982. The survey was mailed to respondents in 1986. We would expect that those who still owned their business in 1986 would be more likely to return the survey than those who did not currently own their business. Since we want to estimate the probability that a person has “terminated his or her ownership share over the 1982–1986 period,”28 we would expect that the data would underestimate the probability of termination, and, in fact, Holmes and Schmitz’s analysis shows this to be the case.

Increasing Response

If nonresponse bias is a problem, then what can we do about it? Increasing response is not a guarantee of low bias, but a high nonresponse rate raises the possibility of nonresponse bias. Let’s look at some ways in which we can increase response.

Groves et al. suggest that there are five factors that affect survey participation.29

  • Societal factors such as the frequency of surveys in the society and public opinion regarding the legitimacy and worth of surveys.
  • The survey design itself, including such factors as the length of the survey and how respondents are chosen.
  • The respondents, including such things as gender, income, and age.
  • The interviewers, including their experience and expectations regarding the interview.
  • The interaction between the respondent and the interviewer.

We can’t do much about some of these factors. For example, we can’t do much about the increase in surveys in our society and the fact that some people may have recently been asked to do a survey. We can’t do much about the growing trend for people to express doubts about the worth of surveys. But we can do something about the survey itself. Dillman has written extensively about reducing the burden on respondents.30 This is a logical consequence of social exchange theory. If we can reduce the costs of doing surveys, then we will increase the likelihood of people to respond. We can make the survey as easy to take as possible. We can create a survey that flows naturally from question to question. We can avoid asking unnecessary questions, which will reduce the length of the survey.

There are psychological principles that we can use to try to increase survey participation. When we ask someone to agree to be interviewed, we hope that the person will comply with our request. Robert Cialdini suggests that there are certain rules of behavior that can be used to increase compliance.31 Here are some of these rules as summarized by Groves et al.

  • The rule of reciprocity suggests that “one should be more willing to comply with a request to the extent that the compliance constitutes the repayment of a perceived gift, favor, or concession.”32 For example, if respondents ask how long the survey will take and we respond 20 minutes, they might refuse saying that’s too long. If we then ask them to take a shorter version of the survey, perhaps 5 minutes, this might be seen as a concession and increase the likelihood of complying with our request.
  • The rule of social validation suggests that “one should be more willing to comply with a request to the degree that one believes that similar others would comply with it.”33 If we tell the respondent that others have been willing to be interviewed and found it interesting, this might increase the likelihood that they will agree to be interviewed.
  • The rule of scarcity suggests that “one should be more willing to comply with requests to secure opportunities that are scarce.”34 If we tell people that we are only contacting a small proportion of the population, this might increase the likelihood of their participation.

There is considerable evidence that offering a prepaid cash incentive increases the likelihood of a person responding to the survey.35 You have probably received a request for donations from nonprofits or political candidates. Often the request comes with a small gift such as a pencil, a key chain, or some other small gift. This is similar to the prepaid cash incentive. Incentives given before the individual responds to the survey have been shown to be more effective than postpaid incentives in increasing survey participation36 Offering the person the opportunity to be entered into a lucky draw for a large gift, such as a computer tablet or cash, does not appear to be as effective.

One of the most effective ways of increasing participation is multiple follow-ups. Dillman, talking about mailed surveys, says that “multiple contacts are essential for maximizing response.”37 The same thing can be said for any type of survey—face-to-face, mailed, phone, and web surveys. In face-to-face and telephone surveys, multiple contacts can add considerably to your cost, but they are essential for increasing response rate.

Measurement Error

Measurement error is the difference between the true value of some variable and the value that the respondent gives you. A simple example is measuring age. Often, we ask the respondent how old the person was on his or her last birthday. But if you are young and you order an alcoholic drink in a bar, the bartender will ask you for proof of age. The age given to the bartender could easily be an overestimate of your age. This would be an example of “measurement error due to the respondent.”38 In other words, respondents might not be giving you an accurate answer because of their self-interest in appearing older. Weisberg contrasts this with “measurement error due to the interviewer.”39 We know that the interviewer’s gender, race, and age can influence how respondents answer our questions. We’re going to talk about both types of measurement error.

We’ll start by discussing error that occurs as a result of question wording and question order. It’s important to understand that measurement error, like all types of error, cannot be eliminated. But it can be minimized. Minimizing error is only possible if you are, first, aware of the ways in which error can occur and, second, take steps to minimize it.

Measurement Error Associated with Question Wording

Measurement error can occur as a result of question wording. One of the classic examples is found in Howard Schuman’s and Stanley Presser’s discussion of the difference between “forbidding” and “not allowing” certain types of behavior such as “making public speeches against democracy.” They conclude that “Americans are much more willing to not allow speeches than they are to forbid them, although the two actions appear to be logically equivalent.”40 Numerous studies have replicated this finding. However, Schuman notes that regardless of which wording is used, there is a clear trend over time toward not forbidding or allowing such speeches. Thus, even with questions like the forbid and not allow questions, you can still track changes over time.41

Barbara Bickart et al. studied the accuracy of reports of other people’s behavior. She asked couples to “search for information about a vacation they could win.”42 Then they discussed and actually planned the vacation. Afterward they “were asked to either count or estimate the number of accommodations, restaurants, and activities that they/their partner examined during the information search task.”43 Their analysis showed that questions asking for counts were more accurate than questions asking for estimations.

Still another example is found in a series of questions in the GSS conducted by the National Opinion Research Center. The GSS asks a series of questions about whether the United States should be spending more money, less money, or about the same amount on such things as welfare. They conducted an experiment by randomly asking one-half of the respondents about “welfare,” while the other random half was asked about “assistance to the poor.” Tom Smith analyzed GSS data and concluded that “‘welfare’ typically produces much more negative evaluations than ‘the poor.’”44 Gregory Huber and Celia Paris point out that this assumes that these two terms are equivalent to the respondent. Their research suggests that this is not the case. They conclude that “respondents are twice as likely . . . to believe that programs like soup kitchens, homeless shelters, and food banks are ATP [assistance to the poor] as opposed to welfare.”45 In other words, the questions are not equivalent because the words “welfare” and “assistance to the poor” bring to mind different things. Huber and Paris’s findings point out that we shouldn’t be too quick to conclude that question wording is behind the different responses but that we need to look below the surface and consider how respondents interpret different question wording.

Another example is questions that ask for opinions about global warming and climate change. Jonathon Schuldt et al. found that “Republicans were less likely to endorse that the phenomenon is real when it was referred to as ‘global warming’ . . . rather than ‘climate change’ . . . whereas Democrats were unaffected by question wording.”46 They point out that the difference between Republicans and Democrats is much greater when the question is framed in terms of global warming. Lorraine Whitmarsh looked at what respondents think these terms mean and discovered that global warming is more likely to be associated with human causes than climate change.47 Again this suggests that respondents attach different meaning to these terms. Thus, it becomes critical how the question is worded when making comparisons between Republicans and Democrats.

Questions are often asked about people’s attitudes toward abortion. Sometimes a single question is used for asking respondents to indicate their attitude toward abortion in general. For example, do you think abortion should be legal or not? However, the GSS includes a series of seven questions, asking whether people think abortion should be legal in various scenarios—in the case of rape, in the case of a serious defect to the baby, in the case of a woman who has low income and can’t afford more children, and other such situations. The data show that people are much more likely to feel abortion should be legal in the case of rape or a serious defect to the baby than they are in the case of low-income women who can’t afford more children. Howard Schuman offers the advice of asking “several different questions about any important issue.”48 The abortion example illustrates this point.

Eleanor Singer and Mick Couper analyzed a series of questions from the GSS that shows that changes in question wording do not necessarily affect respondents.

At intervals since 1990, the General Social Survey (GSS) has asked a series of four questions inquiring into knowledge of genetic testing and attitudes toward prenatal testing and abortion, most recently in 2010. The questions about prenatal testing and abortion were framed in terms of “baby”. But in the current anti-abortion climate, it seemed possible that the word “fetus” would carry more abstract, impersonal connotations than “baby” and might therefore lead to different responses, especially in the case of abortion. To resolve this issue, we designed the question-wording experiment reported in this research note. We found no significant differences by question wording for abortion preferences in the sample as a whole and small but significant differences for prenatal testing, in a direction opposite to that expected. However, question wording did make substantial differences in the responses of some demographic subgroups.49

Still another example is found in asking about voting. You would think that whether you voted or who you voted for is pretty straightforward, but here again, question wording makes a difference. Janet Box-Steffensmeier et al. report on a change that was made in the American National Election Study’s (NES) question on whether and how one voted in House of Representatives contests. Prior to 1978, there was little difference between the actual House vote and the vote reported in the NES. Since 1978 the NES has reported a much higher vote for the incumbent than the actual vote. Box-Steffensmeier suggests that the following changes in question wording might account for this finding.

  • Question used prior to 1978—“How about the vote for Congressman—that is, for the House of Representatives in Washington? Did you vote for a candidate in Congress? [If yes] Who did you vote for? Which party was that?”50
  • In 1978 and afterward, a ballot card was given to the respondent listing the candidates and their party, and the following question was asked—“Here is a list of candidates for major races in this district. How about the election for House of Representatives in Washington? Did you vote for a candidate in the U.S. House of Representatives? [If yes] Who did you vote for?”51

Box-Steffensmeier concludes that “the ballot format evidently exaggerates the incumbent’s support because people are far more likely to recognize . . . the incumbent’s name than . . . the challenger’s name.”52 This study also showed that you can reduce the proincumbent bias by making the candidates’ party stand out by bolding and italicizing it and using a different font. This reduced but did not eliminate the bias.53

Measurement Error Associated with Question Order

It’s clear that question wording can affect what people tell us. Question order also makes a difference. Think about a survey in your community that deals with quality of life. One of the questions you might ask is “What is the most pressing problem facing your community today?” You might also want to ask more specific questions about crime, the public schools, and jobs. Would the order of the questions make a difference? If you asked about crime first, then respondents would probably be more likely to mention crime as one of the most pressing problems. Order matters.

David Moore provides us with some interesting examples of order effects using data from a Gallup Poll that was conducted in 1997. The question was “Do you generally think that [Bill Clinton/Al Gore] is honest and trustworthy?”54 A random half of the respondents was asked the question with Clinton’s name first, and the other random half was asked with Gore’s name first. The data show

that when respondents (half the sample) were asked about Clinton first, 50 percent said he was honest and trustworthy; when the other half of the sample was asked about Gore first, 68% said the vice president was honest and trustworthy.55

In other words, Gore was considered honest and trustworthy by 18 percentage points more than Clinton. But when Moore took into account the order of the questions, he found that when Clinton’s name appeared second, 57 percent said he was honest and trustworthy, and when Gore’s name appeared second, 60 percent saw him as honest and trustworthy. The 18 percentage point difference is reduced to three percentage points. He concludes that “this is a classic case of people trying to make their ratings of the two men somewhat consistent” and he refers to this as a consistency effect.56

On the same poll, respondents were given the following question: “I’m going to read some personal characteristics and qualities. As I read each one, please tell me whether you think it applies to [Newt Gingrich/Bob Dole] . . . Honest and trustworthy.”57 Again the order of the names was randomly assigned with half the respondents receiving Gingrich’s name first and the other half given Dole’s name first. Dole was considered more honest and trustworthy by 19 percentage points when Gingrich’s and Dole’s names appeared first but that increased to 31 percentage points when their names appeared second. Moore calls this a contrast effect because the data show that when “when people think of Dole and Gingrich, they tend to emphasize the differences between the two men rather than the similarities.”58 This is not to say that the order of the questions always affects what people tell us. But we should be aware of this possibility. The examples provided by Moore show us how this might occur.59

Measurement Error Associated with Respondents’ Characteristics

Satisficing

Answering questions often requires a lot of effort on the part of respondents. Charles Cannell et al. suggest that respondents go through a process in trying to answer questions that looks like the following.60

  • First, they have to understand what the question means.
  • Then they have to process the information that is necessary to answer the question. This involves determining what information they need, retrieving this information from their memory or records, and then organizing this material.
  • Next they have to determine whether this information actually answers the interviewer’s question as well as evaluating the information in terms of other things that are important to them such as their self-esteem.

In order to reduce the amount of effort required to answer survey questions, respondents sometimes look for ways to reduce this burden. This is called satisficing. Krosnick defines satisficing as “giving minimally acceptable answers, rather than optimal answers”61 and can take various forms including:

  • Answering don’t know to questions;
  • Skipping questions or saying they have no opinion;
  • Choosing answers randomly; and
  • Giving one-word answers to open-ended questions.62

For example, let’s think about the quality-of-life survey that we mentioned earlier that asks, “What is the most pressing problem facing your community today?” Some respondents might give you a one-word answer such as crime or education or jobs. This doesn’t really tell us much about what respondents are thinking. Other respondents might say they don’t know or that they have no opinion. Such answers reduce the workload of respondents.

Some survey questions give respondents a list of possible response categories from which they are asked to select their answer. Sometimes they are limited to one choice, while other times, they may select multiple responses. Marta Galesic et al. used eye-tracking information for a web survey to show that respondents often spend “more time looking at the first few options in a list of response options than those at the end of the list.”63 They also found that “the eye-tracking data reveal that respondents are reluctant to invest effort in reading definitions of survey concepts that are only a mouse click away or paying attention to initially hidden response options.”64 These are also examples of satisficing.

Another interesting example of satisficing is often referred to as “heaping.”65 Often when respondents are asked to respond with a number or a count, they respond with a rounded value. For example, when asked about the years in which events occurred, responses were more likely to be in multiples of five.

Jon Krosnick suggests that

the likelihood that a given respondent will satisfice . . . is a function of three factors: the first is the inherent difficulty of the task that the respondent confronts; the second is the respondent’s ability to perform the required task; and the third is the respondent’s motivation to perform the task.66

Other researchers have suggested that satisficing occurs more frequently in certain types of surveys. Heerwegh and Loosveldt found that satisficing occurred more frequently in web surveys than in face-to-face surveys,67 and Holbrook et al. discovered that satisficing occurred more often in telephone surveys than in face-to-face surveys.68 Krosnick et al. also found that some respondents are more likely to satisfice than other respondents. For example, low-education respondents were more likely to say that they had no opinion than those with more education.69

Social Desirability

Some types of behavior or attitudes are viewed as more socially desirable than others. For example, voting is often seen as a responsibility of citizens and as a socially desirable action. On the other hand, cheating on exams is typically viewed as socially undesirable. There is considerable evidence that respondents tend to overreport socially desirable behaviors and attitudes and underreport those that are socially undesirable.

Brian Duff et al. compared the actual voting turnout in the 2000 and 2002 elections with the turnout reported in the 2000 and 2002 American NES. In 2000, reported turnout exceeded actual turnout by 17.7 percentage points, and in the 2002 election, by 16.6 percentage points.70

Matthew Streb et al. looked at a different question—whether people would vote for a woman for president if they thought she was qualified. Public opinion data show that the percent of people who say they would vote for a woman increased from slightly over 30 percent in 1945 to slightly over 90 percent in 2005.71 Clearly the norms of equality and fairness suggest that one ought to be willing to vote for a woman who is qualified. Some people might be giving this answer because they see it as the socially desirable response.

Frauke Kreuter et al. looked at reports of socially desirable and undesirable behaviors in a survey of university alumni. The types of behavior included dropping a class, receiving a D or F, receiving academic honors, belonging to the Alumni Association, and donating money to the university. Clearly receiving a D or F would be socially undesirable. Using university records, Kreuter found that approximately 61 percent of the respondents who answered this question had received such a grade. Of these respondents, approximately 27 percent failed to report receiving that grade.72 Kreuter also found that underreporting of the socially undesirable response was less in web surveys than in telephone surveys.

Roger Tourangeau and Tinn Yang suggest that “misreporting about sensitive topics is quite common and . . . it is largely situational.”73 While research supports this claim, interestedly, Patrick Egan and Jeffrey Lax et al. found no evidence of a social desirability effect when it came to support for same-sex marriage.74

Measurement Error Associated with the Interviewer

Characteristics of the interviewer could refer to physical characteristics such as race, gender, and age or to characteristics such as perceived friendliness. These characteristics can affect what respondents tell us. They can interact with respondent characteristics to produce different effects for males and females or for blacks and whites or for other categories of respondents. We’re going to focus on two characteristics of interviewers that have been shown to affect what people tell us—race and gender.

Race of the Interviewer

Two classic studies dealt with questions about race in surveys conducted in Detroit in 1968 and 1971. Howard Schuman and Jean Converse showed that blacks appeared more militant and expressed more hostility toward whites when interviewed by blacks than when interviewed by whites.75 Shirley Hatchett and Schuman found that whites gave more “liberal or pro-Black opinions when the interviewer is Black.”76 Both of these studies interviewed respondents face-to-face where the race of both the interviewer and the respondent was generally apparent.

Other studies focused on voting. Barbara Anderson et al. used five election surveys ranging in time from 1964 to 1984. Their data showed the following.77

Black nonvoters . . . who lived in predominately Black neighborhoods and were interviewed by Black interviewers were more likely to report falsely that they voted than Black respondents interviewed by White interviewers. Black respondents in Black neighborhoods who were interviewed by Black interviewers were also more likely actually to vote . . . than Blacks interviewed by Whites.

Steven Finkel et al. used a 1989 survey in Virginia that looked at voting in a gubernatorial election in which Douglas Wilder, who was black, ran against Marshall Coleman, who was white. Finkel found that “whites are 8–11 percentage points more likely to voice support for the Black candidate to Blacks than to Whites.”78

Darren Davis and Brian Silver focused on political knowledge in a telephone survey of adults in Michigan. He considered both the actual race of the interviewer and the race perceived by the respondent. For whites, neither the actual race nor the perceived race of the interviewer was related to political knowledge. However, “when Black respondents identify the test-giver as Black, they do much better on the test than when they identify the test-giver as White or when the race of the interviewer is ambiguous.”79 This study is important because it explicitly measured the perceived race of the interviewer and showed perceived race to be an important variable. It also showed that race can be an important factor for some respondents but not for other respondents.

Gender of the Interviewer

Research has also shown that the gender of the interviewer can affect what people tell us. Emily Kane and Laura Macaulay analyzed data from a national sample of households and found that “male respondents offer significantly different responses to male and female interviewers on questions dealing with gender inequality in employment.”80 Men voiced more equalitarian views to female interviewers than to male interviewers.

Other studies focused on health-related information. Timothy Johnson and Jennifer Parsons reported that the homeless (both male and female) are more likely to report substance abuse to male interviewers than to female interviewers.81 However, Melvin Pollner found that both male and female respondents were more likely to report substance abuse to female interviewers than to male interviewers, suggesting that gender affects respondents differently in various settings.82

These studies show that interviewer characteristics such as race and gender can influence what respondents tell us suggesting that we ought to consider the interviewers’ race and gender as variables in our analysis of survey data. They also suggest that interviewers ought to be randomly assigned to respondents rather than trying to match the respondents’ race and gender.83

Recognizing and Minimizing Measurement Error

  • Some measurement error is associated with question wording and order.
    • One strategy is to embed an experiment into the survey. Identify two or three different ways to word the question, and assign each version to a random half or third of the sample. This will allow you to determine if the different ways to word the question produce similar or different responses. (See the discussion of global warming versus climate change earlier in this chapter.) The same strategy can be used with question order.84
    • Ask your respondents to describe what they think the question means or what they are thinking when they answer the question. George Bishop calls this asking respondents “to think out loud” about how they arrived at their answers.85 It can be used with a random part of your sample. Howard Schuman calls this a “random probe.”86
    • Ask people who are survey experts to review your questions and identify questions that might be problematic. Where question wording might be an issue, follow Schuman’s advice and ask “several different questions” about that issue.87 (See discussion of questions on abortion earlier in this chapter.)
  • Other measurement error is associated with respondent behavior such as satisficing and social desirability.
    • If satisficing is a result of the burden of answering questions, then it follows that reducing this burden might decrease satisficing. For example, instead of asking for the exact total family income in the previous year, we could give respondents a set of categories and ask them to place themselves in one of these categories. We can make sure that the interview is clearly worded and that it flows naturally from question to question. We can avoid asking unnecessary questions, thus reducing the length of the survey.
    • Research shows that question wording can reduce the tendency to give socially desirable responses. Duff reports that by “providing respondents with socially acceptable excuses for not voting, we [can reduce] . . . the over-reporting of turnout in the 2002 National Election Study by about 8 percentage points.”88 For example, the question can give respondents the option of saying that they thought about voting but didn’t or that they usually voted but didn’t vote this time.
    • Streb used a list experiment to decrease the tendency to offer the socially desirable response to a question about voting for a woman for political office. He selected two random samples from the population. The first sample was “asked how many of the following four statements make them angry or upset.”
      1. “The way gasoline prices keep going up.”
      2. “Professional athletes getting million-dollar-plus salaries.”
      3. “Requiring seat belts to be used when driving.”
      4. “Large corporations polluting the environment.”
        The second group was given a fifth statement in addition to the first four statements:
      5. “A woman serving as president.”89

    To get the percent that was angry or upset about a woman as president, all he had to do was to subtract “the average number of items in the baseline condition [the first group] from the average number of items in the test condition [the second group] and . . . [multiply] by 100.”90

Mode Effects

The method or mode of survey delivery might affect what people tell us. This is referred to as mode effects. The four basic modes are face-to-face, telephone, mailed, and web surveys, although there are many variations of these four modes. This isn’t error but simply differences due to the mode of delivery. We’re going to consider several studies that illustrate the nature of mode effects.

  • Cong Ye et al. reviewed 18 experimental studies that compared telephone surveys to other modes and found that respondents in telephone surveys are more likely “to give extremely positive answers . . . but are not more likely to give extremely negative responses” compared to other modes.91
  • Holbrook compared a telephone survey to a face-to-face survey and found that telephone respondents were more likely to satisfice and to give socially desirable responses than face-to-face respondents.92
  • Peter Preisendorfer and Felix Wolter compared a face-to-face survey and a mailed survey and found that mailed surveys were somewhat more likely to elicit truthful answers to a question about having been convicted of a criminal offense.93
  • Exit polls in elections are common. Typically, an interviewer approaches the respondent outside the polling area on Election Day and asks the respondent to fill out a paper-and-pencil interview. Since more and more voters are voting before Election Day, the paper-and-pencil survey has been supplemented by a phone survey of early voters. Michael McDonald and Matthew Thornburg compared the Election Day paper-and-pencil survey with the telephone survey and found that telephone respondents were more likely to have higher item nonresponse to the family-income question than the paper-and-pencil respondents.94
  • Douglas Currivan et al. compared a telephone survey with a telephone audio computer-assisted self-interview, in which an interviewer contacts respondents and gets their consent and then the interview itself is conducted over the phone without the interviewer’s presence. Respondents answer questions that are prerecorded by pressing keys on their touch-tone phone. Respondents were youth who were asked about tobacco use. They found that “girls, regardless of race/ethnicity, seem more likely to report smoking if they can do so by pushing a button on their touch-tone phone rather than by providing answers aloud to a human interviewer.”95
  • Dirk Heerwegh and Geert Loosveldt compared a web survey with a face-to-face interview.96 The web survey had more don’t knows and more item nonresponses than did the face-to-face survey. In other words, the web survey demonstrated more satisficing.
  • Trung Ha and Julia Soulakova found that the percent “of smoke-free homes was lower for personal interviews . . . than for phone interviews . . . .”97

Dealing with Mode Effects

Mode effects are not survey error. Rather, they occur because the mode of survey delivery affects respondents in different ways. Telephone surveys represent a different interview environment than face-to-face interviews, and it’s not surprising that this might result in greater satisficing, as found by Holbrook and McDonald and Thornburg. How then should we deal with mode effects?

  • First, we need to be aware of the possibility of mode effects in our data.
  • Second, we need to take the possibility of mode effects into account when we report our findings.
  • Third, if we combine different modes of survey delivery in our study, we need to compare our findings across the various modes to try to identify what, if any, mode effects are present.

Postsurvey Error

Error can also occur after the survey data have been collected. Error can occur in the processing of data. If we enter data manually in a spreadsheet or statistical program, there is the possibility of error. If we code open-ended questions such as “what is the most pressing problem facing your community today?” we might make coding errors. The solution here is to check our data entry and our coding to see if there are errors. We can have another person independently code or enter the data and then compare the results to determine if there are discrepancies. These discrepancies can then be corrected.

Error can occur in the analysis of our data. Most quantitative analyses use some type of statistical package, such as SPSS, SAS, Stata, and R, and many qualitative analyses use some type of computer program, such as NVivo or Atlas.ti. A simple type of mistake might occur in writing the data-definition statements that create the variable labels, value labels, and designate the missing values. A much more difficult type of error is using the wrong type of statistical analysis. Our best advice is to talk with a statistical consultant if there is any doubt about the proper method of analysis.

Error can occur in the reporting of data. For example, if we conducted a telephone survey of households in our county and we only sampled landline numbers, it would be an error to claim that our findings apply to all households in the county. This would be an example of overgeneralization. Rather, we should generalize to all households with landline numbers. We’ll discuss reporting further in Chapter 5 (Volume II).

Summary

Here’s a brief summary of what we have covered in this chapter.

  • Error is inevitable in any survey.
  • Error can be either random or systematic. Systematic error is referred to as bias.
  • Error is typically categorized as follows.
    • Sampling error occurs whenever we select a sample from a population in order to make inferences about the population from our sample data.
    • Coverage error occurs whenever the sampling frame does not match the population. In other words, sometimes the list from which the sample is selected does not match the population and this produces coverage error.
    • Nonresponse error occurs when the individuals who respond to our survey are different from those who do not respond and these differences are related to what we are asking in our survey.
    • Measurement error is the difference between the true value of some variable and the answer that the respondent gives you. Measurement error can be associated with question wording; question order; respondent behavior, such as satisficing and giving the socially desirable response; and with interviewer characteristics, such as race and gender.
    • Error can also occur in the processing, analysis, and reporting of data.
    • Mode effects are not error but occur when the mode of survey delivery affects what people tell us.
  • How can we minimize survey error?
    • We should be aware of the possibility of error and try to identify possible sources of error in our data.
    • We should carefully inspect the list from which we draw our sample and try to identify elements in the population that are left off the list, elements that are on the list but are not part of the population, and elements that occur more than once on our list.
    • We should take steps to minimize nonresponse. However, it’s important to recognize that increasing the response rate “will not necessarily reduce nonresponse bias.”98 Jeffrey Rosen et al. note that “nonresponse follow-up interventions are successful in reducing nonresponse bias to the extent that they secure participation from (underrepresented) nonrespondents who are unlike cases already interviewed.”99
  • We should try to reduce the burden on the respondent of answering our questions. This might reduce the possibility of satisficing.
  • Social desirability can be reduced by considering alternative question wording. For example, Duff gave respondents the option of saying they had thought about voting but didn’t.
  • We should check and recheck our data to make sure that we didn’t make errors in creating our data file.
  • We should seek advice from a statistical consultant to make sure that we are using the proper method of analysis.
  • We should take the possibility of survey error into account when reporting our findings.

Annotated Bibliography

Total Survey Error

  • The best place to start is Herbert Weisberg’s The Total Survey Error Approach. Weisberg makes the point that we need to focus on all possible types of error.100
  • Don Dillman’s books are the next place to go: Mail and Telephone Surveys—The Total Design Approach; Mail and Internet Surveys—The Tailored Design Method; Internet, Mail, and Mixed-Mode Surveys—The Tailored Design Method; and Internet, Phone, Mail, and Mixed-Mode Surveys—The Tailored Design Method. These books are full of examples of the different types of survey error and how to try to minimize them.101

Sampling Error

  • Your favorite statistics book probably has a good discussion of sampling error.
  • If you don’t have a favorite statistics book, take a look at Social Statistics for a Diverse Society by Chava Frankfort-Nachmias and Anna Leon-Guerrero102 and Working with Sample Data by Priscilla Chaffe-Stengel and Donald N. Stengel.103 Earl Babbie’s The Practice of Social Research also has a good discussion of sampling error.104
  • Another excellent source is Leslie Kish’s Survey Sampling.105 But be warned, this is a more difficult book.
  • Paul Biemer et al. have published a book—Total Survey Error in Practice.106

Other Types of Survey Error

  • Weisberg’s and Dillman’s works cited previously have a good discussion of the other types of survey error (coverage, nonresponse, and measurement).

iWe discussed sampling in an earlier chapter, so we’re not going to revisit the details of sampling here.

iiAnother problem occurs when elements that are not part of the population are included in the sampling frame. Sometimes this can be dealt with by screening. For example, in a phone survey some phone numbers that are outside the geographical area you want to cover might be included in your sampling frame. If you are aware of this possibility, you could include a screening question in which you ask if the household is in the desired geographical area.

iiiThis is often referred to as a multistage cluster sample.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.136.170