Chapter 2

Taking Control: So Many Numbers, So Little Time

IN THIS CHAPTER

Bullet Examining the extent of statistics abuse

Bullet Feeling the impact of statistics gone wrong

The sheer amount of statistics in daily life can leave you feeling overwhelmed and confused. This chapter gives you a tool to help you deal with statistics: skepticism! Not radical skepticism like “I can’t believe anything anymore,” but healthy skepticism like “Hmm, I wonder where that number came from?” and “I need to find out more information before I believe these results.” To develop healthy skepticism, you need to understand how the chain of statistical information works.

Statistics end up on your TV and in your newspaper as a result of a process. First, the researchers who study an issue generate results; this group is composed of pollsters, doctors, marketing researchers, government researchers, and other scientists. They are considered the original sources of the statistical information.

After they get their results, these researchers naturally want to tell people about them, so they typically either put out a press release or publish a journal article. Enter the journalists or reporters, who are considered the media sources of the information. Journalists hunt for interesting press releases and sort through journals, basically searching for the next headline. When reporters complete their stories, statistics are immediately sent out to the public through all forms of media. Now the information is ready to be taken in by the third group — the consumers of the information (you). You and other consumers of information are faced with the task of listening to and reading the information, sorting through it, and making decisions about it.

At any stage in the process of doing research, communicating results, or consuming information, errors can take place, either unintentionally or by design. The tools and strategies you find in this chapter give you the skills to be a good detective.

Detecting Errors, Exaggerations, and Just Plain Lies

Statistics can go wrong for many different reasons. First, a simple, honest error can occur. This can happen to anyone, right? Other times, the error is something other than a simple, honest mistake. In the heat of the moment, because someone feels strongly about a cause and because the numbers don’t quite bear out the point that the researcher wants to make, statistics get tweaked, or, more commonly, exaggerated, either in their values or how they’re represented and discussed.

Another type of error is an error of omission — information that is missing that would have made a big difference in terms of getting a handle on the real story behind the numbers. That omission makes the issue of correctness difficult to address, because you’re lacking information to go on.

You may even encounter situations in which the numbers have been completely fabricated and can’t be repeated by anyone because they never happened. This section gives you tips to help you spot errors, exaggerations, and lies, along with some examples of each type of error that you, as an information consumer, may encounter.

Checking the math

The first thing you want to do when you come upon a statistic or the result of a statistical study is to ask, “Is this number correct?” Don’t assume it is! You’d probably be surprised at the number of simple arithmetic errors that occur when statistics are collected, summarized, reported, or interpreted.

Tip To spot arithmetic errors or omissions in statistics:

  • Check to be sure everything adds up. In other words, do the percents in the pie chart add up to 100 (or close enough due to rounding)? Do the number of people in each category add up to the total number surveyed?
  • Double-check even the most basic calculations.
  • Always look for a total so you can put the results into proper perspective. Ignore results based on tiny sample sizes.
  • Examine whether the projections are reasonable. For example, if three deaths due to a certain condition are said to happen per minute, that adds up to over 1.5 million such deaths in a year. Depending on what condition is being reported, this number may be unreasonable.

Uncovering misleading statistics

By far, the most common abuses of statistics are subtle, yet effective, exaggerations of the truth. Even when the math checks out, the underlying statistics themselves can be misleading if they exaggerate the facts. Misleading statistics are harder to pinpoint than simple math errors, but they can have a huge impact on society, and, unfortunately, they occur all the time.

Breaking down statistical debates

Crime statistics are a great example of how statistics are used to show two sides of a story, only one of which is really correct. Crime is often discussed in political debates, with one candidate (usually the incumbent) arguing that crime has gone down during her tenure, and the challenger often arguing that crime has gone up (giving the challenger something to criticize the incumbent for). How can two candidates make such different conclusions based on the same data set? Turns out, depending on the way you measure crime, getting either result can be possible.

Table 2-1 shows the population of the United States for 2000 to 2008, along with the number of reported crimes and the crime rates (crimes per 100,000 people), calculated by taking the number of crimes divided by the population size and multiplying by 100,000.

Table 2-1 Number of Crimes, Estimated Population Size, and Crime Rates in the U.S.

Year

Number of Crimes

Population Size

Crime Rate per 100,000 People

2000

11,608,072

281,421,906

4,124.8

2001

11,876,669

285,317,559

4,162.6

2002

11,878,954

287,973,924

4,125.0

2003

11,826,538

290,690,788

4,068.4

2004

11,679,474

293,656,842

3,977.3

2005

11,565,499

296,507,061

3,900.6

2006

11,401,511

299,398,484

3,808.1

2007

11,251,828

301,621,157

3,730.5

2008

11,149,927

304,059,784

3,667.0

Source: U.S. Crime Victimization Survey.

Now, compare the number of crimes and the crime rates for 2001 and 2002 in Table 2-1. In column 2, you see that the number of crimes increased by 2,285 from 2001 to 2002 (11,878,954 − 11,876,669). This represents an increase of 0.019 percent (dividing the difference, 2,285, by the number of crimes in 2001, 11,876,669). Note the population size (column 3) also increased from 2001 to 2002, by 2,656,365 people (287,973,924 − 285,317,559), or 0.931 percent (dividing this difference by the population size in 2001). However, in column 4, you see the crime rate decreased from 2001 to 2002 from 4,162.6 (per 100,000 people) in 2001 to 4,125.0 (per 100,000) in 2002. How did the crime rate decrease? Although the number of crimes and the number of people both went up, the number of crimes increased at a slower rate than the increase in population size (0.019 percent compared to 0.931 percent).

So how should the crime trend be reported? Did crime actually go up or down from 2001 to 2002? Based on the crime rate — which is a more accurate gauge — you can conclude that crime decreased during that year. But be watchful of the politician who wants to show that the incumbent didn’t do their job; they will be tempted to look at the number of crimes and claim that crime went up, creating an artificial controversy and resulting in confusion (not to mention skepticism) on behalf of the voters. (Aren’t election years fun?)

Remember To create an even playing field when measuring how often an event occurs, you convert each number to a percent by dividing by the total to get what statisticians call a rate. Rates are usually better than count data because rates allow you to make fair comparisons when the totals are different.

Untwisting tornado statistics

Which state has the most tornados? It depends on how you look at it. If you just count the number of tornados in a given year (which is how I’ve seen the media report it most often), the top state is Texas. But think about it. Texas is the second-biggest state (after Alaska). Yes, Texas is in that part of the U.S. called “Tornado Alley,” and yes, it gets a lot of tornados, but it also has a huge surface area for those tornados to land and run.

A more fair comparison, and how meteorologists look at it, is to look at the number of tornados per 10,000 square miles. Using this statistic (depending on your source), Florida comes out on top, followed by Oklahoma, Indiana, Iowa, Kansas, Delaware, Louisiana, Mississippi, and Nebraska, and finally Texas weighs in at number 10 (although I’m sure this is one statistic they are happy to rank low on — as opposed to their AP rankings in NCAA football).

Other tornado statistics that are measured and reported include the state with the highest percentage of killer tornadoes as a percentage of all tornados (Tennessee), and the total length of tornado paths per 10,000 square miles (Mississippi). Note that each of these statistics is reported appropriately as a rate (amount per unit).

Remember Before believing statistics indicating “the highest XXX” or “the lowest XXX,” take a look at how the variable is measured to see whether it’s fair and whether there are other statistics that should be examined to get the whole picture. Also make sure the units are appropriate for making fair comparisons.

Zeroing in on what the scale tells you

Charts and graphs are useful for making a quick and clear point about your data. Unfortunately, in many cases the charts and graphs accompanying everyday statistics aren’t done correctly or fairly. One of the most important elements to watch for is the way that the chart or graph is scaled. The scale of a graph is the quantity used to represent each tick mark on the axis of the graph. Do the tick marks increase by 1s, 10s, 20s, 100s, 1,000s, or what? The scale can make a big difference in terms of the way the graph or chart looks.

For example, the Kansas Lottery routinely shows its recent results from the Pick 3 Lottery. One of the statistics reported is the number of times each number (0 through 9) is drawn among the three winning numbers. Table 2-2 shows a chart of the number of times each number was drawn during 1,613 total Pick 3 games (4,839 single numbers drawn). It also reports the percentage of times that each number was drawn. Depending on how you choose to look at these results, you can again make the statistics appear to tell very different stories.

Table 2-2 Numbers Drawn in the Pick 3 Lottery

Number Drawn

No. of Times Drawn out of 4,839

Percentage of Times Drawn (No. of Times Drawn ÷ 4,839)

0

485

10.0%

1

468

9.7%

2

513

10.6%

3

491

10.1%

4

484

10.0%

5

480

9.9%

6

487

10.1%

7

482

10.0%

8

475

9.8%

9

474

9.8%

The way lotteries typically display results like those in Table 2-2 is shown in Figure 2-1a. Notice that in this chart, it seems that the number 1 doesn’t get drawn nearly as often (only 468 times) as number 2 does (513 times). The difference in the height of these two bars appears to be very large, exaggerating the difference in the number of times these two numbers were drawn. However, to put this in perspective, the actual difference here is math out of a total of 4,839 numbers drawn. In terms of percentages, the difference between the number of times the number 1 and the number 2 are drawn is math, or only nine-tenths of 1 percent.

What makes this chart exaggerate the differences? Two issues come to mind. First, notice that the vertical axis, which shows the number of times (or frequency) that each number is drawn, goes up by increments of 5. So a difference of 5 out of a total of 4,839 numbers drawn appears significant. Stretching the scale so that differences appear larger than they really are is a common trick used to exaggerate results. Second, the chart starts counting at 465, not at 0. Only the top part of each bar is shown, which also exaggerates the results. In comparison, Figure 2-1b graphs the percentage of times each number was drawn. Normally the shape of a graph wouldn’t change when going from counts to percentages; however, this chart uses a more realistic scale than the one in Figure 2-1a (going by 2 percent increments) and starts at 0, both of which make the differences appear as they really are — not much different at all. Boring, huh?

Maybe the lottery folks thought so, too. In fact, maybe they use Figure 2-1a rather than Figure 2-1b because they want you to think that some “magic” is involved in the numbers — and you can’t blame them; that’s their business.

Remember Looking at the scale of a graph or chart can really help you keep the reported results in proper perspective. Stretching the scale out or starting the y-axis at the highest possible number makes differences appear larger than they really are; squeezing down the scale or starting the y-axis at a much lower value than needed makes differences appear smaller.

A bar graph depicts the percentage of times number drawn.

FIGURE 2-1: Bar charts showing a) number of times each number was drawn; and b) percentage of times each number was drawn.

Checking your sources

When examining the results of any study, check the source of the information. The best results are often published in reputable journals that are well known by the experts in the field. For example, in the world of medical science, the Journal of the American Medical Association (JAMA), the New England Journal of Medicine, The Lancet, and the British Medical Journal are all reputable journals that doctors use to publish results and read about new findings.

Tip Consider the source and who financially supported the research. Many companies finance research and use it for advertising their products. Although that in itself isn’t necessarily a bad thing, in some cases a conflict of interest on the part of researchers can lead to biased results. And if the results are very important to you, ask whether more than one study was conducted, and if so, ask to examine all the studies that were conducted, not just those whose results were published in journals or appeared in advertisements.

Counting on sample size

Sample size isn’t everything, but it does count for a great deal in surveys and studies. If the study is designed and conducted correctly, and if the participants are selected randomly (that is, with no bias), then sample size is an important factor in determining the accuracy and repeatability of the results. (See Chapters 17 and 18 for more information on designing and carrying out studies including random samples.)

Many surveys are based on large numbers of participants, but that isn’t always true for other types of research, such as carefully controlled experiments. Because of the high cost of some types of research in terms of time and money, some studies are based on a small number of participants or products. Researchers have to find the appropriate balance when determining sample size.

Remember The most unreliable results are those based on anecdotes, stories that talk about a single incident in an attempt to sway opinion. Have you ever told someone not to buy a product because you had a bad experience with it? Remember that an anecdote (or story) is really a nonrandom sample whose size is only one.

Considering cause and effect

Headlines often simplify or skew the “real” information, especially when the stories involve statistics and the studies that generated the statistics.

A study conducted a few years back evaluated videotaped sessions of 1,265 patient appointments with 59 primary-care physicians and 6 surgeons in Colorado and Oregon. This study found that physicians who had not been sued for malpractice spent an average of 18 minutes with each patient, compared to 16 minutes for physicians who had been sued for malpractice. The study was reported by the media with the headline, “Bedside manner fends off malpractice suits.” However, this study seemed to say that if you are a doctor who gets sued, all you have to do is spend more time with your patients, and you’re off the hook. (Now when did bedside manner get characterized as time spent?)

Beyond that, are we supposed to believe that a doctor who has been sued needs only add a couple more minutes of time with each patient to avoid being sued in the future? Maybe what the doctor does during that time counts much more than how much time the doctor actually spends with each patient. You tackle the issues of cause-and-effect relationships between variables in Chapter 19.

Finding what you want to find

You may wonder how two political candidates can discuss the same topic and get two opposing conclusions, both based on “scientific surveys.” Even small differences in a survey can create big differences in results.

One common source of skewed survey results comes from question wording. Here are three different questions that are trying to get at the same issue — public opinion regarding the line-item veto option available to the president:

  • Should the line-item veto be available to the president to eliminate waste (yes/no/no opinion)?
  • Does the line-item veto give the president too much individual power (yes/no/no opinion)?
  • What is your opinion on the presidential line-item veto? Choose 1–5, with 1 = strongly opposed and 5 = strongly support.

The first two questions are misleading and will lead to biased results in opposite directions. The third version will draw results that are more accurate in terms of what people really think. However, not all surveys are written with the purpose of finding the truth; many are written to support a certain viewpoint.

Remember Research shows that even small changes in wording affect survey outcomes, leading to results that conflict when different surveys are compared. If you can tell from the wording of the question how they want you to respond to it, you know you’re looking at a leading question; and leading questions lead to biased results.

Looking for lies in all the right places

Every once in a while, you hear about someone who faked his data, or “fudged the numbers.” Probably the most commonly committed lie involving statistics and data is when people throw out data that don’t fit their hypothesis, don’t fit the pattern, or appear to be outliers. In cases when someone has clearly made an error (for example, someone’s age is recorded as 200), removing that erroneous data point or trying to correct the error makes sense. Eliminating data for any other reason is ethically wrong; yet it happens.

Regarding missing data from experiments, a commonly used phrase is “Among those who completed the study.” What about those who didn’t complete the study, especially a medical one? Did they get tired of the side effects of the experimental drug and quit? If so, the loss of this person will create results that are biased toward positive outcomes.

Remember Before believing the results of a study, check out how many people were chosen to participate, how many finished the study, and what happened to all the participants, not just the ones who experienced a positive result.

Surveys are not immune to problems from missing data, either. For example, it’s known by statisticians that the opinions of people who respond to a survey can be very different from the opinions of those who don’t. In general, the lower the percentage of people who respond to a survey (the response rate), the less credible the results will be. For more about surveys and missing data, see Chapter 17.

Feeling the Impact of Misleading Statistics

You make decisions every day based on statistics and statistical studies that you’ve heard about or seen, many times without even realizing it. Misleading statistics affect your life in small or large ways, depending on the type of statistics that cross your path and what you choose to do with the information you’re given. Here are some little everyday scenarios where statistics slip in:

  • “Gee, I hope Rex doesn’t chew up my rugs again while I’m at work. I heard somewhere that dogs on Prozac deal better with separation anxiety. How did they figure that out? And what would I tell my friends?”
  • “I thought everyone was supposed to drink eight glasses of water a day, but now I hear that too much water could be bad for me; what should I believe?”
  • “A study says that people spend two hours a day at work checking and sending personal emails. How is that possible? No wonder my boss is paranoid.”

You may run into other situations involving statistics that can have a larger impact on your life, and you need to be able to sort it all out. Here are some examples:

  • A group lobbying for a new skateboard park tells you 80 percent of the people surveyed agree that taxes should be raised to pay for it, so you should too. Will you feel the pressure to say yes?
  • The radio news at the top of the hour says cellphones cause brain tumors. Your spouse uses their cellphone all the time. Should you panic and throw away all cellphones in your house?
  • You see an advertisement that tells you a certain drug will cure your particular illness. Do you run to your doctor and demand a prescription?

Remember Although not all statistics are misleading and not everyone is out to get you, you do need to be vigilant. By sorting out the good information from the suspicious and bad information, you can steer clear of statistics that go wrong. The tools and strategies in this chapter are designed to help you to stop and say, “Wait a minute!” so you can analyze and critically think about the issues and make good decisions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.242.204