Chapter 6. Facts and Truth

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6 FACTS AND TRUTH THE BLURRED EDGE OF PERSUASION AND DECEPTION

EVERY CHART is a manipulation.

Behind every chart are dozens of decisions, conscious and subconscious, that influence what someone sees and thinks about that chart.

This idea makes some people uncomfortable. Data visualization has what’s called “high facticity”—that is, people feel like charts represent some reality accurately.¹ That data itself is dispassionate. That numbers don’t lie. The whole point of data is that it’s objective, right? And visualization is just a way to show data.

Well, yes. But also, no. Data visualization is not just a visualization of facts; it’s the manipulation of them. Here’s an exercise to reinforce the idea. I need to plot year-by-year data of my LDL or “bad cholesterol” level for five years.² Which set of axes should I use?

First, we need to know the proper distance to put between years on an x-axis. What is the correct amount of space between two years?

Obviously, it’s an absurd question. The space between years on a two-dimensional surface is not a real thing; it’s an arbitrary decision based on any number of factors that have nothing to do with time. If I wanted, I could put six meters between each year and I wouldn’t be wrong. I’d be impractical, but not incorrect.

Even in this bare example, before I’ve plotted any data, I have several decisions to make about my axes: the length of the x- and y-axes; the range I use for the y-axis; the number of axis labels I use; the number of tick marks and where they’re placed. I’ve probably at this point also made several decisions about my data: What years should I include or leave out? What other data will I include or leave out?

And once I make those choices, I face dozens more decisions. What chart type do I use? What colors? How many colors? Similar colors or different ones? How thick do I make my lines or how big do I make my dots or how much space do I put between my bars? What’s the title and subtitle? What fonts do I use? Do I add a caption? Where’s the key? Should I label specific numerical values? Which ones? And on and on.

Some of these questions I’ll barely think about. I might just go with what the software gives me, or I might just do what I usually do. Others I’ll consider more carefully. But in every case, I’m manipulating the visualization in ways that will affect what the user of the chart sees, feels, and understands about the data. I cannot avoid this. Every chart is a manipulation.

Sometimes, the manipulation is automated. Take these two curves:

They tell different stories. With the first, you notice a rolling trend, almost like a plane’s trajectory. The second’s sheer and bumpy—like a roller-coaster ride. It seems sharper, more volatile.

But these curves plot the same data on the same y-axis. The only difference between them is the length of the x-axis, a change that’s merely the result of tilting a phone from landscape to portrait mode. So which chart is more “objective”? More “correct”?

Which is true?

A MAGIC TRICK

Let’s be clear: The word manipulation as used above is reasonably neutral. Its connotative, nonpejorative meaning is just to work something with your hands. (Its Latin origins come from the words “hand” and “fill.”³)

But it’s also not a passive word. Some manipulation is the product of happenstance—decisions you don’t even know you’re making. But much is deliberate and skillful. And the more you understand how you control the truth you present to people, the more powerful your manipulations become.

Here’s an example of manipulation that doesn’t change a single data point but flips the meaning of the chart. We start with the chart below.

As a user of this chart, I see a clear message that the immigrant population is rising. In fact, it’s almost filled up this bound box. It seems like it’s making a point about this population reaching a high point it hasn’t seen in 100 years. There are two variables in my data—immigrants and nonimmigrants. I’ve only labeled one and I truncated my y-axis. This box only reaches “14% share,” which is obviously out of 100%, but we don’t see all that other data space, meaning you literally can’t see some of the data that shows the nonimmigrant population (which again, I haven’t labeled). I’m telling you what variable matters and making it hard for you to see or think about the other variable. Still I know it’s 14%, so maybe I can imagine how much that is, or what the other 86% looks like. Try to imagine what 14% looks like on a full y-axis that shows all the data. When you do see it, the story feels different.

Now I’ve plotted both variables entirely, but I’ve still only labeled one. Also the second one is white—negative space in the visual, meaning I think of it as background information. I’ve made sure you focus on the variable I want you to see, as small as it now looks compared to the previous chart.

I could easily change your focus.

Now both variables are labeled and given strong colors. Now the massive size of the nonimmigrant variable draws the eye. I also changed the title to reinforce that I want you to think about both variables.

One more step: Let’s do what the initial chart did, but in reverse. One variable shown. Title reinforcing what I want you to see. I could have truncated the y-axis but chose not to, as this variable comes so close to using the whole axis.

Keep in mind this is the exact data used in the first chart, just manipulated to deliver a completely different truth. The decisions I made with color, axes, labels, and titles drive you to see what I want you to see or what I think you need to see.

THE BLURRED EDGE OF TRUTH

My transformation of the immigration data wasn’t meant to be devious or manipulative in the pejorative sense. It was only to show the broad spectrum of truths you can create with simple alterations.

You’ve probably come across real-world examples of visualizations that are designed to deceive, hide, or otherwise alter the story in data in an unfair or unethical way. I’m often asked in seminars and workshops how to know where the line is between visual persuasion and visual dishonesty.

Even if it were a fine line, at least we could see it and stay on the ethical side of it. But, of course, no such line exists. Instead, we have to negotiate a blurred and shifting borderland between truthfulness and unfair manipulation.

On one side of this indefinite border are the persuasion techniques outlined in chapter 5 and on the other are the four types of deception:

Visual Persuasion Techniques		Visual Deception Techniques
Emphasis: Drawing the eye to main idea Example: Making one line in trend chart thicker and brighter-colored than others		Exaggeration: Making an idea look more important or dramatic than it warrants Example: Truncating a y-axis to make an upward sales trend look like steeper gains
Isolation: Drawing the eye away from other ideas Example: Making all dots in a scatter plot gray except for the group you want to discuss		Falsification: Changing or altering an idea in a way not supported by the data Example: Using two distinct y-axes to create a correlation where none exists
Added or removed reference points: Adjusting how much data is around the main idea to shift the context Example: Removing U.S. data from a bar chart to focus only on European data		Omission: Leaving out data that would discount the viability of your idea Example: Removing U.S. data from a bar chart to make overall performance look better than it was
Shifted reference points: Adding new or different data to create a new context Example: Layering in stock index trend line to compare to your stock performance		Equivocation: Using unnecessary elements to hide ideas or make them vague or unclear Example: Adding dozens of unnecessary stock performance trend lines to a chart so it’s difficult to focus on one company’s stock performance

Visual Persuasion Techniques

Visual Deception Techniques

Emphasis: Drawing the eye to main idea

Example: Making one line in trend chart thicker and brighter-colored than others

Exaggeration: Making an idea look more important or dramatic than it warrants

Example: Truncating a y-axis to make an upward sales trend look like steeper gains

Isolation: Drawing the eye away from other ideas

Example: Making all dots in a scatter plot gray except for the group you want to discuss

Falsification: Changing or altering an idea in a way not supported by the data

Example: Using two distinct y-axes to create a correlation where none exists

Added or removed reference points: Adjusting how much data is around the main idea to shift the context

Example: Removing U.S. data from a bar chart to focus only on European data

Omission: Leaving out data that would discount the viability of your idea

Example: Removing U.S. data from a bar chart to make overall performance look better than it was

Shifted reference points: Adding new or different data to create a new context

Example: Layering in stock index trend line to compare to your stock performance

Equivocation: Using unnecessary elements to hide ideas or make them vague or unclear

Example: Adding dozens of unnecessary stock performance trend lines to a chart so it’s difficult to focus on one company’s stock performance

I won’t dwell on falsification; the commandments should be obvious: Don’t lie. Don’t deliberately mislead. Don’t create a chart like this one.

It looks like a positive revenue trend, but here each bar is cumulative, accounting for all previous years’ revenue as well as new revenue. Year 1 is counted five times (see the chart below), although that revenue was earned only once.

This is continuous data, a trend line, hiding in a categorical form: We expect each bar to represent a discrete value. The breakdown shown here is the more honest depiction of the revenue trend.

Of course, many charts don’t fall neatly into one category or the other. One person’s isolation is another’s omission. It’s easy to see how emphasis, applied too forcefully, might slip into exaggeration.

Imagine, for example, your boss asks you to prepare a presentation version of this chart that she quickly generated on current and expected job satisfaction across careers. She attached a note to it.

Data and rough visual attached. For the board presentation, want to show the big change, the U-curve for current satisfaction and the huge gap in current vs. expected for young employees, which closes and flips in midcareer. Important to show where we need to address employee satisfaction issues before we propose funding for engagement programs.

You can see everything the boss is describing, but you also know that this satisfaction survey was scored on a 1 to 10 scale. This chart only shows from 6.4 to 7.8 on that scale, 14% of the actual range. When you reproduce this and compare it to a version with a full y-axis, you see a remarkable disparity.

Remember the boss’s keywords from her note: big change, U-curve, huge gap, flips. Those were clear in the original version, but the new version looks almost changeless—a small gap that converges in an unremarkable crossover.

What do you do? The boss thinks it’s acceptable to “zoom in” like this, and indeed, we see this kind of truncation all the time. The boss insists that even though it’s only about a point-and-a-half change, that is remarkably significant in this kind of data. By truncating, you’re emphasizing what’s important, not exaggerating. And it needs to be emphasized if the career programs are to get funded, which you agree are a good idea. No way they’ll fund if you show the flat lines.

Which way do you go? Some of you will say show the whole range. You don’t have to be a “y-axis fundamentalist” to see how dramatically truncation alters the idea that emerges from the data.⁴ Others will say choose to truncate the axis. Though you think the changes look small on a full-axis chart, they matter, so they should be made to look less flat and more dynamic. If anything, some will argue, the full-axis version is deceptive because it makes something deemed significant look insignificant.

There’s no easy answer here.

THAT’S NOT A GOOD CHART

“HAPPY MANIPULATING!”

Deception in visualization is often unintentional, but there are plenty of examples of ill-intentioned people and organizations using charts—and their knowledge of how we see them—to lie. Partisan politics often fall to manipulating visuals to cater to a point of view in a way that belies the facts.

One of the worst cases of the past two decades was at Takata, an airbag manufacturer. Some airbags were failing safety tests, but they were not taken off the market or recalled. Over several years, many injuries and nearly two-dozen deaths have resulted from misfiring airbags. Eventually, a congressional investigation found that Takata had hidden the fact that there were technical problems with its product, both through omission of data and through manipulating charts.⁵

The company’s airbag recall was one of the biggest in automotive history and eventually led to a multibillion-dollar class-action lawsuit settlement.

In one case a manager sent data to a team that was meant to clean up reports with the message “Happy manipulating!”

One of the emails uncovered in the investigation included an engineer saying, “I showed all the data together, which helped disguise the bimodal distribution. Nothing wrong with that. All the data is there. Every piece.” He also suggested using “thick and thin lines to try and dress it up, or changing colors to divert attention.”

The charts in question have not been identified, but they probably looked something like the one here, which was presented as evidence in the congressional investigation. It’s easy to see how some light design manipulation of the kind the engineer cites could hide important information.

As telling and as troubling as the design manipulation was, the justification the person tries to use for doing it is equally disturbing. “All the data is there,” he says, as if that absolves one from responsibly representing the ideas in the data.

There could be no better way to illustrate the difference between facts and truth. Even when showing all the facts, they hid the truth. And people died.

EXPLORING THE GRAY AREA

Think of manipulating your dataviz like wielding a knife. Knives can be used in any number of ways: professionally by someone who’s well trained; skillfully by a careful amateur; carelessly by someone not paying attention; recklessly by someone who isn’t careful or considerate; even illicitly by a bad actor.

How you wield this knife really comes down to your intentions and your execution.

You strive to reach the top right here, but that empty space in the middle is where we sometimes end up. Unpacking the ways in which charts slip into deception, even if we don’t mean them to, is like learning to handle a knife so that you don’t accidentally cut yourself or others.

These cases, like the earlier example of the career satisfaction chart, aren’t cut and dried, so my advice isn’t either. Rather than trying to create a doctrinaire list of dos and don’ts, I’ll deconstruct four of the most common techniques that put charts in this gray area, explain why and when you might want to use them, and lay out why and when they may not be okay.

The truncated y-axis: exaggerating trends. The debate over the y-axis is visualization’s version of grammarians arguing about ending a sentence with a preposition. Even if we think it’s wrong, we do it because the proper alternative often feels awkward.

Why it may be effective. It emphasizes an idea. Cutting empty ranges out of an axis increases the physical distance between values, revealing more texture in the changes and making change look more dramatic, as shown in the career satisfaction example earlier.

It’s clearly true that not truncating makes it harder to see change and difference. The full-axis version uses 7% of the y-axis to show a 7% gap. The truncated version uses almost 50% of the chart’s vertical space to represent a 7% gap. Truncation is a way of zooming in and isolating the main idea. It’s not unlike looking through a magnifying glass.

It’s also true that if a range of data is consistently far from zero, you’ll need much more space to effectively unflatten the visual while maintaining a full y-axis.⁶ You’ll have to manipulate the height and width of the chart. This quickly becomes an impractical exercise: It yields strangely formatted charts that, although they preserve some detail of the curves, ultimately distract the viewer.

Why it may be deceptive. Some will argue that truncation acts less like a magnifying glass than like a fun house mirror, distorting reality by exaggerating select parts of it. The line on the Taking a Vacation chart above represents a drop of 25 percentage points, from 80% to 55%. But its physical descent covers almost the entire y-axis. In other words, the line descends 100% of the y-axis to represent a 25% decline. Truncation also hides representative space. The line here divides space that represents vacationers (below) and nonvacationers (above), but neither space accurately represents the proportions between the two at any given point. We can see this when we chart the space devoted to each variable, as shown below. In the truncated version, the proportions are simply inaccurate.

Another good way to understand the effect of truncation is to pluck three points from the data set and turn them into stacked bars, one group with a truncated y-axis and one that spans from zero to one hundred, as shown below.

Rather than persuasive or even deceptive, the truncated-axis chart looks plain wrong, and it is. Its 1995 bar, for example, at 67%, should be two-thirds dark yellow and one-third pale yellow, but it’s split about 50/50. Truncation with categorical data doesn’t work. We see it used like this mostly when deception is the goal.⁷ And yet the original line chart represents a similar dividing of space, except with many more data points along a continuum.

Sometimes people equate truncating the y-axis with not starting at zero. But even if it starts at zero, lopping off the top of an axis’s true range also produces a distortionary effect, as it did in the immigration chart series. That kind of truncation is less often noticed and produces fewer outbursts from y-axis fundamentalists, but it can hide representative space in the same way.

The double y-axis: comparing apples and oranges. Compared with truncation, double-y-axis charts provoke little agitation. An internet search for “truncated y-axis” returns top results about lying with charts, but a search for “secondary y-axis” turns up mostly sites that teach you how to add one in Excel. Still, charts with two y-axes deserve similar scrutiny.

Why it may be effective. It compels an audience to make comparisons. Instead of trying to convince people that there’s a relationship between two variables, it creates a relationship by fiat. Above is an example I created for a humorous essay on the use of the term “apples and oranges” in the media.

You can’t look at this chart and consider each plot on its own merits. The fact that they’re together forces you to think about them as something, not two things that happen to share a space. What does this chart say? You’ve probably formed the narrative I wanted you to: Stock market gains lead to more people using the term “apples and oranges.”

Of course, that idea is absurd on its face—but it’s almost impossible not to make the connection. I knew that (or at least I sensed it; this was created long before I had considered the mechanics of chart making) and leveraged it to send you down a path of trying to figure out why this relationship exists and to make a funny point. (This is one case where visual deception is allowed: in humor, when the audience knows you’re being deceptive to make a funny point.) Two y-axes can shape a narrative that goes in the direction you want it to, and it is economical, using the space of one chart to plot two.

Why it may be deceptive. The relative sameness or difference in the shapes of lines or the heights of bars being measured on two different scales is much less meaningful than it appears to be. The simplest illustration is a chart that uses two axes representing the same type of value but in different ranges.

In the chart above, it appears that gold and silver are roughly the same price, and their prices move together. But the range of the secondary y-axis is two orders of magnitude lower than that of the primary y-axis. (In addition, they’re truncated, so the closeness of the lines is artificial.) That means we’re seeing lines that interact in fake ways. When the blue line is higher on the chart, the price of silver isn’t higher than the price of gold. When the lines cross over, prices aren’t crossing over. Both axes measure U.S. dollars, so why not use just one y-axis?

That’s what this gold and silver chart shows, and it’s simply less useful.

We can’t see what’s happening to silver prices. One solution to this dilemma would be to show relative change in price rather than raw price, as the chart below shows.

The price of silver, a flat line in the previous chart, is actually more volatile than the price of gold—an idea we don’t see in the first chart. If anything, the price of gold looks more dynamic in that first chart, but the relative change from $1,300 to $1,200 is smaller than the change from $21 to $18, even though the slopes match when we use separate y-axes in the same space. Still, this new version creates new challenges. It shifts the main idea from the price of precious metals to the change in price—from value to volatility. Knowing the actual price of gold and silver at any given time is not possible in a percentage change chart.

Things get even murkier when the second y-axis uses a different value altogether. In this chart, it’s hard to miss the narrative that Tesla’s market share is going to come on strong in light vehicle sales. Its line reaches higher and higher into the bars that represent all light vehicle sales.

Unfortunately, that narrative is illusory. In 2025 the line reaches about a third of the way up the total light vehicle sales bar, which suggests Tesla will approach 10 million vehicle sales. Except that its y-axis is measured in percentage, not raw numbers. In 2025 it would have just a 3% market share—only 1/33rd of that year’s plotted bar. The chart below is an accurate portrayal of the scenario.

When two measures bear no relationship at all, things get truly weird, as with the next chart.

We see events in physical space—crossovers, meeting points, divergences, convergences—that suggest a relationship that doesn’t exist. Time on page didn’t cross over or go higher than page views between the seventh and eighth weeks—and what would it even mean for seconds to be higher than page views? It’s as if rugby and baseball are being played on the same field and we’re trying to make sense of both as one game.

Nevertheless, when we see data charted together, our minds want to form a narrative around what we see. Charts can be concocted that combine truncation with dual y-axes to manipulate the curves into similar shapes to encourage that narrative-seeking, such as the chart below. The two variables here are correlated, but that’s just an accident of statistics. The tempting if unlikely causal narrative is that eating more cheese increases the chances you’ll suffocate in your bedsheets.⁸

What happens when this visual parlor trick is applied to less silly examples? In an age of very big data sets and sophisticated tools for mining them, it becomes easy, as the Stanford professor of medicine John Ioannidis puts it, to “confer spurious precision status to noise.”⁹ Chart 1 in the series is a good example.

Sales and customer service calls map closely over the course of the day. The tight link might make a manager think that customer service should be staffed according to how much money the company expects to be bringing in at that time of day. More money, more reps. But the way these lines stick together, as much as we might want to believe it means something, is artificial. First, the lines stick together in part because they use separate grids.

Chart 2 in the series exposes the grid lines to show that the tight connection between lines is artificial.

It’s almost as if each chart were on a semitransparent piece of paper and we slid one over the other until the curves aligned. In chart 3, when the axes are lined up to share a single grid, the picture changes.

Similarity remains, but now calls are always lower than sales (keep in mind this is all still nonsensical since the values are completely different). Even so, we get the sense that sales and calls go up and down together. This chart still might persuade us that staffing should follow the day’s sales trends.

But what if we take a view of the data that doesn’t rely on an artificial similarity in the shape of curves? Using the same data, let’s recalculate to compare sales per customer service call each hour as a ratio, shown in chart 4 in the series.

If sales and customer service calls really were as closely linked as the original chart suggests, this line would be essentially flat—as sales rise, calls rise. But this view tells a different, somewhat more nuanced story: The customer service team is handling 30% more calls for every $100,000 earned at 9 a.m. compared to 9 p.m. And the ratio bounces up and down all morning. In the first chart in this series, morning was when the lines were almost perfectly in sync, but that’s when there’s the most change in calls per sales.

Comparisons are one of the most basic and useful things we do with charts. They form a narrative, and narrative is persuasive. But it should be obvious by now that there are no easy ways to handle different ranges and measures in a single space. Pushing down one misleading problem can cause another to pop up. More-accurate portrayals, such as percentage change, may be less accessible or useful, or even alter the idea being conveyed.

The simplest way to fix this is to avoid it. Placing charts side by side rather than on top of each other and using presentation techniques that we’ll talk about in chapter 7, can help create comparisons without creating false narratives.

The map: Misrepresenting Montana and Manhattan. Maps are themselves information visualizations, but they’re also popular containers for dataviz. Assigning values from spreadsheets to geographic spaces has become essential practice in public policy circles and politics especially. The rise in popularity of color-coded maps, or choropleths,” has spawned one of the toughest dataviz challenges in terms of toeing the line between effectiveness and deceptiveness.

Why it may be effective. Maps make data based on geography more accessible by making it simple to find and compare reference points, because we are generally familiar with where places are. Comparing country data, for example, is easier when we embed values in maps, especially as the number of locations being measured increases. Looking at the Solar Capacity charts below, see how long it takes you to complete the fairly simple task of comparing the United States with Japan, then Spain with France, and finally Germany with Australia on the bar chart. Then do the same on the map.

Choropleths also help us see regional trends that other forms of charts cannot. It’s difficult, for example, to look at the bar chart and form ideas about, say, the European versus Asian deployment of solar capacity, but in the map we can make those assessments almost without thinking.

Why it may be deceptive. The size of geographical space usually over- or underrepresents the variable encoded within it. This is especially true with maps that represent populations, as we see during elections. You might call this the Montana-Manhattan problem.

More people live in Manhattan, even though Montana is almost 6,400 times bigger. Another way to express this is to show how many people live in one square mile of each place. Each dot represents seven people.

It may be hard to see, but Montana’s square mile contains one dot. So, when Montana votes one way during an election, the visual representation is of a colored-in area that’s more than 6,400 times the size of the one for Manhattan, even though 60% more people live in Manhattan. This happens all over the world. Below are the election results for Scotland’s referendum on independence plotted as a map and as a simple proportional bar chart.

Geographically it looks like about 95% of the country voted no. But what looks like an overwhelming victory isn’t actually so one-sided. Less than 5% of the landmass on the map represents a yes vote; 38% of eligible voters voted yes. Consider that in the Highlands, that massive northernmost red region on Scotland’s mainland, only about 166,000 people voted in total—fewer than the 195,000 who voted yes in Glasgow, one of the small blue wedges. The hexagon version of this map is an attempt to add some of the nuance back into the data, and it succeeds in downplaying the value of large spaces that have relatively small numbers of voters, but you do lose some sense of actual geographical navigation. For example, try to locate the Scottish highlands on this map.

Moving away from maps, though, reintroduces the problems that maps are meant to solve by using our knowledge of where things are to make values more accessible. The proportional bar chart below, for example, makes it nearly impossible to connect places to values quickly or to make regional estimations.

More-accurate representations of the data lead to less accessible geographic information. Conversely, good maps tend to misrepresent data values. This paradox has vexed designers, cartographers, and data scientists for some time, and they continue to look for solutions to this challenge; none has taken hold as a standard.

Grid maps provide an alternative solution. In a grid map, every region is assigned an equal size and placed roughly where we imagine it belongs on a regular geographical map. Some use squares, some hexagons, and some use compound hexagons that all have the same area but can change shape, as shown here.¹⁰ It still takes more work to grab locations in these grids than it would in a regular map. Find New York in this grid map, for example. When I looked for Texas, I found Louisiana.

Other maps use proportional circles overlaying states, which can be striking, but it’s hard to use if there are too many circles crashing into each other and it’s still difficult to make comparisons between geographically disparate circles, say, Washington State and Maine (and if the values encoded in the circles have a wide range, they can become overwhelmingly disparate in size). Some use three-dimensional bars rising up from geographies. They also can be striking, but comparing values in this form is difficult; they tend to be best deployed when one geography is an outlying large value that draws the eye.

These efforts are less misrepresentative than the ones that use real area to encode other variables, but they also flout a deeply ingrained convention in our heads—the shape of the world—and make us work harder, sometimes much harder, to find what we’re looking for. That can be frustrating and therefore less persuasive.

Uncertainty: The paradox of showing potential futures. How do you show what hasn’t happened, and might not happen? The paradox of charts showing uncertainty is that they force you to visually determine the undetermined. You must show what might be, but the act of showing it makes it appear to be. Humans struggle processing probability. Combine that with the high facticity of data visualization—when we see things charted, they seem to reflect a true reality—and you have a steep challenge of making uncertainty visible while not making it seem certain.

Why it may be effective. This is not to say we shouldn’t visualize uncertainty. It’s a highly valuable way to discuss multiple potential futures and ranges of possible outcomes. It’s most effective when there is some certainty within the range of possibilities, and even more effective if you can assign probabilities to those potential outcomes. The classic academic approach is a box-and-whisker plot that shows a certain range as a bar and lines extending from either side to show the full potential range of outcomes. For statisticians and those used to using them, box-and-whisker plots are fine, but most audiences don’t parse them so easily.

Another approach is to change a solid line to a dotted line, or a semitransparent line when real data becomes projected data, signaling that we think this is going to happen or on the current trajectory, this is what will happen, but it hasn’t happened yet. Again, this may be rendered as multiple scenarios.

A most popular approach is what’s sometimes called a “fan chart” because of the way the range of outcomes fans out, as with this drone chart. The lighter hue band of data around a “most expected trend” line represents the range of possible outcomes. In some charts, the saturation of the uncertainty color deepens as probability increases and goes paler as it decreases—a smart way to signal likelihood using our brain heuristic that as color empties, so do values.

Why it may be deceptive. There’s no getting around the fact that plotting uncertainty gives it a veneer of certainty. For example, look at the drone map again. The high forecast is extremely unlikely. Let’s say it has a one-in-a-thousand chance of happening. And let’s say the base forecast has a one-in-five chance of happening. Would you get the sense that the high forecast is 200 times less likely to exist than the base forecast from this visual? Can you even say what “200 times less likely” should look like?

Such difficulty accurately representing uncertainty manifests in users of charts as anxiety and frustration. Famously during the 2016 election, the New York Times used a needle over vote percentages for each candidate, like a pressure gauge. The needle jittered as results came in to represent uncertainty in the outcome. The frenetic moves left and right were not representative of any real probability values, even though the needle hovered over real data values. It was just meant to metaphorically represent “uncertainty.” The gauge only generated confusion, complaints, anger, and angst.

The pandemic became a master class in the problems with visualizing uncertainty. Charting potential deaths in such a fluid situation was a precarious task, even with probabilities attached to the outcomes. Experts were desperately hoping that visualizing the potential dire consequences would change people’s actions to minimize poor outcomes.

But such charts also produce anxiety as it makes real “worst-case scenarios”—the mere act of visualizing such a thing affects the audience in potentially disproportionate ways to what the data suggests is likely.

This is also true in another classic uncertainty visualization, the hurricane projected path map. Such maps are widely deployed as storms barrel toward land. Often, they’re animated. But they present a host of problems that can make them deceptive.¹¹ The ones shown below, for example, show two approaches. The sprayed lines give me no sense whatever of the likelihood of any one of these paths. What if the line that curls back into Mississippi is 90% likely to occur and all the others are a combined 10% likely? What’s more, I don’t get a good sense of the area that will be affected. The path of the storm is less important to me as a user of the visual than the swath it will pummel.

So, the second one fixes that, right? Actually no. It’s a clearer depiction of probability—I see the most likely path and then the shaded area is all other paths, but that cone looks more like the area the storm will affect. I see one path and the swath of the storm getting bigger as it hits land when in fact that’s not what it’s meant to represent at all. These visual decisions have real consequences. When Hurricane Ian hit Florida, people were making decisions to stay in place or evacuate based on what they saw in a similar cone projection—mistaking the visual for showing what area would be affected. But these charts show neither the affected area or secondary effects of a storm such as flooding and tide surges.¹²

I don’t mean these observations about deceptive charts to be criticism. They are not accusations that people are trying to deceive. Remember the manipulation matrix. Many visualizations fall into a gray area, and we’re just trying to spot the reasons they might, and how to avoid that. Visualizing uncertainty remains one of the most difficult challenges for anyone. They’re only offered here to help you avoid becoming deceptive when you visualize uncertainty and probability.

WHAT’S GOING ON HERE?

By now it should be clear that facts and truth are different. You can create multiple truths from one set of facts. That’s strange when you think about it. Data is data. How can I present the same data to two people and get them to believe different, even opposite, truths?

The answer goes back to how we process visuals, and a concept called the Law of Prägnanz. This word translates roughly to “pithiness.” Without getting too deep into gestalt psychology theory, all this means is that the simplest organization, requiring the least cognitive effort, will emerge as the figure. That is, our brains and visual perception systems do as little work as possible to find the easiest meaning.

This is true even if the figure that emerges in your mind isn’t actually there. Here is a famous example.

There are no circles or triangles in that figure. But your brain can’t not see them. As gestalt psychologist Kurt Koffka put it, “The whole is other than the sum of the parts.”

Applied to data visualization, that means we don’t assemble the parts into a whole idea. We don’t process all the data points and note their arrangement; compare their placement, their colors, and all the other marks on the page; and say to ourselves, “This all adds up to an idea.”

We don’t evaluate that picture and think, There are three circular shapes, two on top, one in between the two on top below, each with 25-degree wedges removed, and the radius that makes up each edge of the wedge pointing to a corresponding radius on one of the others.

We see it and think, There’s a triangle on top of three circles.

The same is true for data visualization. When we look at the Scottish referendum map, we’re not thinking about percentages of people who voted one way or another; we’re thinking, “No” won by a massive landslide. When we look at a steep up-and-down curve, we think, That’s a volatile trend. When there’s an outlier on a scatter plot, we think, That’s different from the others.

It’s crucial to remember that, when you are creating persuasive visualization, and trying to avoid being deceptive, the audience is not reading your data. They are not parsing statistical information. They are seeing a whole and only afterward thinking about the parts.

First, they feel something. Then they try to relate to it, make sense of it. And then they think. But by that time, they’ve already formed the idea about it, and often those ideas are based on those heuristics and conventions we talked about in chapter 2. Those shortcuts in our mind we use to rapidly grab meaning so that we don’t have to think much about something we see all the time. Up is positive, down is negative. Time goes left to right. So on.¹³

When the line in the vacation chart approaches the bottom—the “end” or the “floor”—of a chart, we take that as a cue that it’s approaching zero, or nothing. This creates a false sense of termination. We expect the bottom to be zero, and our brains want to process it that way. When we realize it’s not zero, we have to expend more mental energy trying to understand what we’re actually looking at. Conversely, we see the top of the chart as the maximum, pinnacle, or ceiling. The truncated-axis vacation chart leads us toward the idea that everybody used to go on vacation and now no one does. But compare it to the full y-axis version below it.

Okay, the number of vacationers is indeed declining, but more people than not still take a vacation. Did that idea come through from the truncated version? Did you see it first? Was it an accessible idea? Did you get the sense that on average, over nearly four decades, a vast majority of people took vacations and a majority still do?

You can see how someone might use how we process information to engineer persuasion or deception into a visualization. Look at this chart, based on one that made the rounds on Twitter.

I love that this chart is accurately plotted. It’s an inane plot, but it’s not wrong. The chart maker has engineered this knowing that you see this and don’t think about temperature changes and their significance. You look at this and see a flat line, and flat means no change, status quo, safe. The trick here is to drastically exaggerate the y-axis (a kind of anti-truncation). Global temperature averages will never range more than a few degrees, but this chart includes a possible range of 120 degrees. In truth, a half-degree change is significant, but you can’t see that significance here, never mind feel it. A half-degree is only 0.4% of the y-axis. Significant changes have been designed out of the chart.

There are two ways to combat such deception. One is to show a different view that engineers the opposite feeling in the audience, and many critics of this chart did just that, truncating the y-axis and showing steeply rising curves and dark red zones of data that all looked dangerous and menacing. When you do this, you have to be prepared to be able to demonstrate how small changes are significant. Even if they look significant in a truncated-axis chart, for example, that’s just what the chart shows. Can you demonstrate significance in another way? Can you, for example, show the correlation between a half-degree change and famine?

Another way is to overcome the this, to demonstrate to the audience that flat lines are sometimes very bad. Using the same techniques that the chart maker used for the global temperature charts, I created a visualization that I hoped could change the conversation on flat lines.

This works, but it’s a lot of work to constantly challenge how people’s brains naturally process information. One way that deception can be overcome is through datavisual literacy. The more we know about how our brains process data visualizations, and the more we know the techniques that are used to persuade or deceive, many of which we’ve learned here, the more prepared we are to detect and disqualify deceptive visualizations, whether they’re deceptive by accident or on purpose.

KNIFE SKILLS

I described the borderland between persuasion and deception as blurred. It should be obvious why. Most of the examples deconstructed here feel not perfectly right or wrong but, rather, endlessly debatable.

I also described the borderland as shifting, and in some ways that’s the more difficult characteristic of persuasion techniques to reconcile. A truncated y-axis chart may be fine in one setting and violative in another. Even two colleagues in the same meeting might disagree about whether it’s convincing or spurious.

Judging whether your visualization crosses that indefinite line will, like any other ethical consideration, come down to one of those difficult, honest conversations with yourself. Ask yourself:

Does my chart make it easier to see the idea, or is it actively changing the idea?
If it’s changing the idea, does the new idea contradict or fight with the one in the less persuasive chart?
Does eliminating information hide something that would rightfully challenge the idea I’m showing?
Does the chart make me feel or see something I know doesn’t reflect reality?
Would I feel duped if someone else presented me with a chart like this?

The reason we can see many truths from one data set is because our minds don’t read data, they find the simplest, fastest explanation for a picture, a gestalt psychology principle called the Law of Prägnanz, or pithiness. The whole is not the sum of the parts; it’s other than the sum of the parts. We see the whole to make sense of the parts.

Understanding this, we can design persuasion into our charts, but persuasion can slip into deception if we’re not careful, and there are no hard-and-fast lines between the two. Situational context may make a chart persuasive in one setting and deceptive in another.

It’s crucial to avoid accidentally, or intentionally, deceiving an audience with data visualization. Datavisual literacy helps combat visual deception.

Judging whether your visualization crosses that indefinite line between persuasion and manipulation will, like all other ethical considerations, come down to a difficult, honest conversation with yourself. Ask:

• Does my chart make it easier to see the idea, or is it actively changing the idea?

• If it’s changing the idea, does the new idea contradict or fight with the one in the less persuasive chart?

• Does eliminating information hide something that would rightfully challenge the idea I’m showing?

• Would I feel duped if someone else presented me with a chart like this?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6. Facts and Truth

Create new playlist

Sign In

Sign Up

CHAPTER 6 FACTS AND TRUTH THE BLURRED EDGE OF PERSUASION AND DECEPTION

A MAGIC TRICK

THE BLURRED EDGE OF TRUTH

EXPLORING THE GRAY AREA

WHAT’S GOING ON HERE?

KNIFE SKILLS

RECAP FACTS AND TRUTH

1. THE TRUNCATED Y-AXIS

2. THE DOUBLE Y-AXIS

3. THE MAP

4. UNCERTAINTY

THE LAW OF PRÄGNANZ

Table of Contents for
Chapter 6. Facts and Truth