CHAPTER 1 A BRIEF HISTORY OF DATA VISUALIZATION THE ART AND SCIENCE THAT BUILT A NEW LANGUAGE

HERE’S A BREAKNECK SYNOPSIS of data visualization’s development from simple communication tool to burgeoning cross-disciplinary science.

ANTECEDENTS

The first data visualization was probably drawn in the dirt with a stick, when one hunter-gatherer scratched out a map for another hunter-gatherer to show where they could find food. If data is information about the world, and if communication is conveying information from one person to another, and if people use five senses to communicate, and if, of those five senses, sight accounts for more than half our brain activity, then visualization must have been a survival tactic.1 Far from being a new trend, it’s primal.

For a long time, visualization was probably limited to cave paintings and simple counting; eventually, maps, calendars, networks (for example, genealogies), musical notation, and structural diagrams emerged. In a sense, an abacus provides a visualization of data. No matter, I’m flying forward: Tables arrived in the late seventeenth or early eighteenth century and created spatial regularity that made reading many data points much less taxing. Ledgers were born. For two centuries, tables dominated information design.

What we think of as data visualization today—charts and graphs—dates to the late 1700s and a man named William Playfair, who in 1786 published The Commercial and Political Atlas, which was full of line charts and bar charts. He later added pie charts. Histories of infographics often start with a celebrated 1861 diagram by Charles Minard that shows the decimation of Napoleon’s army during his doomed Russian campaign. Praise also goes to Florence Nightingale’s “coxcomb diagrams” of British casualties in the Crimean War, published about the same time as Minard’s well-known chart. Nightingale’s work is credited with improving sanitation in hospitals because it showed how disease, above all, was what killed soldiers.

BRINTON TO BERTIN TO TUKEY TO TUFTE

It’s no accident that charting began to take off with the Industrial Revolution. Visualization is an abstraction, a way to reduce complexity, and industrialization brought unprecedented complexity to human life. The railroad companies were charting pioneers. They created some of the first organizational charts and plotted operational data such as “revenue-tons per train mile” (line chart) and “freight car-floats at a railroad terminal” (dual-axis timeline).2 The work of their skilled teams of draftsmen (alas, they were all men) was a prime inspiration for what can be considered the first business book about data visualization: Graphic Methods for Presenting Facts, by Willard C. Brinton, published in 1914.

William Playfair, Florence Nightingale, and Charles Minard, the big three of early modern charting.

Willard Brinton’s Graphic Methods for Presenting Facts provided advice to chart makers and critiques of charts from the early twentieth century.

Brinton parses railroad companies’ charts (and many others) and suggests improvements. He documents some rules for presenting data and gives examples of chart types to use and types to avoid. Some of his work is delightfully archaic—he expounds, for example, on the best kind of pushpin for maps and how to prepare piano wire for use as a pin connector (“heated in a gas flame so as to remove some of the spring temper”).

Then again, many of his ideas were in the vanguard. Brinton lays out the case for using small multiples (he doesn’t call them that), currently a popular way to show a series of simple graphs with the same axes, rather than piling lines on top of one another in a single graph. He shows examples of bump charts and slope graphs, styles many people assume are more modern inventions. He looks askance at spider graphs (they should be “banished to the scrap heap”), and he questions the efficacy of pie charts a century ahead of today’s gurus.

Eventually, Brinton lays out a system for creating “curves for the executive” which can “tell the complete story [of the business] in every detail if placed in proper graphic form.”

By midcentury, the U.S. government had become a complex and data-driven enterprise that demanded abstraction in unprecedented volume. Fortunately for the feds, they employed Mary Eleanor Spear, a charting pioneer who worked for dozens of government agencies and taught at American University. She produced two books in the spare, directive prose of someone who has a lot of work to do and not a lot of time to explain. Charting Statistics (1952) arose as a response to “problems encountered during years of analyzing and presenting data” in government. Practical Charting Techniques (1969) was an update and expansion on the previous, advocating for the power of data visualization: “The eye absorbs written statistics, but only slowly does the brain receive the message hidden behind the written words and numbers. The correct graph, however, reveals that message briefly and simply.”

Spear’s books, like Brinton’s, are filled with smart, commonsensical advice, along with some now-obsolete passages of her own (she expertly lays out how to apply various cross-hatching patterns to distinguish variables on black-and-white charts; the resulting material is beautifully executed). And she engaged in some ahead-of-her-time thinking—in 1952 she included tips and techniques for presenting charts on color TV.

Jacques Bertin, a cartographer, wanted to ground all this practical advice about chart making in some theoretical foundation. So he formed a theory of information visualization in his watershed 1967 book, Sémiologie Graphique. Rather than focus on which chart types to use and how to use them, Bertin describes an elemental system that still frames and provides the vocabulary for contemporary dataviz theory. He broadly defines seven “visual variables” with which we encode data: position, size, shape, color, brightness, orientation, and texture.3

Bertin also established two ideas that remain deeply influential to this day. The first is the principle of expressiveness: Say everything you want to say—no more, no less—and don’t mislead. This is a reasonably universal idea: It’s editing. Writers, composers, directors, cooks, people in any creative pursuit, strive (okay, struggle) to pare down their work to the essential.

The second is the principle of effectiveness: Use the best method available for showing your data. That is, choose the visual form that will most efficiently and most accurately convey the data’s meaning. If position is the best way to show your data, use that. If color is more effective, use that. This second principle is obviously trickier, because even today, determining the “best” or “most appropriate” method isn’t easy. Often, what’s best comes down to convention, or taste, or what’s readily available. We’re still learning, scientifically, what’s best, and the process is complicated by the fact that in a world of digital interactivity and animation, what’s best may change from page to screen, or even from screen to screen.

Bertin was followed in the 1970s by John Tukey, a statistician and scientist who was making 3-D scatter plots way back in the mainframe era. Tukey can be credited with popularizing the concepts of exploratory and confirmatory visualization—terms I’ll borrow and use later in this book. Roughly, exploratory visualization is used to find patterns you don’t know are there, while confirmatory visualization is used to show what you know is there.

Jock Mackinlay built on Bertin’s work in his influential 1986 PhD thesis.4 Mackinlay focused on automatically encoding data with software so that people could spend more time exploring what emerged in the visuals and less time thinking about how to create them. He also added an eighth variable to Bertin’s list: motion. Working in computer science at the dawn of the PC era, he could see animation’s powerful application for communicating data.

If Brinton is modern data visualization’s first apostle, and Spear and Bertin its early disciples, Edward Tufte is its current pope. With disciplined design principles and a persuasive voice, Tufte created an enduring theory of information design in The Visual Display of Quantitative Information (1983) and ensuing tomes. For some, Display is visualization gospel, its famous commandments oft repeated. For example: “Above all else show the data” and “Chartjunk can turn bores into disasters, but it can never rescue a thin data set.” Even though his work was rooted in scientific precision, Tufte is to the design-driven tradition what Bertin was to the scientific. A generation of designers and data-driven journalists grew up under the influence of Tufte’s minimalist approach.5

EARLY EVIDENCE

While Tufte was declaring the best ways to create beautiful, effective charts, researchers were learning how people read them. In 1984 William S. Cleveland and Robert McGill took on “graphic perception” by testing how well people could decipher simple charts.6 Pie charts have seemingly been under assault as long as they’ve existed, but Cleveland and McGill provided the first evidence that people find the curved area of pie slices more difficult to parse than other proportional forms. The two instigated a decade-plus of research aimed at understanding how we read charts and applying the results to a burgeoning visual grammar.7 They felt duty-bound to challenge accepted wisdom: “If progress is to be made in graphics,” they concluded, “we must be prepared to set aside old procedures when better ones are developed, just as is done in other areas of science.” A few old procedures were set aside; a few new ones were developed.8 This research deeply influenced the rapidly developing computer science community. Foundational texts that emerged from this era were Cleveland’s The Elements of Graphing Data (1985) and The Grammar of Graphics (1999) by Leland Wilkinson.

Viz communities grew apart. Computer scientists increasingly focused on automation and new ways to see complex data, scientific visualization using 3-D modeling, and other highly specialized techniques. They were comfortable with visualizations that didn’t look great. (In some ways this was unavoidable; computers weren’t very good at graphics yet.) Meanwhile, designers and journalists focused on capturing the mass market with eye-catching, dramatic, and decorated charts and information graphics.

Wedged between these two worlds was Chart Wizard, the Microsoft innovation in its Excel spreadsheet program that married the automation of computer-generated visualization with some design options built in—albeit design options much maligned for their superfluous ineffectiveness. From extraneous three-dimensionality to limited and unintuitive color palettes, Excel charts have become an immediately identifiable trope.

Still, Excel was a democratizing moment that put dataviz in the hands of millions, and the effect of that can’t be understated.

The internet happened and messed up everything.

REFORMATION

Tufte couldn’t have anticipated when he published Display that the PC, which debuted about the same time as his book, would, along with the internet that runs through it, ultimately overwhelm his restrained, efficient approach to dataviz. This century has brought broad access to digital visualization tools, mass experimentation, and ubiquitous publishing and sharing.9

The early twenty-first century’s explosion of infoviz—good and bad—has spurred a kind of reformation. The two traditions have dozens of offshoots. The followers of Tufte are just one sect now, Catholics surrounded by so many Protestant denominations, each practicing in its own way, sometimes flouting what they consider stale principles from an academic, paper-and-ink world.

Some offshoots have mastered design-driven visualization in which delight and attractiveness are as valuable as precision.10 Others view dataviz as an art form in which embellishment and aesthetics create an emotional response that supersedes numerical understanding.11 There are new storytellers and journalists who use visualization to bolster reporting and to lure and engage audiences.12 Some use it as a means of persuasion, in which accuracy or restraint may be counterproductive.13

No one owns the idea of what data visualization is or should be anymore, because everyone does.

This transfer of ownership from experts to everyone has diminished the influence of scientific research from the 1980s and 1990s. Cleveland and McGill’s results are sound, but most of their work focused on learning how people see static, mostly black-and-white charts, and it was limited to simple tasks such as identifying larger and smaller values. In a full-color, digital, interactive world, new research is needed.

Additionally, two assumptions were embedded in that early research: The first is that chart makers already have the undivided attention of the person decoding the chart. They don’t. You need only look at a Twitter feed, or at all the faces staring down at smartphones during presentations, to know that every chart must fight to be seen. Early research didn’t test how charts gain attention in the first place, which requires different and possibly conflicting techniques from the ones that show data most effectively. For example, complexity and color catch the eye; they’re captivating. They can also make it harder to extract meaning from a chart.

The second assumption is that the most efficient and effective transfer of the encoded data is always our primary goal when creating a visualization. It’s not. Our judgments may not be as precise with pie charts as they are with bar charts, but they may be accurate enough. If one chart type is most effective, that doesn’t mean others are ineffective. Managers know they must make trade-offs: Maybe the resources required to use the best chart type aren’t worth the time or effort. Maybe a colleague just seems to respond more positively to pie charts. Context matters.

EMERGING SCIENCE

The next key moment in the history of dataviz is now. This disruptive, democratizing moment has fractured data visualization into a thousand different ideas, with little agreed-upon science to help put it back together. But a group of active, mostly young researchers have flocked to the field to try. While honoring the work of the 1980s and 1990s, they’re also moving past it, attempting to understand dataviz as a physiological and psychological phenomenon. They’re borrowing from contemporary research in visual perception, neuroscience, cognitive psychology, and even behavioral economics.

Here are some important findings from this new school of researchers:

Chartjunk may not be so bad. Chartjunk is Tufte’s term for embellishment or manipulation—such as 3-D bars, icons, and illustrations—that doesn’t add to data’s meaning or clarity. It has long been scoffed at, but new research suggests that it can make some charts more memorable.14 This does not suggest that overloading a data visualization with adornment is necessarily a good idea—most professionals know the value of restraint. It only suggests that an absolute dictum against chartjunk may be officious. Even if you’re not adding to the meaning, you may be drawing someone’s attention, or you may be giving them a memorable visual cue.

Other studies are evaluating the role of aesthetics, persuasiveness, and memorability in chart effectiveness. The findings aren’t yet definitive, but they won’t all align with the long-held design principles of the past. Some research even suggests that if you have only a few categories of information, a pie chart is probably fine.15

A chart’s effectiveness is not an absolute consideration. Of course, reality is turning out to be far more complicated than “Don’t use pie charts” or “Line charts work best for trends.” Personality type, gender, display media, even the mood you’re in when you see the chart—all will change your perception of the visualization and its effectiveness.16 There may even be times to forgo visualization altogether.17 Research shows that charts help people see and correct their factual misperceptions when they’re uncertain or lack strong opinions about a topic. But when we understand a topic well or feel deep opposition to the idea being presented, visuals don’t persuade us. Charts that present ideas counter to our strongly held beliefs threaten our sense of identity; when that happens, simply presenting more and more visuals to prove a point seems to backfire. (The research goes on to suggest that what’s more persuasive in those situations is affirmation—being reminded that we’re good, thoughtful people.18)

Our visual systems are quite good at math. In some cases we can process multiple visual cues simultaneously (say, color, size, and position), and when we’re looking at charts with multiple variables, our ability to identify average values and variability is more precise than when we’re looking at numbers. That is, show me many numbers in a spreadsheet and ask me to estimate their average, or how much change occurs within them, and I won’t do as well as if you show me, say, a scatter plot and ask me to do the same. Ronald Rensink at the University of British Columbia and, later, Lane Harrison at Tufts University have also shown that we can sense correlation in charts in a predictable way, and how effective that sense is varies from chart type to chart type—allowing us to rank order the effectiveness of certain visual forms for showing correlation (more on this in the next chapter).

All of this suggests that visual representation is even more powerful than we know and sometimes a more intuitive and human way to understand values than statistics is.19

Visualization literacy can be measured. Some researchers are attempting to create standard visual literacy levels. Early results suggest that most people test just below what could be considered “dataviz literate,” but that they can be taught to become proficient or even fluent with charts and graphs.20 This research also shows that we don’t trust our judgments of charts as much as we should: Even when we correctly identify the idea a chart conveys, we want to check whether we’re right. Helen Kennedy, a professor and researcher at Leeds University, has done groundbreaking work here on defining what seems to matter with datavisual literacy and our confidence. Many of the findings are expected—we need to be confident in our math abilities and our familiarity with visual forms. But others are surprising; for example, emotions play a large role in how people respond to visualization. (More to come on this topic in chapter 6.)

In just over a century, data visualization has evolved from manuals of simple visual grammar to frameworks for understanding the practice to, now, more-sophisticated discussions about visualization’s role in the world. Whereas Brinton and Spear were concerned with simply helping people get their cross-hatching right, today entire books have been written on visualization and misinformation, and data and feminism. Kennedy herself has researched understanding visualization’s role in public discourse, diversity, and how we live. She coedited Data Visualization in Society in 2020, a collection of scholarly articles that aims to do no less than create a philosophy of visualization, attempting to answer questions such as: Can visualizations be objective? What is the value of beauty in data visualization? How do charts affect policies and institutions? What role do emotions play in dataviz?21

You may look askance at these questions; you’re just here to learn how to make some good charts. Don’t worry, this book is focused on those practical skills and techniques. Still, even as you’re learning the practical skills, you’ll find these questions naturally emerge as you try to find visual solutions to presenting your data, and as you observe others’ work. Such questions emerged during the pandemic as charts became a key force in informing and debating trends. It’s worth considering such questions.

A RETURN TO CRAFT

The science of visualization and information design is hurtling forward, but it will not stamp out the art of it. If anything, the science has demonstrated the need for the art. We know, empirically, that skillful design plays a role in effective visual communication. Humans have subjective feelings about data visualizations that can’t be ignored in the process of creating them.

And so, the two broad communities in the visualization world—the computer-driven science community and the design-driven creative community, the Tukeys and the Tuftes—that were cleaved in the late twentieth century have drifted back toward each other.

This is mostly out of necessity. The volume of data we have demands automation and machine processing; at the same time, the tools we have to turn these massive pools of data into visual information still aren’t good at understanding (never mind setting) the human context the data will be used in.

So on the one side, the technology is limited in its ability to intuit human needs and desires. For example, no computer program can ever know the needs of my audience and what part of my data is most important to them. If I generate a line chart with five lines on it, the software treats them all equally—each gets a unique color, they are all the same thickness, and none looks more important than another. But usually, one is more important than the others for my audience. Usually, I want them to focus on this information while using that plot as a reference point to make the primary data make more sense. I need to design my intention into the visualization. Only a person, not a computer, can know how to group the variables, or how to change the range of the axis to create a certain focus, or overlay information that’s not in the data set to bring an idea into high relief.

On the other side, to find signals in the noise, I need to process and visualize hundreds, thousands, millions of data points. I need to make hypotheses and test them by generating quick visuals to see what’s in the data. I need to be able to react to dynamic, changing data sources. This is the work of the machines; no human can manage this.

We need both the science and the art. And as we’ll see later, we need teams to marry objective data with the needs of human beings in specific situations. It’s the combination of tools and people that makes effective visualization.


As the grammar of graphics evolves (and it will continue to evolve, just as linguistic grammar does), visualization will remain what it always has been—an intermingling of the scientific and design traditions. It will be a mash-up of art and science, of taste and proof. But even if the grammar were already fully developed, understanding it alone wouldn’t ensure good charts, just as knowing the rules for prepositions and the passive voice doesn’t ensure good writing. The task at hand remains the same: We must learn to think visually, to understand the context, and to design charts that communicate ideas, not data sets.

And the best way to start learning how to produce good charts is to understand how people consume them. That starts by understanding some of the basics of visual perception.

RECAP A BRIEF HISTORY OF DATA VISUALIZATION

Visual communication is primal, but what we now think of as data visualization started just two centuries ago. The history of visualization provides a foundation for learning and helps dispel several misconceptions about the practice. Above all, it allows us to dismiss the myth that dataviz is a fully formed science with rules that must be obeyed. In fact, dataviz is a craft that relies on both art and science, in which experimentation and innovation should be rewarded, not punished.

A TIMELINE OF SOME KEY MOMENTS:

Late 1700s

William Playfair produces what are often considered the first modern charts, including line charts, bar charts, pie charts, and timelines.

1858

Florence Nightingale produces “coxcomb diagrams” that show the devastating effect of disease on the British army.

1861

Charles Minard publishes a diagram showing the toll taken on Napoleon’s army by his march on Russia.

1914

Willard Brinton publishes Graphic Methods for Presenting Facts, the first book about visualization for business.

1952

Mary Eleanor Spear publishes Charting Statistics, a book of chart-making best practices based on decades of work with many groups in the U.S. government.

1967

Jacques Bertin publishes Sémiologie Graphique, the first overarching theory of visualization, and one that remains deeply influential. Bertin describes seven “visual variables”: position, size, shape, color, brightness, orientation, and texture. He also establishes two core principles: the principle of expressiveness (show what you need to; no more, no less) and the principle of effectiveness (use the most efficient method available to visualize your information).

1970s

John Tukey pioneers the use of visualization with computers and popularizes the concepts of exploratory visualization (finding patterns in data that you don’t know are there) and confirmatory visualization (showing patterns in the data that you know are there).

1981

Microsoft introduces Chart Wizard into its Excel spreadsheet program, allowing millions to create fast, quasi-designed visualizations.

1983

Edward Tufte publishes The Visual Display of Quantitative Information, combining statistical rigor with clear, clean design principles and inspiring two generations of information designers and data journalists.

1984

William Cleveland and Robert McGill publish the first of several research papers that attempt to measure “graphic perception,” setting off two decades of research into what makes visualizations effective.

1986

Jock Mackinlay publishes his highly influential PhD thesis, which carries Jacques Bertin’s work into the digital age.

1990s–2000s

The computer-driven, scientific visualization community and the design-driven, journalistic visualization community diverge in their approaches to dataviz.

2010

Ronald Rensink publishes research suggesting that our perception of correlation in a scatter plot follows what’s known as Weber’s law and, for the first time, that a method for calculating a chart type’s effectiveness may exist.

2010s

The social internet, cheap and easy-to-use software, and massive volumes of data democratize the practice of visualization, creating mass experimentation. Viz is no longer the province of a small community of experts; it’s an internet phenomenon.

2014

Lane Harrison replicates Rensink’s findings and applies them to additional chart types. He creates a ranking of chart-type effectiveness for showing correlation. Harrison’s work is part of a new generation of research into establishing science around graphic perception, which draws on many other disciplines, including psychology, neuroscience, and economics.

2016

Helen Kennedy publishes an influential paper on visual literacy and the critical role emotions and “feelings of numbers” play in helping us make sense of data. She will push her research in the coming years into new territory of understanding the role of visualization in influencing people, affecting equality, diversity, and the effectiveness of institutions.

Today

Experimentation continues across a broad spectrum of disciplines. Tools for visualizing increasingly improve. They create better charts faster and allow for interactivity and dynamic updating of visuals. Social media is rife with visualization by professionals and amateurs. The discipline is a mass phenomenon.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.27.58