CHAPTER 2

A Simple Exercise to Help You Think Like a Data Scientist

by Thomas C. Redman

For 20 years, I’ve used a simple exercise to help those with an open mind (and a pencil, paper, and calculator) get started with data. One activity won’t make you data savvy, but it will help you become data literate, open your eyes to the millions of small data opportunities, and enable you to work a bit more effectively with data scientists, analytics, and all things quantitative.

While the exercise is very much a how-to, each step also illustrates an important concept in analytics—from understanding variation to visualization.

First, start with something that interests, even bothers, you at work, like consistently late-starting meetings. Form it up as a question and write it down: “Meetings always seem to start late. Is that really true?”

Next, think through the data that can help answer your question and develop a plan for creating it. Write down all the relevant definitions and your protocol for collecting the data. For this particular example, you have to define when the meeting actually begins. Is it the time someone says, “OK, let’s begin”? Or the time the real business of the meeting starts? Does kibitzing count?

Now collect the data. It is critical that you trust the data. And, as you go, you’re almost certain to find gaps in data collection. You may find that even though a meeting has started, it starts anew when a more senior person joins in. Modify your definition and protocol as you go along.

Sooner than you think, you’ll be ready to start drawing some pictures. Good pictures make it easier for you to both understand the data and communicate main points to others. There are plenty of good tools to help, but I like to draw my first picture by hand. My go-to plot is a time-series plot, where the horizontal axis has the date and time and the vertical axis has the variable of interest. Thus, a point on the graph in figure 2-1 is the date and time of a meeting versus the number of minutes late.

Now return to the question that you started with and develop summary statistics. Have you discovered an answer? In this case, “Over a two-week period, 10% of the meetings I attended started on time. And on average, they started 12 minutes late.”

FIGURE 2-1

How late are meetings?

But don’t stop there. Ask yourself, “So what?” In this case, “If those two weeks are typical, I waste an hour a day. And that costs the company x dollars a year.”

Many analyses end because there is no “so what?” Certainly if 80% of meetings start within a few minutes of their scheduled start times, the answer to the original question is, “No, meetings start pretty much on time,” and there is no need to go further.

But this case demands more, as some analyses do. Get a feel for variation. Understanding variation leads to a better feel for the overall problem, deeper insights, and novel ideas for improvement. Note on the graph that 8–20 minutes late is typical. A few meetings start right on time, others nearly a full 30 minutes late. It would be great if you could conclude, “I can get to meetings 10 minutes late, just in time for them to start,” but the variation is too great.

Now ask, “What else does the data reveal?” It strikes me that six meetings began exactly on time, while every other meeting began at least seven minutes late. In this case, bringing meeting notes to bear reveals that all six on-time meetings were called by the vice president of finance. Evidently, she starts all her meetings on time.

So where do you go from here? Are there important next steps? This example illustrates a common dichotomy. On a personal level, results pass both the “interesting” and “important” test. Most of us would give almost anything to get back an hour a day. And you may not be able to make all meetings start on time, but if the VP can, you can certainly start the meetings you control promptly.

On the company level, results so far pass only the interesting test. You don’t know whether your results are typical, nor whether others can be as hard-nosed as the VP when it comes to starting meetings. But a deeper look is surely in order: Are your results consistent with others’ experiences in the company? Are some days worse than others? Which starts later: conference calls or face-to-face meetings? Is there a relationship between meeting start time and most senior attendee? Return to step one, pose the next group of questions, and repeat the process. Keep the focus narrow—two or three questions at most.

I hope you’ll have fun with this exercise. Many find joy in teasing insights from data. But whether you experience that joy or not, don’t take this exercise lightly. There are fewer and fewer places for the “data illiterate” and, in my humble opinion, no more excuses.

__________

Thomas C. Redman, “the Data Doc,” is President of Data Quality Solutions. He helps companies and people, including startups, multinationals, executives, and leaders at all levels, chart their courses to data-driven futures. He places special emphasis on quality, analytics, and organizational capabilities.


Adapted from “How to Start Thinking Like a Data Scientist” on hbr.org, November 29, 2013.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.114.221