Step 3: Extracting Statistics from the Data

Brief Introduction to Statistics

As discussed in the previous section, a dataset is not usually helpful on its own. The process is actually to extract from your data just a few representative numbers – we call these statistics – that tell us something about the data and therefore give us condensed information about the piece of world we are studying. Here are some examples:
  1. An average: The simple average (say the average age of your employees) is a good example: the average is a single number summarizing all age data of all your employees.
  2. Statistics that measure relationships between concepts: Other types of statistics summarize relationships between concepts. Figure 2.1 The fundamental process of statistics gives an example, where we wish to test to see if employee engagement (one concept measured by data) is related to customer retention (another concept and set of data). Perhaps you wish to see if higher employee engagement leads to higher customer retention. Just a single statistical number (or perhaps a few) can summarize this relationship between these two concepts;
  3. Patterns over time: Perhaps you are interested in the movement of a single data variable over time, like the sales of a product over many months. All the sales figures may be boiled down to a few statistics that tell us the extent to which the sales figure have been growing, shrinking or staying stable, or perhaps statistics that summarize whether there are seasonal highs or lows in the data.
This is not the end of the process. Once you have extracted these statistics you need to assess two things about them.

Do We Trust the Statistics? Are They Accurate?

Just because you extracted a representative number from a dataset – such as an average – does not mean you should believe or trust the result. An important initial step in the statistical universe is scrutinizing the results for accuracy, even before we understand what the statistics mean. This is a topic that Chapter 12 addresses in detail.

What Do the Statistics Mean?

Having established that the statistic is accurate enough to trust, the major step is to establish what it means. Getting a representative statistic does not necessarily mean that you understand what it is telling you!
The skill of understanding what a statistic actually means is one many people lack even if they are good at the process of deriving the statistics. So what if you can find some statistical number that represents the effectiveness of your new drug on a disease? Do you really understand what the statistic is saying about the effectiveness? Can you think further about the potential demand for the drug, its profitability, its comparisons to other remedies, and what effect the drug would then have on people’s broader lives? Therefore, really understanding the meaning and impact of a statistical number is perhaps the most crucial skill of all.
Based on the above, there are two levels of understanding a statistic:
  1. Understanding a statistic in its context: Every statistic needs to be assessed for meaning and impact in its own context. By this I mean asking questions like whether the statistic is big or not. This would have to be decided in comparison to some benchmarks. For instance, if you have a statistic informing you about how customers react to an advertisement, you need to ask whether the average reactions are good enough (big enough) or not. What is your benchmark? Is it no (zero) customer reaction to the ad? Is it reactions to your past ads, to your competitors’ ads? Your benchmark is highly contextual, and different benchmarks can affect your judgment of what the statistic means. Chapter 11 further addresses this skill of assessing the meaning of statistics.
  2. Extrapolating the meaning of the statistic to broader contexts: Really good business statisticians are able to go beyond the statistic on its own. They are able to extrapolate the meaning to further contexts. For instance, continue with the example above in which statistics measure the impact of an ad on customers. Showing you have impacted customers is one thing. However, can you then take the changes in the customer and show how much profitability or return on investment the ad can then generate overall? If you are testing a drug, can you combine statistics on its healing efficacy with other information to extrapolate the ultimate impact on the buyer’s lifespan or quality of life? Can you again estimate return on investment? Chapter 17 discusses this in more detail.
In addition, if you are a complete beginner in statistics note the following.

The Process of Generating Statistics: Math Versus Computers

For centuries prior to the advent of computerization, if you wanted to take raw data and get statistics you needed to use mathematics and hard work. Underlying most statistics is some form of mathematics. I discuss this further in Chapter 11.
These days, luckily, we usually use computer programs to do the hard number crunching for us. Examples of such programs are SAS, SPSS, STATA, Statistica, NCSS, Microsoft Excel, and a great many more. These programs look at your raw data and – based on your telling them what you want – they tell you what the estimated statistics are, from averages to complex relational statistics.
As discussed in the preface to the book, this book uses SAS, which is one of the world’s most used statistics suites and is the leading player in business analytics particularly.
It is crucial to note that many statistics mistakes occur because people either give the computer program bad data or the wrong instructions, or because people do not adequately scrutinize the computer results. Please understand that the fact that the computer has given you a statistical result does not mean that it is the right output or in fact that the result should be taken seriously! The computer is just a tool, and cannot do the real work for you, like making sure your data is right, understanding which statistical tests are appropriate, and understanding the results and what to do about them. One of my favorite acronyms is “PICNIC,” which stands for “Problem In Chair Not In Computer.” So, try to be people who do not rely on the computer for more than it can do.
Last updated: April 18, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.157.6