Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Step 3: Interpret the Pattern

Introduction to Interpreting Patterns

If a pattern is detected in Step 2, the second step is to interpret this pattern in as much detail, and with as much creativity, as you can. This is typically where many researchers fail, because it’s one thing to find a pattern but another to know what to do about it.

Steps in Interpreting Statistical Patterns

The following are steps that I usually advocate:

Summarize the pattern into statistical coefficients and parameters (the model): In Step 2 (the previous section) we discussed the fact that finding patterns in data most often compares the actual data to a mathematical model, which is a precise data relationship, like a normal distribution, an exact correlation-type straight line, another type of line (perhaps a parabola), and so on. Once fit is established assume that, because the data fits an exact mathematical model, the mathematical model in fact represents the data. Therefore we stop looking at the data and start analyzing what the model tells us. As discussed in Step 2, mathematical models can usually be reduced to a few summary statistical numbers (we call them ”coefficients” or ”parameters”) that describe the shape of the model. We find these statistical parameters.
Analyze the model: We analyze the statistical coefficients in the previous step that describe and summarize the pattern in the data. This analysis includes several things:
1. The size of each statistical parameter
2. The accuracy of each statistical measure (see Chapter 12 for more)
3. The practical and theoretical implications of the coefficients

Let us think further about the example in Figure 11.7 Fitting a straight line to data. Here we have data and hope to fit a straight line to it. If the data lie close to the straight line, we start interpreting what the straight line tells us.

In the example we said that the straight line equation might be Sales = $24,500 + $7,200*(Age). The two numbers in the equation are, in the case of the straight line, the intercept and slope (generally, such numbers are called parameters or coefficients).

So what do these parameters mean? The intercept tells us the starting point before age is considered. The slope is often of greatest interest when straight lines are considered, and we would ask the following questions of the slope of R7,200:

What does it mean, literally? In this case, for every year older a salesperson gets, on average he or she sells $7,200 more.
Is the coefficient positive or negative? In this case it is positive, because as age increases, sales increase.
Is the slope big or small? Is $7,200 a big increase? It depends on the context of the data and research. How much variation in ages between salespeople do we expect? Is $7,200 an important jump in sales given the context and measurement of sales?
Is the slope trustworthy, i.e. accurate enough that we would place our trust in it? Is it so inaccurate that it could just as easily be zero as $7,200? I cover accuracy in a lot more detail in the following chapter.
What theories explain the finding?
What are the implications of the coefficients? Theoretically, have we learned anything new? Practically, should the company now hire older salespeople because they sell more? Are increased sales worth the expenses and any other issues of an older salesforce?
Why might we doubt the findings?

Analyzing the implications of models is too often a step that researchers omit or do badly, so the next section discusses this in more detail.

Implications of the Model and Coefficients

Reflecting On & Expressing the Implications

When you get the few statistical numbers (coefficients and parameters) that summarize the shape you found in your data, these may or may not have useful implications for you. You need not only to decide whether they do, but optimally you need to then thoroughly consider these implications.

First and foremost, some of the coefficients will be too small or inaccurate to take seriously (Chapter 12 discusses these concepts in more detail). This occurs even when you find a strong pattern, and you must be willing to accept that although you found a pattern, its implications are what you perhaps hoped for. Perhaps you find a pattern, but its parameters suggest the specific relationship you are interested in is small, or even occurs in the opposite way that you thought it did.

For an illustration of this thinking, let us once again examine straight lines (linear relationships). Looking for such relationships in data is a classic example of the difference between finding a pattern and interpreting the pattern. We will pick up again on these concepts in Chapter 13. Take a look at Figure 11.8 Finding sloped versus flat straight lines in data, where in both of two different plots we see a straight-line relationship between the data of two variables. In both cases Step 2 of the global statistics process is fulfilled: there is a strong pattern found in the data! However, on the left hand side of Figure 11.8 Finding sloped versus flat straight lines in data the straight line is upward sloping, while on the right it is flat. These patterns then will have very different interpretations:

The pattern on the left of Figure 11.8 Finding sloped versus flat straight lines in data indicates a linear association between Variable 1 and Variable 2: a positive correlation. Implications might arise – you may choose to believe that finding levels of Variable 2 now allows you to predict levels of Variable 1. Perhaps Variable 1 is sales: predicting low versus high levels of sales is useful and could make you a lot of money.

Figure 11.8 Finding sloped versus flat straight lines in data

On the other hand, while the right hand side of Figure 11.8 Finding sloped versus flat straight lines in data shows a straight line, this line is flat. Regardless of the level of Variable 2, Variable 1 occurs within a narrow band of possible values. There is a near zero correlation here, as there is no upward or downward slope (remember correlations mean that when the one variable is higher, the other should be higher or lower – a slope). There is still an interpretation: in this case Variable 2 is strongly related to the average of Variable 1, as shown by the dashed line in the right panel of Figure 11.8 Finding sloped versus flat straight lines in data. However, if you were expecting a correlation, you do not get what you wanted here. You need to interpret accordingly, not try to torture an upward or downward slope out of this data.

Hopefully you can find a pattern that also has important implications for you. The next section discusses interpreting such patterns.

Theoretical and Practical Implications of Models

When you find a pattern in data, I encourage you to ask and report on the following four questions:

If I found a pattern I was expecting, what theory backs this pattern up? (In the case of theory-based analysis you would have established this; in data mining, you should seek a theory that explains your finding). Reference the theories that back your findings up!
If I found a pattern I was not expecting, what alternate theories might explain it? (See the Gary Becker illustration in What to Do When You Don’t Find Your Expected Pattern again for an example of this.)
What practical implications does my pattern hold? For example, in business studies you might ask:
1. What business strategies or tactics might my findings stimulate?
2. Can I extrapolate my model to other important implications, such as profitability?
If I were advising future researchers following in my footsteps, what improvements or extensions on this research would I advise?

Practical implications are often quite difficult for novice researchers to think about, but this is so important!

For instance, imagine you are a pharmaceutical company testing a drug, say a cholesterol drug. Your studies will look at the symptoms and physiology of high cholesterol sufferers and test the drug on these focus variables. If you find patterns, you will model these into summary coefficients that express in a few numbers how strongly the drug affects patients (e.g. decreases bad cholesterol counts in blood by x%, lowers the odds of some secondary effect by so-much, etc.).

Now, use these coefficients to consider practical implications. What do these improvements really mean in the lives of patients? If this is the improvement, will doctors prescribe it (may require surveys of doctors)? How much can we sell it for? Taking into account the cost to produce the drug, would this be profitable? Take the analysis as far as you can or as far as is useful. Just stopping at the “this is the statistical effect of the drug” is less useful than all these extrapolations.

Don’t assume that you can interpret statistics only when you find the pattern you thought you were going to find. As discussed, you should come up with alternative explanations when you don’t find your pattern, as well as practical implications.

Example of Interpreting When a Pattern Is Not Found

Consider again the tenure data in Figure 11.2 Examining the distribution of a single variable, and the skewness and kurtosis measures discussed in Example 2: Testing Data for a Normal Distribution. The tenure data was probably not normally distributed, i.e. there seems to be evidence to reject a perfect normal distribution. You should probably do the following:

Explain why the pattern was not found. This is easy in this case: long-serving employees make for a lengthened right-hand tail.
What does the lack of this particular pattern imply practically? For one thing, the skewness in the data probably means average tenure is a bad measurement for the representative number (because the long right tail drags the average to the right too). If you are using tenure – as discussed – to track success of a retention campaign, then it might be better to use median tenure, since this is less affected by long-serving employees who are actually not your concern in a retention campaign.
See if alternate patterns can be confirmed. Just because we rejected normality does not mean there is no pattern! In fact, this type of pattern forms what we call a lognormal distribution. You would go back to the step in which you look for a pattern. This time instead of trying to fit and test a normal curve, you would try the lognormal.
Interpret any alternate patterns found. If we do confirm a lognormal pattern with certain statistical model features, interpret it and use in future tenure analyses.

This process of thinking through all implications and permutations is thorough, thoughtful, and necessary if you are to make the world of statistics really work for you.

Last updated: April 18, 2017

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Step 3: Interpret the Pattern

Create new playlist

Sign In

Sign Up