Assessing One-Way Frequencies

Introduction to the One-Way Case

Let us start with the simplest example: examining the distribution of observations in a single categorical variable, like the Size variable in the textbook example. We refer to counts derived from a single categorical variable as “one-way” frequencies. There are two major possibilities here, namely:
  • Assessing the distribution of one-way frequencies
  • Assessing one specific category
The following sections expand on these options.

Assessing Distribution of One-Way Frequencies

The distribution of a single categorical variable can generally only be assessed against some benchmark distribution. In the previous section, I gave two examples, namely:
  • Testing your actual distribution against equal cell distributions
  • Testing your actual distribution against some other benchmark distribution
The second example is slightly more complex, so let us use it. In the example above, I suggested testing the Size distribution against an industry average of 50% big customers, 30% medium and 20% small. Figure 15.3 Comparing one-way frequencies to hypothetical test distribution below illustrates this test:
Figure 15.3 Comparing one-way frequencies to hypothetical test distribution
We would usually start off assessing these comparisons by using the “Chi-Square test”. This simple test will give you a p-value that expresses whether your sample frequencies are statistically different or roughly the same as your comparative set of frequencies.
To implement this test in SAS, you need to generate a dataset that expresses your summary frequencies, and then test these against your chosen comparison distribution. Figure 15.4 SAS code for running a one-way chi square test below shows code (which you can run in the “Code15a one-way chi square test” file in the “Textbook Materials” folder).
Figure 15.4 SAS code for running a one-way chi square test
The resulting output is quite simple. You will see a confirmation of your test percentages, a Chi-Square table like that in Figure 15.5 Output of the one-way Chi-Square test below, and a graph of the extent to which categories differ from their benchmark percentages.
Figure 15.5 Output of the one-way Chi-Square test
In Figure 15.5 Output of the one-way Chi-Square test above, we see that the Chi-square statistic is statistically significant (a very low p-value), indicating significant deviations from the benchmark statistics. The graph suggests that the very big deviation is for small companies, who are almost 100% more represented in your sample than their benchmark percentage.

Assessing a Single Category through Binomial Proportions

There is a special type of data known as dichotomous or binomial, which means that only two categorical options are being considered.
Some data is naturally binomial. Consider employee turnover, which is sometimes analyzed as “leaver vs. stayer.” Consider business failure data, which is sometimes analyzed as “failed vs. did not fail.” In marketing, we may have “bought vs. did not buy.” In numerical terms, you can think of such variables as 0/1 data (coded as 0 or 1).
You can, however, look at any given category – no matter how many other categories there are in the variable – in binomial terms. To do this, we would ”zoom in” on the category and analyze it on its own, so that the categories would implicitly become “belongs to this category” versus “does not belong to this category.” In the Size data, although there are three categories, we could zoom in on (say) Big companies only. Our focus would become the probability of belonging to this category, versus other sizes.
Usually, we are interested in comparing the actual percentage of observations in the focus category to a benchmark percentage. In the previous section we ascertained that the Chi-square test suggested that the distribution of sizes differed significantly from a benchmark set of sizes. Now, we may wish to focus on the Big companies. The benchmark percentage for these was 50%, while in our sample there are 120 big companies totaling 43% of the population.
Figure 15.6 Focusing in on one category using binomial analysis shows code for analyzing binomial proportions, see “Code15b One-way binomial proportions” in the “Textbook Materials” folder to run this code for yourself.
Figure 15.6 Focusing in on one category using binomial analysis
The resulting analysis – seen in Figure 15.7 Results of standard binomial test for one category below - gives several confidence intervals and p-values[1]. In this case all agree that – at least with 95% confidence – the company’s 43% proportion of big customers is statistically significantly lower than the industry benchmark of 50%, because the confidence intervals exclude 50%, and because both p-values are low suggesting a rejection of the null hypothesis that the company and industry figures are equal.
Figure 15.7 Results of standard binomial test for one category
Note again that the one-sided p-value supports a specific directional hypothesis, for example a test of whether the company specifically has lower numbers of big customers than the industry.
There are more tests that can be achieved with binomial proportions. One possibility is to test for equivalence, which is a specific test for whether a categorical proportion in your data is statistically equivalent to a benchmark value (whereas the usual binomial test looks at difference). The interested reader can find out how to test for these specific issues in the SAS 9 helpfiles (the SAS/STAT 13.2 User’s Guide or the like). The binomial test example in the PROC FREQ section includes an example of equivalence testing.
Last updated: April 18, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.93.20