Relating Categorical Variables

Introduction to Categorical Variable Links

What about categorical variables like gender, industry, or share category? You cannot do correlational type analyses on these, because the categories have no natural order from low to high. However, you can link two or more categorical variables to each other and also link categorical variables to continuous or ordinal.

Linking Categorical Variables Together: Crosstabs

We use contingency tables to link two or more categorical variables together.
For instance, in the Accu-Phi case study you may wish to associate license and size. This analysis would tell you the number and percentage of premium versus freeware users in different size bands of customer. Figure 8.8 Crosstab example of relating two categorical variables shows an example of this analysis, which is widely called ”crosstabs.”
Figure 8.8 Crosstab example of relating two categorical variables
There are formalized statistics that can test whether the differences between these cells are substantial enough to be called a systemic pattern (for instance, whether big customers are significantly more likely to have a premium license). I discuss some of these tests in Chapter 15.
SAS generates contingency tables using the PROC FREQ procedure. To see an example of a contingency table in our data, open and run the file “Code08c Categ association crosstabs.” In this example, we want to link the ordinal variable “Size” to the categorical variable “License.” You will get tables and charts comparing and associating the categories, like Figure 8.8 Crosstab example of relating two categorical variables above, although with a little more detail.

Linking a Categorical Variable to Continuous Variables

When linking categorical to continuous variables, the basic process is to estimate summary statistics such as the average and standard deviation of the continuous variables for each category.
In “Data03_Aggregated” I have created aggregate trust and satisfaction variables (Chapter 5 discusses the process to do this). To see if trust, satisfaction, enquiries, and sales differ by customer size and license type, we need to estimate the means and standard deviations (possibly also medians and IQRs) for each continuous variable within each category and combinations of categories. Figure 8.9 Summary table relating size & license to continuous variables shows an example of the final analysis, in which we see each continuous variable analyzed separately by each size and license combination.
Figure 8.9 Summary table relating size & license to continuous variables
You can also test statistically whether these means differ substantially. For instance, in Figure 8.9 Summary table relating size & license to continuous variables medium-sized firms actually have the highest sales. These sales levels across sizes seem different, but are they really different enough to say that they are statistically different? A set of statistics called “Comparison of Means,” introduced in Chapter 14, can further test such differences.
To get the means and standard deviation of continuous variables split by categories, you can use various modules in SAS. I propose the PROC MEANS[1] module. Open and run the file “Code08d Categorical Continuous“ to see how SAS presents the raw output that you can format into a table such as the above. In that file, I also provide code that analyzes each continuous variable across combinations of size and license.
Last updated: April 18, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.237.29