Example 3.4. Displaying Basic Frequency Counts and Percentages

Goal

Compute frequency counts and percentages for categories in a data set.

Report

                              Gender Distribution
 within Job Classes
                                         for Four
 Regions

         -----------------------------------------
----------------------------------------
         |                                        
        |      Gender       |          |
         |                                        
        |-------------------|   All    |
         |                                        
        | Female  |  Male    |Employees|
         |----------------------------------------
--------+---------+---------+----------|
         |Job Class              |                
        |         |         |          |
         |-----------------------+----------------
--------|         |         |          |
         |Technical              |Number of
 employees     |       16|       18|        34|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of row
 total    |     47.1|     52.9|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of
 column total |     26.2|     29.0|      27.6|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of total
        |     13.0|     14.6|      27.6|
         |-----------------------+----------------
--------+---------+---------+----------|
         |Manager/Supervisor     |Number of
 employees     |       20|       15|        35|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of row
 total    |     57.1|     42.9|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of
 column total |     32.8|     24.2|      28.5|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of total
        |     16.3|     12.2|      28.5|
         |-----------------------+----------------
--------+---------+---------+----------|
         |Clerical               |Number of
 employees     |       14|       14|        28|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of row
 total    |     50.0|     50.0|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of
 column total |     23.0|     22.6|      22.8|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of total
        |     11.4|     11.4|      22.8|
         |-----------------------+----------------
--------+---------+---------+----------|
         |Administrative         |Number of
 employees     |       11|       15|        26|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of row
 total    |     42.3|     57.7|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of
 column total |     18.0|     24.2|      21.1|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of total
        |      8.9|     12.2|      21.1|
         |-----------------------+----------------
--------+---------+---------+----------|
         |All Jobs               |Number of
 employees     |       61|       62|       123|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of row
 total    |     49.6|     50.4|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of
 column total |    100.0|    100.0|     100.0|
         |                      
 |----------------
--------+---------+---------+----------|
         |                       |Percent of total
        |     49.6|     50.4|     100.0|
         -----------------------------------------
---------------------------------------


Example Features

Data SetJOBCLASS
Featured StepPROC TABULATE
Featured Step Statements and OptionsTABLE statement: ROWPCTN, COLPCTN, and REPPCTN statistics
Formatting FeaturesPROC TABULATE statement: FORMAT= option TABLE statement: RTS= option when sending output to the LISTING destination
Related TechniquePROC FREQ
A Closer LookAnalyzing the Structure of the Report

Understanding the Percentage Statistics in PROC TABULATE

Specifying a Denominator for a Percentage Statistic
Other Examples That Use This Data SetExamples 3.5, 3.6, 6.3, and 6.7

Example Overview

Crosstabulation tables (also called contingency tables) show combined frequency distributions for two or more variables.

This report shows frequency counts for females and males within each of four job classes. The table also shows the percentage of the following totals that each frequency count represents:

the total women and men in that job class (row percentage)
the total for that gender in all job classes (column percentage)
the total number of employees

Each observation in JOBCLASS corresponds to the information for one employee.

Program

Define formats to associate with the variables GENDER and OCCUPAT.
proc format;
  value gendfmt 1='Female'
                2='Male';
  value occupfmt 1='Technical'
                 2='Manager/Supervisor'
                 3='Clerical'
                 4='Administrative';
run;

Specify a default format for each table cell.
proc tabulate data=jobclass format=8.1;
  title 'Gender Distribution within Job
         Classes';
  title2 'for Four Regions';

Specify the classification variables.
  class gender occupat;

Establish the layout of the table.
  table

Specify the row dimension of the table. Enclose the row classifications in parentheses so that the expression for the set of statistics that follows the asterisk has to be written only once. Specify headings for columns, rows, and statistics.
      (occupat='Job Class'

Summarize at the bottom of the report the rows defined by the values of OCCUPAT.
       all='All Jobs')

Nest statistics beneath each category defined by the row classifications. Enclose in parentheses the set of statistics that should be computed for each category.
       *(n='Number of employees'*f=9.

Specify the percentage statistics that PROC TABULATE should calculate. Place these percentages in the row dimension. Calculate the percentage of females and males within a job class, since JOBCLASS is in the row dimension.
         rowpctn='Percent of row total'

Calculate the percentage of job classes within each gender, which is in the column dimension.
         colpctn='Percent of column total'

Calculate the percentage each cell contributes to the total in the report. Terminate the row specification with a comma and begin the column dimension specification.
         reppctn='Percent of total'),

Specify the column dimension of the table.
         gender='Gender'

Summarize in the rightmost column of the report the columns defined by the values of GENDER.
         all='All Employees' /

When sending output to the LISTING destination, specify the space allocated to row titles with the RTS= option.
         rts=50;


  format  gender gendfmt. occupat occupfmt.;
run;


Related Technique

PROC FREQ can also produce crosstabular reports of counts and percentages. Although this procedure does not provide the customization and formatting features of PROC TABULATE, it does automatically calculate totals as well as row, column, and total percentages. You can also request a wider range of statistics with PROC FREQ, including several chi-square and odds ratios statistics.

Figure 3.4 shows the output from PROC FREQ. By default, when PROC FREQ computes a two-way table (row*column), four values are presented in each cell of the report. The key to the values in each cell is shown in the upper-left corner of the table.

Figure 3.4. Output from PROC FREQ
                      Gender Distribution within Job Classes
                                 for Four Regions

                                The FREQ Procedure

                            Table of occupat by gender

                   occupat(Job class)     gender

                   Frequency        |
                   Percent          |
                   Row Pct          |
                   Col Pct          |Female  |Male     |  Total
                                    |        |         |
                                    |        |         |
                   -----------------+--------+---------+
                   Technical        |     16 |     18  |     34
                                    |  13.01 |  14.63  |  27.64
                                    |  47.06 |  52.94  |
                                    |  26.23 |  29.03  |
                   -----------------+--------+---------+
                   Manager/Supervis |     20 |     15  |     35
                   or               |  16.26 |  12.20  |  28.46
                                    |  57.14 |  42.86  |
                                    |  32.79 |  24.19  |
                   -----------------+--------+---------+
                   Clerical         |     14 |     14  |     28
                                    |  11.38 |  11.38  |  22.76
                                    |  50.00 |  50.00  |
                                    |  22.95 |  22.58  |
                   -----------------+--------+---------+
                   Administrative   |     11 |     15  |     26
                                    |   8.94 |  12.20  |  21.14
                                    |  42.31 |  57.69  |
                                    |  18.03 |  24.19  |
                   -----------------+--------+---------+
                   Total                  61       62       123
                                       49.59    50.41    100.00

The following PROC FREQ step produces the report in Figure 3.4.

 
proc format;
  value gendfmt 1='Female'
                2='Male';
  value occupfmt 1='Technical'
                 2='Manager/Supervisor'
                 3='Clerical'
                 4='Administrative';
run;

proc freq data=jobclass;
  title 'Gender Distribution within Job
         Classes';
  title2 'for Four Regions';
  tables occupat*gender;

Place the row dimension to the left of the column dimension. Separate the dimensions with an asterisk (*).

  label occupat='Job class';
  format gender gendfmt. occupat occupfmt.;
run;


A Closer Look

Analyzing the Structure of the Report

The combinations of the row and column classifications define the categories of the report. The two classification variables in the row dimension of the report are OCCUPAT and the universal class variable ALL. The two classification variables in the column dimension of the report are GENDER and the universal class variable ALL.

The PROC TABULATE step computes frequency percentages for each of the four possible combinations that result from crossing the two row and two column classifications. Table 3.4a describes the combinations.

Table 3.4a. Combinations of the Classification Variables
Class Variables (row and column)DescriptionNumber of Categories
OCCUPAT and GENDERNumber of females in each job or number of males in each job8
ALL and GENDERTotal number of females or total number of males2
OCCUPAT and ALLNumber of employees in each job4
ALL and ALLTotal number of employees in all jobs1

You can think of each combination of a row and a column classification as a subtable. Figure 3.4 illustrates this concept as applied to this example’s report.

Figure 3.4. Illustration of the Four Subtables


Understanding the concept of viewing a PROC TABULATE table as a collection of subtables is especially useful when you need to specify denominator definitions. See “Specifying a Denominator for a Percentage Statistic” later in this section for a discussion of how to form denominator definitions.

Understanding the Percentage Statistics in PROC TABULATE

Table 3.4b lists the eight dimension-specific percentages that PROC TABULATE can compute: four percentages are computed based on the N statistic, and four are computed based on the SUM statistic.

Table 3.4b. Dimension-Specific Percentages That PROC TABULATE Can Compute
StatisticDimensionDescription
REPPCTNEntire reportPercentage of the frequency count in a table cell in relation to the total frequency count in the report
REPPCTSUMEntire reportPercentage of the sum of an analysis variable in a table cell in relation to the total sum of the analysis variable in the report
COLPCTNColumnPercentage of the frequency count in a table cell in relation to the total frequency count in the column of the table cell
COLPCTSUMColumnPercentage of the sum of an analysis variable in a table cell in relation to the total sum in the column of the table cell
ROWPCTNRowPercentage of the frequency count in a table cell in relation to the total frequency count in the row of the table cell
ROWPCTSUMRowPercentage of the sum of an analysis variable in a table cell in relation to the total sum in the row of the table cell
PAGEPCTNPagePercentage of the frequency count in a table cell in relation to the total frequency count on the page of the table cell
PAGEPCTSUMPagePercentage of the sum of an analysis variable in a table cell in relation to the total sum on the page of the table cell

The total (or denominator) on which a percentage is based is the total of the statistic in the specific dimension.

When selecting a dimension-specific percentage, make sure that your table includes the dimension of the statistic. In the example above, the TABLE statement specifies a row and a column dimension. Including the PAGEPCTN statistic in this example’s TABLE statement generates an error, because no page dimension was specified.

Two additional percentages, PCTN and PCTSUM, allow you to specify the denominators of the percentage calculations explicitly. The next section briefly discusses ways to write denominator definitions and applies this to writing the denominator definitions for the percentages in the main example.

Specifying a Denominator for a Percentage Statistic

The main example computes percentages by simply using the dimension-specific percentages. When your tables are more complex, however, the dimension-specific percentages might not produce the percentages required for your report. In those situations, using the PCTN and PCTSUM statistics enables you to specify the denominator of your percentage calculation.

When coding a PCTN or PCTSUM statistic that requires a denominator definition, follow the PCTN or PCTSUM keyword with the denominator definition enclosed in angle brackets. The denominator definition specifies the classifications to tally in order to calculate the denominator.

Table 3.4c shows how to code the denominator definition for the PCTN statistic so that PROC TABULATE calculates the same percentages as the corresponding dimension-specific percentages in the above example.

Table 3.4c. Constructing Denominator Definitions for Percentage Calculations That Are Equivalent to Percentage Statistics in This Example
StatisticEquivalent Statistic Using PCTN with a Denominator Definition
ROWPCTNPCTN<GENDER ALL>
COLPCTNPCTN<OCCUPAT ALL>
REPPCTNPCTN or PCTN<OCCUPAT*GENDER ALL>

A version of the TABLE statement that uses the PCTN statistic follows:

table (occupat='Job Class' all='All Jobs')*
      (n='Number of employees'*f=9.
       pctn<gender all>='Percent of row total'
       pctn<occupat all>='Percent of column total'
       pctn='Percent of total'),
       gender='Gender' all='All Employees'
       / rts=50;

Table 3.4c shows that the ROWPCTN statistic is equivalent to the PCTN statistic with a denominator definition of GENDER and ALL. The report has two classification variables in the row dimension: OCCUPAT and ALL. For each level of the classification variable OCCUPAT, which is presented in a row in this report, PROC TABULATE computes the N for all the columns in this level (or row). The columns in this level are GENDER and ALL, and these are concatenated columns in the table. Therefore, “GENDER ALL” becomes the denominator definition of the PCTN statistic to compute the row percentages.

Example 3.5 features a program that uses denominator definitions.

Interpreting Denominator Definitions

The TABLE statement in the preceding topic defines denominator definitions for computing percentages. Each use of PCTN in the TABLE statement nests a row of statistics within each value of OCCUPAT and ALL. Each denominator definition tells PROC TABULATE the frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.

Row Percentages

The following part of the TABLE statement calculates the row percentages and labels the row:

   pctn<gender all>='Percent of row total'

Consider how PROC TABULATE interprets this denominator definition for each of the four subtables.

Occupat and Gender

PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of GENDER within the same value of OCCUPAT.

For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of OCCUPAT is technical. There are two such categories: female, technical and male, technical. The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18=34.

All and Gender

PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of GENDER in the subtable.

For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male. The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62=123.

Occupat and All

PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. ALL does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.

For example, the denominator for the category clerical, all is the frequency count for that category, 28.

Note: In these table cells, because the numerator and denominator are the same, the row percentages in this subtable are all 100.

All and All

PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. ALL does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.

There is only one category in this subtable: all, all. The denominator for this category is 123.

Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.

Column Percentages

The following part of the TABLE statement calculates the column percentages and labels the row:

   pctn<occupat all>='Percent of column total'

Consider how PROC TABULATE interprets this denominator definition for each subtable.

Occupat and Gender

PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of OCCUPAT within the same value of GENDER.

For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of GENDER is male. There are four such categories: technical, male; manager/supervisor, male; clerical, male; and administrative, male. The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15=62.

All and Gender

PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. Because ALL does contribute to this subtable, PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for ALL as the denominator.

For example, the denominator for the category all, female is the frequency count for that category, 61.

Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.

Occupat and All

PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of OCCUPAT in the subtable.

For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all; manager/supervisor, all; clerical, all; and administrative, all. The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26=123.

All and All

PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. Because ALL does contribute to this subtable, PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.

There is only one category in this subtable: all, all. The frequency count for this category is 123.

Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.

Total Percentages

The following part of the TABLE statement calculates the total percentages and labels the row:

   pctn='Percent of total'

If you do not specify a denominator definition, PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. Table 3.4d summarizes the process for all subtables in this example.

Table 3.4d. Denominators for Total Percentages
Class Variables Contributing to the SubtableFrequency CountsTotal
OCCUPAT and GENDER16, 18, 20, 15 14, 14, 11, 15123
OCCUPAT and ALL34, 35, 28, 26123
GENDER and ALL61, 62123
ALL and ALL123123

Consequently, the denominator for total percentages is always 123.

Where to Go from Here

PROC FREQ reference, usage information, and additional examples. See “The FREQ Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

PROC TABULATE reference, usage information, and additional examples. See “The TABULATE Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.159.136