Compute frequency counts and percentages for categories in a data set.
Gender Distribution within Job Classes for Four Regions ----------------------------------------- ---------------------------------------- | | Gender | | | |-------------------| All | | | Female | Male |Employees| |---------------------------------------- --------+---------+---------+----------| |Job Class | | | | | |-----------------------+---------------- --------| | | | |Technical |Number of employees | 16| 18| 34| | |---------------- --------+---------+---------+----------| | |Percent of row total | 47.1| 52.9| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of column total | 26.2| 29.0| 27.6| | |---------------- --------+---------+---------+----------| | |Percent of total | 13.0| 14.6| 27.6| |-----------------------+---------------- --------+---------+---------+----------| |Manager/Supervisor |Number of employees | 20| 15| 35| | |---------------- --------+---------+---------+----------| | |Percent of row total | 57.1| 42.9| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of column total | 32.8| 24.2| 28.5| | |---------------- --------+---------+---------+----------| | |Percent of total | 16.3| 12.2| 28.5| |-----------------------+---------------- --------+---------+---------+----------| |Clerical |Number of employees | 14| 14| 28| | |---------------- --------+---------+---------+----------| | |Percent of row total | 50.0| 50.0| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of column total | 23.0| 22.6| 22.8| | |---------------- --------+---------+---------+----------| | |Percent of total | 11.4| 11.4| 22.8| |-----------------------+---------------- --------+---------+---------+----------| |Administrative |Number of employees | 11| 15| 26| | |---------------- --------+---------+---------+----------| | |Percent of row total | 42.3| 57.7| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of column total | 18.0| 24.2| 21.1| | |---------------- --------+---------+---------+----------| | |Percent of total | 8.9| 12.2| 21.1| |-----------------------+---------------- --------+---------+---------+----------| |All Jobs |Number of employees | 61| 62| 123| | |---------------- --------+---------+---------+----------| | |Percent of row total | 49.6| 50.4| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of column total | 100.0| 100.0| 100.0| | |---------------- --------+---------+---------+----------| | |Percent of total | 49.6| 50.4| 100.0| ----------------------------------------- --------------------------------------- |
Data Set | JOBCLASS |
Featured Step | PROC TABULATE |
Featured Step Statements and Options | TABLE statement: ROWPCTN, COLPCTN, and REPPCTN statistics |
Formatting Features | PROC TABULATE statement: FORMAT= option TABLE statement: RTS= option when sending output to the LISTING destination |
Related Technique | PROC FREQ |
A Closer Look | Analyzing the Structure of the Report
Understanding the Percentage Statistics in PROC TABULATE Specifying a Denominator for a Percentage Statistic |
Other Examples That Use This Data Set | Examples 3.5, 3.6, 6.3, and 6.7 |
Crosstabulation tables (also called contingency tables) show combined frequency distributions for two or more variables.
This report shows frequency counts for females and males within each of four job classes. The table also shows the percentage of the following totals that each frequency count represents:
□ | the total women and men in that job class (row percentage) |
□ | the total for that gender in all job classes (column percentage) |
□ | the total number of employees |
Each observation in JOBCLASS corresponds to the information for one employee.
Define formats to associate with the variables GENDER and OCCUPAT. | proc format; value gendfmt 1='Female' 2='Male'; value occupfmt 1='Technical' 2='Manager/Supervisor' 3='Clerical' 4='Administrative'; run; |
Specify a default format for each table cell. | proc tabulate data=jobclass format=8.1; title 'Gender Distribution within Job Classes'; title2 'for Four Regions'; |
Specify the classification variables. | class gender occupat; |
Establish the layout of the table. | table |
Specify the row dimension of the table. Enclose the row classifications in parentheses so that the expression for the set of statistics that follows the asterisk has to be written only once. Specify headings for columns, rows, and statistics. | (occupat='Job Class' |
Summarize at the bottom of the report the rows defined by the values of OCCUPAT. | all='All Jobs') |
Nest statistics beneath each category defined by the row classifications. Enclose in parentheses the set of statistics that should be computed for each category. | *(n='Number of employees'*f=9. |
Specify the percentage statistics that PROC TABULATE should calculate. Place these percentages in the row dimension. Calculate the percentage of females and males within a job class, since JOBCLASS is in the row dimension. | rowpctn='Percent of row total' |
Calculate the percentage of job classes within each gender, which is in the column dimension. | colpctn='Percent of column total' |
Calculate the percentage each cell contributes to the total in the report. Terminate the row specification with a comma and begin the column dimension specification. | reppctn='Percent of total'), |
Specify the column dimension of the table. | gender='Gender' |
Summarize in the rightmost column of the report the columns defined by the values of GENDER. | all='All Employees' / |
When sending output to the LISTING destination, specify the space allocated to row titles with the RTS= option. | rts=50; format gender gendfmt. occupat occupfmt.; run; |
PROC FREQ can also produce crosstabular reports of counts and percentages. Although this procedure does not provide the customization and formatting features of PROC TABULATE, it does automatically calculate totals as well as row, column, and total percentages. You can also request a wider range of statistics with PROC FREQ, including several chi-square and odds ratios statistics.
Figure 3.4 shows the output from PROC FREQ. By default, when PROC FREQ computes a two-way table (row*column), four values are presented in each cell of the report. The key to the values in each cell is shown in the upper-left corner of the table.
Gender Distribution within Job Classes for Four Regions The FREQ Procedure Table of occupat by gender occupat(Job class) gender Frequency | Percent | Row Pct | Col Pct |Female |Male | Total | | | | | | -----------------+--------+---------+ Technical | 16 | 18 | 34 | 13.01 | 14.63 | 27.64 | 47.06 | 52.94 | | 26.23 | 29.03 | -----------------+--------+---------+ Manager/Supervis | 20 | 15 | 35 or | 16.26 | 12.20 | 28.46 | 57.14 | 42.86 | | 32.79 | 24.19 | -----------------+--------+---------+ Clerical | 14 | 14 | 28 | 11.38 | 11.38 | 22.76 | 50.00 | 50.00 | | 22.95 | 22.58 | -----------------+--------+---------+ Administrative | 11 | 15 | 26 | 8.94 | 12.20 | 21.14 | 42.31 | 57.69 | | 18.03 | 24.19 | -----------------+--------+---------+ Total 61 62 123 49.59 50.41 100.00 |
The following PROC FREQ step produces the report in Figure 3.4.
proc format; value gendfmt 1='Female' 2='Male'; value occupfmt 1='Technical' 2='Manager/Supervisor' 3='Clerical' 4='Administrative'; run; proc freq data=jobclass; title 'Gender Distribution within Job Classes'; title2 'for Four Regions'; tables occupat*gender; | |
Place the row dimension to the left of the column dimension. Separate the dimensions with an asterisk (*). | label occupat='Job class'; format gender gendfmt. occupat occupfmt.; run; |
A Closer Look |
The combinations of the row and column classifications define the categories of the report. The two classification variables in the row dimension of the report are OCCUPAT and the universal class variable ALL. The two classification variables in the column dimension of the report are GENDER and the universal class variable ALL.
The PROC TABULATE step computes frequency percentages for each of the four possible combinations that result from crossing the two row and two column classifications. Table 3.4a describes the combinations.
Class Variables (row and column) | Description | Number of Categories |
---|---|---|
OCCUPAT and GENDER | Number of females in each job or number of males in each job | 8 |
ALL and GENDER | Total number of females or total number of males | 2 |
OCCUPAT and ALL | Number of employees in each job | 4 |
ALL and ALL | Total number of employees in all jobs | 1 |
You can think of each combination of a row and a column classification as a subtable. Figure 3.4 illustrates this concept as applied to this example’s report.
Understanding the concept of viewing a PROC TABULATE table as a collection of subtables is especially useful when you need to specify denominator definitions. See “Specifying a Denominator for a Percentage Statistic” later in this section for a discussion of how to form denominator definitions.
Table 3.4b lists the eight dimension-specific percentages that PROC TABULATE can compute: four percentages are computed based on the N statistic, and four are computed based on the SUM statistic.
Statistic | Dimension | Description |
---|---|---|
REPPCTN | Entire report | Percentage of the frequency count in a table cell in relation to the total frequency count in the report |
REPPCTSUM | Entire report | Percentage of the sum of an analysis variable in a table cell in relation to the total sum of the analysis variable in the report |
COLPCTN | Column | Percentage of the frequency count in a table cell in relation to the total frequency count in the column of the table cell |
COLPCTSUM | Column | Percentage of the sum of an analysis variable in a table cell in relation to the total sum in the column of the table cell |
ROWPCTN | Row | Percentage of the frequency count in a table cell in relation to the total frequency count in the row of the table cell |
ROWPCTSUM | Row | Percentage of the sum of an analysis variable in a table cell in relation to the total sum in the row of the table cell |
PAGEPCTN | Page | Percentage of the frequency count in a table cell in relation to the total frequency count on the page of the table cell |
PAGEPCTSUM | Page | Percentage of the sum of an analysis variable in a table cell in relation to the total sum on the page of the table cell |
The total (or denominator) on which a percentage is based is the total of the statistic in the specific dimension.
When selecting a dimension-specific percentage, make sure that your table includes the dimension of the statistic. In the example above, the TABLE statement specifies a row and a column dimension. Including the PAGEPCTN statistic in this example’s TABLE statement generates an error, because no page dimension was specified.
Two additional percentages, PCTN and PCTSUM, allow you to specify the denominators of the percentage calculations explicitly. The next section briefly discusses ways to write denominator definitions and applies this to writing the denominator definitions for the percentages in the main example.
The main example computes percentages by simply using the dimension-specific percentages. When your tables are more complex, however, the dimension-specific percentages might not produce the percentages required for your report. In those situations, using the PCTN and PCTSUM statistics enables you to specify the denominator of your percentage calculation.
When coding a PCTN or PCTSUM statistic that requires a denominator definition, follow the PCTN or PCTSUM keyword with the denominator definition enclosed in angle brackets. The denominator definition specifies the classifications to tally in order to calculate the denominator.
Table 3.4c shows how to code the denominator definition for the PCTN statistic so that PROC TABULATE calculates the same percentages as the corresponding dimension-specific percentages in the above example.
A version of the TABLE statement that uses the PCTN statistic follows:
table (occupat='Job Class' all='All Jobs')* (n='Number of employees'*f=9. pctn<gender all>='Percent of row total' pctn<occupat all>='Percent of column total' pctn='Percent of total'), gender='Gender' all='All Employees' / rts=50;
Table 3.4c shows that the ROWPCTN statistic is equivalent to the PCTN statistic with a denominator definition of GENDER and ALL. The report has two classification variables in the row dimension: OCCUPAT and ALL. For each level of the classification variable OCCUPAT, which is presented in a row in this report, PROC TABULATE computes the N for all the columns in this level (or row). The columns in this level are GENDER and ALL, and these are concatenated columns in the table. Therefore, “GENDER ALL” becomes the denominator definition of the PCTN statistic to compute the row percentages.
Example 3.5 features a program that uses denominator definitions.
The TABLE statement in the preceding topic defines denominator definitions for computing percentages. Each use of PCTN in the TABLE statement nests a row of statistics within each value of OCCUPAT and ALL. Each denominator definition tells PROC TABULATE the frequency counts to sum for the denominators in that row. This section explains how PROC TABULATE interprets these denominator definitions.
The following part of the TABLE statement calculates the row percentages and labels the row:
pctn<gender all>='Percent of row total'
Consider how PROC TABULATE interprets this denominator definition for each of the four subtables.
PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of GENDER within the same value of OCCUPAT.
For example, the denominator for the category female, technical is the sum of all frequency counts for all categories in this subtable for which the value of OCCUPAT is technical. There are two such categories: female, technical and male, technical. The corresponding frequency counts are 16 and 18. Therefore, the denominator for this category is 16+18=34.
PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of GENDER in the subtable.
For example, the denominator for the category all, female is the sum of the frequency counts for all, female and all, male. The corresponding frequency counts are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62=123.
PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. ALL does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.
For example, the denominator for the category clerical, all is the frequency count for that category, 28.
Note: In these table cells, because the numerator and denominator are the same, the row percentages in this subtable are all 100.
PROC TABULATE looks at the first element in the denominator definition, GENDER, and asks if GENDER contributes to the subtable. Because GENDER does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. ALL does contribute to this subtable, so PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.
There is only one category in this subtable: all, all. The denominator for this category is 123.
Note: In this table cell, because the numerator and denominator are the same, the row percentage in this subtable is 100.
The following part of the TABLE statement calculates the column percentages and labels the row:
pctn<occupat all>='Percent of column total'
Consider how PROC TABULATE interprets this denominator definition for each subtable.
PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of OCCUPAT within the same value of GENDER.
For example, the denominator for the category manager/supervisor, male is the sum of all frequency counts for all categories in this subtable for which the value of GENDER is male. There are four such categories: technical, male; manager/supervisor, male; clerical, male; and administrative, male. The corresponding frequency counts are 18, 15, 14, and 15. Therefore, the denominator for this category is 18+15+14+15=62.
PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. Because ALL does contribute to this subtable, PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count for ALL as the denominator.
For example, the denominator for the category all, female is the frequency count for that category, 61.
Note: In these table cells, because the numerator and denominator are the same, the column percentages in this subtable are all 100.
PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does contribute to the subtable, PROC TABULATE uses it as the denominator definition. This denominator definition tells PROC TABULATE to sum the frequency counts for all occurrences of OCCUPAT in the subtable.
For example, the denominator for the category technical, all is the sum of the frequency counts for technical, all; manager/supervisor, all; clerical, all; and administrative, all. The corresponding frequency counts are 34, 35, 28, and 26. Therefore, the denominator for this category is 34+35+28+26=123.
PROC TABULATE looks at the first element in the denominator definition, OCCUPAT, and asks if OCCUPAT contributes to the subtable. Because OCCUPAT does not contribute to the subtable, PROC TABULATE looks at the next element in the denominator definition, which is ALL. Because ALL does contribute to this subtable, PROC TABULATE uses it as the denominator definition. ALL is a reserved class variable with only one category. Therefore, this denominator definition tells PROC TABULATE to use the frequency count of ALL as the denominator.
There is only one category in this subtable: all, all. The frequency count for this category is 123.
Note: In this calculation, because the numerator and denominator are the same, the column percentage in this subtable is 100.
The following part of the TABLE statement calculates the total percentages and labels the row:
pctn='Percent of total'
If you do not specify a denominator definition, PROC TABULATE obtains the denominator for a cell by totaling all the frequency counts in the subtable. Table 3.4d summarizes the process for all subtables in this example.
Class Variables Contributing to the Subtable | Frequency Counts | Total |
---|---|---|
OCCUPAT and GENDER | 16, 18, 20, 15 14, 14, 11, 15 | 123 |
OCCUPAT and ALL | 34, 35, 28, 26 | 123 |
GENDER and ALL | 61, 62 | 123 |
ALL and ALL | 123 | 123 |
Consequently, the denominator for total percentages is always 123.
PROC FREQ reference, usage information, and additional examples. See “The FREQ Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.
PROC TABULATE reference, usage information, and additional examples. See “The TABULATE Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.
3.133.151.220