Example 3.1. Summarizing Data

Goal

Summarize the observations in a data set by several categories.

Report

   Calls Received by Technical Support on January
 31, 2005

                                           
 Percent   Average
                                   Number   within
   Length of
Hour   Status of Problems         of Calls   Hour 
     Calls

9:00   Automated Troubleshooting     10       15% 
    0:05:19
       Resolved on Initial Call      44       68% 
    0:05:38
       Tracked to Specialist         11       17% 
    0:09:20
-----                            --------  
 -------  --------
9:00                                 65      100% 
    0:06:13

10:00  Automated Troubleshooting     16       21% 
    0:09:26
       Resolved on Initial Call      52       68% 
    0:06:53
       Tracked to Specialist          8       11% 
    0:12:21
-----                            --------  
 -------  --------
10:00                                76      100% 
    0:08:00

11:00  Automated Troubleshooting     12       16% 
    0:11:44
       Resolved on Initial Call      48       64% 
    0:07:08
       Tracked to Specialist         15       20% 
    0:13:48
-----                            --------  
 -------  --------
11:00                                75      100% 
    0:09:12

12:00  Automated Troubleshooting      5        9% 
    0:10:07
       Resolved on Initial Call      37       66% 
    0:08:22
       Tracked to Specialist         14       25% 
    0:11:08
-----                            --------  
 -------  --------
12:00                                56      100% 
    0:09:13

1:00   Automated Troubleshooting      5        8% 
    0:04:20
       Resolved on Initial Call      43       70% 
    0:05:53
       Tracked to Specialist         13       21% 
    0:10:39
-----                            --------  
 -------  --------
1:00                                 61      100% 
    0:06:47

2:00   Automated Troubleshooting     13       15% 
    0:07:44
       Resolved on Initial Call      64       74% 
    0:07:57
       Tracked to Specialist         10       11% 
    0:15:21
-----                            --------  
 -------  --------
2:00                                 87      100% 
    0:08:46

3:00   Automated Troubleshooting     13       15% 
    0:09:59
       Resolved on Initial Call      61       72% 
    0:06:46
       Tracked to Specialist         11       13% 
    0:07:29
-----                            --------  
 -------  --------
3:00                                 85      100% 
    0:07:21

4:00   Automated Troubleshooting      3        4% 
    0:08:00
       Resolved on Initial Call      57       75% 
    0:07:39
       Tracked to Specialist         16       21% 
    0:06:13
-----                            --------  
 -------  --------
4:00                                 76      100% 
    0:07:21

       =========================  ======== 
 =======  =========
       Total for Day                581      100% 
    0:07:53


Example Features

Data SetPHONDATA
Featured StepPROC REPORT
Featured Step Statements and OptionsDEFINE statement: ANALYSIS, COMPUTED, GROUP, MEAN, and ORDER= options

COMPUTE blocks

BREAK AFTER statement: SUMMARIZE option

RBREAK AFTER statement: SUMMARIZE option
Formatting FeaturesCustomizing the RBREAK summary line

BREAK AFTER statement: OL and SKIP options when sending output to the LISTING destination

RBREAK AFTER statement: DOL option when sending output to the LISTING destination
Related TechniquePROC TABULATE, PCTN percentage calculation
A Closer LookComparing PROC REPORT and PROC TABULATE for Generating the Report in this Example

Example Overview

This report summarizes phone calls to technical support by the hour the call was received and by the status of the call. As shown in the example the status of each call is classified into one of three groups. For each combination of hour and status, the report presents the number of calls and the average length of each call. The percentage of each status grouping is computed within each hour.

The first two columns of the report correspond to variables in the PHONDATA data set. These two variables define the categories for which statistics are computed. The N statistic is presented in the third column. It is equal to the number of observations represented in the row. The fourth column is a computed column that is based on the N statistic computed in column 3. The fifth column is the MEAN statistic computed on an analysis variable in the data set.

Program

 
proc format;

Define a format to group the types of calls.
   value $statfmt
         'PRIM/RES'='Resolved on Initial Call'
         'AUTOMATED'='Automated Troubleshooting'
         'TOTAL'='Total for Day'
          other='Tracked to Specialist';

Define a format to label the hour in which technical support received the call.
   value hourfmt  9='9:00'
                 10='10:00'
                 11='11:00'
                 12='12:00'
                 13='1:00'
                 14='2:00'
                 15='3:00'
                 16='4:00';
run;

proc report data=phondata nowindows;
  title1 'Calls Received by Technical Support on
          January 31, 2005';

List the columns in the order in which they should appear in the report. Include in the list the statistics columns and the computed column. Ensure that the columns used to calculate the computed variable precede the computed variable in the list.
  column hour status n hourpct primtime;

Group the observations by the values of HOUR and STATUS. Establish the order of the rows by the position of the grouping variables in the COLUMN statement.
  define hour     / group format=hourfmt. center

Present the groups in their unformatted order. This overrides the default ordering of the rows by the formatted values of the group variable.
                    order=internal
                    'Hour';

Group the values of the variable STATUS by the $STATFMT format.
  define status   / group format=$statfmt.
                   'Status of Problems';

When sending output to the LISTING destination, specify the width of the column.
                    width=25

Specify the characteristics of the column that presents the N statistic.
  define n        / format=3. width=8 center
                    'Number/of Calls';

Define the computed column and write the results with the PERCENT format.
  define hourpct  / computed format=percent.
                    width=7 center
                    'Percent/within/Hour';

Compute the MEAN statistic.
  define primtime / analysis mean format=time8.
                    width=9 center
                    'Average/Length of/Calls';

Retain the total N for each HOUR category so that this value can be used as the denominator in calculating HOURPCT for each row within the HOUR category. Determine this value before listing the rows in the HOUR category. Note that it is not necessary to define TOTALN in the COLUMN statement or on a DEFINE statement. By default, an undefined variable that is not in the data set is a numeric computed, nondisplay variable.
  compute before hour;
    totaln=n;
  endcomp;

Compute the value of HOURPCT for each row of the report. Determine the percentage of each type of call within each HOUR category. For the denominator use the TOTALN variable, which is a computed, nondisplay variable. Specify the N statistic as the numerator, which is equal to the N statistic for the current row.
  compute hourpct;
    hourpct=n/totaln;
  endcomp;

Create a summary line after the last row for each value of HOUR.
  break after hour / summarize

When sending output to the LISTING destination, skip a line after the summary row and place a line above the summary row.
                     skip
                     ol;

Create a summary line at the end of the report.
  rbreak after / summarize

When sending the report to the LISTING destination, place a double line above the summary row.
                 dol;

Modify the contents of the summary line at the end of the report.
  compute after;

Specify a unique value for STATUS that reflects the summary line. Note that this value was specified in the $STATFMT format that was defined in the PROC FORMAT step.
    status='TOTAL';

Replace the calculated value of HOURPCT at the end of the report. Assign a value of 1 to HOURPCT so that the PERCENT format writes the value as 100%.
    hourpct=1;
  endcomp;
run;


Related Technique

PROC TABULATE can produce a report similar to the one that PROC REPORT produced, as shown in Figure 3.1.

Figure 3.1. Output Produced by PROC TABULATE
                       Calls Received by Technical Support on January 31, 2005
           -------------------------------------------------------------------------------
           |                                                     |Number|Percent|Average  |
           |                                                     |  of  |within | Length  |
           |                                                     |Calls | Hour  |of Calls |
           |-----------------------------------------------------+------+-------+---------|
           |Hour                      |Status                    |      |       |         |
           |--------------------------+--------------------------|      |       |         |
           |9:00                      |Automated Troubleshooting |    10|     15| 0:05:19 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    11|     17| 0:09:20 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    44|     68| 0:05:38 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    65|    100| 0:06:13 |
           |--------------------------+--------------------------+------+-------+---------|
           |10:00                     |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |    16|     21| 0:09:26 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |     8|     11| 0:12:21 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    52|     68| 0:06:53 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    76|    100| 0:08:00 |
           |--------------------------+--------------------------+------+-------+---------|
           |11:00                     |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |    12|     16| 0:11:44 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    15|     20| 0:13:48 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    48|     64| 0:07:08 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    75|    100| 0:09:12 |
           |--------------------------+--------------------------+------+-------+---------|
           |12:00                     |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |     5|      9| 0:10:07 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    14|     25| 0:11:08 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    37|     66| 0:08:22 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    56|    100| 0:09:13 |
           |--------------------------+--------------------------+------+-------+---------|
           |1:00                      |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |     5|      8| 0:04:20 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    13|     21| 0:10:39 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    43|     70| 0:05:53 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    61|    100| 0:06:47 |
           -------------------------------------------------------------------------------
           |2:00                      |Automated Troubleshooting |    13|     15| 0:07:44 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    10|     11| 0:15:21 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    64|     74| 0:07:57 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    87|    100| 0:08:46 |
           |--------------------------+--------------------------+------+-------+---------|
           |3:00                      |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |    13|     15| 0:09:59 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    11|     13| 0:07:29 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    61|     72| 0:06:46 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    85|    100| 0:07:21 |
           |--------------------------+--------------------------+------+-------+---------|
           |4:00                      |Status                    |      |       |         |
           |                          |--------------------------|      |       |         |
           |                          |Automated Troubleshooting |     3|      4| 0:08:00 |
           |                          |--------------------------+------+-------+---------|
           |                          |Tracked to Specialist     |    16|     21| 0:06:13 |
           |                          |--------------------------+------+-------+---------|
           |                          |Resolved on Initial Call  |    57|     75| 0:07:39 |
           |                          |--------------------------+------+-------+---------|
           |                          |Total for Hour            |    76|    100| 0:07:21 |
           |-----------------------------------------------------+------+-------+---------|
           |Total for Day                                        |   581|    100| 0:07:53 |
           -------------------------------------------------------------------------------

The following program produces the report shown in Figure 3.1.

 
proc format;
   value $statfmt
         'AUTOMATED'='Automated Troubleshooting'
         'PRIM/RES'='Resolved on Initial Call'
         other='Tracked to Specialist';
    value hourfmt  9='9:00'
                  10='10:00'
                  11='11:00'
                  12='12:00'
                  13='1:00'
                  14='2:00'
                  15='3:00'
                  16='4:00';
run;
proc tabulate data=phondata;

Specify the classification variables.
   class hour status;

Specify the analysis variable.
   var primtime;

Place HOUR as the major classification variable of the report and put it in the row dimension.
   table hour='Hour'*

Nest the formatted values of STATUS within each formatted value of HOUR. Summarize the information over all values of STATUS within one formatted value of HOUR by using the ALL keyword.
       (status='Status' all='Total for Hour')

Compute an overall summary of the data and place this summary as the last row in the report.
        all='Total for Day',

Place the analyses of PRIMTIME in the column dimension.
        primtime=' '*

Compute three statistics on PRIMTIME.
       (n='Number of Calls'*f=6.

Override the default denominator for computing percentages by specifying a different denominator in angle brackets. (The default denominator is the total number of observations analyzed in the report.) To compute the percentages within each HOUR category, use as the denominator the total number of observations analyzed over all values of STATUS for that category of HOUR. (Specify STATUS within the angle brackets.) To compute the overall percentages presented in the last row of the report, use as the denominator the total number of observations analyzed in the report. (Specify ALL within the same set of angle brackets.)
       pctn<status all>='Percent within Hour'*f=7.
       mean='Average Length of Calls'*f=time8.)

When sending output to the LISTING destination, override the default determination of the amount of space to allocate to the row titles (HOUR and STATUS).
       / rts=55;
  format status $statfmt. hour hourfmt.;
run;


A Closer Look

Comparing PROC REPORT and PROC TABULATE for Generating the Report in This Example

As demonstrated in the previous sections, either PROC REPORT or PROC TABULATE can summarize data by categories. Each procedure has strengths and weaknesses.

Strengths of PROC REPORT

In this example, you might prefer PROC REPORT for these reasons:

More control over spacing between groups. When sending output to the LISTING destination, you can add options to skip lines between groups as well as underline them and overline them. This helps you distinguish between lower-level and higher-level summaries in the report. PROC TABULATE inserts a separator line between each row of the report in this example, whether or not it’s sent to the LISTING destination. You can remove these lines by specifying NOSEPS on the PROC TABULATE statement. However, NOSEPS removes all lines, and some of these may be useful in making the report easier to understand.
Less space required to display the report. This advantage relates to the advantage described above and also applies to the LISTING destination. The output from the PROC REPORT step does not have any separator lines between the three types of calls within each hour, and it labels the STATUS column only once at the top of each page. The PROC TABULATE report separates the three types of call status within each hour with a line and repeats the label for STATUS within each hour. You could save a line within each hour by suppressing the column heading for STATUS in the PROC TABULATE step, but then you would not have a column heading anywhere else for STATUS. To suppress the heading for STATUS, use status=‘ ’ in the PROC TABULATE TABLE statement instead of status='Status'.

Strengths of PROC TABULATE

You might prefer PROC TABULATE for these reasons:

Easier to specify the calculation of percentages. PROC REPORT includes percentage statistics, but the denominator for these calculations is based on the total number of observations presented in the report. To compute percentages using other denominator definitions requires the use of COMPUTE blocks. The options in PROC TABULATE allow you to base your percentage calculations on the combinations of classification variables or report dimensions that are appropriate for the information you want to convey, without adding programming statements.
Less coding required to produce summary rows and columns. The placement of the keyword ALL in a PROC TABULATE TABLE statement summarizes all of the categories for class variables in the same parenthetical group or dimension. With PROC REPORT, you must include BREAK and RBREAK statements to summarize information.

Where to Go from Here

PROC REPORT reference, usage information, and additional examples. See “The REPORT Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

PROC TABULATE reference, usage information, and additional examples. See “The TABULATE Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.241.48