Example 2.7. Presenting Multiple Observations per Report Row

Goal

Present in one row data from more than one observation. Use the values of a grouping variable to select the observations to present in one row.

Analyze specific variables within a row and present the results in computed columns in the report. Some of the calculations will be derived from the multiple observations in the row.

Report

                         Exercise Program Results

                                             Chol/
  Chol/
          Study  Pre Program   Post Program   HDL 
   HDL    Chol/HDL
 Gender      ID   Chol    HDL   Chol    HDL   Pre 
  Post    Change
 Males     1005    296     47    272     51    6.3
    5.3   Improved
           1006    155     31    152     33    5.0
    4.6   Minimal Change
           1007    250     55    231     60    4.5
    3.9   Improved
           1008    264     43    195     44    6.1
    4.4   Improved
           1013    183     51    192     49    3.6
    3.9   Minimal Change
           1014    256     43    235     43    6.0
    5.5   Minimal Change
           1015    235     43    216     44    5.5
    4.9   Improved
           1016    238     36    207     36    6.6
    5.8   Improved
           1017    215     50    205     52    4.3
    3.9   Minimal Change
           1018    190     31    164     32    6.1
    5.1   Improved
           1019    168     52    172     44    3.2
    3.9   Worsened
           1020    219     57    207     58    3.8
    3.6   Minimal Change
           1021    203     28    169     28    7.3
    6.0   Improved
           1022    215     51    205     52    4.2
    3.9   Minimal Change
           1023    222     32    210     32    6.9
    6.6   Minimal Change

 Females   1001    156     48    150     50    3.3
    3.0   Minimal Change
           1002    151     50    139     54    3.0
    2.6   Improved
           1003    165     51    149     51    3.2
    2.9   Minimal Change
           1004    158     51    143     53    3.1
    2.7   Improved
           1009    187     71    174     69    2.6
    2.5   Minimal Change
           1010    161     64    155     66    2.5
    2.3   Minimal Change
           1011    164     72    149     73    2.3
    2.0   Improved
           1012    160     66    168     61    2.4
    2.8   Worsened
           1024    196     61    186     62    3.2
    3.0   Minimal Change
           1025    216     51    171     54    4.2
    3.2   Improved
           1026    195     60    195     60    3.3
    3.3   Minimal Change


Example Features

Data SetLIPIDS
Featured StepPROC REPORT
Featured Step Statements and OptionsDEFINE statement: ACROSS, COMPUTED, DESCENDING, DISPLAY, and GROUP options

COMPUTE blocks

Stacking one column above other columns
Formatting FeaturesControlling line breaks in column headings BREAK statement: SKIP option when sending output to the LISTING destination
A Closer LookUnderstanding the Processing of the BREAK Statement in This Example

Constructing Columns from Column Specifications

Reshaping a Report by Using GROUP and ACROSS Variables
ODS Enhanced VersionExample 6.4
Other Examples That Use This Data SetExamples 3.3, 6.4, 6.8, and 6.9

Example Overview

This report presents test results for participants in a study that measures lipids before and after participation in an exercise program. The goal of the study is to reduce cholesterol and raise HDL. Results are measured by the ratio of cholesterol to HDL. The report should present an evaluation of the change in the ratio before and after participation in an exercise program.

Each observation in the LIPIDS data set corresponds to the lipid measurements for one participant during one of the two testing periods.

Simply listing the observations with PROC PRINT or with PROC REPORT produces a long listing with many rows. The structure of this simple presentation makes it difficult to draw conclusions about each participant’s test results in the two time periods.

Grouping the observations by GENDER and STUDYID and specifying TESTPERIOD as an ACROSS column reshapes the long listing into a more concise report that puts each participant’s results on one row. This new structure makes it easier to compare the results for a study participant. It also makes it possible to calculate changes for a participant before and after the exercise program and to include these evaluations in the report.

Each row in the report contains data from two observations in the LIPIDS data set. The two observations are the test results for a study participant before and after participation in the exercise program. The last three columns are computed columns defined by PROC REPORT, and their values are determined in COMPUTE blocks.

Program

 
proc format;
  value ratio low-<-.1='Improved'
              -.1-.1='Minimal Change'
              >.1-high='Worsened';
  value $gender 'M'='Males'
                'F'='Females';

Place the dollar sign character in the format label in the positions where the value label should split over lines.
  value $results 'Pre'='Pre$Program$Results'
                 'Post'='Post$Program$Results';
run;

proc report data=lipids nowindows

Replace the default split character (/) with another character because some of the column labels in the report contain the slash (/).
     split='$';



  title 'Exercise Program Results';

  column gender studyid
	 testperiod,(chol hdl)	

Stack one column above two other columns.
List the three computed columns. Ensure that any column that is based on another computed column follows that computed column.
         ratiopre ratiopost results;

Identify the variables that group the observations and define the rows. Order the values of GENDER in descending order so that the data for males is displayed before that for females.
  define gender / group descending 'Gender'
                  format=$gender.;
  define studyid / group 'Study ID' width=5;

Identify TESTPERIOD as the variable whose values define and label the columns.
  define testperiod / across

Order the values of TESTPERIOD in descending order so that the “PRE” data values are displayed before the “POST” data values.
           descending

Suppress the column heading for TESTPERIOD.
           ' '
           center format=$results.;

Specify the display variables that are listed side-by-side beneath the values of TESTPERIOD. Specify the column width when sending output to the LISTING destination.
  define chol / display 'Chol' width=5;
  define hdl / display 'HDL' width=5;

Specify the computed columns.
  define ratiopre / computed format=5.1
                    'Chol/HDL Pre' center;
  define ratiopost / computed format=5.1
                     'Chol/HDL Post' center;

Characterize the pre- and post-exercise results by formatting the value of the computed column.
  define results / computed format=ratio.
                   'Chol/HDL Change' left width=14;

Skip a line after each section that is grouped by GENDER when sending output to the LISTING destination.
  break after gender / skip;

Compute two columns whose values are based on other columns. Reference these columns explicitly by number, because they do not have a single name to reference them.
  compute ratiopre;
    ratiopre=_c3_/_c4_;
  endcomp;
  compute ratiopost;
    ratiopost=_c5_/_c6_;
  endcomp;

Compute an additional column based on the two computed columns, and reference these columns by name, even though their COMPUTE blocks use explicit column numbers to create them.
  compute results;
    results=(ratiopost-ratiopre)/ratiopre;
  endcomp;
run;


A Closer Look

Understanding the Processing of the BREAK Statement in This Example

The SAS log for this program generates the following note:

   NOTE: Missing values were generated as a result of performing
   an operation on missing values.
   Each place is given by: (Number of times) at (Line):(Column).
   2 at 1:15   2 at 1:16   2 at 1:20

Reviewing the reason for this note can further help you understand how PROC REPORT processes BREAK statements.

The BREAK statement executes before it processes the observations within a group. Since CHOL and HDL are DISPLAY variables, their values are missing when the BREAK statement executes. Thus the three COMPUTE blocks generate missing values. A missing value is generated twice in each COMPUTE block, which corresponds to each value of GENDER.

One way to suppress the note is to define CHOL and HDL as analysis variables that compute the mean statistic. Each row has an N of 1, so the means end up being equal to the actual detail value. Mean statistic values for columns 3, 4, 5, and 6 are available when the BREAK statement executes. The means, however, are not displayed, because the SUMMARIZE option is not included in the BREAK statement.

Another way to suppress the note is to test the value of the automatic variable, _BREAK_. PROC REPORT assigns specific values to _BREAK_ that you can test to control execution of the statements in the COMPUTE blocks. The value of _BREAK_ is missing when PROC REPORT is not processing a BREAK line. The COMPUTE blocks could be rewritten as follows to test the value of _BREAK_.

   compute ratiopre;
     if _break_=' ' then ratiopre=_c3_/_c4_;
   endcomp;
   compute ratiopost;
     if _break_=' ' then ratiopost=_c5_/_c6_;
   endcomp;
   compute results;
     if _break_=' ' then
       results=(ratiopost-ratiopre)/ratiopre;
   endcomp;

Example 3.8 conditionally executes COMPUTE block statements by testing the value of _BREAK_. Further discussion of _BREAK_ is included in the “A Closer Look” section in that example.

Constructing Columns from Column Specifications

The COLUMN statement in this report uses two types of column specifications. One is the simple specification of a single report item. The other is the specification of report items separated by a comma.

   column gender studyid testperiod,(chol hdl)
          ratiopre ratiopost results;

A third way to specify report columns is to assign an alias to a report item. If you want to use the same report item more than once in a COLUMN statement, you can assign an alias to each occurrence. This allows you to specify a different DEFINE statement for each occurrence. The Related Technique in Example 3.9 assigns aliases to multiple occurrences of the statistic N.

Table 2.7 describes three ways to specify report columns.

Table 2.7. Constructing Columns from Column Specifications
This type of column specification...Results in this type of structure...Corresponding report items in this example
A single report itemA single columnGENDER

STUDYID

RATIOPRE

RATIOPOST

RESULTS
Two or more report items separated by a commaItems that collectively determine the contents of the column or columns. All items are used to create column headings with the heading for the leftmost item on top. (The heading for TESTPERIOD is a blank; the formatted values of TESTPERIOD become the column heading for TESTPERIOD.) If one of the items is an analysis variable, a computed variable, or a statistic, its values fill the cells. Otherwise, PROC REPORT fills the cells with frequency counts. The parentheses in this example’s specification create a column for each of the variables within the parentheses beneath each value of the report item to the left of the comma. (In this example, the values for TESTPERIOD are “Pre” and “Post.”)TESTPERIOD,(CHOL HDL)
A single report item and an aliasA single column. The alias enables you to use two different DEFINE statements for the single report item so that you can specify different attributes for the different occurrences of the variable.See Example 3.9

Reshaping a Report by Using GROUP and ACROSS Variables

The first option in each DEFINE statement for this report describes how PROC REPORT should use each variable in the report. This report defines four kinds of variables:

group variables: GENDER and STUDYID
across variables: TESTPERIOD
display variables: CHOL and HDL
computed variables: RATIOPRE, RATIOPOST, and RESULTS

Group and across variables affect the layout of rows and columns in the report. A variable defined as DISPLAY lists the values of the variable. (An analysis variable could be specified in the same manner as the display variable, and the values presented would be the statistic associated with the variable.) Computed variables are calculated from other items in the report. The rest of this section explains the different usages in more detail.

Group Variables

A group is a set of observations that have a unique combination of formatted values for all group variables. PROC REPORT summarizes all the observations in a group in one row of the report.

In this report, the group variables are GENDER and STUDYID. Figure 2.7a illustrates what happens when PROC REPORT creates groups for GENDER and STUDYID from the first six observations in the input data set. These six observations represent three rows in the report.

Figure 2.7a. Grouping Six Observations


Across Variables

PROC REPORT creates a column for each formatted value of an across variable. In this report, it creates a column for each combination of testing time period and the two tests, cholesterol and HDL. By default, the values that fill the cells created by an across variable are frequency counts. This report, however, overrides this default. The values of each test fill the cells because the test variables, CHOL and HDL, are nested underneath the across variable and are defined as display variables.

Figure 2.7b uses the first six observations to illustrate how PROC REPORT places values for TESTPERIOD, CHOL, and HDL in the report. Note that TRI is not included in the final report.

Figure 2.7b. Layout Created by Across Variables

Input Data Set

   STUDYID GENDER TESTPERIOD CHOL HDL TRI
   1001      F       Pre     156  48  134
   1001      F       Post    150  50  127
   1002      F       Pre     151  50  102
   1002      F       Post    139  54  81
   1003      F       Pre     165  51  114
   1003      F       Post    149  51  101


Report

   Pre         Post
   Program     Program
   Results     Results
Chol   HDL  Chol   HDL
 156    48   150    50
 151    50   139    54
 165    51   149    51


Putting Together the Pieces

This example uses group and across variables together to reshape and present the data in one step. Figure 2.7a lists the first six observations in the LIPIDS data set. Figure 2.7b shows the transformation of these first six observations into three rows of the report.

The computed variables add columns to the report. The calculation of a computed variable value uses information in the row in which the value is placed. Because each row presents test values from several observations, the calculation therefore ends up being performed on data from several observations. Figure 2.7c includes the three computed variables.

Figure 2.7c. Constructing the Whole Report

Input Data Set

STUDYID GENDER TESTPERIOD CHOL HDL TRI
1001      F       Pre      156  48  134
1001      F       Post     150  50  127
1002      F       Pre      151  50  102
1002      F       Post     139  54  81
1003      F       Pre      165  51  114
1003      F       Post     149  51  101


Report

                 Pre         Post
                 Program     Program    Chol/  Chol/
         Study   Results     Results     HDL   
 HDL    Chol/HDL
Gender      ID  Chol   HDL  Chol   HDL   Pre  
 Post    Change
Females   1001   156    48   150    50    3.3    3
.0   Minimal Change
          1002   151    50   139    54    3.0    2
.6   Improved
          1003   165    51   149    51    3.2    2
.9   Minimal Change


Where to Go from Here

PROC FORMAT reference, usage information, and additional examples. See “The FORMAT Procedure” in the “Procedures” section of Base SAS 9.1 Procedures Guide, and “Formatted Values” in the “Fundamental Concepts for Using Base SAS Procedures” section of Base SAS 9.1 Procedures Guide.

PROC REPORT reference, usage information, and additional examples. See “The REPORT Procedure,” in the “Procedures” section of Base SAS 9.1 Procedures Guide.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.68.197