Example 3.12 Combining and Collapsing Observations Based on a Common Variable

Goal

Reshape a transaction data set by turning related observations into single ones. Match each collapsed observation from the transaction data set with an appropriate observation from the master data set, based on the value of a key variable.

Example 3.13 also combines and collapses observations based on a common variable, but its transaction data set is indexed on the common variable and it uses index features to combine the observations.

Example Features

Featured StepPROC TRANSPOSE and DATA step
Featured Step Options and StatementsPROC TRANSPOSE: BY, ID, and VAR statements, PREFIX= option DATA step: match-merge by common variable
Related TechniqueDATA step, arrays, RETAIN statement
A Closer LookNaming the Variables in the Transposed Data Set

Input Data Sets

Data set WEIGHT_LOSS identifies five participants in a weight loss group. Data set WEIGHTS contains the weights for four of the five participants over a four-week period. The variable in common is ID. No weights were recorded for S005. Only one (S003) of the five participants has a weight recorded for each of the four weeks.


            WEIGHT_LOSS

                       goal_
       Obs     id     weight
        1     S001      190
        2     S002      176
        3     S003      136
        4     S004      118
        5     S005      135


                 WEIGHTS

     Obs     id     week    weight
       1    S001      1       231
       2    S002      1       187
       3    S003      1       154
       4    S004      1       134
       5    S001      2       223
       6    S003      2       151
       7    S004      2       133
       8    S002      3       176
       9    S003      3       148
      10    S004      4       129
      11    S003      4       142

Resulting Data Set

Output 3.12 WEIGHT_UPDATES Data Set

Example 3.12 WEIGHT_UPDATES Data Set Created with PROC TRANSPOSE

          goal_
Obs  id  weight wt_week1 wt_week2 wt_week3 wt_week4

 1  S001   190     231       223        .        .
 2  S002   176     187         .      176        .
 3  S003   136     154       151      148      142
 4  S004   118     134       133        .      129
 5  S005   135       .         .        .        .


Example Overview

This example transposes the observations in a transaction data set and match-merges the transposed observations to a master data set. It shows you how to use PROC TRANSPOSE to collapse observations based on a common variable and control how the variables are named and arranged in the output data set. A DATA step then merges this new transaction data set to the master data set.

Participants in a weight loss group and their goal weight are identified in data set WEIGHT_LOSS. Each observation in data set WEIGHTS records the weight for a participant at a specific week.

The goal of the program is to rearrange the observations in WEIGHTS so that all the weights for a participant are in one observation rather than several observations, and then merge the transposed observations to the master data set WEIGHT_LOSS by common variable ID. All observations in WEIGHT_LOSS are to be kept even if there is no match in WEIGHTS.

An advantage of this program over the DATA step in the Related Technique is that you do not have to know in advance how many values of WEEK are in WEIGHT_LOSS so that you define the correct number of variables.

One disadvantage of this program over the DATA step in the Related Technique is if you have multiple variables to transpose. Within each BY group, PROC TRANSPOSE creates an observation for each variable named in the VAR statement. Data set WEIGHT_LOSS is collapsed into four observations, one for each study subject in WEIGHT_LOSS, because the procedure transposes only one variable, WEIGHT. If two variables were transposed, PROC TRANSPOSE would create an output data set with eight observations and you would have to perform additional processing to get all the information for one subject on one observation.

Program

Sort data set WEIGHTS by ID because the goal is to transpose WEIGHT values for each subject. Order the observations for each subject by WEEK so that PROC TRANSPOSE will define the transposed variables in order by the values of WEEK as they are encountered in the data set. Transpose data set WEIGHTS. Save the transposed observations in TRANSWT. Drop the automatic variable _NAME_, which contains the name of the variable that is being transposed. Specify a prefix to use in constructing the names of the transposed variables. Do not enclose the text in quotation marks. For each ID value, create an observation for each transposed variable. Name the variable to transpose. Specify the variable whose values name the transposed variables in the output data set. By specifying the PREFIX= option, precede the values of WEEK in the variable name with the PREFIX= text. Create data set WEIGHT_UPDATES. Combine WEIGHT_LOSS and the transposed data set TRANSWT by the common variable ID. Specify the IN= data set option on WEIGHT_LOSS so that it can be determined when observations are contributed from WEIGHT_LOSS. Keep all observations in WEIGHT_LOSS whether or not they have a match in TRANSWT.

proc sort data=weights;
  by id week;
run;

proc transpose data=weights
               out=transwt(drop=_name_)

               prefix=wt_week;


  by id;


  var weight;
  id week;


run;

data weight_updates;
  merge weight_loss(in=ingroup) transwt;
    by id;


  if ingroup;


run;

Related Technique

The following DATA step creates a data set equivalent to the one that was created in the main example. It requires that you know beforehand the possible values of WEEK in data set WEIGHTS, and it relies on the values of week being sequential from 1 to 4. If the data values of the variable you want to transpose are not sequential, you must modify the code that tests the values so that the values are placed in the proper array elements.

The DATA step places each value of WEIGHT in the WT_WEEK array based on its associated value for WEEK. The elements of the WT_WEEK array are retained across observations. When processing the first observation in an ID BY group, all of the elements of the WT_WEEK array are set to missing. The DATA step outputs observations to WEIGHT_UPDATES only after processing the last observation in an ID BY group.

Both input data sets must be sorted or indexed by ID before executing the DATA step.

The dimension of the WT_WEEK array is the maximum number of observations in a BY group, which is 4. The ARRAY statement is the only location where that value is explicitly specified.

Create data set WEIGHT_UPDATES. Combine WEIGHT_LOSS and WEIGHTS by the common variable ID. Specify the IN= data set option on WEIGHT_LOSS so that it can be determined when observations are contributed from WEIGHT_LOSS. Merge the two data sets by common variable ID and define BY-group processing. Define an array that will hold the transposed values. Specify as the dimension of the array the maximum number of observations in a BY group.

Retain the elements of the WT_WEEK array across iterations of the DATA step. Drop variables not needed in the output data set, including WEIGHT whose values are moved to the elements of the WT_WEEK array, and WEEK whose values are represented by the suffix on the WT_WEEK series of variables. At the beginning of each BY group, set to missing all the elements of the WT_WEEK array.

For known values of WEEK, copy the value of WEIGHT to the corresponding WT_WEEK array element. Output an observation only when processing the last observation in an ID BY group.

data weight_updates;
  merge weight_loss(in=ingroup) weights;


    by id;

  array wt_week{4};


  retain wt_week:;


  drop weight week;




 if first.id then call missing(of wt_week[*]);

  if 1 le week le dim(wt_week) then
                        wt_week{week}=weight;

  if last.id then output;


run;

A Closer Look

Naming the Variables in the Transposed Data Set

The PREFIX= option in the PROC TRANSPOSE statement in conjunction with the ID statement control how the main example names the variables that were created by transposing variable WEIGHT.

PROC TRANSPOSE transposes the values of WEIGHT for each study subject into a series of variables that have the prefix WT_WEEK. The suffix for the WT_WEEK variables is the week of the measurement. Because variable WEEK has four values—1, 2, 3, and 4—the data set that contains the transposed observations will contain four WT_WEEK variables: WT_WEEK1, WT_WEEK2, WT_WEEK3, and WT_WEEK4.

The values of WEEK determine the suffix on the WT_WEEK variables. For example, if your WEEK values were 1, 4, 8, and 12, the names of the four variables would be WT_WEEK1, WT_WEEK4, WT_WEEK8, and WT_WEEK12.

The order in which PROC TRANSPOSE encounters the values of WEEK determines the order in which it defines the transposed variables. Subject S001 has a weight value for the first two weeks, subject S002 has values for the first and third weeks, and subject S003 has values for all four weeks. Therefore, PROC TRANSPOSE defines the series of WT_WEEK variables in sequential order: WT_WEEK1, WT_WEEK2, WT_WEEK3, and WT_WEEK4. However, if subject S001 did not have a value for week 1, the order of the variables in the output data set would be WT_WEEK2, WT_WEEK1, WT_WEEK3, and WT_WEEK4.

You could also move the ARRAY and RETAIN statements before the MERGE statement to ensure these variables were defined in the Program Data Vector in a specific order.

Ensure that the variable you name in the ID statement does not contain duplicate values within the BY group. None of the subjects in this example have more than one weight recorded per week. If a subject did have duplicate values for a week, SAS would stop the PROC TRANSPOSE step and issue an error and warning message. If you are aware of the duplicate condition, you could add the LET option to the PROC TRANSPOSE statement. This prevents the error condition and causes PROC TRANSPOSE to transpose the observation that contains the last occurrence of the particular ID value within the data set or BY group.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.229