Example 8.9 Adding Observations to a SAS Data Set So the Values of a Variable Are Consecutive throughout the BY Group

Goal

Add observations to a data set to fill out each BY group so that the value of a specified variable consistently increments for every observation in the BY group.

Example 8.10 is similar to Example 8.9 except that the data are more complex in Example 8.10 and require different programming steps.

The example uses the same input data set as Example 8.12.

Example Features

Featured StepDATA step
Featured Step Options and StatementsBY-group processing, FIRST.variable and LAST.variable temporary variables, RETAIN statement
Related TechniquePROC MEANS, COMPLETETYPES, NOPRINT and NWAY options, ID and OUTPUT statements
A Closer LookExamining the Processing of Data Set WEIGHT_BMI by the DATA Step

Input Data Set

Data set WEIGHT_BMI contains weight, BMI, and session information for four study participants over a four-week period. Only ID S003 had data recorded for all four weeks.

                   WEIGHT_BMI

Obs     id     week    weight     bmi    session
  1    S001      1       231     29.7      AM1
  2    S001      2       223     28.6      AM2
  3    S002      1       187     28.4      AM1
  4    S002      3       176     26.8      AM2
  5    S003      1       154     27.3      PM1
  6    S003      2       151     26.7      AM1
  7    S003      3       148     26.2      PM1
  8    S003      4       142     25.2      PM1
  9    S004      1       134     25.3      PM3
 10    S004      2       133     25.1      PM3
 11    S004      4       129     24.4      PM3

Resulting Data Set

Output 8.9 WEIGHT_BMI_4WEEKS Data Set

Example 8.9 WEIGHT_BMI_4WEEKS Data Set Created with DATA Step

        Obs     id     week    weight     bmi    session

          1    S001      1       231     29.7      AM1
          2    S001      2       223     28.6      AM2
          3    S001      3         .       .
          4    S001      4         .       .
          5    S002      1       187     28.4      AM1
          6    S002      2         .       .
          7    S002      3       176     26.8      AM2
          8    S002      4         .       .
          9    S003      1       154     27.3      PM1
         10    S003      2       151     26.7      AM1
         11    S003      3       148     26.2      PM1
         12    S003      4       142     25.2      PM1
         13    S004      1       134     25.3      PM3
         14    S004      2       133     25.1      PM3
         15    S004      3         .       .
         16    S004      4       129     24.4      PM3


Example Overview

This example shows you how to add observations to fill out BY groups in a data set so that each BY group has the same number of observations. The observations it adds fill specific positions in the BY group based on the requirement that a specific variable's values be sequentially represented.

Data set WEIGHT_BMI has weights and BMIs for four study participants over a four-week period. Only one of the participants, S003, has information recorded for all four weeks. The goal is to add observations for the other three participants so that they each also have four observations. The variable WEEK records the week of the measurement and its values are 1, 2, 3, and 4. The program examines the observations for each participant and adds an observation for missing weeks. The values for WEIGHT, BMI, and SESSION for that inserted week are set to missing.

The DATA step processes WEIGHT_BMI in BY groups that are defined by the values of variable ID. It starts by renaming variables WEIGHT, BMI, and SESSION so that the processing in the DATA step does not overwrite existing values.

The DATA step tests when it is at the beginning of the BY group and assigns a value of 1 to WEEK in case the BY group is missing an observation for WEEK=1. Next it tests whether the current observation is the last in the BY group.

The results of testing whether the current observation is the first or last in the BY group determine the bounds of a DO loop. Each iteration of the DO loop outputs one observation. When an observation is not the last in the BY group, the loop executes between the current value of WEEK and the current value of HOLDWEEK, which is the name of the renamed WEEK variable. When an observation is the last in the BY group, it executes between the current value of WEEK and the value 4, which is the maximum number of observations per BY group. This upper bound value is assigned to variable UPPER.

Table 8.1 in "Examining the Processing of Data Set WEIGHT_BMI by the DATA Step" in the "A Closer Look" section lists the bounds of the DO loop on each iteration of the DATA step.

Example 8.10 is similar in that it fills out BY groups. It is more complex because the BY variables include character variables whose values are not numerically sequential as they are in this example.

Program

Create data set WEIGHT_BMI_4WEEKS. Read the observations from WEIGHT_BMI. Rename all variables except for the BY variable ID so that the values of these variables are available in the DATA step and do not get overwritten. Process the data set in groups defined by the values of ID. Retain the values of WEEK across iterations of the DATA step. Drop the renamed variables since their values are assigned to new variables later in the DATA step. Initialize WEEK to 1 at the beginning of each BY group. When processing the last observation in the BY group, set the value of the upper bound of the DO loop to the maximum number of observations per BY group. When not processing the last observation in the BY group, assign to the upper bound of the DO loop the renamed value of WEEK. This causes the DO loop to possibly iterate more than once because the value of WEEK might be less than HOLDWEEK. This allows for observations with missing data for the intervening weeks to be added to the output data set. Process a DO loop between the bounds that are defined by the preceding code. Assign missing values to variables WEIGHT, BMI, and SESSION when WEEK is not equal to HOLDWEEK because an observation for the current value of WEEK is not present in the BY group. Assign values to WEIGHT, BMI, and SESSION from the renamed variables when WEEK is equal to HOLDWEEK because an observation for the current value of WEEK is present in the BY group. Output an observation on each iteration of the DO loop. The DO loop iterates in increments of 1 so that even when a value of WEEK has no observation for the BY group in WEIGHT_BMI, it writes an observation to WEIGHT_BMI_4WEEKS.

data weight_bmi_4weeks;

  set weight_bmi(rename=(week=holdweek weight=holdweight
                              bmi=holdbmi
                              session=holdsession));



  by id;

  retain week;

  drop holdweek holdweight holdbmi holdsession upper;


  if first.id then week=1;

  if last.id then upper=4;

else upper=holdweek;







  do week=week to upper;

    if week ne holdweek then do;
      weight=.;
      bmi=.;
      session='   ';
    end;

    else do;
      weight=holdweight;
      bmi=holdbmi;
      session=holdsession;
    end;

    output;





  end;
run

Related Technique

The following program uses PROC MEANS to create a data set equivalent to the data set that was created in the main example. By adding the COMPLETETYPES and NWAY options to the PROC MEANS statement, the PROC MEANS step creates a data set that has all possible combinations of ID and WEEK.

The COMPLETEYPES option creates all possible combinations of the classification variables even if the combination does not occur in the input data set. The NWAY option causes PROC MEANS to save only the combinations of the two variables and not the classifications of ID alone and WEEK alone as well as an observation for the entire data set.

The combinations of ID and WEEK are saved in an output data set along with the participant's values of WEIGHT, BMI, and SESSION. The VAR statement specifies the two numeric variables, WEIGHT and BMI.

The ID statement specifies the character variable, SESSION. When there is more than one observation in a combination of the CLASS variables, PROC MEANS saves in the output data set the maximum value of the variable that is specified in the ID statement. If the variable is a character variable, PROC MEANS determines the maximum value by the sort order of the characters.

The OUTPUT statement requests that the MIN statistic be saved in an output data set. Because there is only one observation per combination of ID and WEEK in WEIGHT_BMI, the MIN statistic equals the actual data value.

The following program is a good choice if you don't need to add programming statements to a DATA step. Also, when the increments between your values are not sequential or if you are using character variables, the programming of the bounds of the DO loop in the main example can be complicated. PROC MEANS would handle the complexity of defining the combinations without additional programming.

PROC FREQ and the SPARSE option in the TABLES statement can provide a similar solution. This method is used in the Related Technique in Example 8.10.

Analyze data set WEIGHT_BMI. Create all possible combinations of the variables listed in the CLASS statement even if a combination does not occur in the input data set. Suppress the output report. Specify that PROC MEANS compute statistics only for the groups that are defined by the combinations of ID and WEEK. Specify the classification variables whose values define the combinations for which to compute statistics. Include SESSION in the output data set. Because each combination of ID and WEEK has at most one observation, the actual value of SESSION is saved in the output data set. Specify the two numeric variables whose values will be saved in the output data set. Save the results in WEIGHT_BMI_4WEEKS. Do not keep the automatic variables that PROC MEANS creates. Compute the MIN statistic on WEIGHT and BMI. Name the statistics WEIGHT and BMI. Because each combination of ID and WEEK has at most one observation, the MIN statistics for WEIGHT and BMI equal their actual values.

proc means data=weight_bmi
                           completetypes


                           noprint
                           nway;


  class id week;


  id session;



  var weight bmi;


  output out=weight_bmi_4weeks(drop=_freq_ _type_)


             min=;





run;

A Closer Look

Examining the Processing of Data Set WEIGHT_BMI by the DATA Step

Table 8.1 shows how the DO loop bounds are set on each iteration of the DATA step in the main example. Because the value of new variable WEEK is retained across iterations of the DATA step, the DATA step starts out its next iteration with the value it had when the DO loop ended in the previous iteration. For example, when the DO loop ends on the first iteration of the DATA step, the value of WEEK is 2. The second iteration starts out with WEEK=2, which is the lower bound of the DO loop.

Table 8.1. DO Loop Bounds on Each Iteration of the DATA Step
 DO Loop 
Observation IDDATA Step Iteration (_N_)Lower bound (new variable WEEK)Upper bound (variable UPPER)Times DO loop iterates
S0011111
S0012243
S0023111
S0024243
S0035111
S0036221
S0037331
S0038441
S0049111
S00410221
S00411342

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.74.199