Example 3.13 Combining and Collapsing Observations Based on a Common Variable When the Transaction Data Set Is Indexed

Goal

Reshape a data set by turning related observations into single ones. Match the collapsed observations from this data set with an appropriate observation from another data set based on the value of a key variable that is indexed in the first data set.

Example 3.12 also combines and collapses observations based on a common variable, but Example 3.12 does not use code that relies on indexes.

Example Features

Featured StepDATA step
Featured Step Options and StatementsKEY= option in the SET statement Automatic variable _IORC_ and %SYRC autocall macro program

Input Data Sets

Data set KIDS_IDS contains identification information about five pediatric patients with one observation per patient. Data set IMMUNIZATIONS contains 36 immunization records for patients with one immunization per observation and multiple observations possible per patient. The variable in common between the two data sets is ID. Four of the observations in KIDS_IDS have at least one matching observation in IMMUNIZATIONS. ID C1901 does not. ID C0054 is not in KIDS_IDS, but it does have two observations in IMMUNIZATIONS.

        KIDS_IDS

Obs     id              dob
 1     C0402     07/15/2001
 2     C1593     06/30/2003
 3     C1374     04/23/2007
 4     C3811     02/01/2009
 5     C1901     03/18/2009


                 IMMUNIZATIONS

Obs     id      type     sequence       received
  1    C3811    POLIO        1        04/01/2009
  2    C0402    DTAP         1        09/12/2001
  3    C0402    POLIO        1        09/12/2001
  4    C0402    DTAP         2        11/16/2001
  5    C0402    POLIO        2        11/16/2001
  6    C0402    DTAP         3        01/10/2002
  7    C0402    POLIO        3        04/14/2002
  8    C0402    CPOX         1        07/30/2002
  9    C0402    MMR          1        07/30/2002
 10    C0402    DTAP         4        11/20/2002
 11    C0402    CPOX         2        04/15/2006
 12    C0402    MMR          2        04/15/2006
 13    C0402    DTAP         5        08/15/2006
 14    C0402    POLIO        4        08/15/2006
 15    C1593    DTAP         1        09/05/2003
 16    C1593    POLIO        1        09/05/2003
 17    C1593    DTAP         2        10/29/2003
 18    C1593    POLIO        2        10/29/2003
 19    C1593    DTAP         3        01/03/2004
 20    C1593    CPOX         1        08/04/2004
 21    C1593    MMR          1        08/04/2004
 22    C1593    DTAP         4        10/20/2004
 23    C1593    DTAP         5        07/16/2008
 24    C1593    POLIO        3        07/16/2008
 25    C1593    CPOX         2        08/23/2008
 26    C1593    MMR          2        08/23/2008
 27    C1374    DTAP         1        06/28/2007
 28    C1374    POLIO        1        06/28/2007
 29    C1374    DTAP         2        08/22/2007
 30    C1374    POLIO        2        08/22/2007
 31    C1374    DTAP         3        10/20/2007
 32    C1374    POLIO        3        01/22/2008
 33    C1374    CPOX         1        05/03/2008
 34    C1374    MMR          1        05/03/2008
 35    C0054    DTAP         1        07/01/2000
 36    C0054    POLIO        1        07/01/2000

Resulting Data Set

Output 3.13 DTAP_KIDS Data Set

                Example 3.13 DTAP_KIDS Data Set Created with DATA Step

     Obs  id          dob dtap_date1 dtap_date2 dtap_date3 dtap_date4 dtap_date5

      1  C0402 07/15/2001 09/12/2001 11/16/2001 01/10/2002 11/20/2002 08/15/2006
      2  C1593 06/30/2003 09/05/2003 10/29/2003 01/03/2004 10/20/2004 07/16/2008
      3  C1374 04/23/2007 06/28/2007 08/22/2007 10/20/2007 06/15/2008          .


Example Overview

This example shows you how to link two data sets where the key values are unique in one data set and where the second data set can contain multiple observations for the same key value. The second data set with multiple observations per key value is indexed by the key variable.

The program reads the first data set sequentially while using the KEY= option to directly access observations in the indexed data set. The code shows you how to collapse the multiple observations per key value in the second data set into a single observation and combine the single observation with an observation in the first data set.

Data set KID_IDS contains a list of pediatric patients age 6 and under and uniquely identifies them by variable ID. Data set IMMUNIZATIONS contains immunization information about pediatric patients. Each observation in IMMUNIZATIONS stores the information for one immunization for a child. The variable in common between the two data sets is ID. The variable TYPE in IMMUNIZATIONS identifies the type of immunization. The variable SEQUENCE in IMMUNIZATIONS identifies the shot number in the series of immunizations for a specific type.

The goal of the DATA step is to find all the diphtheria, tetanus, and pertussis (DTAP) immunizations in IMMUNIZATIONS for the pediatric patients in KIDS_IDS. Following pediatric medicine standards, it is expected that the child will receive up to five DTAP immunizations by the age of 6. Therefore, it is expected the value of SEQUENCE can range from 1 to 5 for a child.

The dates of the DTAP immunizations are added to the observations read from KIDS_IDS. Only observations with at least one DTAP immunization are output. The IF statement in the second WHEN block checks that the observation has at least one immunization date before outputting an observation.

Five new variables defined by array ALLSHOTS store the series of DTAP immunization dates: DTAP_DATE1, DTAP_DATE2, DTAP_DATE3, DTAP_DATE4, and DTAP_DATE5. The program copies each DTAP immunization date to the ALLSHOTS array element that is defined by the value of SEQUENCE.

The DATA step starts by reading KIDS_IDS sequentially. A DO UNTIL loop follows that looks for all observations in IMMUNIZATIONS that match by ID. The observations are found by directly accessing IMMUNIZATIONS through a simple index on ID. The return codes from the search operations are tested so that appropriate code paths are followed.

DTAP immunizations are found for three children: C0402, C1593, and C1374. Because child C3811 has only a polio shot, the DATA step does not output an observation for this child. Child C1901 has no observations in IMMUNIZATIONS and therefore no observation in the output data set. Child C0054 has two immunization observations in IMMUNIZATIONS and none in KIDS_IDS so an observation for this child is not in the output data set.

Note that you do not need to sort KIDS_IDS by ID prior to the DATA step. The technique shown does not perform a match-merge that relies on ordering the observations by a BY variable. Instead, matching observations in IMMUNIZATIONS are accessed directly by the key variable ID.

The "A Closer Look" section in Example 3.11 describes in more detail how to use the _IORC_ automatic variable and %SYSRC autocall macro program.

Program

Create data set DTAP_KIDS. Read data set KIDS_IDS sequentially. Prepare to collapse up to five DTAP immunization observations from the transaction data set into one observation by defining array ALLSHOTS.

Drop unneeded variables that come from IMMUNIZATIONS and whose values are collapsed into the DTAP_DATE variables. Execute a DO UNTIL loop until all observations in IMMUNIZATIONS with the current value of ID from KIDS_IDS have been read. Specify that it test whether the return code from the keyed search, which is saved in automatic variable _IORC_, equates to the not found condition. Read an observation from IMMUNIZATIONS based on the current value of the key variable ID from KIDS_IDS. Direct processing of the results of the indexed search based on the value of the automatic variable _IORC_. Test if the search was successful by passing the mnemonic _SOK to autocall macro program %SYSRC. Process only observations of type DTAP. Update the ALLSHOTS array element that corresponds to the current value of SEQUENCE for values of SEQUENCE between 1 and 5. Write an error message to the SAS log when the values of SEQUENCE are invalid.

Execute this block of code when the indexed search does not find a match, which includes when all observations from IMMUNIZATIONS with the current value of ID from KIDS_IDS have been read. Because there are no more observations with the current value of ID in IMMUNIZATIONS, conclude that all DTAP_DATE values present for the ID value have been copied to the elements of ALLSHOTS. Write the completed observation to DTAP_KIDS only if the child had at least one DTAP immunization.

Reset automatic variable _ERROR_ to 0 to prevent writing error messages to the SAS log when no match is found. When the search does not find a match, SAS sets _ERROR_ to 1.

For an unexpected _IORC_ condition, write a message to the SAS log and stop the DATA step.

data dtap_kids;
  set kids_ids;
  array allshots{5} dtap_date1-dtap_date5;



  format dtap_date1-dtap_date5 mmddyy10.;
  drop sequence received type;


  do until (_iorc_=%sysrc(_dsenom));






    set immunizations key=id;


    select (_iorc_);


      when(%sysrc(_sok)) do;


        if type='DTAP' then do;

          if 1 le sequence le 5 then
                allshots{sequence}=received;


          else putlog
'ERROR: DTAP_DATE cannot be updated. Value of SEQUENCE
                            is not 1-5.'  /
                   id= sequence= received=;
      end;
      when (%sysrc(_dsenom)) do;




        if allshots{1} ne . then output;

        _error_=0;


      end;
      otherwise do;
        putlog "ERROR: Unexpected error _IORC_=" _iorc_;
        stop;
       end;
    end;
  end;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.146.237