Reshape a data set by turning related observations into single ones. Match the collapsed observations from this data set with an appropriate observation from another data set based on the value of a key variable that is indexed in the first data set.
Example 3.12 also combines and collapses observations based on a common variable, but Example 3.12 does not use code that relies on indexes.
Featured Step | DATA step |
Featured Step Options and Statements | KEY= option in the SET statement Automatic variable _IORC_ and %SYRC autocall macro program |
Data set KIDS_IDS contains identification information about five pediatric patients with one observation per patient. Data set IMMUNIZATIONS contains 36 immunization records for patients with one immunization per observation and multiple observations possible per patient. The variable in common between the two data sets is ID. Four of the observations in KIDS_IDS have at least one matching observation in IMMUNIZATIONS. ID C1901 does not. ID C0054 is not in KIDS_IDS, but it does have two observations in IMMUNIZATIONS.
KIDS_IDS Obs id dob 1 C0402 07/15/2001 2 C1593 06/30/2003 3 C1374 04/23/2007 4 C3811 02/01/2009 5 C1901 03/18/2009 IMMUNIZATIONS Obs id type sequence received 1 C3811 POLIO 1 04/01/2009 2 C0402 DTAP 1 09/12/2001 3 C0402 POLIO 1 09/12/2001 4 C0402 DTAP 2 11/16/2001 5 C0402 POLIO 2 11/16/2001 6 C0402 DTAP 3 01/10/2002 7 C0402 POLIO 3 04/14/2002 8 C0402 CPOX 1 07/30/2002 9 C0402 MMR 1 07/30/2002 10 C0402 DTAP 4 11/20/2002 11 C0402 CPOX 2 04/15/2006 12 C0402 MMR 2 04/15/2006 13 C0402 DTAP 5 08/15/2006 14 C0402 POLIO 4 08/15/2006 15 C1593 DTAP 1 09/05/2003 16 C1593 POLIO 1 09/05/2003 17 C1593 DTAP 2 10/29/2003 18 C1593 POLIO 2 10/29/2003 19 C1593 DTAP 3 01/03/2004 20 C1593 CPOX 1 08/04/2004 21 C1593 MMR 1 08/04/2004 22 C1593 DTAP 4 10/20/2004 23 C1593 DTAP 5 07/16/2008 24 C1593 POLIO 3 07/16/2008 25 C1593 CPOX 2 08/23/2008 26 C1593 MMR 2 08/23/2008 27 C1374 DTAP 1 06/28/2007 28 C1374 POLIO 1 06/28/2007 29 C1374 DTAP 2 08/22/2007 30 C1374 POLIO 2 08/22/2007 31 C1374 DTAP 3 10/20/2007 32 C1374 POLIO 3 01/22/2008 33 C1374 CPOX 1 05/03/2008 34 C1374 MMR 1 05/03/2008 35 C0054 DTAP 1 07/01/2000 36 C0054 POLIO 1 07/01/2000
Output 3.13 DTAP_KIDS Data SetExample 3.13 DTAP_KIDS Data Set Created with DATA Step Obs id dob dtap_date1 dtap_date2 dtap_date3 dtap_date4 dtap_date5 1 C0402 07/15/2001 09/12/2001 11/16/2001 01/10/2002 11/20/2002 08/15/2006 2 C1593 06/30/2003 09/05/2003 10/29/2003 01/03/2004 10/20/2004 07/16/2008 3 C1374 04/23/2007 06/28/2007 08/22/2007 10/20/2007 06/15/2008 . |
This example shows you how to link two data sets where the key values are unique in one data set and where the second data set can contain multiple observations for the same key value. The second data set with multiple observations per key value is indexed by the key variable.
The program reads the first data set sequentially while using the KEY= option to directly access observations in the indexed data set. The code shows you how to collapse the multiple observations per key value in the second data set into a single observation and combine the single observation with an observation in the first data set.
Data set KID_IDS contains a list of pediatric patients age 6 and under and uniquely identifies them by variable ID. Data set IMMUNIZATIONS contains immunization information about pediatric patients. Each observation in IMMUNIZATIONS stores the information for one immunization for a child. The variable in common between the two data sets is ID. The variable TYPE in IMMUNIZATIONS identifies the type of immunization. The variable SEQUENCE in IMMUNIZATIONS identifies the shot number in the series of immunizations for a specific type.
The goal of the DATA step is to find all the diphtheria, tetanus, and pertussis (DTAP) immunizations in IMMUNIZATIONS for the pediatric patients in KIDS_IDS. Following pediatric medicine standards, it is expected that the child will receive up to five DTAP immunizations by the age of 6. Therefore, it is expected the value of SEQUENCE can range from 1 to 5 for a child.
The dates of the DTAP immunizations are added to the observations read from KIDS_IDS. Only observations with at least one DTAP immunization are output. The IF statement in the second WHEN block checks that the observation has at least one immunization date before outputting an observation.
Five new variables defined by array ALLSHOTS store the series of DTAP immunization dates: DTAP_DATE1, DTAP_DATE2, DTAP_DATE3, DTAP_DATE4, and DTAP_DATE5. The program copies each DTAP immunization date to the ALLSHOTS array element that is defined by the value of SEQUENCE.
The DATA step starts by reading KIDS_IDS sequentially. A DO UNTIL loop follows that looks for all observations in IMMUNIZATIONS that match by ID. The observations are found by directly accessing IMMUNIZATIONS through a simple index on ID. The return codes from the search operations are tested so that appropriate code paths are followed.
DTAP immunizations are found for three children: C0402, C1593, and C1374. Because child C3811 has only a polio shot, the DATA step does not output an observation for this child. Child C1901 has no observations in IMMUNIZATIONS and therefore no observation in the output data set. Child C0054 has two immunization observations in IMMUNIZATIONS and none in KIDS_IDS so an observation for this child is not in the output data set.
Note that you do not need to sort KIDS_IDS by ID prior to the DATA step. The technique shown does not perform a match-merge that relies on ordering the observations by a BY variable. Instead, matching observations in IMMUNIZATIONS are accessed directly by the key variable ID.
The "A Closer Look" section in Example 3.11 describes in more detail how to use the _IORC_ automatic variable and %SYSRC autocall macro program.
Create data set DTAP_KIDS. Read data set KIDS_IDS sequentially. Prepare to collapse up to five DTAP immunization observations from the transaction data set into one observation by defining array ALLSHOTS.
Drop unneeded variables that come from IMMUNIZATIONS and whose values are collapsed into the DTAP_DATE variables. Execute a DO UNTIL loop until all observations in IMMUNIZATIONS with the current value of ID from KIDS_IDS have been read. Specify that it test whether the return code from the keyed search, which is saved in automatic variable _IORC_, equates to the not found condition. Read an observation from IMMUNIZATIONS based on the current value of the key variable ID from KIDS_IDS. Direct processing of the results of the indexed search based on the value of the automatic variable _IORC_. Test if the search was successful by passing the mnemonic _SOK to autocall macro program %SYSRC. Process only observations of type DTAP. Update the ALLSHOTS array element that corresponds to the current value of SEQUENCE for values of SEQUENCE between 1 and 5. Write an error message to the SAS log when the values of SEQUENCE are invalid.
Execute this block of code when the indexed search does not find a match, which includes when all observations from IMMUNIZATIONS with the current value of ID from KIDS_IDS have been read. Because there are no more observations with the current value of ID in IMMUNIZATIONS, conclude that all DTAP_DATE values present for the ID value have been copied to the elements of ALLSHOTS. Write the completed observation to DTAP_KIDS only if the child had at least one DTAP immunization.
Reset automatic variable _ERROR_ to 0 to prevent writing error messages to the SAS log when no match is found. When the search does not find a match, SAS sets _ERROR_ to 1.
For an unexpected _IORC_ condition, write a message to the SAS log and stop the DATA step.
data dtap_kids; set kids_ids; array allshots{5} dtap_date1-dtap_date5; format dtap_date1-dtap_date5 mmddyy10.; drop sequence received type; do until (_iorc_=%sysrc(_dsenom)); set immunizations key=id; select (_iorc_); when(%sysrc(_sok)) do; if type='DTAP' then do; if 1 le sequence le 5 then allshots{sequence}=received; else putlog 'ERROR: DTAP_DATE cannot be updated. Value of SEQUENCE is not 1-5.' / id= sequence= received=; end; when (%sysrc(_dsenom)) do; if allshots{1} ne . then output; _error_=0; end; otherwise do; putlog "ERROR: Unexpected error _IORC_=" _iorc_; stop; end; end; end; run;
3.133.146.237