Example 6.4 Adding Observations and Variables to the Master Data Set When Duplicate Matching Variable Values Exist in the Transaction Data Set

Goal

Match a master data set and transaction data set where the master data set has only unique combinations of the values of the matching variables and the transaction data set can have duplicate combinations. Keep the multiple observations for the duplicate combinations of the matching variables in the output data set. Only keep information from the transaction data set for observations with a match in the master data set.

Examples 6.1, 6.2, 6.3, 6.4, and 6.5 illustrate ways to update common variables when combining data sets. The five examples apply the methods to the same master data set. Different transaction data sets are used in Examples 6.1, 6.2, and 6.3. Examples 6.4 and 6.5 use the same transaction data set and produce different resulting data sets.

Example Features

Featured StepDATA step
Featured Step Options and StatementsMatch-merge with MERGE and BY statements IN= data set option
Related TechniquePROC SQL, left join

Input Data Sets

Master data set BOOKLIST contains information about the chapters of three books. Transaction data set SUBMISSIONS contains updates of information to apply to BOOKLIST.

                                         BOOKLIST

 Obs bookid     booktitle      chapter author                  duedate       editor
   1 NF0586 Current Narratives     1   Smith, Rebecca       09/04/2010 Williams, Stephanie
   2 NF0586 Current Narratives     2   Williams, Susan      09/04/2010 Williams, Stephanie
   3 NF0586 Current Narratives     3   Torres, Christopher  09/11/2010 Williams, Stephanie
   4 NF0586 Current Narratives     4   Torres, Christopher  09/11/2010 Williams, Stephanie
   5 NF0586 Current Narratives     5   Powell, George       09/11/2010 Williams, Stephanie
   6 NF0586 Current Narratives     6   Thompson, Tonya      09/11/2010 Williams, Stephanie
   7 NF0586 Current Narratives     7   Allen, Linda         09/11/2010 Williams, Stephanie
   8 NF0586 Current Narratives     8   Johnson, Tammy       09/11/2010 Williams, Stephanie
   9 NF0586 Current Narratives     9   Kelly, Melissa       09/11/2010 Williams, Stephanie
  10 NF0586 Current Narratives    10   Thompson, Tonya      09/11/2010 Williams, Stephanie
  11 NF2413 Political Comments     1   Jones, Robin         07/31/2010 White, Michelle
  12 NF2413 Political Comments     2   Sanchez, Brandon     08/07/2010 White, Michelle
  13 NF2413 Political Comments     3   Jones, Robin         07/31/2010 White, Michelle
  14 NF2413 Political Comments     4   Perez, Joshua        07/31/2010 White, Michelle
  15 NF2413 Political Comments     5   Williams, Nicholas   07/31/2010 White, Michelle
  16 NF2413 Political Comments     6   Patterson, Mary      08/14/2010 White, Michelle
  17 NF2413 Political Comments     7   Torres, Christopher  08/07/2010 White, Michelle
  18 NF2413 Political Comments     8   Robinson, Bonnie     08/07/2010 White, Michelle
  19 NF2413 Political Comments     9   Brown, Patricia      08/07/2010 White, Michelle
  20 NF8141 Favorite Essays        1   Clark, Todd          10/02/2010 Patterson, Daniel
  21 NF8141 Favorite Essays        2   Barnes, David        10/02/2010 Patterson, Daniel
  22 NF8141 Favorite Essays        3   Young, Richard       09/18/2010 Patterson, Daniel
  23 NF8141 Favorite Essays        4   Barnes, David        10/02/2010 Patterson, Daniel
  24 NF8141 Favorite Essays        5   Anderson, Daniel     09/18/2010 Patterson, Daniel
  25 NF8141 Favorite Essays        6   Anderson, Daniel     09/18/2010 Patterson, Daniel
  26 NF8141 Favorite Essays        7   Morris, Laura        09/18/2010 Patterson, Daniel
  27 NF8141 Favorite Essays        8   Powell, George       09/18/2010 Patterson, Daniel

Data set BOOKLIST has unique occurrences of the combinations of BOOKID and CHAPTER. Data set SUBMISSIONS contains multiple occurrences for two combinations of BOOKID and CHAPTER.

                         SUBMISSIONS

 Obs    bookid    chapter          author             draftdate
  1     NF0586       3       Torres, Christopher     05/13/2010
  2     NF0586       3       Torres, Christopher     06/17/2010
  3     NF0586       3       Torres, Christopher     06/30/2010
  4     NF2413       2       Sanchez, Brandon        04/22/2010
  5     NF2413       2       Sanchez, Brandon        06/02/2010
  6     NF2413       8       Robinson, Bonnie        04/01/2010
  7     NF8141       5       Anderson, Daniel        05/26/2010
  8     NF8141       6       Anderson, Daniel        07/01/2010

Resulting Data Set

Output 6.4 ALLDRAFTS Data Set

                      Example 6.4 ALLDRAFTS Data Set Created with DATA Step

Obs bookid     booktitle      chapter author                 duedate       editor          draftdate

  1 NF0586 Current Narratives     1   Smith, Rebecca      09/04/201 0 Williams, Stephanie          .
  2 NF0586 Current Narratives     2   Williams, Susan     09/04/201 0 Williams, Stephanie          .
  3 NF0586 Current Narratives     3   Torres, Christopher 09/11/201 0 Williams, Stephanie 05/13/2010
  4 NF0586 Current Narratives     3   Torres, Christopher 09/11/201 0 Williams, Stephanie 06/17/2010
  5 NF0586 Current Narratives     3   Torres, Christopher 09/11/201 0 Williams, Stephanie 06/30/2010
  6 NF0586 Current Narratives     4   Torres, Christopher 09/11/201 0 Williams, Stephanie          .
  7 NF0586 Current Narratives     5   Powell, George      09/11/201 0 Williams, Stephanie          .
  8 NF0586 Current Narratives     6   Thompson, Tonya     09/11/201 0 Williams, Stephanie          .
  9 NF0586 Current Narratives     7   Allen, Linda        09/11/201 0 Williams, Stephanie          .
 10 NF0586 Current Narratives     8   Johnson, Tammy      09/11/201 0 Williams, Stephanie          .
 11 NF0586 Current Narratives     9   Kelly, Melissa      09/11/201 0 Williams, Stephanie          .
 12 NF0586 Current Narratives    10   Thompson, Tonya     09/11/201 0 Williams, Stephanie          .
 13 NF2413 Political Comments     1   Jones, Robin        07/31/201 0 White, Michelle              .
 14 NF2413 Political Comments     2   Sanchez, Brandon    08/07/201 0 White, Michelle     04/22/2010
 15 NF2413 Political Comments     2   Sanchez, Brandon    08/07/201 0 White, Michelle     06/02/2010
 16 NF2413 Political Comments     3   Jones, Robin        07/31/201 0 White, Michelle              .
 17 NF2413 Political Comments     4   Perez, Joshua       07/31/201 0 White, Michelle              .
 18 NF2413 Political Comments     5   Williams, Nicholas  07/31/201 0 White, Michelle              .
 19 NF2413 Political Comments     6   Patterson, Mary     08/14/201 0 White, Michelle              .
 20 NF2413 Political Comments     7   Torres, Christopher 08/07/201 0 White, Michelle              .
 21 NF2413 Political Comments     8   Robinson, Bonnie    08/07/201 0 White, Michelle     04/01/2010
 22 NF2413 Political Comments     9   Brown, Patricia     08/07/201 0 White, Michelle              .
 23 NF8141 Favorite Essays        1   Clark, Todd         10/02/201 0 Patterson, Daniel            .
 24 NF8141 Favorite Essays        2   Barnes, David       10/02/201 0 Patterson, Daniel            .
 25 NF8141 Favorite Essays        3   Young, Richard      09/18/201 0 Patterson, Daniel            .
 26 NF8141 Favorite Essays        4   Barnes, David       10/02/201 0 Patterson, Daniel            .
 27 NF8141 Favorite Essays        5   Anderson, Daniel    09/18/201 0 Patterson, Daniel   05/26/2010
 28 NF8141 Favorite Essays        6   Anderson, Daniel    09/18/201 0 Patterson, Daniel   07/01/2010
 29 NF8141 Favorite Essays        7   Morris, Laura       09/18/201 0 Patterson, Daniel            .
 30 NF8141 Favorite Essays        8   Powell, George      09/18/201 0 Patterson, Daniel


Example Overview

This example illustrates match-merge processing when your transaction data set contains multiple values of matching variables. The DATA step is written so that all observations found in the master data set are copied to the output data set whether or not they have a match in the transaction data set.

The transaction data set contributes to the output data set only when a transaction observation matches an observation in the master data set. Multiple observations with the same matching variable values in the transaction data set form the same number of multiple observations in the output data set. Also, the DATA step adds to the output data set all variables that are unique to the transaction data set. Missing values are assigned to these variables for observations in the master data set without matches in the transaction data set.

At the beginning of each BY group, the values of the variables from the master data set are loaded into the Program Data Vector (PDV). The values of the variables that are unique to the master data set remain in the PDV during the processing of the matches in the transaction data set. Unless otherwise changed by statements in the DATA step, they are copied to the output data set.

Each match in the transaction data set, along with the information from the master data set already in the PDV, is written to the output data set as one observation.

Data set BOOKLIST is the master data set, and it tracks information for the chapters of three books. Data set SUBMISSIONS contains updates of information to apply to BOOKLIST.

The DATA step uses the MERGE and BY statements to combine the two data sets. The IN= data set option that is applied to BOOKLIST defines the temporary variable INLIST, which the program tests to determine whether data set BOOKLIST contributes to the current observation. The subsetting IF statement tests INLIST and writes out observations only if they are found in BOOKLIST. This means that the DATA step does not output an observation in SUBMISSIONS that has a combination of BOOKID and CHAPTER not found in BOOKLIST.

Example 6.4 and Example 6.5 use the same transaction data set, SUBMISSIONS, and produce different resulting data sets. The resulting data set for Example 6.4 has an observation for each matching observation from SUBMISSIONS. Thus, the data set has more observations than in BOOKLIST because of multiple observations for two combinations of BOOKID and CHAPTER values found in SUBMISSIONS. The resulting data set for Example 6.5 has only one observation from each of the two combinations of BOOKID and CHAPTER values found in SUBMISSIONS that have multiple observations. Therefore, the resulting data set in Example 6.5 has the same number of observations as found in BOOKLIST.

Before processing the DATA step, both data sets BOOKLIST and SUBMISSIONS must either be sorted or indexed by BOOKID and CHAPTER.

Program

Create data set ALLDRAFTS. Combine the two data sets. Place the master data set first in the MERGE statement. Define temporary variable INLIST so that the IF statement can test its value to determine whether the current observation has information contributed from BOOKLIST. Match the two data sets by BOOKID and CHAPTER. Keep all observations from BOOKLIST.

data alldrafts;
  merge booklist(in=inlist)
        submissions;




  by bookid chapter;

  if inlist;

run;

Related Technique

The following PROC SQL step creates a table equivalent to the data set that was created by the DATA step in the main example. A left join combines the two tables because the goal is to save all rows from BOOKLIST in ALLDRAFTS along with the matches of specific rows that were contributed from SUBMISSIONS. The program does not write to the output table the rows that are found only in SUBMISSIONS.

The table alias that is attached to the three columns in common is for table BOOKLIST. Because the step saves all rows present in BOOKLIST and only those rows from SUBMISSIONS that have a match in BOOKLIST, the values for the columns in common should come from BOOKLIST.

Create table ALLDRAFTS. Specify the columns to save in the output table. Specify the origin of the three columns in common between the two tables being joined by preceding the column name with the table alias. Combine the two tables with a left join. Specify an alias for each of the two tables. Specify the columns that match the two tables.

proc sql;
  create table alldrafts as
    select b.bookid, booktitle, b.chapter,
           b.author, duedate, editor,
           draftdate


      from booklist b
             left join
           submissions s
      on b.bookid=s.bookid and b.chapter=s.chapter;

quit;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.51.241