Match a master data set and transaction data set where the master data set has only unique combinations of the values of the matching variables and the transaction data set can have duplicate combinations. Keep the multiple observations for the duplicate combinations of the matching variables in the output data set. Only keep information from the transaction data set for observations with a match in the master data set.
Examples 6.1, 6.2, 6.3, 6.4, and 6.5 illustrate ways to update common variables when combining data sets. The five examples apply the methods to the same master data set. Different transaction data sets are used in Examples 6.1, 6.2, and 6.3. Examples 6.4 and 6.5 use the same transaction data set and produce different resulting data sets.
Featured Step | DATA step |
Featured Step Options and Statements | Match-merge with MERGE and BY statements IN= data set option |
Related Technique | PROC SQL, left join |
Master data set BOOKLIST contains information about the chapters of three books. Transaction data set SUBMISSIONS contains updates of information to apply to BOOKLIST.
BOOKLIST Obs bookid booktitle chapter author duedate editor 1 NF0586 Current Narratives 1 Smith, Rebecca 09/04/2010 Williams, Stephanie 2 NF0586 Current Narratives 2 Williams, Susan 09/04/2010 Williams, Stephanie 3 NF0586 Current Narratives 3 Torres, Christopher 09/11/2010 Williams, Stephanie 4 NF0586 Current Narratives 4 Torres, Christopher 09/11/2010 Williams, Stephanie 5 NF0586 Current Narratives 5 Powell, George 09/11/2010 Williams, Stephanie 6 NF0586 Current Narratives 6 Thompson, Tonya 09/11/2010 Williams, Stephanie 7 NF0586 Current Narratives 7 Allen, Linda 09/11/2010 Williams, Stephanie 8 NF0586 Current Narratives 8 Johnson, Tammy 09/11/2010 Williams, Stephanie 9 NF0586 Current Narratives 9 Kelly, Melissa 09/11/2010 Williams, Stephanie 10 NF0586 Current Narratives 10 Thompson, Tonya 09/11/2010 Williams, Stephanie 11 NF2413 Political Comments 1 Jones, Robin 07/31/2010 White, Michelle 12 NF2413 Political Comments 2 Sanchez, Brandon 08/07/2010 White, Michelle 13 NF2413 Political Comments 3 Jones, Robin 07/31/2010 White, Michelle 14 NF2413 Political Comments 4 Perez, Joshua 07/31/2010 White, Michelle 15 NF2413 Political Comments 5 Williams, Nicholas 07/31/2010 White, Michelle 16 NF2413 Political Comments 6 Patterson, Mary 08/14/2010 White, Michelle 17 NF2413 Political Comments 7 Torres, Christopher 08/07/2010 White, Michelle 18 NF2413 Political Comments 8 Robinson, Bonnie 08/07/2010 White, Michelle 19 NF2413 Political Comments 9 Brown, Patricia 08/07/2010 White, Michelle 20 NF8141 Favorite Essays 1 Clark, Todd 10/02/2010 Patterson, Daniel 21 NF8141 Favorite Essays 2 Barnes, David 10/02/2010 Patterson, Daniel 22 NF8141 Favorite Essays 3 Young, Richard 09/18/2010 Patterson, Daniel 23 NF8141 Favorite Essays 4 Barnes, David 10/02/2010 Patterson, Daniel 24 NF8141 Favorite Essays 5 Anderson, Daniel 09/18/2010 Patterson, Daniel 25 NF8141 Favorite Essays 6 Anderson, Daniel 09/18/2010 Patterson, Daniel 26 NF8141 Favorite Essays 7 Morris, Laura 09/18/2010 Patterson, Daniel 27 NF8141 Favorite Essays 8 Powell, George 09/18/2010 Patterson, Daniel
Data set BOOKLIST has unique occurrences of the combinations of BOOKID and CHAPTER. Data set SUBMISSIONS contains multiple occurrences for two combinations of BOOKID and CHAPTER.
SUBMISSIONS Obs bookid chapter author draftdate 1 NF0586 3 Torres, Christopher 05/13/2010 2 NF0586 3 Torres, Christopher 06/17/2010 3 NF0586 3 Torres, Christopher 06/30/2010 4 NF2413 2 Sanchez, Brandon 04/22/2010 5 NF2413 2 Sanchez, Brandon 06/02/2010 6 NF2413 8 Robinson, Bonnie 04/01/2010 7 NF8141 5 Anderson, Daniel 05/26/2010 8 NF8141 6 Anderson, Daniel 07/01/2010
Output 6.4 ALLDRAFTS Data SetExample 6.4 ALLDRAFTS Data Set Created with DATA Step Obs bookid booktitle chapter author duedate editor draftdate 1 NF0586 Current Narratives 1 Smith, Rebecca 09/04/201 0 Williams, Stephanie . 2 NF0586 Current Narratives 2 Williams, Susan 09/04/201 0 Williams, Stephanie . 3 NF0586 Current Narratives 3 Torres, Christopher 09/11/201 0 Williams, Stephanie 05/13/2010 4 NF0586 Current Narratives 3 Torres, Christopher 09/11/201 0 Williams, Stephanie 06/17/2010 5 NF0586 Current Narratives 3 Torres, Christopher 09/11/201 0 Williams, Stephanie 06/30/2010 6 NF0586 Current Narratives 4 Torres, Christopher 09/11/201 0 Williams, Stephanie . 7 NF0586 Current Narratives 5 Powell, George 09/11/201 0 Williams, Stephanie . 8 NF0586 Current Narratives 6 Thompson, Tonya 09/11/201 0 Williams, Stephanie . 9 NF0586 Current Narratives 7 Allen, Linda 09/11/201 0 Williams, Stephanie . 10 NF0586 Current Narratives 8 Johnson, Tammy 09/11/201 0 Williams, Stephanie . 11 NF0586 Current Narratives 9 Kelly, Melissa 09/11/201 0 Williams, Stephanie . 12 NF0586 Current Narratives 10 Thompson, Tonya 09/11/201 0 Williams, Stephanie . 13 NF2413 Political Comments 1 Jones, Robin 07/31/201 0 White, Michelle . 14 NF2413 Political Comments 2 Sanchez, Brandon 08/07/201 0 White, Michelle 04/22/2010 15 NF2413 Political Comments 2 Sanchez, Brandon 08/07/201 0 White, Michelle 06/02/2010 16 NF2413 Political Comments 3 Jones, Robin 07/31/201 0 White, Michelle . 17 NF2413 Political Comments 4 Perez, Joshua 07/31/201 0 White, Michelle . 18 NF2413 Political Comments 5 Williams, Nicholas 07/31/201 0 White, Michelle . 19 NF2413 Political Comments 6 Patterson, Mary 08/14/201 0 White, Michelle . 20 NF2413 Political Comments 7 Torres, Christopher 08/07/201 0 White, Michelle . 21 NF2413 Political Comments 8 Robinson, Bonnie 08/07/201 0 White, Michelle 04/01/2010 22 NF2413 Political Comments 9 Brown, Patricia 08/07/201 0 White, Michelle . 23 NF8141 Favorite Essays 1 Clark, Todd 10/02/201 0 Patterson, Daniel . 24 NF8141 Favorite Essays 2 Barnes, David 10/02/201 0 Patterson, Daniel . 25 NF8141 Favorite Essays 3 Young, Richard 09/18/201 0 Patterson, Daniel . 26 NF8141 Favorite Essays 4 Barnes, David 10/02/201 0 Patterson, Daniel . 27 NF8141 Favorite Essays 5 Anderson, Daniel 09/18/201 0 Patterson, Daniel 05/26/2010 28 NF8141 Favorite Essays 6 Anderson, Daniel 09/18/201 0 Patterson, Daniel 07/01/2010 29 NF8141 Favorite Essays 7 Morris, Laura 09/18/201 0 Patterson, Daniel . 30 NF8141 Favorite Essays 8 Powell, George 09/18/201 0 Patterson, Daniel |
This example illustrates match-merge processing when your transaction data set contains multiple values of matching variables. The DATA step is written so that all observations found in the master data set are copied to the output data set whether or not they have a match in the transaction data set.
The transaction data set contributes to the output data set only when a transaction observation matches an observation in the master data set. Multiple observations with the same matching variable values in the transaction data set form the same number of multiple observations in the output data set. Also, the DATA step adds to the output data set all variables that are unique to the transaction data set. Missing values are assigned to these variables for observations in the master data set without matches in the transaction data set.
At the beginning of each BY group, the values of the variables from the master data set are loaded into the Program Data Vector (PDV). The values of the variables that are unique to the master data set remain in the PDV during the processing of the matches in the transaction data set. Unless otherwise changed by statements in the DATA step, they are copied to the output data set.
Each match in the transaction data set, along with the information from the master data set already in the PDV, is written to the output data set as one observation.
Data set BOOKLIST is the master data set, and it tracks information for the chapters of three books. Data set SUBMISSIONS contains updates of information to apply to BOOKLIST.
The DATA step uses the MERGE and BY statements to combine the two data sets. The IN= data set option that is applied to BOOKLIST defines the temporary variable INLIST, which the program tests to determine whether data set BOOKLIST contributes to the current observation. The subsetting IF statement tests INLIST and writes out observations only if they are found in BOOKLIST. This means that the DATA step does not output an observation in SUBMISSIONS that has a combination of BOOKID and CHAPTER not found in BOOKLIST.
Example 6.4 and Example 6.5 use the same transaction data set, SUBMISSIONS, and produce different resulting data sets. The resulting data set for Example 6.4 has an observation for each matching observation from SUBMISSIONS. Thus, the data set has more observations than in BOOKLIST because of multiple observations for two combinations of BOOKID and CHAPTER values found in SUBMISSIONS. The resulting data set for Example 6.5 has only one observation from each of the two combinations of BOOKID and CHAPTER values found in SUBMISSIONS that have multiple observations. Therefore, the resulting data set in Example 6.5 has the same number of observations as found in BOOKLIST.
Before processing the DATA step, both data sets BOOKLIST and SUBMISSIONS must either be sorted or indexed by BOOKID and CHAPTER.
Create data set ALLDRAFTS. Combine the two data sets. Place the master data set first in the MERGE statement. Define temporary variable INLIST so that the IF statement can test its value to determine whether the current observation has information contributed from BOOKLIST. Match the two data sets by BOOKID and CHAPTER. Keep all observations from BOOKLIST.
data alldrafts; merge booklist(in=inlist) submissions; by bookid chapter; if inlist; run;
The following PROC SQL step creates a table equivalent to the data set that was created by the DATA step in the main example. A left join combines the two tables because the goal is to save all rows from BOOKLIST in ALLDRAFTS along with the matches of specific rows that were contributed from SUBMISSIONS. The program does not write to the output table the rows that are found only in SUBMISSIONS.
The table alias that is attached to the three columns in common is for table BOOKLIST. Because the step saves all rows present in BOOKLIST and only those rows from SUBMISSIONS that have a match in BOOKLIST, the values for the columns in common should come from BOOKLIST.
Create table ALLDRAFTS. Specify the columns to save in the output table. Specify the origin of the three columns in common between the two tables being joined by preceding the column name with the table alias. Combine the two tables with a left join. Specify an alias for each of the two tables. Specify the columns that match the two tables.
proc sql; create table alldrafts as select b.bookid, booktitle, b.chapter, b.author, duedate, editor, draftdate from booklist b left join submissions s on b.bookid=s.bookid and b.chapter=s.chapter; quit;
3.22.51.241