Example 6.1 Updating a Data Set and Controlling Whether Common Variables Are Overwritten with Missing Values

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Example 6.1 Updating a Data Set and Controlling Whether Common Variables Are Overwritten with Missing Values

Goal

Update a master data set with values from another data set. Do not replace a nonmissing value in the master data set with a missing value when the value for a common variable in the transaction data is missing. Also add a new variable to the updated master data set.

Examples 6.1, 6.2, 6.3, 6.4, and 6.5 illustrate ways to update common variables when combining data sets. The five examples apply the methods to the same master data set. Different transaction data sets are used in Examples 6.1, 6.2, and 6.3. Examples 6.4 and 6.5 use the same transaction data set and produce different resulting data sets.

Example Features

Featured Step	DATA step
Featured Step Options and Statements	UPDATE and BY statements
Related Technique	PROC SQL, full join using the ON clause, COALESCE function
A Closer Look	Allowing Missing Values to Replace Existing Values in the Master Data Set When Updating a Master Data Set
	Comparing the UPDATE and MERGE Statements
	Using UPDATE and MERGE When There Are Multiple Observations in a BY Group

Input Data Sets

Master data set BOOKLIST contains information about the chapters of three books. Transaction data set BOOKUPDATES contains updates of information that should be applied to BOOKLIST. It also has variable UPDATEDBY, which does not exist in BOOKLIST. Both data sets have unique occurrences of the combinations of values of BOOKID and CHAPTER.


                                        BOOKLIST

Obs bookid     booktitle      chapter author                  duedate       editor
  1 NF0586 Current Narratives     1   Smith, Rebecca       09/04/2010 Williams, Stephanie
  2 NF0586 Current Narratives     2   Williams, Susan      09/04/2010 Williams, Stephanie
  3 NF0586 Current Narratives     3    Torres, Christopher 09/11/2010 Williams, Stephanie
  4 NF0586 Current Narratives     4    Torres, Christopher 09/11/2010 Williams, Stephanie
  5 NF0586 Current Narratives     5   Powell, George       09/11/2010 Williams, Stephanie
  6 NF0586 Current Narratives     6   Thompson, Tonya      09/11/2010 Williams, Stephanie
  7 NF0586 Current Narratives     7   Allen, Linda         09/11/2010 Williams, Stephanie
  8 NF0586 Current Narratives     8   Johnson, Tammy       09/11/2010 Williams, Stephanie
  9 NF0586 Current Narratives     9   Kelly, Melissa       09/11/2010 Williams, Stephanie
 10 NF0586 Current Narratives    10   Thompson, Tonya      09/11/2010 Williams, Stephanie
 11 NF2413 Political Comments     1   Jones, Robin         07/31/2010 White, Michelle
 12 NF2413 Political Comments     2   Sanchez, Brandon     08/07/2010 White, Michelle
 13 NF2413 Political Comments     3   Jones, Robin         07/31/2010 White, Michelle
 14 NF2413 Political Comments     4   Perez, Joshua        07/31/2010 White, Michelle
 15 NF2413 Political Comments     5    Williams, Nicholas  07/31/2010 White, Michelle
 16 NF2413 Political Comments     6   Patterson, Mary      08/14/2010 White, Michelle
 17 NF2413 Political Comments     7    Torres, Christopher 08/07/2010 White, Michelle
 18 NF2413 Political Comments     8   Robinson, Bonnie     08/07/2010 White, Michelle
 19 NF2413 Political Comments     9   Brown, Patricia      08/07/2010 White, Michelle
 20 NF8141 Favorite Essays        1   Clark, Todd          10/02/2010 Patterson, Daniel
 21 NF8141 Favorite Essays        2   Barnes, David        10/02/2010 Patterson, Daniel
 22 NF8141 Favorite Essays        3   Young, Richard       09/18/2010 Patterson, Daniel
 23 NF8141 Favorite Essays        4   Barnes, David        10/02/2010 Patterson, Daniel
 24 NF8141 Favorite Essays        5   Anderson, Daniel     09/18/2010 Patterson, Daniel
 25 NF8141 Favorite Essays        6   Anderson, Daniel     09/18/2010 Patterson, Daniel
 26 NF8141 Favorite Essays        7   Morris, Laura        09/18/2010 Patterson, Daniel
 27 NF8141 Favorite Essays        8   Powell, George       09/18/2010 Patterson, Daniel

                         BOOKUPDATES

Obs   bookid   chapter      author         duedate   updatedby
 1    NF0586      4      Banks, James            .      JWE
 2    NF0586      9      King, Weston   09/18/2010      JWE
 3    NF8141      6                     10/02/2010      SAW

Resulting Data Set

Output 6.1a REVISEDLIST Data Set


                      Example 6.1 REVISEDLIST Data Set Created with DATA Step
  
Obs bookid     booktitle      chapter author                  duedate       editor        updatedby
  1 NF0586 Current Narratives     1   Smith, Rebecca       09/04/2010 Williams, Stephanie
  2 NF0586 Current Narratives     2   Williams, Susan      09/04/2010 Williams, Stephanie
  3 NF0586 Current Narratives     3   Torres, Christopher  09/11/2010 Williams, Stephanie
  4 NF0586 Current Narratives     4   Banks, James         09/11/2010 Williams, Stephanie    JWE
  5 NF0586 Current Narratives     5   Powell, George       09/11/2010 Williams, Stephanie
  6 NF0586 Current Narratives     6   Thompson, Tonya      09/11/2010 Williams, Stephanie
  7 NF0586 Current Narratives     7   Allen, Linda         09/11/2010 Williams, Stephanie
  8 NF0586 Current Narratives     8   Johnson, Tammy       09/11/2010 Williams, Stephanie
  9 NF0586 Current Narratives     9   King, Weston         09/18/2010 Williams, Stephanie    JWE
 10 NF0586 Current Narratives    10   Thompson, Tonya      09/11/2010 Williams, Stephanie
 11 NF2413 Political Comments     1   Jones, Robin         07/31/2010 White, Michelle
 12 NF2413 Political Comments     2   Sanchez, Brandon     08/07/2010 White, Michelle
 13 NF2413 Political Comments     3   Jones, Robin         07/31/2010 White, Michelle
 14 NF2413 Political Comments     4   Perez, Joshua        07/31/2010 White, Michelle
 15 NF2413 Political Comments     5   Williams, Nicholas   07/31/2010 White, Michelle
 16 NF2413 Political Comments     6   Patterson, Mary      08/14/2010 White, Michelle
 17 NF2413 Political Comments     7   Torres, Christopher  08/07/2010 White, Michelle
 18 NF2413 Political Comments     8   Robinson, Bonnie     08/07/2010 White, Michelle
 19 NF2413 Political Comments     9   Brown, Patricia      08/07/2010 White, Michelle
 20 NF8141 Favorite Essays        1   Clark, Todd          10/02/2010 Patterson, Daniel
 21 NF8141 Favorite Essays        2   Barnes, David        10/02/2010 Patterson, Daniel
 22 NF8141 Favorite Essays        3   Young, Richard       09/18/2010 Patterson, Daniel
 23 NF8141 Favorite Essays        4   Barnes, David        10/02/2010 Patterson, Daniel
 24 NF8141 Favorite Essays        5   Anderson, Daniel     09/18/2010 Patterson, Daniel
 25 NF8141 Favorite Essays        6   Anderson, Daniel     10/02/2010 Patterson, Daniel      SAW
 26 NF8141 Favorite Essays        7   Morris, Laura        09/18/2010 Patterson, Daniel
 27 NF8141 Favorite Essays        8   Powell, George       09/18/2010 Patterson, Daniel

Example Overview

This example shows you how to update a master data set with only nonmissing values from observations in a transaction data set. The UPDATE and BY statements in the DATA step update the master data set for observations in the two data sets that match according to the values of the BY variables.

Data set BOOKLIST is the master data set, and it tracks information for the chapters of three books. The transaction data set BOOKUPDATES contains updated information about selected chapters in the books.

No duplicates are present in either the master data set or the transaction data set. Additional programming statements are required when duplicate BY values exist. The only variables that are present in the transaction data set besides the matching variables are the variables to be updated. This program also adds one new variable to the output data set that exists in the transaction data set.

The DATA step updates the two variables, AUTHOR and DUEDATE, in data set BOOKLIST with information from BOOKUPDATES when a variable value is not missing. It uses the default action of the UPDATE and BY statements to perform the updates. Observations are matched by two BY variables, BOOKID and CHAPTER.

Missing values are present in the first and third observations of the transaction data set BOOKUPDATES.

Assume both data sets were sorted or indexed by BOOKID and CHAPTER prior to the DATA step.

A MERGE statement could be used instead of the UPDATE statement. However, in this example, it is simpler to use the UPDATE statement. You can take advantage of its default action of not replacing a variable's nonmissing value in the master data set with a missing value when the variable value is missing in the matching observation in the transaction data set. For more information about when to choose UPDATE or MERGE, see "Comparing the UPDATE and MERGE Statements" in the following "A Closer Look" section and SAS documentation.

Program

Create data set REVISEDLIST. Update data set BOOKLIST with information in BOOKUPDATES. Place the master data set first in the UPDATE statement. Specify how to match observations in the two data sets.

 data revisedlist;
   update booklist bookupdates;


  by bookid chapter;

run;

Related Technique

The following PROC SQL step creates a table equivalent to the data set that was created by the DATA step in the main example. A full join using the ON clause combines the two tables so that all matching and nonmatching rows in the two input tables are present in output table REVISEDBOOKLIST along with updates of specific rows that were contributed from BOOKUPDATES.

The COALESCE function performs the update of columns AUTHOR and DUEDATE in BOOKLIST. The first argument to each call to COALESCE is the column from BOOKUPDATES. The second argument is the column from BOOKLIST. The COALESCE function checks each of its arguments from left to right until it finds a nonmissing value. When the first argument is nonmissing, PROC SQL replaces the value in BOOKLIST with the value from BOOKUPDATES. When the first argument is missing, the value in BOOKLIST for the column remains the same.

The step also applies the COALESCE function to the matching columns, BOOKID and CHAPTER. In this example, you could select BOOKID and CHAPTER from the master table (B.BOOKID and B.CHAPTER) because all the transactions have a match in the master data set. However, if you have rows in your transaction table that do not have a match in your master table, the values of B.BOOKID are missing. Using the COALESCE function ensures that the unmatched values for the matching columns are copied to the output table.

Missing values are present in the first and third rows of table BOOKUPDATES. The missing values are not copied to REVISEDBOOKLIST.

The SELECT statement includes column UPDATEDBY, which is found only in the transaction table. Compared to the main example, you need to specify all columns you want to include in the output table when using PROC SQL. Unless you use the DROP or KEEP statements or options, the UPDATE statement in the main example copies all new variables in the transaction data set to the updated data set.

Create table REVISEDLIST. Apply the COALESCE function to the two matching columns so that if there are nonmatches in the transaction table, the nonmissing unmatched value is copied to the output table. Apply the COALESCE function to the columns that are being updated. Put the column contributed from the transaction table BOOKUPDATES first so that when it is nonmissing, it replaces the value in BOOKLIST.

Combine the two tables with a full join so that all matching and nonmatching rows are present in the output table. Specify an alias for each table. Specify the columns that match the two tables.

proc sql;
  create table revisedlist as
    select coalesce(u.bookid,b.bookid) as bookid,
           booktitle,
           coalesce(u.chapter,b.chapter) as chapter,


           coalesce(u.author,b.author) as author,
           coalesce(u.duedate,b.duedate) as duedate
                  format=mmddyy10.,



           editor,
           updatedby
      from booklist b
        full join
      bookupdates u
      on b.bookid=u.bookid and b.chapter=u.chapter;

quit;

A Closer Look

Allowing Missing Values to Replace Existing Values in the Master Data Set When Updating a Master Data Set

The DATA step in the preceding main example relies on the default action of the UPDATE statement that prevents missing values in the transaction data set from replacing existing values in the master data set. This action is controlled by UPDATE statement option UPDATEMODE=. This option can have one of two values: MISSINGCHECK or NOMISSINGCHECK. The default value for UPDATEMODE= is MISSINGCHECK. Because the default action is what this example requires, it was not necessary to add option UPDATEMODE= to the UPDATE statement.

If you want to allow missing values in the transaction data set to replace existing values in the master data set, add UPDATEMODE=NOMISSINGCHECK to the UPDATE statement. The DATA step in the main example is modified here to include this option:

data revisedmissing;
  update booklist bookupdates updatemode=nomissingcheck;
  by bookid chapter;
run;

Output 6.1b displays a PROC PRINT of the output data set REVISEDMISSING that was created by the preceding DATA step. Two missing values have replaced existing values in the updated data set. These correspond to the two missing values in the transaction data set:

variable DUEDATE for BOOKID= "NF0586" CHAPTER=4
variable AUTHOR for BOOKID= "NF8141" CHAPTER=6

Output 6.1b REVISEDMISSING Data Set

                    Example 6.1 REVISEDMISSING Data Set Created with DATA Step

Obs bookid     booktitle      chapter author                 duedate       editor        updatedby
  1 NF0586 Current Narratives     1   Smith, Rebecca       09/04/2010 Williams, Stephanie
  2 NF0586 Current Narratives     2   Williams, Susan      09/04/2010 Williams, Stephanie
  3 NF0586 Current Narratives     3    Torres, Christopher 09/11/2010 Williams, Stephanie
  4 NF0586 Current Narratives     4   Banks, James                 . Williams, Stephanie     JWE
  5 NF0586 Current Narratives     5   Powell, George       09/11/2010 Williams, Stephanie
  6 NF0586 Current Narratives     6   Thompson, Tonya      09/11/2010 Williams, Stephanie
  7 NF0586 Current Narratives     7   Allen, Linda         09/11/2010 Williams, Stephanie
  8 NF0586 Current Narratives     8   Johnson, Tammy       09/11/2010 Williams, Stephanie
  9 NF0586 Current Narratives     9   King, Weston         09/18/2010 Williams, Stephanie    JWE
 10 NF0586 Current Narratives    10   Thompson, Tonya      09/11/2010 Williams, Stephanie
 11 NF2413 Political Comments     1   Jones, Robin         07/31/2010 White, Michelle
 12 NF2413 Political Comments     2   Sanchez, Brandon     08/07/2010 White, Michelle
 13 NF2413 Political Comments     3   Jones, Robin         07/31/2010 White, Michelle
 14 NF2413 Political Comments     4   Perez, Joshua        07/31/2010 White, Michelle
 15 NF2413 Political Comments     5    Williams, Nicholas  07/31/2010 White, Michelle
 16 NF2413 Political Comments     6   Patterson, Mary      08/14/2010 White, Michelle
 17 NF2413 Political Comments     7    Torres, Christopher 08/07/2010 White, Michelle
 18 NF2413 Political Comments     8   Robinson, Bonnie     08/07/2010 White, Michelle
 19 NF2413 Political Comments     9   Brown, Patricia      08/07/2010 White, Michelle
 20 NF8141 Favorite Essays        1   Clark, Todd          10/02/2010 Patterson, Daniel
 21 NF8141 Favorite Essays        2   Barnes, David        10/02/2010 Patterson, Daniel
 22 NF8141 Favorite Essays        3   Young, Richard       09/18/2010 Patterson, Daniel
 23 NF8141 Favorite Essays        4   Barnes, David        10/02/2010 Patterson, Daniel
 24 NF8141 Favorite Essays        5   Anderson, Daniel     09/18/2010 Patterson, Daniel
 25 NF8141 Favorite Essays        6                        10/02/2010 Patterson, Daniel      SAW
 26 NF8141 Favorite Essays        7   Morris, Laura        09/18/2010 Patterson, Daniel
 27 NF8141 Favorite Essays        8   Powell, George       09/18/2010 Patterson, Daniel

Comparing the UPDATE and MERGE Statements

The UPDATE and MERGE statements can perform the similar task of updating a master data set with information that is supplied in a transaction data set. It is important to understand their usage differences so that you choose the better tool for your application. Table 6.1 presents an abridged comparison of the two statements. For detailed information about these statements, see SAS documentation.

Table 6.1. Comparing the UPDATE and MERGE Statements
UPDATE Statement	MERGE Statement
Can process only two data sets at a time	Can process an unlimited number of data sets
Can update and add observations to the master data set	Can update and add observations to the output data set
Can add and delete variables in the master data set	Can add and delete variables in the output data set
Must list master data set first	Usually list master data set first, but not required
Requires a BY statement	Updating usually uses a BY statement, but not required
Requires the master and transaction data sets be sorted or indexed by the matching variables in the BY statement	Updating usually expects the input data sets to be sorted or indexed by the matching variables in the BY statement, but not required
Must be no more than one observation in the BY group in the master data set (the DATA step will execute but with warnings, and SAS sets the ERROR variable to 1)	Updating usually expects no more than one observation in the BY group in the master data set, but not required (this condition does not set the ERROR variable to 1)
Outputs observation at the end of the BY group	Outputs each observation at the bottom of the DATA step or by explicit OUTPUT statement
Does not replace missing values in the master data set with missing values found in the matched observations in the transaction data set unless the UPDATE statement option UPDATEMODE= is set to NOMISSINGCHECK or if using special missing values	Replaces existing values in the first data set with missing values found in the matched observations in the second data set if the variables have the same names

The next section describes with examples how the UPDATE and MERGE statements process your master and transaction data sets when they have multiple observations in a BY group.

Using UPDATE and MERGE When There Are Multiple Observations in a BY Group

The UPDATE and MERGE statements differ in how they combine data sets when either the master or transaction data set contains multiple observations per BY group. The next code samples contrast how the two statements handle these duplicates.

Transaction data set with multiple observations in a BY group: When you use the UPDATE statement and have more than one observation in the transaction data set in a BY group, only the last in the series of multiple observations updates the matching observation in the master data set. When you use the MERGE statement and have more than one observation in the transaction data set in a BY group, all transaction observations in that BY group are added to the master data set.

For example, the following DATA step creates the transaction data set DUPDATES and it has three observations in the BY group of BOOKID= "NF8141" and CHAPTER=6.

data dupdates;
  input bookid $ 1-6 chapter duedate : mmddyy10.;
  format duedate mmddyy10.;
datalines;
NF8141 6  10/02/2010
NF8141 6  10/22/2010
NF8141 6  11/01/2010
;;;;

A DATA step that uses the UPDATE statement and a DATA step that uses the MERGE statement follow. Both DATA steps apply the transactions in DUPDATES to the original BOOKLIST data set that was created in the main example, and the two DATA steps produce different output data sets.

Using the UPDATE Statement	Using the MERGE Statement
data upddups; update booklist dupdates; by bookid chapter; run;	data mrgdups; merge booklist dupdates; by bookid chapter; run;

PROC PRINTs of UPDDUPS and MRGDUPS show the differences in the results. Data set UPDDUPS has only one observation for BOOKID= "NF8141" CHAPTER=6, and this observation has the value for DUEDATE equal to the value of DUEDATE in the las observation in DUPDATES (11/01/2010). Data set MRGDUPS has three observations for BOOKID= "NF8141" CHAPTER=6, one for each of the three observations in MRGDUPS.

The output lists only variables BOOKID, CHAPTER, and DUEDATE for the observations where BOOKID= "NF8141". The rows for CHAPTER=6 in each output display are highlighted.

Using the UPDATE Statement	Using the MERGE Statement
UPDDUPS Obs bookid chapter duedate 20 NF8141 1 10/02/2010 21 NF8141 2 10/02/2010 22 NF8141 3 09/18/2010 23 NF8141 4 10/02/2010 24 NF8141 5 09/18/2010 25 NF8141 6 11/01/2010 26 NF8141 7 09/18/2010 27 NF8141 8 09/18/2010	MRGDUPS Obs bookid chapter duedate 20 NF8141 1 10/02/2010 21 NF8141 2 10/02/2010 22 NF8141 3 09/18/2010 23 NF8141 4 10/02/2010 24 NF8141 5 09/18/2010 25 NF8141 6 10/02/2010 26 NF8141 6 10/22/2010 27 NF8141 6 11/01/2010 28 NF8141 7 09/18/2010 29 NF8141 7 09/18/2010

Using the UPDATE Statement

Using the MERGE Statement

            UPDDUPS

Obs  bookid  chapter     duedate
20   NF8141     1       10/02/2010
21   NF8141     2       10/02/2010
22   NF8141     3       09/18/2010
23   NF8141     4       10/02/2010
24   NF8141     5       09/18/2010
25   NF8141     6       11/01/2010
26   NF8141     7       09/18/2010
27   NF8141     8      09/18/2010

            MRGDUPS

Obs  bookid  chapter    duedate
20   NF8141     1      10/02/2010
21   NF8141     2      10/02/2010
22   NF8141     3      09/18/2010
23   NF8141     4      10/02/2010
24   NF8141     5      09/18/2010
25   NF8141     6      10/02/2010
26   NF8141     6      10/22/2010
27   NF8141     6      11/01/2010
28   NF8141     7      09/18/2010
29   NF8141     7      09/18/2010

Master data set with multiple observations in a BY group: When you use the UPDATE statement and have more than one observation in the master data set in a BY group, the variables in common between the master and transaction data sets on the first matching observation in the BY group are updated. The variables in common between the master and transaction data sets in subsequent matching observations in that BY group of multiple observations are not updated. Additionally, values for any new variables that are present in the transaction data set are copied only to the first matching observation in a BY group with multiple observations. Missing values are assigned to these new variables for all subsequent matching observations in that BY group.

SAS issues a WARNING and sets the automatic variable _ERROR_ to 1 when your master data set contains more than one observation in a BY group. The DATA step can execute, but this is a process you should program carefully because the design of the UPDATE statement is to process only one observation per BY group in the master data set. Here is the WARNING message that SAS displays in the situation of multiple observations per BY group:

WARNING: The MASTER data set contains more than one
         observation for a BY group.

When you use the MERGE statement and have more than one observation in the master data set in a BY group, only the variables in common between the master and transaction data sets in the first matching observation in the BY group are updated. This part of the MERGE statement process is identical to that of the UPDATE statement.

The process of adding new variables when using the MERGE statement differs from that of the UPDATE statement. With the MERGE statement, the values for any new variables that are present in the transaction data set are copied to all matching observations in the series of duplicates, not just the first observation as is done with the UPDATE statement. The DATA step loads the observation in the transaction data set into the Program Data Vector (PDV). With new variables found only in the transaction data set, the values for those variables remain in the PDV until the next observation in the transaction data set is processed (unless you change them with SAS language statements in the DATA step).

Unlike the UPDATE statement, the MERGE statement does not consider multiple observations in a BY group to be an error, and it does not write a message to the SAS log. However, SAS does write a warning if more than one of the data sets in your MERGE statement has multiple observations in a BY group.

For example, the following DATA step creates the transaction data set NEWTITLES, and it has one observation with the BY-variable value of BOOKID= "NF8141." The variable VERSION in NEWTITLES is not present in the master data set BOOKLIST.

data newtitles;
  input bookid $ 1-6 booktitle $ 8-25 version;
datalines;
NF8141 Popular Essays 2
;;;;

The master data set BOOKLIST has eight observations with the BY-variable value of BOOKID= "NF8141".

A DATA step that uses the UPDATE statement and a DATA step that uses the MERGE statement follow. Both DATA steps apply the transactions in NEWTITLES to the original BOOKLIST data set that was created in the main example, and they produce different output data sets. Note that these DATA steps have only one BY variable (BOOKID) compared to the two BY variables, BOOKID and CHAPTER, in the preceding DATA steps.

Using the UPDATE Statement	Using the MERGE Statement
data updtitles; update booklist newtitles; by bookid; run;	data mrgtitles; merge booklist newtitles; by bookid; run;

PROC PRINTs of UPDTITLES and MRGTITLES show the similarities and differences in the results. Both data sets have an updated value for BOOKTITLE in only the first of the eight observations for BOOKID= "NF8141". A nonmissing value for VERSION is found only on the first of the eight observations for BOOKID= "NF8141" in data set UPDTITLES, while all eight observations for BOOKID= "NF8141" in data set MRGTITLES have a value for VERSION.

The following output lists only variables BOOKID, BOOKTITLE, CHAPTER, and VERSION for the observations where BOOKID= "NF8141".

Remember that using the UPDATE statement when there are multiple observations in a BY group in the master data set causes SAS to write warnings to the SAS log and to set the automatic variable _ERROR_ to 1.

Using the UPDATE Statement	Using the MERGE Statement
UPDTITLES Obs bookid booktitle chapter version 20 NF8141 Popular Essays 1 2 21 NF8141 Favorite Essays 2 . 22 NF8141 Favorite Essays 3 . 23 NF8141 Favorite Essays 4 . 24 NF8141 Favorite Essays 5 . 25 NF8141 Favorite Essays 6 . 26 NF8141 Favorite Essays 7 . 27 NF8141 Favorite Essays 8 .	MRGTITLES Obs bookid booktitle chapter version 20 NF8141 Popular Essays 1 2 21 NF8141 Favorite Essays 2 2 22 NF8141 Favorite Essays 3 2 23 NF8141 Favorite Essays 4 2 24 NF8141 Favorite Essays 5 2 25 NF8141 Favorite Essays 6 2 26 NF8141 Favorite Essays 7 2 27 NF8141 Favorite Essays 8 2

Using the UPDATE Statement

Using the MERGE Statement

               UPDTITLES

Obs bookid booktitle     chapter version
 20 NF8141 Popular Essays   1        2
 21 NF8141 Favorite Essays  2        .
 22 NF8141 Favorite Essays  3        .
 23 NF8141 Favorite Essays  4        .
 24 NF8141 Favorite Essays  5        .
 25 NF8141 Favorite Essays  6        .
 26 NF8141 Favorite Essays  7        .
 27 NF8141 Favorite Essays  8        .

               MRGTITLES

Obs bookid booktitle     chapter version
 20 NF8141 Popular Essays   1        2
 21 NF8141 Favorite Essays  2        2
 22 NF8141 Favorite Essays  3        2
 23 NF8141 Favorite Essays  4        2
 24 NF8141 Favorite Essays  5        2
 25 NF8141 Favorite Essays  6        2
 26 NF8141 Favorite Essays  7        2
 27 NF8141 Favorite Essays  8        2

See Example 6.4 for how to write a DATA step that uses the MERGE statement to update a master data set that has multiple observations in a BY group.

Master and transaction data sets with multiple observations in a BY group: Updating or merging data sets when both master and transaction data sets have multiple observations in a BY group can be problematic. You should understand your data well before programming a DATA step where this situation exists.

The concepts that were illustrated in the previous two sections apply. When using the UPDATE statement, only the last observation in a BY group of multiple observations updates the first observation in the matching BY group of multiple observations. When using the MERGE statement, one-to-one matching occurs between observations in the master and transaction data sets within the matching BY group of multiple observations.

For example, the following DATA step creates the transaction data set MULTTITLES, and it has two observations with the BY-variable value of BOOKID= "NF8141". The variable VERSION in MULTTITLES is not present in the master data set BOOKLIST.

data multtitles;
  input bookid $ 1-6 booktitle $ 8-25 version;
datalines;
NF8141 Popular Essays      2
NF8141 Essays for All      3
;;;;

The master data set BOOKLIST has eight observations with the BY-variable value of BOOKID= "NF8141".

A DATA step that uses the UPDATE statement and a DATA step that uses the MERGE statement follow. Both DATA steps apply the transactions in MULTTITLES to the original BOOKLIST data set that was created in the main example, and they produce different output data sets. Note that these DATA steps have only one BY variable (BOOKID) compared to the two BY variables, BOOKID and CHAPTER, in some of the preceding DATA steps.

Using the UPDATE Statement	Using the MERGE Statement
data updmult; update booklist multtitles; by bookid; run;	data mrgmult; merge booklist multtitles; by bookid; run;

The following output for data set UPDMULT shows that only the first of the eight observations for BOOKID= "NF8141" has an updated value for BOOKTITLE and that this value comes from the second observation in MULTTITLES. The value of VERSION on the first observation also comes from the second observation in MULTTITLES.

The output for MRGMULT shows for BY group BOOKID= "NF8141" one-to-one matching of the two observations in MULTTITLES to the first two observations in BOOKLIST. The first observation in the BY group has values for BOOKTITLE and VERSION that were copied from the first observation in MULTTITLES. The second observation in the BY group has values for BOOKTITLE and VERSION copied from the second observation in MULTTITLES. The remaining six observations also have a value of VERSION from the second observation in MULTTITLES. Similar to the previous MERGE statement example, the value of VERSION from the second observation in MULTITLES is moved to the PDV and remains there until the next BY group is processed.

The following output lists only variables BOOKID, BOOKTITLE, CHAPTER, and VERSION for the observations where BOOKID= "NF8141".

Use of the UPDATE statement when there are multiple observations in the master data set BY group causes SAS to write warnings to the SAS log and to set the automatic variable _ERROR_ to 1.

When there are multiple observations in more than one of the data sets in the MERGE statement, SAS writes the following note to the SAS log and does not flag this as a warning or error.

NOTE: MERGE statement has more than one data set with
      repeats of BY values.

Using the UPDATE Statement	Using the MERGE Statement
UPDMULT Obs bookid booktitle chapter version 20 NF8141 Essays for All 1 3 21 NF8141 Favorite Essays 2 . 22 NF8141 Favorite Essays 3 . 23 NF8141 Favorite Essays 4 . 24 NF8141 Favorite Essays 5 . 25 NF8141 Favorite Essays 6 . 26 NF8141 Favorite Essays 7 . 27 NF8141 Favorite Essays 8 .	MRGMULT Obs bookid booktitle chapter version 20 NF8141 Popular Essays 1 2 21 NF8141 Essays for All 2 3 22 NF8141 Favorite Essays 3 3 23 NF8141 Favorite Essays 4 3 24 NF8141 Favorite Essays 5 3 25 NF8141 Favorite Essays 6 3 26 NF8141 Favorite Essays 7 3 27 NF8141 Favorite Essays 8 3

Using the UPDATE Statement

Using the MERGE Statement

               UPDMULT

Obs bookid booktitle     chapter version
 20 NF8141 Essays for All   1        3
 21 NF8141 Favorite Essays  2        .
 22 NF8141 Favorite Essays  3        .
 23 NF8141 Favorite Essays  4        .
 24 NF8141 Favorite Essays  5        .
 25 NF8141 Favorite Essays  6        .
 26 NF8141 Favorite Essays  7        .
 27 NF8141 Favorite Essays  8        .

               MRGMULT

Obs bookid booktitle     chapter version
 20 NF8141 Popular Essays   1        2
 21 NF8141 Essays for All   2        3
 22 NF8141 Favorite Essays  3        3
 23 NF8141 Favorite Essays  4        3
 24 NF8141 Favorite Essays  5        3
 25 NF8141 Favorite Essays  6        3
 26 NF8141 Favorite Essays  7        3
 27 NF8141 Favorite Essays  8        3

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Example 6.1 Updating a Data Set and Controlling Whether Common Variables Are Overwritten with Missing Values

Create new playlist

Sign In

Sign Up

Example 6.1 Updating a Data Set and Controlling Whether Common Variables Are Overwritten with Missing Values

Goal

Example Features

Input Data Sets

Resulting Data Set

Output 6.1a REVISEDLIST Data Set

Example Overview

Program

Related Technique

A Closer Look

Allowing Missing Values to Replace Existing Values in the Master Data Set When Updating a Master Data Set

Output 6.1b REVISEDMISSING Data Set

Comparing the UPDATE and MERGE Statements

Using UPDATE and MERGE When There Are Multiple Observations in a BY Group

Table of Contents for
Example 6.1 Updating a Data Set and Controlling Whether Common Variables Are Overwritten with Missing Values