Example 6.2 Updating a Data Set and Allowing Some Values to Be Updated with Missing Values

Goal

Update a master data set with values from another data set. The second data set contains missing values. Control which missing values in the second data set can overwrite existing values in the master data set.

Examples 6.1, 6.2, 6.3, 6.4, and 6.5 illustrate ways to update common variables when combining data sets. The five examples apply the methods to the same master data set. Different transaction data sets are used in Examples 6.1, 6.2, and 6.3. Examples 6.4 and 6.5 use the same transaction data set and produce different resulting data sets.

Example Features

Featured StepDATA step
Featured Step Options and StatementsMISSING statement

UPDATE and BY statements

Special missing values
Related TechniquePROC SQL, full join, CASE expression, COALESCE function, special missing values

Input Data Sets

Master data set BOOKLIST contains information about the chapters of three books. Transaction data set BOOKMISS contains updates of information that should be applied to BOOKLIST. Some of the values for DUEDATE are missing, and some values are coded with the special missing value, "_" (underscore).

                                         BOOKLIST

Obs bookid     booktitle      chapter author                  duedate       editor
  1 NF0586 Current Narratives     1   Smith, Rebecca       09/04/2010 Williams, Stephanie
  2 NF0586 Current Narratives     2   Williams, Susan      09/04/2010 Williams, Stephanie
  3 NF0586 Current Narratives     3   Torres, Christopher  09/11/2010 Williams, Stephanie
  4 NF0586 Current Narratives     4   Torres, Christopher  09/11/2010 Williams, Stephanie
  5 NF0586 Current Narratives     5   Powell, George       09/11/2010 Williams, Stephanie
  6 NF0586 Current Narratives     6   Thompson, Tonya      09/11/2010 Williams, Stephanie
  7 NF0586 Current Narratives     7   Allen, Linda         09/11/2010 Williams, Stephanie
  8 NF0586 Current Narratives     8   Johnson, Tammy       09/11/2010 Williams, Stephanie
  9 NF0586 Current Narratives     9   Kelly, Melissa       09/11/2010 Williams, Stephanie
 10 NF0586 Current Narratives    10   Thompson, Tonya      09/11/2010 Williams, Stephanie
 11 NF2413 Political Comments     1   Jones, Robin         07/31/2010 White, Michelle
 12 NF2413 Political Comments     2   Sanchez, Brandon     08/07/2010 White, Michelle
 13 NF2413 Political Comments     3   Jones, Robin         07/31/2010 White, Michelle
 14 NF2413 Political Comments     4   Perez, Joshua        07/31/2010 White, Michelle
 15 NF2413 Political Comments     5   Williams, Nicholas   07/31/2010 White, Michelle
 16 NF2413 Political Comments     6   Patterson, Mary      08/14/2010 White, Michelle
 17 NF2413 Political Comments     7   Torres, Christopher  08/07/2010 White, Michelle
 18 NF2413 Political Comments     8   Robinson, Bonnie     08/07/2010 White, Michelle
 19 NF2413 Political Comments     9   Brown, Patricia      08/07/2010 White, Michelle
 20 NF8141 Favorite Essays        1   Clark, Todd          10/02/2010 Patterson, Daniel
 21 NF8141 Favorite Essays        2   Barnes, David        10/02/2010 Patterson, Daniel
 22 NF8141 Favorite Essays        3   Young, Richard       09/18/2010 Patterson, Daniel
 23 NF8141 Favorite Essays        4   Barnes, David        10/02/2010 Patterson, Daniel
 24 NF8141 Favorite Essays        5   Anderson, Daniel     09/18/2010 Patterson, Daniel
 25 NF8141 Favorite Essays        6   Anderson, Daniel     09/18/2010 Patterson, Daniel
 26 NF8141 Favorite Essays        7   Morris, Laura        09/18/2010 Patterson, Daniel
 27 NF8141 Favorite Essays        8   Powell, George       09/18/2010 Patterson, Daniel

Both data sets have unique occurrences of the combinations of values of BOOKID and CHAPTER.

                            BOOKMISS

Obs    bookid    chapter    author              duedate
 1     NF0586       3                                 _
 2     NF0586       4                                 _
 3     NF2413       4       Loren, Marie              .
 4     NF2413       7                                 _
 5     NF8141       5                        10/03/2010
 6     NF8141       6                        10/03/2010

Resulting Data Set

Output 6.2 REVISEDMISS Data Set

                    Example 6.2 REVISEDMISS Data Set Created with DATA Step

Obs  bookid      booktitle       chapter  author                  duedate        editor

  1  NF0586  Current Narratives      1    Smith, Rebecca        09/04/2010  Williams, Stephanie
  2  NF0586  Current Narratives      2    Williams, Susan       09/04/2010  Williams, Stephanie
  3  NF0586  Current Narratives      3    Torres, Christopher            .  Williams, Stephanie
  4  NF0586  Current Narratives      4    Torres, Christopher            .  Williams, Stephanie
  5  NF0586  Current Narratives      5    Powell, George        09/11/2010  Williams, Stephanie
  6  NF0586  Current Narratives      6    Thompson, Tonya       09/11/2010  Williams, Stephanie
  7  NF0586  Current Narratives      7    Allen, Linda          09/11/2010  Williams, Stephanie
  8  NF0586  Current Narratives      8    Johnson, Tammy        09/11/2010  Williams, Stephanie
  9  NF0586  Current Narratives      9    Kelly, Melissa        09/11/2010  Williams, Stephanie
 10  NF0586  Current Narratives     10    Thompson, Tonya       09/11/2010  Williams, Stephanie
 11  NF2413  Political Comments      1    Jones, Robin          07/31/2010  White, Michelle
 12  NF2413  Political Comments      2    Sanchez, Brandon      08/07/2010  White, Michelle
 13  NF2413  Political Comments      3    Jones, Robin          07/31/2010  White, Michelle
 14  NF2413  Political Comments      4    Loren, Marie          07/31/2010  White, Michelle
 15  NF2413  Political Comments      5    Williams, Nicholas    07/31/2010  White, Michelle
 16  NF2413  Political Comments      6    Patterson, Mary       08/14/2010  White, Michelle
 17  NF2413  Political Comments      7    Torres, Christopher            .  White, Michelle
 18  NF2413  Political Comments      8    Robinson, Bonnie      08/07/2010  White, Michelle
 19  NF2413  Political Comments      9    Brown, Patricia       08/07/2010  White, Michelle
 20  NF8141  Favorite Essays         1    Clark, Todd           10/02/2010  Patterson, Daniel
 21  NF8141  Favorite Essays         2    Barnes, David         10/02/2010  Patterson, Daniel
 22  NF8141  Favorite Essays         3    Young, Richard        09/18/2010  Patterson, Daniel
 23  NF8141  Favorite Essays         4    Barnes, David         10/02/2010  Patterson, Daniel
 24  NF8141  Favorite Essays         5    Anderson, Daniel      10/03/2010  Patterson, Daniel
 25  NF8141  Favorite Essays         6    Anderson, Daniel      10/03/2010  Patterson, Daniel
 26  NF8141  Favorite Essays         7    Morris, Laura         09/18/2010  Patterson, Daniel
 27  NF8141  Favorite Essays         8    Powell, George        09/18/2010  Patterson, Daniel


Example Overview

This example shows you how to allow some missing values to replace existing values when updating a master data set. The UPDATE and BY statements update the information in the master data set for observations that are matched by the values of the BY variables. By default, the UPDATE statement does not replace existing values with missing values. Example 6.1 discussed in the "A Closer Look" section the use of UPDATE statement option UPDATEMODE= to control this replacement action.

This example does not use the UPDATEMODE= option to selectively replace existing values in the master data set with missing values. Instead it maintains the default value of MISSINGCHECK for option UPDATEMODE=, and it uses special missing values and regular missing values to replace some existing values with missing values.

Data set BOOKLIST is the master data set, and it tracks information about the chapters of three books.

The transaction data set BOOKMISS contains values for DUEDATE that have regular missing values and values that have the underscore character (_) assigned as a special missing value. Special missing values apply to numeric variables. The design of the UPDATE statement with UPDATEMODE=MISSINGCHECK in effect causes an existing value for a variable in the master data set to be overwritten with a regular missing value. This occurs if a special missing value has been assigned to the variable in its matched observation in the transaction data set. In this example, a regular missing value in the transaction data set still does not overwrite an existing value in the master data set.

The DATA step updates two variables, AUTHOR and DUEDATE, in data set BOOKLIST with information from BOOKMISS. Observations are matched by two BY variables, BOOKID and CHAPTER.

Missing values for DUEDATE are present in the first four observations of transaction data set BOOKMISS. Special missing values are found in observations 1, 2, and 4. Observations 5 and 6 have regular missing values for variable AUTHOR and new values for variable DUEDATE. Observation 3 has a new value for variable AUTHOR and a regular missing value for DUEDATE.

Assume both data sets were sorted or indexed by BOOKID and CHAPTER prior to the DATA step.

The DATA step in this example is identical to the one in Example 6.1. The difference between the two examples is the assignment of the special missing value prior to the DATA step.

Program

Precede the DATA step that creates BOOKMISS with the MISSING statement. Assign the underscore (_) character as a special missing value. Create data set BOOKMISS. Note the underscores in the DUEDATE fields in rows 1, 2, and 4. Note the regular missing value for DUEDATE in row 3.

Create data set REVISEDMISS. Update data set BOOKLIST with information in BOOKMISS. Place the master data set first in the UPDATE statement. Specify how to match observations in the two data sets.

missing _;


data bookmiss;
  input bookid $ 1-6 chapter author $ 11-30
        duedate : mmddyy10.;
  format duedate mmddyy10.;
datalines;
NF0586 3                        _
NF0586 4                        _
NF2413 4  Loren, Marie          .
NF2413 7                        _
NF8141 5                        10/03/2010
NF8141 6                        10/03/2010
;;;;
data revisedmiss;
  update booklist bookmiss;

  by bookid chapter;

run;

Related Technique

The PROC SQL step that follows creates a table equivalent to the REVISEDMISS data set that was created by the DATA step in the main example. It uses CASE expressions and the COALESCE function to determine how to update the BOOKLIST table. The COALESCE function returns the first nonmissing value in the arguments that are supplied to it.

The two input tables are combined with a full join. All matching and nonmatching rows in both tables are present in REVISEDMISS along with the updates to specific rows and columns as specified in BOOKMISS.

As in the preceding DATA step, the following PROC SQL step distinguishes between regular and special missing values so that some missing values can replace existing values in the master data set when updating the master data set. The code is written so that a column with a special missing value in the transaction data set is assigned a regular missing value in the output table. A regular missing value in a column in the transaction data set does not replace an existing value in the master data set.

The CASE expression tests when the value for DUEDATE in the transaction table has the special missing value of underscore. When testing a numeric variable, the value is written as the regular missing value of period followed by the special missing value. When a value for DUEDATE is the special missing value of underscore, PROC SQL assigns a regular missing value to the value of DUEDATE in the output table. Otherwise, PROC SQL applies the COALESCE function to the two values of DUEDATE from the two input tables BOOKMISS and BOOKLIST in that order. When the value of DUEDATE in BOOKMISS is a regular missing value, the COALESCE function returns the value for DUEDATE from BOOKLIST if it is nonmissing. If both are missing, the result is missing.

Create table REVISEDMISS. Apply the COALESCE function to the two matching columns so that if there are nonmatches in the transaction table, the nonmissing unmatched value is copied to the output table. If the value for DUEDATE in the transaction table is the special missing value of underscore (_), assign a regular missing value to DUEDATE. Otherwise, assign it the first nonmissing value of the two arguments. End the CASE expression. Format the values of DUEDATE in the output table.

Combine the two tables with a full join so that all matching and nonmatching rows are present in the output table. Specify an alias for each of the two tables. Specify the columns that match the two tables.

proc sql;
  create table revisedmiss as
     select coalesce(m.bookid,b.bookid) as bookid,
            booktitle,
            coalesce(m.chapter,b.chapter) as chapter,
            coalesce(m.author,b.author) as author,

            case when m.duedate=._ then .
                 else coalesce(m.duedate,b.duedate)




            end as duedate format=mmddyy10.,

            editor
       from booklist b
         full join
       bookmiss m


       on b.bookid=m.bookid and b.chapter=m.chapter;

quit;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.215.178