Example 7.2 Modifying a Non-Indexed Data Set in Place by Matching by a Common Variable

Goal

Modify a non-indexed master data set in place with transactions supplied in a second data set. Add and delete observations from the master data set. Match the two data sets by variables in common that uniquely identify the observations. The transaction data set can contain duplicate BY values of the variables in common while the master data set does not.

Example Features

Featured StepDATA step
Featured Step Options and StatementsMODIFY and BY statements

_IORC_ automatic variable

Macro function %SYSRC
A Closer LookControlling When the MODIFY Statement Outputs an Observation Handling Missing Values When Using the MODIFY and BY Statements

Input Data Sets

Data set DATACONFERENCE stores the schedule for one day of a conference. This data set is also used in Example 7.1.

                             DATACONFERENCE

  Obs   section    starttime talktime speaker             topic
   1 Applications  8:00 AM     50    Brooks, Jason        Customer Service
   2 Data Mining   8:00 AM     50    Jones, Janet         Direct Marketing
   3 Statistics    8:00 AM     50    Thomas, Linda        New Analytic Features
   4 Tutorials     8:00 AM     50    Evans, Matthew       Graphics
   5 Applications  9:00 AM     50    Washington, Debra    TBD
   6 Data Mining   9:00 AM     50    Edwards, Andrew      Text Mining Overview
   7 Statistics    9:00 AM     50    Cox, David           Mixed Models
   8 Tutorials     9:00 AM     50    Mitchell, Jeremy     Writing Functions
   9 Applications 10:30 AM     50    Robinson, Catherine Clinical Trials
  10 Data Mining  10:30 AM     50    Wilson, Joshua       Retail Industry
  11 Statistics   10:30 AM     50    Anderson, James      TBD
  12 Tutorials    10:30 AM     50    Nelson, Stephen      Reporting
  13 Applications  1:30 PM     50    Moore, Patrick       Energy Industry
  14 Data Mining   1:30 PM     50    Harris, Michael      Credit Risk
  15 Statistics    1:30 PM     50    Brown, Brandon       Cluster Analysis
  16 Tutorials     1:30 PM     50    White, Elizabeth     External Files
  17 Applications  3:00 PM     50    TBD                  TBD
  18 Data Mining   3:00 PM     50    Johnson, Joshua      Fraud Detection
  19 Statistics    3:00 PM     50    Cox, Mary            Predictive Modeling
  20 Tutorials     3:00 PM     50    Torres, Tara         Efficient Programming
  21 Applications  4:00 PM     50    Sanders, Joyce       Healthcare
  22 Data Mining   4:00 PM     50    White, Kimberly      Decision Making
  23 Statistics    4:00 PM     50    Richardson, Lisa     Bayesian Basics
  24 Tutorials     4:00 PM     50    Morris, Nicole       Customized Output

Data set CONFERENCECHANGES contains the updates to apply to DATACONFERENCE. Observations 4, 5, and 6 are not in DATACONFERENCE. Observations 3 and 7 have the same values for SECTION and STARTTIME.

                               CONFERENCECHANGES
 Obs  section       starttime  talktime  speaker           topic
  1   Applications     9:00        .                       Interfaces
  2   Statistics      10:30        .                       Statistical Graphics
  3   Applications    15:00        .     Brown, Cynthia
  4   Tutorials       17:00       30     Richardson, Lisa  Basic Statistics
  5   Break           10:15       15     (none)             (none)
  6   Break           14:45       15     (none)             (none)
  7   Applications    15:00        .                       Quality Control
  8   Data Mining     16:00        .     REMOVE

Resulting Data Set

Output 7.2 DATACONFERENCE Data Set

              Example 7.2 DATACONFERENCE Data Set Modified with DATA Step

  Obs section      starttime talktime speaker             topic

   1 Applications  8:00 AM     50    Brooks, Jason        Customer Service
   2 Data Mining   8:00 AM     50    Jones, Janet         Direct Marketing
   3 Statistics    8:00 AM     50    Thomas, Linda        New Analytic Features
   4 Tutorials     8:00 AM     50    Evans, Matthew       Graphics
   5 Applications  9:00 AM     50    Washington, Debra    Interfaces
   6 Data Mining   9:00 AM     50    Edwards, Andrew      Text Mining Overview
   7 Statistics    9:00 AM     50    Cox, David           Mixed Models
   8 Tutorials     9:00 AM     50    Mitchell, Jeremy     Writing Functions
   9 Applications 10:30 AM     50    Robinson, Catherine  Clinical Trials
  10 Data Mining  10:30 AM     50    Wilson, Joshua       Retail Industry
  11 Statistics   10:30 AM     50    Anderson, James      Statistical Graphics
  12 Tutorials    10:30 AM     50    Nelson, Stephen      Reporting
  13 Applications  1:30 PM     50    Moore, Patrick       Energy Industry
  14 Data Mining   1:30 PM     50    Harris, Michael      Credit Risk
  15 Statistics    1:30 PM     50    Brown, Brandon       Cluster Analysis
  16 Tutorials     1:30 PM     50    White, Elizabeth     External Files
  17 Applications  3:00 PM     50    Brown, Cynthia       Quality Control
  18 Data Mining   3:00 PM     50    Johnson, Joshua      Fraud Detection
  19 Statistics    3:00 PM     50    Cox, Mary            Predictive Modeling
  20 Tutorials     3:00 PM     50    Torres, Tara         Efficient Programming
  21 Applications  4:00 PM     50    Sanders, Joyce       Healthcare
  23 Statistics    4:00 PM     50    Richardson, Lisa     Bayesian Basics
  24 Tutorials     4:00 PM     50    Morris, Nicole       Customized Output
  25 Tutorials     5:00 PM     30    Richardson, Lisa     Basic Statistics
  26 Break        10:15 AM     15    (none)               (none)
  27 Break         2:45 PM     15    (none)               (none)


Example Overview

This example shows how to update a master data set in place with transactions that are matched to the master data set by the values of variables in common. The transactions in this example cause values in the master data set to be updated. They also add and delete observations.

Because it is not necessary to change the data set by adding variables, deleting variables, or modifying the descriptor portion of the data set, the data set can be modified in place and the MODIFY statement can be used. A DATA step that uses the MODIFY statement does not make a copy of the data set as it would if it used the SET statement or the UPDATE statement.

A BY statement matches the two data sets by the values of variables SECTION and STARTTIME. When you use a MODIFY statement, unlike when you match-merge with the SET statement, you do not have to sort or index your master and transaction data sets by the BY variables. However, sorting or indexing your data sets prior to the step does make the step process more efficiently.

The reason you do not have to sort or index your data sets is because the MODIFY statement completes the modifications by using dynamic WHERE processing. The only observations it processes in the master data set are the ones that match to an observation in the transaction data set. Every observation in the transaction data set is processed one at a time, and the MODIFY statement issues a dynamic WHERE statement on the master data set for the BY variable values in the observation in the transaction data set that is currently being processed.

Data set DATACONFERENCE stores the schedule for one day of a conference. The transactions in CONFERENCECHANGES that are identified in the following list by observation number result in these updates of data set DATACONFERENCE:

  • overlay of values in DATACONFERENCE with values from CONFERENCECHANGES for matched observations (observations 1, 2, 3, and 7) where the value in CONFERENCECHANGES is not missing.

  • addition of new observations (observations 4, 5, and 6 in CONFERENCECHANGES) to DATACONFERENCE.

  • removal of one observation (observation 8) from DATACONFERENCE. The special variable value of "REMOVE" for SPEAKER indicates in the DATA step code when an observation should be removed.

Note that the PROC PRINT report in Output 7.2 shows the three new observations appended to the end of DATACONFERENCE in the order in which they were read from CONFERENCECHANGES. The modified version of DATACONFERENCE after executing the DATA step has 26 observations. The observation for the "Data Mining" talk at 4:00 PM is marked for deletion, but not physically removed from the data set. Therefore, SAS does not list observation 22 in Output 7.2.

Data set CONFERENCECHANGES has two observations with the same BY values for SECTION and STARTTIME. The DATA step applies the transactions from both observations one after the other to the matching observation in the maser data set.

The DATA step processes each observation in CONFERENCECHANGES and attempts to link it to an observation in DATACONFERENCE. A numeric return code value is assigned to automatic variable _IORC_ to indicate the success of the match. This variable is created automatically when you use the MODIFY statement.

The values of the _IORC_ automatic variable are internal and subject to change. Instead of hardcoding the values in your SAS statements, you use the autocall macro program %SYSRC that SAS has supplied to enable you to test the values of _IORC_ while protecting your code from future changes in _IORC_ values. For more information about _IORC_ and %SYSRC, see "Checking for Errors When Using the MODIFY Statement or the SET Statement with the KEY= Option" in Example 3.11 in the "A Closer Look" section.

Program

Specify the name of the data set that the DATA step will modify. Modify data set DATACONFERENCE by applying the transactions in CONFERENCECHANGES. Match the two data sets by the values of SECTION and STARTTIME. Test the values of automatic variable _IORC_ that indicate the status of the search in the master data set for the current value of BY variables supplied by the transaction data set. Test if _IORC_ indicates that the search was successful. Replace data values in the DATACONFERENCE observation that is matched to an observation in CONFERENCECHANGES. Replace a value only if the value from CONFERENCECHANGES is not missing. Remove observations from DATACONFERENCE when SPEAKER= "REMOVE".

Test if _IORC_ indicates that the search did not find a match. Prevent writing an error message to the SAS log when no match is found. Add this new observation to the end of the master data set.

Stop the program for all other values of _IORC_, which indicate an unexpected error. Write messages to the SAS log about the error.

data dataconference;

  modify dataconference conferencechanges;


  by section starttime;

  select (_iorc_);



    when (%sysrc(_sok)) do;

      if speaker ne 'REMOVE' then replace;
      else remove;



    end;
    when (%sysrc(_dsenmr)) do;

      _error_=0;

      output;

    end;

    otherwise do;
      putlog 'ERROR: Program stopped at record ' _n_ '
                            of POLICY_UPDATES';
      stop;
    end;
  end;
run;

A Closer Look

Controlling When the MODIFY Statement Outputs an Observation

As a DATA step with a MODIFY statement processes a data set, each observation's variable values are moved into the Program Data Vector (PDV). By default, when the DATA step finishes processing the observation, it writes the observation back to the same physical position in the data set from where it was retrieved.

Using the REPLACE, OUTPUT, and REMOVE statements in conjunction with the MODIFY statement can control how the MODIFY statement handles the observations it is processing. The default action of a DATA step that uses the MODIFY statement and does not include at least one of these three statements is the same as if the REPLACE statement was placed as the last statement executed by the DATA step.

If you use any one of these three statements in a DATA step, you must explicitly program each action that you want to take.

  • The REPLACE statement rewrites the current observation in the data set to the same location from where it was retrieved. Any values you change during processing of the observation replace existing values in the data set. You would use this statement when you want to modify an observation. The REPLACE statement executes at the point it is processed in the DATA step.

  • The OUTPUT statement writes the current observation to the end of the data set. You would use this statement when you want to add an observation to the data set you are modifying. The OUTPUT statement executes at the point it is processed in the DATA step.

  • The REMOVE statement deletes the current observation in the data set. This might be either a physical or logical deletion. Output 7.2 shows how PROC PRINT lists a data set where a logical deletion has been made. Observation 22 is missing from the report.

If you modify variable values in an observation and use OUTPUT instead of REPLACE, you will have two occurrences of the observation in the data set. The original observation will remain as it was when read in. The modified version of the observation will be added to the end of the data set because of the OUTPUT statement.

Handling Missing Values When Using the MODIFY and BY Statements

The transaction data set in this example has several missing values, but these missing values do not replace nonmissing values in the master data set. The default action when using the MODIFY and BY statements is to not replace values in the master data set with missing values from the transaction data set. If the missing values are special missing values, then the MODIFY and BY statements do replace the existing values in the master data set with regular missing values.

Similar to the UPDATE statement that was used in Chapter 6, the MODIFY statement has the UPDATEMODE= option. This option can have one of two values: MISSINGCHECK or NOMSSINGCHECK. The default value for UPDATEMODE= is MISSINGCHECK. Because the default action is what this example requires, it was not necessary to add option UPDATEMODE= to the MODIFY statement.

If you want to allow missing values in the transaction data set to replace existing values in the master data set, add UPDATEMODE=NOMISSINGCHECK to the MODIFY statement.

For more information about the UPDATEMODE= option, see "Allowing Missing Values to Replace Existing Values in the Master Data Set When Updating a Master Data Set" in Example 6.1 in the "A Closer Look" section. This topic is discussed in the context of the UPDATE statement, but the uses and results are similar to that of the MODIFY statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.106.33