Modifying Observations Using a Transaction Data Set

Overview

You can use a MODIFY statement to update all observations in a data set, but there are times when you want to update only selected observations. You can modify a master SAS data set with values in a transaction data set by using the MODIFY statement with a BY statement to apply updates by matching observations.
General form, MODIFY statement with a BY statement:
DATA SAS-data-set;
MODIFY SAS-data-set transaction-data-set;
BY key-variable;
RUN;
Here is an explanation of the syntax:
SAS-data-set
is the name of the SAS data set that you want to modify (also called the master data set).
transaction-data-set
is the name of the SAS data set with updated values.
key-variable
is the name of the variable whose values are matched in the master and transaction data sets.
Note: In the MODIFY statement, you must list the master data set followed by the transaction data set.
The BY statement matches observations from the transaction data set with observations in the master data set. When the MODIFY statement reads an observation from the transaction data set, it uses dynamic WHERE processing (SAS internally generates a WHERE statement) to locate the matching observation in the master data set. The matching observation in the master data set can be replaced, deleted, or appended. By default, the observation is replaced.
Note: Because the MODIFY statement uses WHERE processing to locate matching observations, neither data set requires sorting. However, having the master data set sorted or indexed and the transaction data set sorted reduces processing overhead, especially for large files.

Example

Suppose you have a master data set, Capacity, which has route numbers for an airline. Some of the route numbers have changed, and the changes are stored in a transaction data set, Newrtnum. The master data set is updated by matching values of the variable FlightID.
proc print data=capacity(obs=5);
run;

data capacity;
   modify capacity sasuser.newrtnum;
   by flightid;
run;

proc print data=capacity(obs=5);
run;
The following PROC PRINT output displays the first five rows of the data set Capacity before updates were applied.
output of Capacity
For each matching observation, the values for RouteID are updated.
output for capacity

Handling Duplicate Values

When you use the MODIFY and BY statements to update a data set, WHERE processing starts at the top of the master data set, finds the first match, and updates it. Consider what happens if there are duplicate values in the master or transaction data sets. Suppose you have the following code to make updates to the master data set M using the transaction data set T:
data m;
    modify m t;
    by a;
run;
If duplicate values of the BY variable exist in the master data set, only the first observation in the group of duplicate values is updated.
master data set
If duplicate values of the BY variable exist in the etransaction data set, the transactions overwrite each other so that only the last transaction in the group is the result in the master data set.
transaction data set
Alternatively, you can write code to accumulate the numeric value of each transaction.

Handling Missing Values

If there are missing values in the transaction data set, SAS does not replace the data in the master data set with missing values unless they are special missing values.
Note: A special missing value is a type of numeric missing value that enables you to represent different categories of missing data by using the letters A-Z or an underscore. You designate special missing values using the MISSING statement in the DATA step. For more information, see the SAS documentation.
You can specify how missing values in the transaction data set are handled by using the UPDATEMODE= option in the MODIFY statement.
General form, MODIFY statement with the UPDATEMODE= option:
MODIFY master-data-set transaction-data-set
UPDATEMODE=MISSINGCHECK | NOMISSINGCHECK;
Here is an explanation of the syntax:
master-data-set
is the name of the SAS data set that you want to modify.
transaction-data-set
is the name of the SAS data set in which the updated values are stored.
MISSINGCHECK
prevents missing values in the transaction data set from replacing values in the master data set unless they are special missing values. MISSINGCHECK is the default.
NOMISSINGCHECK
allows missing values in the transaction data set to replace the values in the master data set. Special missing values in the transaction data set still replace values in the master data set.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.71.21