• SAS creates the FIRST.variable and LAST.variable for each variable that is
listed in the BY statement.
Execution – Step 1
SAS looks at the first observation in each data set that is named in the UPDATE
statement to determine which BY group should appear first. If the transaction BY
value precedes the master BY value, SAS reads from the transaction data set only
and sets the variables from the master data set to missing. If the master BY value
precedes the transaction BY value, SAS reads from the master data set only and sets
the unique variables from the transaction data set to missing. If the BY values in the
master and transaction data sets are equal, it applies the first transaction by copying
the nonmissing values into the program data vector.
Execution – Step 2
After completing the first transaction, SAS looks at the next observation in the
transaction data set. If SAS finds one with the same BY value, it applies that
transaction too. The first observation then contains the new values from both
transactions. If no other transactions exist for that observation, SAS writes the
observation to the new data set and sets the values in the program data vector to
missing. SAS repeats these steps until it has read all observations from all BY groups
in both data sets.
Updating with Nonmatched Observations, Missing Values, and New
Variables
In the UPDATE statement, if an observation in the master data set does not have a
corresponding observation in the transaction data set, SAS writes the observation to the
new data set without modifying it. Any observation from the transaction data set that
does not correspond to an observation in the master data set is written to the program
data vector and becomes the basis for an observation in the new data set. The data in the
program data vector can be modified by other transactions before it is written to the new
data set. If a master data set observation does not need updating, the corresponding
observation can be omitted from the transaction data set.
SAS does not replace existing values in the master data set with missing values if those
values are coded as periods (for numeric variables) or blanks (for character variables) in
the transaction data set. To replace existing values with missing values, you must either
create a transaction data set in which missing values are coded with the special missing
value characters, or use the UPDATEMODE=NOMISSINGCHECK statement option.
With UPDATE, the transaction data set can contain new variables to be added to all
observations in the master data set.
To view a sample program, see “Example 3: Using UPDATE for Processing
Nonmatched Observations, Missing Values, and New Variables” on page 503.
Sort Requirements for the UPDATE Statement
If you do not use an index, both the master data set and the transaction data set must be
sorted by the same variable or variables that you specify in the BY statement that
accompanies the UPDATE statement. The values of the BY variable should be unique
for each observation in the master data set. If you use more than one BY variable, the
combination of values of all BY variables should be unique for each observation in the
master data set. The BY variable or variables should be ones that you never need to
update.
Note: The MODIFY statement does not require sorted files. However, sorting the data
improves efficiency.
Combining SAS Data Sets: Methods 499