The number of observations in the new data set is the sum of the number of observations
in the master data set and the number of unmatched observations in the transaction data
set.
For complete information about the UPDATE and the MODIFY statements, see SAS
Statements: Reference.
Syntax of the UPDATE Statement
Use this form of the UPDATE statement to update a master data set:
UPDATE master-data-set transaction-data-set;
BY variable-list;
where
master-data-set
names the SAS data set that is used as the master file.
transaction-data-set
names the SAS data set that contains the changes to be applied to the master data set.
variable-list
specifies the variables by which observations are matched.
If the transaction data set contains duplicate values of the BY variable, SAS applies both
transactions to the observation. The last values that are copied into the program data
vector are written to the new data set. If your data is in this form, use the MODIFY
statement instead of the UPDATE statement to process your data.
CAUTION:
Values of the BY variable must be unique for each observation in the master
data set. If the master data set contains two observations with the same value of the
BY variable, the first observation is updated and the second observation is ignored.
SAS writes a warning message to the log when the DATA step executes.
For complete information about the UPDATE statement, see SAS Statements: Reference.
Syntax of the MODIFY Statement
This form of the MODIFY statement is used in the examples that follow:
MODIFY master-data–set;
BY variable-list;
where
master-data–set
specifies the SAS data set that you want to modify.
variable-list
names each variable by which the data set is ordered.
Note: The MODIFY statement does not support changing the descriptor portion of a
SAS data set, such as adding a variable.
For complete information, see MODIFY Statement in the SAS Statements: Reference.
DATA Step Processing with the UPDATE Statement
Compilation phase
SAS reads the descriptor information of each data set that is named in the
UPDATE statement and creates a program data vector that contains all the
variables from all data sets as well as variables created by the DATA step.
498 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
SAS creates the FIRST.variable and LAST.variable for each variable that is
listed in the BY statement.
Execution – Step 1
SAS looks at the first observation in each data set that is named in the UPDATE
statement to determine which BY group should appear first. If the transaction BY
value precedes the master BY value, SAS reads from the transaction data set only
and sets the variables from the master data set to missing. If the master BY value
precedes the transaction BY value, SAS reads from the master data set only and sets
the unique variables from the transaction data set to missing. If the BY values in the
master and transaction data sets are equal, it applies the first transaction by copying
the nonmissing values into the program data vector.
Execution – Step 2
After completing the first transaction, SAS looks at the next observation in the
transaction data set. If SAS finds one with the same BY value, it applies that
transaction too. The first observation then contains the new values from both
transactions. If no other transactions exist for that observation, SAS writes the
observation to the new data set and sets the values in the program data vector to
missing. SAS repeats these steps until it has read all observations from all BY groups
in both data sets.
Updating with Nonmatched Observations, Missing Values, and New
Variables
In the UPDATE statement, if an observation in the master data set does not have a
corresponding observation in the transaction data set, SAS writes the observation to the
new data set without modifying it. Any observation from the transaction data set that
does not correspond to an observation in the master data set is written to the program
data vector and becomes the basis for an observation in the new data set. The data in the
program data vector can be modified by other transactions before it is written to the new
data set. If a master data set observation does not need updating, the corresponding
observation can be omitted from the transaction data set.
SAS does not replace existing values in the master data set with missing values if those
values are coded as periods (for numeric variables) or blanks (for character variables) in
the transaction data set. To replace existing values with missing values, you must either
create a transaction data set in which missing values are coded with the special missing
value characters, or use the UPDATEMODE=NOMISSINGCHECK statement option.
With UPDATE, the transaction data set can contain new variables to be added to all
observations in the master data set.
To view a sample program, see “Example 3: Using UPDATE for Processing
Nonmatched Observations, Missing Values, and New Variables” on page 503.
Sort Requirements for the UPDATE Statement
If you do not use an index, both the master data set and the transaction data set must be
sorted by the same variable or variables that you specify in the BY statement that
accompanies the UPDATE statement. The values of the BY variable should be unique
for each observation in the master data set. If you use more than one BY variable, the
combination of values of all BY variables should be unique for each observation in the
master data set. The BY variable or variables should be ones that you never need to
update.
Note: The MODIFY statement does not require sorted files. However, sorting the data
improves efficiency.
Combining SAS Data Sets: Methods 499
Using an Index with the MODIFY Statement
The MODIFY statement maintains the index. You do not have to rebuild the index like
you do for the UPDATE statement.
Choosing between UPDATE or MODIFY with BY
Using the UPDATE statement is comparable to using MODIFY with BY to apply
transactions to a data set. MODIFY is a more powerful tool with several other
applications, but UPDATE is still the tool of choice in some cases. The following table
helps you choose whether to use UPDATE or MODIFY with BY.
Table 21.3 MODIFY with BY versus UPDATE
Issue MODIFY with BY UPDATE
Disk space saves disk space because it updates data in place requires more disk space because it
produces an updated copy of the data
set
Sort and index sorted input data sets are not required, although
for good performance, it is strongly
recommended that both data sets be sorted and
that the master data set be indexed
requires only that both data sets be
sorted
When to use use only when you expect to process a SMALL
portion of the data set
use if you expect to need to process
most of the data set
Where to specify the
modified data set
specify the updated data set in both the DATA
and the MODIFY statements
specify the updated data set in the
DATA and the UPDATE statements
Duplicate BY-values allows duplicate BY-values in both the master
and the transaction data sets
allows duplicate BY-values in the
transaction data set only (If duplicates
exist in the master data set, SAS issues
a warning.)
Scope of changes cannot change the data set descriptor
information, so changes such as adding or
deleting variables, variable labels, and so on, are
not valid
can make changes that require a change
in the descriptor portion of a data set,
such as adding new variables, and so on
Error checking has error-checking capabilities using the
_IORC_ automatic variable and the SYSRC
autocall macro
needs no error checking because
transactions without a corresponding
master record are not applied but are
added to the data set
Data set integrity data might be only partially updated due to an
abnormal task termination
no data loss occurs because UPDATE
works on a copy of the data
For more information about tools for combining SAS data sets, see Table 21.2 on page
474.
Primary Uses of the MODIFY Statement
The MODIFY statement has three primary uses:
modifying observations in a single SAS data set
500 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
modifying observations in a single SAS data set directly, either by observation
number or by values in an index
modifying observations in a master data set, based on values in a transaction data set.
MODIFY with BY is similar to using the UPDATE statement
Several of the examples that follow demonstrate these uses.
Example 1: Using UPDATE for Basic Updating
In this example, the data set Master contains original values of the variables Animal and
Plant. The data set NEWPlant is a transaction data set with new values of the variable
Plant. The following shows the Master and the NEWPlant input data sets:
Master NEWPlant
OBS Common Animal Plant OBS Common Plant
1 a Ant Apple 1 a Apricot
2 b Bird Banana 2 b Barley
3 c Cat Coconut 3 c Cactus
4 d Dog Dewberry 4 d Date
5 e Eagle Eggplant 5 e Escarole
6 f Frog Fig 6 f Fennel
The following program updates Master with the transactions in the data set NEWPlant,
writes the results to UPDATE_FILE, and prints the results:
data update_file;
update master newplant;
by common;
run;
proc print data=update_file;
title 'Data Set Update_File';
run;
Output 21.14 Master Data Set Updated by Transaction Data Set
Each observation in the new data set contains a new value for the variable Plant.
Combining SAS Data Sets: Methods 501
Example 2: Using UPDATE with Duplicate Values of the BY Variable
If the master data set contains two observations with the same value of the BY variable,
the first observation is updated and the second observation is ignored. SAS writes a
warning message to the log. If the transaction data set contains duplicate values of the
BY variable, SAS applies both transactions to the observation. The last values copied
into the program data vector are written to the new data set. The following shows the
Master1 and the DupPlant input data sets.
Master1 DupPlant
OBS Common Animal1 Plant1 OBS Common Plant1
1 a Ant Apple 1 a Apricot
2 b Bird Banana 2 b Barley
3 b Bird Banana 3 c Cactus
4 c Cat Coconut 4 d Date
5 d Dog Dewberry 5 d Dill
6 e Eagle Eggplant 6 e Escarole
7 f Frog Fig 7 f Fennel
The following program applies the transactions in DupPlant to Master1 and prints the
results:
data update1;
update master1 dupplant;
by Common;
run;
proc print data=update1;
title 'Data Set Update1';
run;
Output 21.15 Updating Data Sets with Duplicate BY Values
When this DATA step executes, SAS generates a warning message stating that there is
more than one observation for a BY group. However, the DATA step continues to
process, and the data set Update1 is created.
502 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.5.201