You can use either the MODIFY or UPDATE statement to update a master data set with information in a transaction data set. Chapter 6 includes examples that use the UPDATE statement. Chapter 7 includes examples that use the MODIFY statement.
The MODIFY statement has many applications while the UPDATE statement is limited to updating a master data set. You can use the MODIFY statement to perform the following tasks:
process a file sequentially to apply updates in place (without a BY statement)
make changes to a master data set in place by applying transactions from a transaction data set
update the values of variables by directly accessing observations based on observation numbers
update the values of variables by directly accessing observations based on the values of one or more key variables
Only one application of MODIFY is comparable to UPDATE: using MODIFY with the BY statement to apply transactions to a data set. While MODIFY is a more powerful tool than UPDATE, UPDATE is still the tool of choice in some cases. Table 1.4 helps you choose whether to use UPDATE or MODIFY with BY.
Issue | MODIFY with BY | UPDATE |
---|---|---|
Disk space | Saves disk space because it updates data in place. | Requires more disk space because it produces an updated copy of the data set. |
Sort and index | For good performance, it is strongly recommended that both data sets be sorted and that the master data set be indexed. | Requires that both data sets be sorted. |
When to use | Use only when you expect to process a small portion of the data set. | Use if you expect to process most of the data set. |
Duplicate BY values | Allows duplicate BY values in both the master and transaction data sets. | Allows duplicate BY values in only the transaction data set. |
Scope of changes | Cannot change the data set descriptor information, so changes such as adding or deleting variables or variable labels are not valid. | Can make changes that require a change in the descriptor portion of a data set, such as adding new variables. |
Error checking | Automatically generates the _IORC_ return code variable whose value can be examined for error checking. | Needs no error checking because transactions without a corresponding master record are not applied, but are added to the data set. |
Data set integrity | Data can only be partially updated due to an abnormal task termination. | No data loss occurs because UPDATE works on a copy of the data. |
18.188.154.252