Using a Transaction Data Set

Overview

Sometimes, rather than just combining data from two data sets, you might want to update the data in one data set with data that is stored in another data set. That is, you might want to update a master data set by overwriting certain values with values that are stored in a transaction data set.
For example, suppose the data set Mylib.Empmaster contains data that is outdated. The current data is stored in another data set named Mylib.Empchanges. Mylib.Empmaster contains 148 observations, and Mylib.Empchanges contains six observations. The variable EmpID contains unique values in both data sets.
A partial listing of Mylib.Empmaster and the full listing of Mylib.Empchanges is shown below. Notice that there is one observation in each data set with a value of 1065 for EmpID. The values of JobCode and Salary are different in these observations.
Table 15.9 Mylib.Empmaster (Partial Listing)
DateOfBirth
DateOfHire
EmpID
Gender
JobCode
Salary
05MAR1957
30MAR1990
1009
M
TA1
$40,432
01JAN1956
20OCT1979
1017
M
TA3
$57,201
23MAY1963
27OCT1982
1036
F
TA3
$55,149
14APR1962
17SEP1990
1037
F
TA1
$39,98
13NOV1967
26NOV1989
1038
F
TA1
$37,146
17JUL1961
27AUG1984
1050
M
ME2
$49,234
29JAN1942
10JAN1985
1065
M
ME2
$49,126
18OCT1970
06OCT1989
1076
M
PT1
$93,181
Table 15.10 Mylib.Empchanges
DateOfBirth
DateOfHire
EmpID
Gender
JobCode
Salary
30JUN1955
31JAN1982
1639
F
TA3
$59,164
29JAN1942
10JAN1985
1065
M
ME3
$53,326
03DEC1961
10Oct1985
1561
M
TA3
$51,120
25SEP1965
07OCT1989
1221
F
FA3
$41,854
11AUG1970
01NOV2000
1447
F
FA1
$30,340
13SEP1968
05NOV2000
1998
M
SCP
$32,240
If you could see the full listing of Mylib.Empmaster, you would see that each of the observations in Mylib.Empchanges has a matching observation in Mylib.Empmaster based on the values of EmpID. There are also many observations in Mylib.Empmaster that do not have a matching observation in Mylib.Empchanges. To update Mylib.Empmaster, you want to find all of the matching observations and change their values for JobCode and Salary to the new values from Mylib.Empchanges. You can use the UPDATE statement to make these changes.

Using the UPDATE Statement

You use the UPDATE statement to update a master data set with a transaction data set. The UPDATE statement can perform the following tasks:
  • change the values of variables in the master data set
  • add observations to the master data set
  • add variables to the master data set
General form, UPDATE statement:
DATA master-data-set;
UPDATE master-data-set transaction-data-set;
BY by-variable(s);
RUN;
Here is an explanation of the syntax:
master-data-set
names the SAS data set used as the master file.
transaction-data-set
names the SAS data set that contains the changes to be applied to the master data set.
by-variable(s)
names a variable that appears in both master-data-set and in transaction-data-set. Each observation in master-data-set must have a unique value for by-variable, but transaction-data-set can contain more than one observation with the same by-variable value.
The UPDATE statement replaces values in the master data set with values from the transaction data set for each observation with a matching value of the BY variable. Any observations in either the master data set or the transactional data set that have nonmatching values for the BY variable are included in the output data set. Also, by default, SAS does not replace existing values in the master data set with missing values if those values are coded as periods (for numeric variables) or blanks (for character variables) in the transaction data set.
When you use the UPDATE statement, keep in mind the following restrictions.
  • Only two data set names can appear in the UPDATE statement.
  • The master data set must be listed first.
  • A BY statement that gives the matching variable must be used.
  • Both data sets must be sorted by or have indexes based on the BY variable.
  • In the master data set, each observation must have a unique value for the BY variable.

Example

Remember that you want to update the master data set Mylib.Empmaster with the transactional data set Mylib.Empchanges. You can use the UPDATE statement to accomplish this task, as shown in the program below. Remember, both data sets must be sorted by or indexed on the BY variable.
proc sort data=mylib.empmaster;
   by empid;
run;

proc sort data=mylib.empchanges;
   by empid;
run;

data mylib.empmaster;
   update mylib.empmaster mylib.empchanges;
   by empid;
run;
The first eight observations of the updated Mylib.Empmaster data set are shown below. Notice that the observation that has a value of 1065 for EmpID now contains the updated values for JobCode and Salary.
proc print data=mylib.empmaster (obs=8) noobs;
run;
Using a Transactional Data Set
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.110.183