values for the variables that are unique to that data set. In this example, the data sets
Animal2 and Plant2 have different values of the Common variable. The following shows
the Animal2 and the Plant2 input data sets:
Animal2 Plant2
OBS Common Animal OBS Common Plant
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
4 e Eagle 4 e Eggplant
5 f Fig
The following program produces the data set MERGE2 and prints the results:
/* This program illustrates undesirable results. */
data merge2;
merge animal2 plant2;
run;
proc print data=merge2;
title 'Data Set MERGE2';
run;
Output 21.10 Undesirable Results with Different Values of Common Variables
Comments and Comparisons
The results from a one-to-one merge are similar to the results obtained from using two or
more SET statements to combine observations. However, with the one-to-one merge,
SAS continues processing all observations in all data sets that were named in the
MERGE statement.
Match-Merging
Definition
Match-merging combines observations from two or more SAS data sets into a single
observation in a new data set according to the values of a common variable. The number
of observations in the new data set is the sum of the largest number of observations in
each BY group in all data sets. To perform a match-merge, use the MERGE statement
Combining SAS Data Sets: Methods 493
with a BY statement. Before you can perform a match-merge, all data sets must be
sorted by the variables that you specify in the BY statement or they must have an index.
Syntax
Use this form of the MERGE statement to match-merge data sets:
MERGE data-set(s);
BY variable(s);
where
data-set
names at least two existing SAS data sets from which observations are read.
variable
names each variable by which the data set is sorted or indexed. These variables are
referred to as BY variables.
For a complete description of the MERGE and the BY statements, see SAS Statements:
Reference.
DATA Step Processing during Match-Merging
Compilation phase
SAS reads the descriptor information of each data set that is named in the MERGE
statement and then creates a program data vector that contains all the variables from
all data sets as well as variables created by the DATA step. SAS creates the
FIRST.variable and LAST.variable for each variable that is listed in the BY
statement.
Execution – Step 1
SAS looks at the first BY group in each data set that is named in the MERGE
statement to determine which BY group should appear first in the new data set. The
DATA step reads into the program data vector the first observation in that BY group
from each data set, reading the data sets in the order in which they appear in the
MERGE statement. If a data set does not have observations in that BY group, the
program data vector contains missing values for the variables unique to that data set.
Execution – Step 2
After processing the first observation from the last data set and executing other
statements, SAS writes the contents of the program data vector to the new data set.
SAS retains the values of all variables in the program data vector except those
variables that were created by the DATA step; SAS sets those values to missing. SAS
continues to merge observations until it writes all observations from the first BY
group to the new data set. When SAS has read all observations in a BY group from
all data sets, it sets all variables in the program data vector (except those created by
SAS) to missing. SAS looks at the next BY group in each data set to determine
which BY group should appear next in the new data set.
Execution – Step 3
SAS repeats these steps until it reads all observations from all BY groups in all data
sets.
Example 1: Combining Observations Based on a Criterion
The SAS data sets Animal and Plant each contain the BY variable Common, and the
observations are arranged in order of the values of the BY variable. The following shows
the Animal and the Plant input data sets:
Animal Plant
494 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
OBS Common Animal OBS Common Plant
1 a Ant 1 a Apple
2 b Bird 2 b Banana
3 c Cat 3 c Coconut
4 d Dog 4 d Dewberry
5 e Eagle 5 e Eggplant
6 f Frog 6 f Fig
The following program merges the data sets according to the values of the BY variable
Common, and prints the results:
data combined;
merge animal plant;
by Common;
run;
proc print data=combined;
title 'Data Set Combined';
run;
Output 21.11 Data Sets Combined by Match-Merging
Each observation in the new data set contains all the variables from all the data sets.
Example 2: Match-Merge with Duplicate Values of the BY Variable
When SAS reads the last observation from a BY group in one data set, SAS retains its
values in the program data vector for all variables that are unique to that data set until all
observations for that BY group have been read from all data sets. In the following
example, the data sets Animal1 and Plant1 contain duplicate values of the BY variable
Common. The following shows the Animal1 and the Plant1 input data sets:
Animal1 Plant1
OBS Common Animal1 OBS Common Plant1
1 a Ant 1 a Apple
2 a Ape 2 b Banana
3 b Bird 3 c Coconut
4 c Cat 4 c Celery
5 d Dog 5 d Dewberry
Combining SAS Data Sets: Methods 495
6 e Eagle 6 e
Eggplant
The following program produces the merged data set MATCH1, and prints the results:
data match1;
merge animal1 plant1;
by Common;
run;
proc print data=match1;
title 'Data Set MATCH1';
run;
Output 21.12 Match-Merged Data Set with Duplicate BY Values
In observation 2 of the output, the value of the variable Plant1 is retained until all
observations in the BY group are written to the new data set. Match-merging also
produced duplicate values in Animal1 for observations 4 and 5.
Note: The MERGE statement does not produce a Cartesian product on a many-to-many
match-merge. Instead, it performs a one-to-one merge while there are observations in
the BY group in at least one data set. When all observations in the BY group have
been read from one data set and there are still more observations in another data set,
SAS performs a one-to-many merge until all observations have been read for the BY
group.
Example 3: Match-Merge with Nonmatched Observations
When SAS performs a match-merge with nonmatched observations in the input data
sets, SAS retains the values of all variables in the program data vector even if the value
is missing. The data sets Animal2 and Plant2 do not contain all values of the BY variable
Common. The following shows the Animal2 and the Plant2 input data sets:
Animal2 Plant2
OBS Common Animal2 OBS Common Plant2
1 a Ant 1 a Apple
2 c Cat 2 b Banana
3 d Dog 3 c Coconut
496 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
4 e Eagle 4 e Eggplant
5 f
Fig
The following program produces the merged data set MATCH2, and prints the results:
data match2;
merge animal2 plant2;
by Common;
run;
proc print data=match2;
title 'Data Set MATCH2';
run;
Output 21.13 Match-Merged Data Set with Nonmatched Observations
As the output shows, all values of the variable Common are represented in the new data
set, including missing values for the variables that are in one data set but not in the other.
Updating with the UPDATE and the MODIFY Statements
Definitions
Updating a data set refers to the process of applying changes to a master data set. To
update data sets, you work with two input data sets. The data set containing the original
information is the master data set, and the data set containing the new information is the
transaction data set.
You can update data sets by using the UPDATE statement or the MODIFY statement:
UPDATE
uses observations from the transaction data set to change the values of corresponding
observations from the master data set. You must use a BY statement with the
UPDATE statement because all observations in the transaction data set are keyed to
observations in the master data set according to the values of the BY variable.
MODIFY
can replace, delete, and append observations in an existing data set. Using the
MODIFY statement can save disk space because it modifies data in place, without
creating a copy of the data set.
Combining SAS Data Sets: Methods 497
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.230.81