with a BY statement. Before you can perform a match-merge, all data sets must be
sorted by the variables that you specify in the BY statement or they must have an index.
Syntax
Use this form of the MERGE statement to match-merge data sets:
MERGE data-set(s);
BY variable(s);
where
data-set
names at least two existing SAS data sets from which observations are read.
variable
names each variable by which the data set is sorted or indexed. These variables are
referred to as BY variables.
For a complete description of the MERGE and the BY statements, see SAS Statements:
Reference.
DATA Step Processing during Match-Merging
Compilation phase
SAS reads the descriptor information of each data set that is named in the MERGE
statement and then creates a program data vector that contains all the variables from
all data sets as well as variables created by the DATA step. SAS creates the
FIRST.variable and LAST.variable for each variable that is listed in the BY
statement.
Execution – Step 1
SAS looks at the first BY group in each data set that is named in the MERGE
statement to determine which BY group should appear first in the new data set. The
DATA step reads into the program data vector the first observation in that BY group
from each data set, reading the data sets in the order in which they appear in the
MERGE statement. If a data set does not have observations in that BY group, the
program data vector contains missing values for the variables unique to that data set.
Execution – Step 2
After processing the first observation from the last data set and executing other
statements, SAS writes the contents of the program data vector to the new data set.
SAS retains the values of all variables in the program data vector except those
variables that were created by the DATA step; SAS sets those values to missing. SAS
continues to merge observations until it writes all observations from the first BY
group to the new data set. When SAS has read all observations in a BY group from
all data sets, it sets all variables in the program data vector (except those created by
SAS) to missing. SAS looks at the next BY group in each data set to determine
which BY group should appear next in the new data set.
Execution – Step 3
SAS repeats these steps until it reads all observations from all BY groups in all data
sets.
Example 1: Combining Observations Based on a Criterion
The SAS data sets Animal and Plant each contain the BY variable Common, and the
observations are arranged in order of the values of the BY variable. The following shows
the Animal and the Plant input data sets:
Animal Plant
494 Chapter 21 • Reading, Combining, and Modifying SAS Data Sets