Combining SAS Data Sets: Methods (4/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

values for the variables that are unique to that data set. In this example, the data sets

Animal2 and Plant2 have different values of the Common variable. The following shows

the Animal2 and the Plant2 input data sets:

Animal2 Plant2

OBS Common Animal OBS Common Plant

1 a Ant 1 a Apple

2 c Cat 2 b Banana

3 d Dog 3 c Coconut

4 e Eagle 4 e Eggplant

5 f Fig

The following program produces the data set MERGE2 and prints the results:

/* This program illustrates undesirable results. */

data merge2;

merge animal2 plant2;

run;

proc print data=merge2;

title 'Data Set MERGE2';

run;

Output 21.10 Undesirable Results with Different Values of Common Variables

Comments and Comparisons

The results from a one-to-one merge are similar to the results obtained from using two or

more SET statements to combine observations. However, with the one-to-one merge,

SAS continues processing all observations in all data sets that were named in the

MERGE statement.

Match-Merging

Definition

Match-merging combines observations from two or more SAS data sets into a single

observation in a new data set according to the values of a common variable. The number

of observations in the new data set is the sum of the largest number of observations in

each BY group in all data sets. To perform a match-merge, use the MERGE statement

Combining SAS Data Sets: Methods 493

with a BY statement. Before you can perform a match-merge, all data sets must be

sorted by the variables that you specify in the BY statement or they must have an index.

Syntax

Use this form of the MERGE statement to match-merge data sets:

MERGE data-set(s);

BY variable(s);

where

data-set

names at least two existing SAS data sets from which observations are read.

variable

names each variable by which the data set is sorted or indexed. These variables are

referred to as BY variables.

For a complete description of the MERGE and the BY statements, see SAS Statements:

Reference.

DATA Step Processing during Match-Merging

Compilation phase

SAS reads the descriptor information of each data set that is named in the MERGE

statement and then creates a program data vector that contains all the variables from

all data sets as well as variables created by the DATA step. SAS creates the

FIRST.variable and LAST.variable for each variable that is listed in the BY

statement.

Execution – Step 1

SAS looks at the first BY group in each data set that is named in the MERGE

statement to determine which BY group should appear first in the new data set. The

DATA step reads into the program data vector the first observation in that BY group

from each data set, reading the data sets in the order in which they appear in the

MERGE statement. If a data set does not have observations in that BY group, the

program data vector contains missing values for the variables unique to that data set.

Execution – Step 2

After processing the first observation from the last data set and executing other

statements, SAS writes the contents of the program data vector to the new data set.

SAS retains the values of all variables in the program data vector except those

variables that were created by the DATA step; SAS sets those values to missing. SAS

continues to merge observations until it writes all observations from the first BY

group to the new data set. When SAS has read all observations in a BY group from

all data sets, it sets all variables in the program data vector (except those created by

SAS) to missing. SAS looks at the next BY group in each data set to determine

which BY group should appear next in the new data set.

Execution – Step 3

SAS repeats these steps until it reads all observations from all BY groups in all data

sets.

Example 1: Combining Observations Based on a Criterion

The SAS data sets Animal and Plant each contain the BY variable Common, and the

observations are arranged in order of the values of the BY variable. The following shows

the Animal and the Plant input data sets:

Animal Plant

494 Chapter 21 • Reading, Combining, and Modifying SAS Data Sets

OBS Common Animal OBS Common Plant

1 a Ant 1 a Apple

2 b Bird 2 b Banana

3 c Cat 3 c Coconut

4 d Dog 4 d Dewberry

5 e Eagle 5 e Eggplant

6 f Frog 6 f Fig

The following program merges the data sets according to the values of the BY variable

Common, and prints the results:

data combined;

merge animal plant;

by Common;

run;

proc print data=combined;

title 'Data Set Combined';

run;

Output 21.11 Data Sets Combined by Match-Merging

Each observation in the new data set contains all the variables from all the data sets.

Example 2: Match-Merge with Duplicate Values of the BY Variable

When SAS reads the last observation from a BY group in one data set, SAS retains its

values in the program data vector for all variables that are unique to that data set until all

observations for that BY group have been read from all data sets. In the following

example, the data sets Animal1 and Plant1 contain duplicate values of the BY variable

Common. The following shows the Animal1 and the Plant1 input data sets:

Animal1 Plant1

OBS Common Animal1 OBS Common Plant1

1 a Ant 1 a Apple

2 a Ape 2 b Banana

3 b Bird 3 c Coconut

4 c Cat 4 c Celery

5 d Dog 5 d Dewberry

Combining SAS Data Sets: Methods 495

6 e Eagle 6 e

Eggplant

The following program produces the merged data set MATCH1, and prints the results:

data match1;

merge animal1 plant1;

by Common;

run;

proc print data=match1;

title 'Data Set MATCH1';

run;

Output 21.12 Match-Merged Data Set with Duplicate BY Values

In observation 2 of the output, the value of the variable Plant1 is retained until all

observations in the BY group are written to the new data set. Match-merging also

produced duplicate values in Animal1 for observations 4 and 5.

Note: The MERGE statement does not produce a Cartesian product on a many-to-many

match-merge. Instead, it performs a one-to-one merge while there are observations in

the BY group in at least one data set. When all observations in the BY group have

been read from one data set and there are still more observations in another data set,

SAS performs a one-to-many merge until all observations have been read for the BY

group.

Example 3: Match-Merge with Nonmatched Observations

When SAS performs a match-merge with nonmatched observations in the input data

sets, SAS retains the values of all variables in the program data vector even if the value

is missing. The data sets Animal2 and Plant2 do not contain all values of the BY variable

Common. The following shows the Animal2 and the Plant2 input data sets:

Animal2 Plant2

OBS Common Animal2 OBS Common Plant2

1 a Ant 1 a Apple

2 c Cat 2 b Banana

3 d Dog 3 c Coconut

496 Chapter 21 • Reading, Combining, and Modifying SAS Data Sets

4 e Eagle 4 e Eggplant

5 f

Fig

The following program produces the merged data set MATCH2, and prints the results:

data match2;

merge animal2 plant2;

by Common;

run;

proc print data=match2;

title 'Data Set MATCH2';

run;

Output 21.13 Match-Merged Data Set with Nonmatched Observations

As the output shows, all values of the variable Common are represented in the new data

set, including missing values for the variables that are in one data set but not in the other.

Updating with the UPDATE and the MODIFY Statements

Definitions

Updating a data set refers to the process of applying changes to a master data set. To

update data sets, you work with two input data sets. The data set containing the original

information is the master data set, and the data set containing the new information is the

transaction data set.

You can update data sets by using the UPDATE statement or the MODIFY statement:

UPDATE

uses observations from the transaction data set to change the values of corresponding

observations from the master data set. You must use a BY statement with the

UPDATE statement because all observations in the transaction data set are keyed to

observations in the master data set according to the values of the BY variable.

MODIFY

can replace, delete, and append observations in an existing data set. Using the

MODIFY statement can save disk space because it modifies data in place, without

creating a copy of the data set.

Combining SAS Data Sets: Methods 497

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Combining SAS Data Sets: Methods (4/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
Combining SAS Data Sets: Methods (4/6)