Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Match-Merge Processing

The Basics of Match-Merge Processing

The match-merging examples in this book are straightforward. However, match-merging can be more complex, depending on your data and on the output data set that you want to create. To predict the results of match-merges correctly, you need to understand how the DATA step performs match-merges.

When you submit a DATA step, it is processed in two phases:

the compilation phase, in which SAS checks the syntax of the SAS statements and compiles them (translates them into machine code). During this phase, SAS also sets up descriptor information for the output data set and creates the PDV.
the execution phase in which the DATA step reads data and executes any subsequent programming statements. When the DATA step executes, data values are read into the appropriate variables in the PDV. From here, the variables are written to the output data set as a single observation.

The Compilation Phase: Setting Up a New Data Set

To prepare to merge data sets, SAS does the following:

reads the descriptor portions of the data sets that are listed in the MERGE statement
reads the rest of the DATA step program
creates the PDV for the merged data set
assigns a tracking pointer to each data set that is listed in the MERGE statement

If there are variables with the same name in more than one data set, then the variable from the first data set (the order in which the data sets are listed in the MERGE statement) determines the length of the variable.

Figure 10.11 The Compilation Phase: Setting Up the New Data Set

After reading the descriptor portions of the data sets Clients and Amounts, SAS does the following:

creates a PDV for the new Claims data set. The PDV contains all variables from the two data sets. Note that although Name appears in both input data sets, it appears in the PDV only once.
assigns tracking pointers to Clients and Amounts.

The Execution Phase: Match-Merging Observations

After compiling the DATA step, SAS sequentially match-merges observations by moving the pointers down each observation of each data set and checking to see whether the BY values match.

If the BY values match, the observations are read into the PDV in the order in which the data sets appear in the MERGE statement. Values of any same-named variable are overwritten by values of the same-named variable in subsequent observations. SAS writes the combined observation to the new data set and retains the values in the PDV until the BY value changes in all the data sets.
If the BY values do not match, SAS determines which BY value comes first and reads the observation that contains this value into the PDV. Then the contents of the PDV are written.
When the BY value changes in all the input data sets, the PDV is initialized to missing.

The DATA step merge continues to process every observation in each data set until it has processed all observations in all data sets.

Handling Unmatched Observations and Missing Values

By default, all observations that are read into the PDV, including observations that have missing data and no matching BY values, are written to the output data set. If you specify a subsetting IF statement to select observations, then only those that meet the IF condition are written.

If an observation contains missing values for a variable, then the observation in the output data set contains the missing values as well. Observations that have missing values for the BY variable appear at the top of the output data set because missing values sort first in ascending order.
If an input data set does not have a matching BY value, then the observation in the output data set contains missing values for the variables that are unique to that input data set.
The last observation in Cert.Clients would be added after the last observation in Cert.Amounts.

The PROC PRINT output is displayed below. Use the FORMAT statement for the date variable in the PRINT procedure. To learn how to apply a format, see SAS Formats and Informats.

proc print data=work.claims noobs;
  format date date9.;
run;

Figure 10.12 PROC PRINT Output of Merged Data

Last updated: August 23, 2018

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Match-Merge Processing

Create new playlist

Sign In

Sign Up

Match-Merge Processing

The Basics of Match-Merge Processing

The Compilation Phase: Setting Up a New Data Set

The Execution Phase: Match-Merging Observations

Handling Unmatched Observations and Missing Values

Table of Contents for
Match-Merge Processing