Figure 21.7 One-to-One Reading and One-to-One Merging
COMBINEDDATA2DATA1
VarXVarYVarX
X1
X2
X3
X4
X5
d a t a c o mb i n e d ;
s e t d a t a 1 ;
s e t d a t a 2 ;
r u n ;
d a t a c o mb i n e d ;
me r g e d a t a 1 d a t a 2 ;
r u n ;
+
=
Y1
Y2
Y3
Y4
Y5
VarY
X1
X2
X3
X4
X5
Y1
Y2
Y3
Y4
Y5
Match-Merging
The following figure shows the results of match-merging. Match-merging combines
observations from two or more SAS data sets into a single observation in a new data set
based on the values of one or more common variables. Data set Combined shows the
results.
Figure 21.8 Match-Merging Two Data Sets
d a t a c o mb i n e d ;
me r g e d a t a 1 d a t a 2 ;
b y Y e a r ;
r u n ;
+
=
COMBINED
1991
1991
1992
1993
1994
1995
Year
X1
X1
X2
X3
X4
X5
Y1
Y2
Y3
Y4
Y5
VarYVarX
DATA2
1991
1991
1993
1994
1995
VarYYear
Y1
Y2
Y3
Y4
Y5
DATA1
VarX
X1
X2
X3
X4
X5
Year
1991
1992
1993
1994
1995
Updating
The following figure shows the results of updating a master data set. Updating uses
information from observations in a transaction data set to delete, add, or alter
information in observations in a master data set. You can update a master data set by
using the UPDATE statement or the MODIFY statement. If you use the UPDATE
statement, your input data sets must be sorted by the values of the variables listed in the
BY statement. (In this example, Master and Transaction are both sorted by Year.) If you
use the MODIFY statement, your input data does not need to be sorted.
UPDATE replaces an existing file with a new file, allowing you to add, delete, or rename
columns. MODIFY performs an update in place by rewriting only those records that
have changed, or by appending new records to the end of the file.
Combining SAS Data Sets: Basic Concepts 473
Note that by default, UPDATE and MODIFY do not replace nonmissing values in a
master data set with missing values from a transaction data set.
Figure 21.9 Updating a Master Data Set
d a t a ma s t e r ;
u p d a t e ma s t e r t r a n s a c t i o n ;
b y Y e a r ;
r u n ;
+
=
MASTER
X1
X1
X1
X1
X1
X1
X2
X2
X2
X1
X2
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
Y1
Y1
Y1
Y1
Y1
Y1
Y1
Y2
Y2
Y1
Y2
Year
VarYVarX
Y1
Y1
Y1
Y1
Y1
Y1
Y1
Y1
Y1
Y1
MASTER
VarXYear VarY
X1
X1
X1
X1
X1
X1
X1
X1
X1
X1
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
TRANSACTION
1991
1992
1993
1993
1995
Y2
Y2
Y2
X2
X2
X2
X2
VarXYear VarY
Overview of Tools for Combining SAS Data Sets
Using Statements and Procedures
Once you understand the basics of establishing relationships among data, the ways to
access data, and the ways that you can combine SAS data sets, you can choose from a
variety of SAS tools for accessing, combining, and processing your data. The following
table lists and briefly describes the DATA step statements and the procedures that you
can use for combining SAS data sets.
Table 21.2 Statements or Procedures for Combining SAS Data Sets
Access Method
Statement or
Procedure Action Performed Sequential Direct
Can Use with
BY statement Comments
BY Controls the operation of a
SET, MERGE, UPDATE, or
MODIFY statement in the
DATA step and sets up
special grouping variables.
NA NA NA BY-group processing
is a means of
processing
observations that have
the same values of
one or more variables.
474 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
Access Method
Statement or
Procedure Action Performed Sequential Direct
Can Use with
BY statement Comments
MERGE Reads observations from
two or more SAS data sets
and joins them into a single
observation.
X X When using MERGE
with BY, the data
must be sorted or
indexed on the BY
variable.
MODIFY Processes observations in a
SAS data set in place.
(Contrast with UPDATE.)
X X X Sorted or indexed data
are not required for
use with BY, but are
recommended for
performance.
SET Reads an observation from
one or more SAS data sets.
X X X Use KEY= or
POINT= statement
options for directly
accessing data.
UPDATE Applies transactions to
observations in a master
SAS data set. UPDATE
does not update
observations in place; it
produces an updated copy
of the current data set.
X X Both the master and
transaction data sets
must be sorted by or
indexed on the BY
variable.
PROC
APPEND
Adds the observations from
one SAS data set to the end
of another SAS data set.
X
PROC SQL
*
Reads an observation from
one or more SAS data sets;
reads observations from up
to 32 SAS data sets and
joins them into single
observations; manipulates
observations in a SAS data
set in place; easily produces
a Cartesian product.
X X X All three access
methods are available
in PROC SQL, but the
access method is
chosen by the internal
optimizer.
*
PROC SQL is the SAS implementation of Structured Query Language. In addition to expected SQL capabilities, PROC SQL includes
additional capabilities specific to SAS, such as the use of formats and SAS macro language.
Using Error Checking
You can use the _IORC_ automatic variable and the SYSRC autocall macro to perform
error checking in a DATA step. Use these tools with the MODIFY statement or with the
SET statement and the KEY= option. For more information about these tools, see “Error
Checking When Using Indexes to Randomly Access or Update Data” on page 507.
Combining SAS Data Sets: Basic Concepts 475
How to Prepare Your Data Sets
Guidelines to Prepare Your Data Sets
Before combining SAS data sets, follow these guidelines to produce the results that you
want:
Know the structure and the contents of the data sets.
Look at sources of common problems.
Ensure that observations are in the correct order, or that they can be retrieved in the
correct order (for example, by using an index).
Test your program.
Knowing the Structure and Contents of the Data Sets
To help determine how your data is related, look at the structure of the data sets. To see
the data set structure, execute the DATASETS procedure, the CONTENTS procedure, or
access the SAS Explorer window in your windowing environment to display the
descriptor information. Descriptor information includes the number of observations in
each data set, the name and attributes of each variable, an alphabetic list of extended
attributes (including data set and variable extended attributes), and a list of indexes and
index attributes. To print a sample of the observations, use the PRINT procedure or the
REPORT procedure.
You can also use functions such as VTYPE, VLENGTH, and VLENGTHX to show
specific descriptor information. For complete information about these functions, see SAS
Functions and CALL Routines: Reference.
Looking at Sources of Common Problems
If your program does not execute correctly, review your input data for the following
errors:
variables that have the same name but that represent different data
SAS includes only one variable of a given name in the new data set. If you are
merging two data sets that have variables with the same names but different data, the
values from the last data set that was read are written over the values from other data
sets.
To correct the error, you can rename variables before you combine the data sets by
using the RENAME= data set option in the SET, UPDATE, or MERGE statement.
Or you can use the DATASETS procedure.
common variables with the same data but different attributes
The way SAS handles these differences depends on which attributes are different:
type attribute
If the type attribute is different, SAS stops processing the DATA step and issues
an error message stating that the variables are incompatible.
To correct this error, you must use a DATA step to re-create the variables. The
SAS statements that you use depend on the nature of the variable.
length attribute
If the length attribute is different, SAS takes the length from the first data set that
contains the variable. In the following example, all data sets that are listed in the
476 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
MERGE statement contain the variable Mileage. In Quarter1, the length of the
variable Mileage is four bytes; in Quarter2, it is eight bytes and in Quarter3 and
Quarter4, it is six bytes. In the output data set Yearly, the length of the variable
Mileage is four bytes, which is the length derived from Quarter1.
data yearly;
merge quarter1 quarter2 quarter3 quarter4;
by Account;
run;
To override the default and set the length yourself, specify the appropriate length
in a LENGTH statement that precedes the SET, MERGE, or UPDATE statement.
Note: If the length of a variable changes as a result of combining data sets, SAS
prints a warning message to the log and issues a nonzero return code (for
example, on z/OS, SYSRC=4). If you expect truncation of data (for example,
when removing insignificant blanks from the end of character values), the
warning is expected and you do not want SAS to issue a nonzero return code.
In this case, you can turn this warning off by setting the VARLENCHK
system option to NOWARN. For more information, see “VARLENCHK=
System Option” in SAS System Options: Reference.
label, format, and informat attributes
If any of these attributes are different, SAS takes the attribute from the first data
set that contains the variable with that attribute. However, any label, format, or
informat that you explicitly specify overrides a default. If all data sets contain
explicitly specified attributes, the one specified in the first data set overrides the
others. To ensure that the new output data set has the attributes that you prefer,
use an ATTRIB statement.
You can also use SAS File I/O functions, such as VLABEL, VLABELX, and
other Variable Information functions to access this information. For complete
information about these functions, see SAS Functions and CALL Routines:
Reference.
extended attributes
Like formats and labels, extended attributes are automatically passed from the
input data set to the output data set in a DATA step. If two input data sets
containing extended attributes are combined, then SAS preserves the extended
attributes from the first data set seen in the program where the variable is read
and applies those attributes to the output data set. To ensure that the new output
data set has the extended attributes that you prefer, use the DATASETS
procedure to add, delete, remove, set, and update extended attributes. For more
information about the DATASETS procedure see “Extended Attributes” in Base
SAS Procedures Guide.
Ensuring Correct Order
If you use BY-group processing with the UPDATE, SET, and MERGE statements to
combine data sets, ensure that the observations in the data sets are sorted in the order of
the variables that are listed in the BY statement, or that the data sets have an appropriate
index. If you use BY-group processing in a MODIFY statement, your data does not need
to be sorted, but sorting the data improves efficiency. The BY variable or variables must
be common to both data sets, and they must have the same attributes. For more
information, see Chapter 20, “BY-Group Processing in the DATA Step,” on page 449.
Combining SAS Data Sets: Basic Concepts 477
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.254.118