Testing Your Program
As a final step in preparing your data sets, you should test your program. Create small
temporary SAS data sets that contain a sample of observations that test all of your
program's logic. If your logic is faulty and you get unexpected output, you can use the
DATA step debugger to debug your program. For complete information about the DATA
Step Debugger, see SAS Data Set Options: Reference.
Combining SAS Data Sets: Methods
Concatenating
Definition
Concatenating data sets is the combining of two or more data sets, one after the other,
into a single data set. The number of observations in the new data set is the sum of the
number of observations in the original data sets. The order of observations is sequential.
All observations from the first data set are followed by all observations from the second
data set, and so on.
In the simplest case, all input data sets contain the same variables. If the input data sets
contain different variables, observations from one data set have missing values for
variables defined only in other data sets. In either case, the variables in the new data set
are the same as the variables in the old data sets.
Syntax
Use this form of the SET statement to concatenate data sets:
SET data-set(s);
where
data-set
specifies any valid SAS data set name.
For a complete description of valid SAS data set names, see the SET statement in SAS
Statements: Reference.
DATA Step Processing during Concatenation
Compilation phase
SAS reads the descriptor information of each data set that is named in the SET
statement and then creates a program data vector that contains all the variables from
all data sets as well as variables created by the DATA step.
Execution — Step 1
SAS reads the first observation from the first data set into the program data vector. It
processes the first observation and executes other statements in the DATA step. It
then writes the contents of the program data vector to the new data set.
The SET statement does not reset the values in the program data vector to missing,
except for variables whose value is calculated or assigned during the DATA step.
Variables that are created by the DATA step are set to missing at the beginning of
each iteration of the DATA step. Variables that are read from a data set are not.
Execution — Step 2
SAS continues to read one observation at a time from the first data set until it finds
an end-of-file indicator. The values of the variables in the program data vector are
478 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
then set to missing, and SAS begins reading observations from the second data set,
and so on, until it reads all observations from all data sets.
Example 1: Concatenation Using the DATA Step
In this example, each data set contains the variables Common and Number, and the
observations are arranged in the order of the values of Common. Generally, you
concatenate SAS data sets that have the same variables. In this case, each data set also
contains a unique variable to show the effects of combining data sets more clearly. The
following shows the Animal and the Plant input data sets in the library that is referenced
by the libref Example:
Animal Plant
OBS Common Animal Number OBS Common Plant Number
1 a Ant 5 1 g Grape 69
2 b Bird 2 h Hazelnut 55
3 c Cat 17 3 i Indigo .
4 d Dog 9 4 j Jicama 14
5 e Eagle 5 k Kale 5
6 f Frog 76 6 l Lentil 77
The following program uses a SET statement to concatenate the data sets and then prints
the results:
data concatenation;
set animal plant;
run;
proc print data=concatenation;
var Common Animal Plant Number;
title 'Data Set CONCATENATION';
run;
Combining SAS Data Sets: Methods 479
Output 21.1 Concatenated Data Sets (DATA Step)
The resulting data set CONCATENATION has 12 observations, which is the sum of the
observations from the combined data sets. The program data vector contains all variables
from all data sets. The values of variables found in one data set but not in another are set
to missing.
Example 2: Concatenation Using SQL
You can also use the SQL language to concatenate tables. In this example, SQL reads
each row in both tables and creates a new table named Combined. The following shows
the YEAR1 and YEAR2 input tables:
YEAR1 YEAR2
Date1 Date2
2009
2010 2010
2011 2011
2012 2012
2013
2014
The following SQL code creates and prints the table Combined.
proc sql;
title 'SQL Table Combined';
create table combined as
select * from year1
480 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
union all
select * from year2;
select * from combined;
quit;
Output 21.2 Concatenated Tables (SQL)
Appending Files
Instead of concatenating data sets or tables, you can append them and produce the same
results as concatenation. SAS concatenates data sets (DATA step) and tables (SQL) by
reading each row of data to create a new file. To avoid reading all the records, you can
append the second file to the first file by using the APPEND procedure:
proc append base=year1 data=year2;
run;
The YEAR1 file contains all rows from both tables.
Note: You cannot use PROC APPEND to add observations to a SAS data set in a
sequential library.
Efficiency
If no additional processing is necessary, using PROC APPEND or the APPEND
statement in PROC DATASETS is more efficient than using a DATA step to concatenate
data sets.
Interleaving
Definition
Interleaving uses a SET statement and a BY statement to combine multiple data sets into
one new data set. The number of observations in the new data set is the sum of the
number of observations from the original data sets. However, the observations in the
new data set are arranged by the values of the BY variable or variables and, within each
Combining SAS Data Sets: Methods 481
BY group, by the order of the data sets in which they occur. You can interleave data sets
either by using a BY variable or by using an index.
Syntax
Use this form of the SET statement to interleave data sets when you use a BY variable:
SET data-set(s);
BY variable(s);
where
data-set
specifies a one-level name, a two-level name, or one of the special SAS data set
names.
variable
specifies each variable by which the data set is sorted. These variables are referred to
as BY variables for the current DATA or PROC step.
Use this form of the SET statement to interleave data sets when you use an index:
SET data-set-1 . . . data-set-n KEY= index;
where
data-set
specifies a one-level name, a two-level name, or one of the special SAS data set
names.
index
provides nonsequential access to observations in a SAS data set, which are based on
the value of an index variable or key.
For a complete description of the SET statement, including SET with the KEY= option,
see the SET statement in SAS Statements: Reference.
Sort Requirements
Before you can interleave data sets, the observations must be sorted or grouped by the
same variable or variables that you use in the BY statement, or you must have an
appropriate index for the data sets.
DATA Step Processing during Interleaving
Compilation phase
SAS reads the descriptor information of each data set that is named in the SET
statement and then creates a program data vector that contains all the variables
from all data sets as well as variables created by the DATA step.
SAS creates the FIRST.variable and LAST.variable for each variable listed in the
BY statement.
Execution — Step 1
SAS compares the first observation from each data set that is named in the SET
statement to determine which BY group should appear first in the new data set. It
reads all observations from the first BY group from the selected data set. If this BY
group appears in more than one data set, it reads from the data sets in the order in
which they appear in the SET statement. The values of the variables in the program
data vector are set to missing each time SAS starts to read a new data set and when
the BY group changes.
482 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.122.82