Interleaving Data Sets

Preparing to Interleave Data Sets

Before you can interleave data sets, the data must be sorted by the same variable or variables that will be used with the BY statement that accompanies your SET statement.
For example, the Research and Development division and the Publications division of a company both maintain data sets containing information about each project currently under way. Each data set includes these variables:
Project
specifies a unique code that identifies the project.
Department
specifies the name of a department that is involved in the project.
Manager
specifies the last name of the manager from Department.
StaffCount
specifies the number of people working for Manager on this project.
Senior management for the company wants to combine the data sets by Project so that the new data set shows the resources that both divisions are devoting to each project. Both data sets must be sorted by Project before they can be interleaved.
The following program creates and displays the data set RESEARCH_DEVELOPMENT. Note that the input data is already sorted by Project.
data research_development;
   length Department Manager $ 10;
   input Project $ Department $ Manager $ StaffCount;
   datalines;
MP971 Designing Daugherty 10
MP971 Coding Newton 8
MP971 Testing Miller 7
SL827 Designing Ramirez 8
SL827 Coding Cho 10
SL827 Testing Baker 7
WP057 Designing Hascal 11
WP057 Coding Constant 13
WP057 Testing Slivko 10
;
run;

proc print data=research_development;
   title 'Research and Development Project Staffing';
run;
The following output displays the RESEARCH_DEVELOPMENT data set:
Display 18.1 The RESEARCH_DEVELOPMENT Data Set
The RESEARCH_DEVELOPMENT Data Set
The following program creates, sorts, and displays the second data set, PUBLICATIONS. Note that the output data set is sorted by Project.
data publications;
   length Department Manager $ 10;
   input Manager $ Department $ Project $ StaffCount;
   datalines;
Cook Writing WP057 5
Deakins Writing SL827 7
Franscombe Editing MP971 4
Henry Editing WP057 3
King Production SL827 5
Krysonski Production WP057 3
Lassiter Graphics SL827 3
Miedema Editing SL827 5
Morard Writing MP971 6
Posey Production MP971 4
Spackle Graphics WP057 2
;
run;

proc sort data=publications;
   by Project;
run;

proc print data=publications;
   title 'Publications Project Staffing';
run;
The following output displays the PUBLICATIONS data set:
Display 18.2 The PUBLICATIONS Data Set
The PUBLICATIONS Data Set

Understanding the Interleaving Process

When you interleave data sets, SAS creates a new data set as follows:
  1. Before executing the SET statement, SAS reads the descriptor portion of each data set that you name in the SET statement. Then SAS creates a program data vector that, by default, contains all the variables from all data sets as well as any variables created by the DATA step. SAS sets the value of each variable to missing.
  2. SAS looks at the first BY group in each data set in the SET statement in order to determine which BY group should appear first in the new data set.
  3. SAS copies to the new data set all observations in that BY group from each data set that contains observations in the BY group. SAS copies from the data sets in the same order as they appear in the SET statement.
  4. SAS looks at the next BY group in each data set to determine which BY group should appear next in the new data set.
  5. SAS sets the value of each variable in the program data vector to missing.
  6. SAS repeats steps 3 through 5 until it has copied all observations to the new data set.

Using the Interleaving Process

The following program uses the SET and BY statements to interleave the data sets RESEARCH_DEVELOPMENT and PUBLICATIONS.
data rnd_pubs;
   set research_development publications;
   by Project;
run;

proc print data=rnd_pubs;
   title 'Project Participation by Research and Development';
   title2 'and Publications Departments';
   title3 'Sorted by Project';
run;
The new data set, RND_PUBS, includes all observations from both data sets. Each BY group in the new data set contains observations from RESEARCH_DEVELOPMENT followed by observations from PUBLICATIONS.
Display 18.3 Interleaving Two Data Sets
Interleaving Two Data Sets
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.49.160