Interleave the observations from three data sets in order by the values of a specific variable. Define a variable that identifies the data set from which the observation originated.
Featured Step | DATA step |
Featured Step Options and Statements | SET statement, IN= option BY statement |
Related Technique | PROC SQL, OUTER UNION set operator and CORR keyword, ORDER BY clause |
Data sets SPRINGGRADS, SUMMERGRADS, and FALLGRADS contain lists of biology students who graduated in spring, summer, and fall. The students' final GPAs are included.
Example 2.1 also uses the three data sets.
SPRINGGRADS Obs student concentration gpa 1 Johnson, Allison Biochemistry 3.86 2 Davis, Jeffrey General Biology 3.91 3 Hall, Daniel Genetics 3.43 4 Hill, Courtney Ecology 4.00 5 Scott, Tiffany Plant Biology 3.58 6 Martinez, Teresa Zoology 3.21
SUMMERGRADS Obs student concentration gpa 1 Robinson, Adam Ecology 3.87 2 Cox, William Zoology 3.61
FALLGRADS Obs student concentration gpa 1 Mitchell, Frank Biochemistry 3.25 2 Rogers, Melissa Microbiology 4.00 3 Brown, Beverly Molecular Biology 3.42
Output 2.2 GPAGRADS Data SetExample 2.2 GPAGRADS Data Set Created by DATA Step Obs student concentration gpa graduated 1 Hill, Courtney Ecology 4.00 Spring 2 Rogers, Melissa Microbiology 4.00 Fall 3 Davis, Jeffrey General Biology 3.91 Spring 4 Robinson, Adam Biophysics 3.87 Summer 5 Johnson, Allison Biochemistry 3.86 Spring 6 Cox, William Zoology 3.61 Summer 7 Scott, Tiffany Plant Biology 3.58 Spring 8 Hall, Daniel Genetics 3.43 Spring 9 Brown, Beverly Molecular Biology 3.42 Fall 10 Mitchell, Frank Biochemistry 3.25 Fall 11 Martinez, Teresa Zoology 3.21 Spring |
This program is similar to the main example program in Example 2.1 because both use a SET statement to vertically combine the observations from three data sets. This example differs because instead of concatenating the observations from one data set after another, it interleaves the observations in SPRINGRADS, SUMMERGRADS, and FALLGRADS in the order that is specified by the values of the variables in the BY statement. The two variables that control the interleaving are GPA and STUDENT.
Observations in output data set GPAGRADS are arranged in descending GPA order (highest to lowest). For identical GPAs, the observations are further arranged alphabetically by the student's last name.
As in Example 2.1, this DATA step applies the IN= data set option to each input data set to define temporary variables INSPRING, INSUMMER, and INFALL. IF-ELSE statements test the values of these variables to determine the data set source for the observation that is currently being processed and to assign values to new variable GRADUATED.
Each of the three data sets contains the same three variables. Therefore, ALLGRADS contains these three variables plus new variable GRADUATED.
Three PROC SORT steps precede the DATA step in which the observations in each of the three data sets are sorted in the order in which they will be interleaved.
If a data set has been indexed by the variables in the BY statement in the DATA step, you might not need to sort the data set. This depends on whether simple and/or composite indexes have been defined prior to the step. For more information about when indexes are used and not used in accessing observations, see your SAS documentation. In this example, none of the data sets are indexed by GPA and STUDENT.
Sort the observations in the three data sets in the order in which you want to interleave them.
Concatenate three data sets. Add the IN= data set option to each input data set so that the data set origin of each observation can be determined.
Specify the order in which to interleave the observations from the three data sets. Define GRADUATED and assign it values based on the origin of the observation.
proc sort data=springgrads; by descending gpa student; run; proc sort data=summergrads; by descending gpa student; run; proc sort data=fallgrads; by descending gpa student; run; data allgrads; set springgrads(in=inspring) summergrads(in=insummer) fallgrads(in=infall); by descending gpa student; attrib graduated length=$6 label='Semester Graduated'; if inspring then graduated='Spring'; else if insummer then graduated='Summer'; else if infall then graduated='Fall'; run;
The following PROC SQL step creates a table equivalent to the data set that was created by the DATA step in the main example. It concatenates three queries by using the OUTER UNION set operator and the CORR keyword. (CORR is the short version of CORRESPONDING.) Each query creates the same new column, GRADUATED, whose values reflect the origin of the row.
The OUTER UNION set operator concatanates tables. The CORR keyword matches and aligns in the output table the same-named columns that were extracted from the two input tables. The ORDER BY clause arranges the rows in the output table.
The ORDER BY clause interleaves the rows so that the rows in table GPAGRADS are arranged from highest GPA to lowest GPA. The DESC keyword arranges the values of GPA in descending order. For identical values of GPA, the ORDER BY clause arranges the rows arranged alphabetically by the student's last name (STUDENT).
Note that the keyword DESC follows the column name that is to be arranged in descending order. In BY statements, the keyword DESCENDING precedes the variable name that is to be arranged in descending order.
The only difference between this PROC SQL step and the PROC SQL step in Related Technique 1 of Example 2.1 is the addition of the ORDER BY clause. For details about all other statements in the step, see that example.
Repeat the SELECT clauses from Related Technique 1.
Arrange the rows in table GPAGRADS from highest GPA to lowest GPA and within identical GPAs, arrange the rows alphabetically by the student's last name.
proc sql; create table gpagrads as select *, 'Spring' as graduated length=6 label='Semester Graduated' from springgrads outer union corr select *, 'Summer' as graduated length=6 label='Semester Graduated' from summergrads outer union corr select *, 'Fall' as graduated length=6 label='Semester Graduated' from fallgrads order by gpa desc, student; quit;
3.15.220.219