Example 2.2 Interleaving Observations from Two or More Data Sets Based on a Common Variable

Goal

Interleave the observations from three data sets in order by the values of a specific variable. Define a variable that identifies the data set from which the observation originated.

Example Features

Featured StepDATA step
Featured Step Options and StatementsSET statement, IN= option BY statement
Related TechniquePROC SQL, OUTER UNION set operator and CORR keyword, ORDER BY clause

Input Data Sets

Data sets SPRINGGRADS, SUMMERGRADS, and FALLGRADS contain lists of biology students who graduated in spring, summer, and fall. The students' final GPAs are included.

Example 2.1 also uses the three data sets.

                SPRINGGRADS

Obs      student       concentration      gpa
 1   Johnson, Allison  Biochemistry      3.86
 2   Davis, Jeffrey    General Biology   3.91
 3   Hall, Daniel      Genetics          3.43
 4   Hill, Courtney    Ecology           4.00
 5   Scott,  Tiffany   Plant  Biology    3.58
 6   Martinez, Teresa  Zoology           3.21

                SUMMERGRADS

Obs     student       concentration    gpa
 1   Robinson, Adam   Ecology          3.87
 2   Cox, William     Zoology          3.61

                FALLGRADS

Obs     student        concentration      gpa
 1   Mitchell, Frank   Biochemistry       3.25
 2   Rogers, Melissa   Microbiology       4.00
 3   Brown, Beverly    Molecular Biology  3.42

Resulting Data Set

Output 2.2 GPAGRADS Data Set

        Example 2.2 GPAGRADS Data Set Created by DATA Step

  Obs       student       concentration         gpa   graduated

    1   Hill, Courtney     Ecology             4.00    Spring
    2   Rogers, Melissa    Microbiology        4.00    Fall
    3   Davis, Jeffrey     General Biology     3.91    Spring
    4   Robinson, Adam     Biophysics          3.87    Summer
    5   Johnson, Allison   Biochemistry        3.86    Spring
    6   Cox, William       Zoology             3.61    Summer
    7   Scott, Tiffany     Plant Biology       3.58    Spring
    8   Hall, Daniel       Genetics            3.43    Spring
    9   Brown, Beverly     Molecular Biology   3.42    Fall
   10   Mitchell, Frank    Biochemistry        3.25    Fall
   11   Martinez, Teresa   Zoology             3.21    Spring


Example Overview

This program is similar to the main example program in Example 2.1 because both use a SET statement to vertically combine the observations from three data sets. This example differs because instead of concatenating the observations from one data set after another, it interleaves the observations in SPRINGRADS, SUMMERGRADS, and FALLGRADS in the order that is specified by the values of the variables in the BY statement. The two variables that control the interleaving are GPA and STUDENT.

Observations in output data set GPAGRADS are arranged in descending GPA order (highest to lowest). For identical GPAs, the observations are further arranged alphabetically by the student's last name.

As in Example 2.1, this DATA step applies the IN= data set option to each input data set to define temporary variables INSPRING, INSUMMER, and INFALL. IF-ELSE statements test the values of these variables to determine the data set source for the observation that is currently being processed and to assign values to new variable GRADUATED.

Each of the three data sets contains the same three variables. Therefore, ALLGRADS contains these three variables plus new variable GRADUATED.

Three PROC SORT steps precede the DATA step in which the observations in each of the three data sets are sorted in the order in which they will be interleaved.

If a data set has been indexed by the variables in the BY statement in the DATA step, you might not need to sort the data set. This depends on whether simple and/or composite indexes have been defined prior to the step. For more information about when indexes are used and not used in accessing observations, see your SAS documentation. In this example, none of the data sets are indexed by GPA and STUDENT.

Program

Sort the observations in the three data sets in the order in which you want to interleave them.

Concatenate three data sets. Add the IN= data set option to each input data set so that the data set origin of each observation can be determined.

Specify the order in which to interleave the observations from the three data sets. Define GRADUATED and assign it values based on the origin of the observation.

proc sort data=springgrads;
  by descending gpa student;
run;
proc sort data=summergrads;
  by descending gpa student;
run;
proc sort data=fallgrads;
  by descending gpa student;
run;
data allgrads;
  set springgrads(in=inspring)
      summergrads(in=insummer)
      fallgrads(in=infall);

  by descending gpa student;


  attrib graduated length=$6 label='Semester Graduated';
  if inspring then graduated='Spring';
  else if insummer then graduated='Summer';
  else if infall then graduated='Fall';
run;

Related Technique

The following PROC SQL step creates a table equivalent to the data set that was created by the DATA step in the main example. It concatenates three queries by using the OUTER UNION set operator and the CORR keyword. (CORR is the short version of CORRESPONDING.) Each query creates the same new column, GRADUATED, whose values reflect the origin of the row.

The OUTER UNION set operator concatanates tables. The CORR keyword matches and aligns in the output table the same-named columns that were extracted from the two input tables. The ORDER BY clause arranges the rows in the output table.

The ORDER BY clause interleaves the rows so that the rows in table GPAGRADS are arranged from highest GPA to lowest GPA. The DESC keyword arranges the values of GPA in descending order. For identical values of GPA, the ORDER BY clause arranges the rows arranged alphabetically by the student's last name (STUDENT).

Note that the keyword DESC follows the column name that is to be arranged in descending order. In BY statements, the keyword DESCENDING precedes the variable name that is to be arranged in descending order.

The only difference between this PROC SQL step and the PROC SQL step in Related Technique 1 of Example 2.1 is the addition of the ORDER BY clause. For details about all other statements in the step, see that example.

Repeat the SELECT clauses from Related Technique 1.

Arrange the rows in table GPAGRADS from highest GPA to lowest GPA and within identical GPAs, arrange the rows alphabetically by the student's last name.

             proc sql;
               create table gpagrads as
                 select *,
                        'Spring' as graduated length=6
                            label='Semester Graduated'
                     from springgrads
                   outer union corr
                 select *,
                        'Summer' as graduated length=6
                            label='Semester Graduated'
                     from summergrads
                   outer union corr
                 select *,
                        'Fall' as graduated length=6
                             label='Semester Graduated'
                    from fallgrads
                 order by gpa desc, student;



             quit;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.192.55