Example 2.4 Selecting Unique Rows When Concatenating Tables

Goal

Select the unique rows among two or more tables based on the value of a specific column.

Example Features

Featured StepPROC SQL
Featured Step Options and StatementsUNION set operator
Related TechniqueDATA step PROC SORT, NODUPKEY option

Input Tables

Tables MARCHRUNNERS, APRILRUNNERS, and MAYRUNNERS contain the names and running times for the runners in the month that is specified in the table name.

             MARCHRUNNERS

   Obs  runner         racetime
     1  Sanchez, SO    35:49.4
     2  Flores, RX     38:32.8
     3  Martinez, KF   36:12.3
     4  Hayes, MU      35:24.7
     5  Carter, RT     39:38.1
     6  Rogers, LX     40:22.1
     7  Clark, SQ      40:05.8
     8  Taylor, TP     38:30.2
     9  Smith, JD      37:03.5
    10  Green, TF      37:50.0
    11  Brown, GL      35:59.2
    12  Lee, AO        39:36.4
    13  Jones, LW      37:38.3
    14  Smith, KB      38:10.3

        APRILRUNNERS

Obs  runner           racetime
 1   Martinez, KF     37:55.1
 2   Rogers, LX       37:21.7
 3   Smith, JD        38:32.7
 4   Lee, AO          37:18.8
 5   Jones, LW        35:04.1
 6   Patterson, BB    38:26.1
 7   Smith, KB        37:27.1
 8   Alexander, SW    38:55.0
 9   Young, SX        37:14.8

The three input tables are also used in Examples 2.5 and 2.6.

         MAYRUNNERS

Obs  runner           racetime
  1  Lee, AO          35:36.0
  2  Williams, JO     40:26.7
  3  Martin, JF       39:12.4
  4  Jones, LW        36:37.6
  5  Young, MX        39:17.6
  6  Brown, NV        35:37.6
  7  Patterson, BB    36:09.8
  8  Johnson, AY      37:37.9
  9  Cox, SR          37:16.6
 10  Smith, KB        40:12.7
 11  Martin, MG       35:37.9
 12  Johnson, GR      38:14.6
 13  Alexander, SW    36:36.8
 14  Martinez, WL     37:06.6
 15  Miller, JN       36:40.3
 16  Young, SX        37:15.4
 17  Wilson, WU       38:53.1

Resulting Table

Output 2.4 SPRINGRUNNERS Table

       Example 2.4 SPRINGRUNNERS Table Created by PROC SQL

                      Obs    runner

                        1    Alexander, SW
                        2    Brown, GL
                        3    Brown, NV
                        4    Carter, RT
                        5    Clark, SQ
                        6    Cox, SR
                        7    Flores, RX
                        8    Green, TF
                        9    Hayes, MU
                       10    Johnson, AY
                       11    Johnson, GR
                       12    Jones, LW
                       13    Lee, AO
                       14    Martin, JF
                       15    Martin, MG
                       16    Martinez, KF
                       17    Martinez, WL
                       18    Miller, JN
                       19    Patterson, BB
                       20    Rogers, LX
                       21    Sanchez, SO
                       22    Smith, JD
                       23    Smith, KB
                       24    Taylor, TP
                       25    Williams, JO
                       26    Wilson, WU
                       27    Young, MX
                       28    Young, SX


Example Overview

The following PROC SQL step combines three tables vertically. The step performs three queries that each select the same column and it returns one row for each unique value of the column.

The UNION operator that connects the queries concatenates the rows that the three queries return. The process that PROC SQL follows is to combine the first two queries, and then combine that result with the third query.

A UNION operator that is used without the ALL keyword removes duplicate rows from the output table. PROC SQL determines uniqueness of a row by the values of the columns in the SELECT clauses.

The program combines tables MARCHRUNNERS, APRILRUNNERS, and MAYRUNNERS by column RUNNER. Each input table records the names and running times for the runners in the month that is specified in the table name.

The goal of the program is to find the names of all runners during the spring season from March to May. Some runners ran only once while others ran two or three times. Output table SPRINGRUNNERS should contain the names of all the runners with no duplicate rows for those running two or three times. There were 14 runners in March, 9 in April, and 17 in May. Output table SPRINGRUNNERS contains 28 rows.

PROC SQL determines uniqueness of rows by the columns in the SELECT clauses. For example, if you expanded the list of columns selected by coding SELECT * instead of SELECT RUNNER, PROC SQL would select all rows from each of the three input tables because none of the repeat runners had identical race times.

Even though the CREATE TABLE statement did not include the ORDER BY clause, the rows in the output table are arranged alphabetically by the runner's name. With the ALL keyword omitted, PROC SQL first concatenates all the rows that are returned by the three queries. Next, it sorts the rows by the column in common, and last, it removes duplicate rows. The result is that the output table is ordered by the runner's name.

Program

Create table SPRINGRUNNERS. Specify three queries. Select the same column from each of the three tables. Find the unique values of the column in the three queries by using the UNION set operator.

proc sql;
  create table springrunners as
     select runner from marchrunners
       union
     select runner from aprilrunners
       union
    select runner from mayrunners;
quit;

Related Technique

The following program creates a data set equivalent to the table that was produced in the main example with PROC SQL. A DATA step concatenates the three input data sets. Next, PROC SORT sorts the new data set by RUNNER. With the PROC SORT option NODUPKEY specified, PROC SORT saves unique occurrences of variable RUNNER in the output data set SPRINGRUNNERS.

Concatenate the three input data sets. Save only variable RUNNER in data set ALLENTRIES.

Sort data set ALLENTRIES and save the sorted observations in SPRINGRUNNERS. Do not save duplicate values of RUNNER in SPRINGRUNNERS. Sort the data set by RUNNER.

data allentries;
  set marchrunners aprilrunners mayrunners;
  keep runner;
run;
proc sort data=allentries out=springrunners

           nodupkey;

  by runner;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.97.85