Select the unique rows among two or more tables based on the value of a specific column.
Featured Step | PROC SQL |
Featured Step Options and Statements | UNION set operator |
Related Technique | DATA step PROC SORT, NODUPKEY option |
Tables MARCHRUNNERS, APRILRUNNERS, and MAYRUNNERS contain the names and running times for the runners in the month that is specified in the table name.
MARCHRUNNERS Obs runner racetime 1 Sanchez, SO 35:49.4 2 Flores, RX 38:32.8 3 Martinez, KF 36:12.3 4 Hayes, MU 35:24.7 5 Carter, RT 39:38.1 6 Rogers, LX 40:22.1 7 Clark, SQ 40:05.8 8 Taylor, TP 38:30.2 9 Smith, JD 37:03.5 10 Green, TF 37:50.0 11 Brown, GL 35:59.2 12 Lee, AO 39:36.4 13 Jones, LW 37:38.3 14 Smith, KB 38:10.3
APRILRUNNERS Obs runner racetime 1 Martinez, KF 37:55.1 2 Rogers, LX 37:21.7 3 Smith, JD 38:32.7 4 Lee, AO 37:18.8 5 Jones, LW 35:04.1 6 Patterson, BB 38:26.1 7 Smith, KB 37:27.1 8 Alexander, SW 38:55.0 9 Young, SX 37:14.8
The three input tables are also used in Examples 2.5 and 2.6.
MAYRUNNERS Obs runner racetime 1 Lee, AO 35:36.0 2 Williams, JO 40:26.7 3 Martin, JF 39:12.4 4 Jones, LW 36:37.6 5 Young, MX 39:17.6 6 Brown, NV 35:37.6 7 Patterson, BB 36:09.8 8 Johnson, AY 37:37.9 9 Cox, SR 37:16.6 10 Smith, KB 40:12.7 11 Martin, MG 35:37.9 12 Johnson, GR 38:14.6 13 Alexander, SW 36:36.8 14 Martinez, WL 37:06.6 15 Miller, JN 36:40.3 16 Young, SX 37:15.4 17 Wilson, WU 38:53.1
Output 2.4 SPRINGRUNNERS TableExample 2.4 SPRINGRUNNERS Table Created by PROC SQL Obs runner 1 Alexander, SW 2 Brown, GL 3 Brown, NV 4 Carter, RT 5 Clark, SQ 6 Cox, SR 7 Flores, RX 8 Green, TF 9 Hayes, MU 10 Johnson, AY 11 Johnson, GR 12 Jones, LW 13 Lee, AO 14 Martin, JF 15 Martin, MG 16 Martinez, KF 17 Martinez, WL 18 Miller, JN 19 Patterson, BB 20 Rogers, LX 21 Sanchez, SO 22 Smith, JD 23 Smith, KB 24 Taylor, TP 25 Williams, JO 26 Wilson, WU 27 Young, MX 28 Young, SX |
The following PROC SQL step combines three tables vertically. The step performs three queries that each select the same column and it returns one row for each unique value of the column.
The UNION operator that connects the queries concatenates the rows that the three queries return. The process that PROC SQL follows is to combine the first two queries, and then combine that result with the third query.
A UNION operator that is used without the ALL keyword removes duplicate rows from the output table. PROC SQL determines uniqueness of a row by the values of the columns in the SELECT clauses.
The program combines tables MARCHRUNNERS, APRILRUNNERS, and MAYRUNNERS by column RUNNER. Each input table records the names and running times for the runners in the month that is specified in the table name.
The goal of the program is to find the names of all runners during the spring season from March to May. Some runners ran only once while others ran two or three times. Output table SPRINGRUNNERS should contain the names of all the runners with no duplicate rows for those running two or three times. There were 14 runners in March, 9 in April, and 17 in May. Output table SPRINGRUNNERS contains 28 rows.
PROC SQL determines uniqueness of rows by the columns in the SELECT clauses. For example, if you expanded the list of columns selected by coding SELECT * instead of SELECT RUNNER, PROC SQL would select all rows from each of the three input tables because none of the repeat runners had identical race times.
Even though the CREATE TABLE statement did not include the ORDER BY clause, the rows in the output table are arranged alphabetically by the runner's name. With the ALL keyword omitted, PROC SQL first concatenates all the rows that are returned by the three queries. Next, it sorts the rows by the column in common, and last, it removes duplicate rows. The result is that the output table is ordered by the runner's name.
Create table SPRINGRUNNERS. Specify three queries. Select the same column from each of the three tables. Find the unique values of the column in the three queries by using the UNION set operator.
proc sql; create table springrunners as select runner from marchrunners union select runner from aprilrunners union select runner from mayrunners; quit;
The following program creates a data set equivalent to the table that was produced in the main example with PROC SQL. A DATA step concatenates the three input data sets. Next, PROC SORT sorts the new data set by RUNNER. With the PROC SORT option NODUPKEY specified, PROC SORT saves unique occurrences of variable RUNNER in the output data set SPRINGRUNNERS.
Concatenate the three input data sets. Save only variable RUNNER in data set ALLENTRIES.
Sort data set ALLENTRIES and save the sorted observations in SPRINGRUNNERS. Do not save duplicate values of RUNNER in SPRINGRUNNERS. Sort the data set by RUNNER.
data allentries; set marchrunners aprilrunners mayrunners; keep runner; run; proc sort data=allentries out=springrunners nodupkey; by runner; run;
18.191.97.85