Combine two tables that have a common column where the common column has duplicate values in both tables. Produce a table that contains all possible combinations of the rows where the values from the common column match.
Featured Step | PROC SQL |
Featured Step Options and Statements | Full join with WHERE clause |
Related Technique | Hash iterator object in the DATA step |
Table MEETINGS contains events to be held on two days and the expected attendance for each event. Table MEETINGROOMS contains meeting rooms that are available for the same two days. The column in common between the two tables is MEETINGDATE with multiple occurrences of the two values in each of the tables.
MEETINGS approx_ Obs meetingdate event attendance 1 09/14/2009 Civil Engineering Seminar 1 25 2 09/14/2009 Economics Panel Discussion 150 3 09/14/2009 Psychology Working Group 15 4 09/14/2009 Statistics Users Group 45 5 09/15/2009 Civil Engineering Seminar 2 25 6 09/15/2009 Energy Solutions Committee 50 7 09/15/2009 Language Arts Teachers 60 8 09/15/2009 Management Certification 25 9 09/15/2009 Transit Planning Hearing 40 MEETINGROOMS room_ Obs meetingdate number capacity 1 09/14/2009 A140 100 2 09/14/2009 A020 40 3 09/14/2009 B200 300 4 09/14/2009 B220 75 5 09/14/2009 C030 15 6 09/14/2009 C125 30 7 09/15/2009 A140 100 8 09/15/2009 A120 25 9 09/15/2009 A200 300 10 09/15/2009 B110 50 11 09/15/2009 B220 25 12 09/15/2009 C050 25 13 09/15/2009 C070 10 14 09/15/2009 C125 20
Output 3.10 POSSIBLE_LOCS TableExample 3.10 POSSIBLE_LOCS Table Created with PROC SQL approx_ room_ Obs meetingdate event attendance number capacity 1 09/14/2009 Civil Engineering Seminar 1 25 A020 40 2 09/14/2009 Civil Engineering Seminar 1 25 A140 100 3 09/14/2009 Civil Engineering Seminar 1 25 B200 300 4 09/14/2009 Civil Engineering Seminar 1 25 B220 75 5 09/14/2009 Civil Engineering Seminar 1 25 C125 30 6 09/14/2009 Economics Panel Discussion 150 B200 300 7 09/14/2009 Psychology Working Group 15 A020 40 8 09/14/2009 Psychology Working Group 15 A140 100 9 09/14/2009 Psychology Working Group 15 B200 300 10 09/14/2009 Psychology Working Group 15 B220 75 11 09/14/2009 Psychology Working Group 15 C030 15 12 09/14/2009 Psychology Working Group 15 C125 30 13 09/14/2009 Statistics Users Group 45 A140 100 14 09/14/2009 Statistics Users Group 45 B200 300 15 09/14/2009 Statistics Users Group 45 B220 75 16 09/15/2009 Civil Engineering Seminar 2 25 A120 25 17 09/15/2009 Civil Engineering Seminar 2 25 A140 100 18 09/15/2009 Civil Engineering Seminar 2 25 A200 300 19 09/15/2009 Civil Engineering Seminar 2 25 B110 50 20 09/15/2009 Civil Engineering Seminar 2 25 B220 25 21 09/15/2009 Civil Engineering Seminar 2 25 C050 25 22 09/15/2009 Energy Solutions Committee 50 A140 100 23 09/15/2009 Energy Solutions Committee 50 A200 300 24 09/15/2009 Energy Solutions Committee 50 B110 50 25 09/15/2009 Language Arts Teachers 60 A140 100 26 09/15/2009 Language Arts Teachers 60 A200 300 27 09/15/2009 Management Certification 25 A120 25 28 09/15/2009 Management Certification 25 A140 100 29 09/15/2009 Management Certification 25 A200 300 30 09/15/2009 Management Certification 25 B110 50 31 09/15/2009 Management Certification 25 B220 25 32 09/15/2009 Management Certification 25 C050 25 33 09/15/2009 Transit Planning Hearing 40 A140 100 34 09/15/2009 Transit Planning Hearing 40 A200 300 35 09/15/2009 Transit Planning Hearing 40 B110 50 |
This example uses PROC SQL to produce all possible combinations of the rows from two tables that match on a column common to the two tables. And from the product of the two tables, it selects rows that meet a specific condition.
The goal of the program is to create a list of suitable meeting rooms for several events on two days. Table MEETINGS stores event information. Table MEETINGROOMS stores meeting room information. The compound WHERE clause in the SELECT statement matches the two tables by MEETINGDATE, which is the column in common, and it selects matches based on whether the meeting room has the capacity for the event.
Create table POSSIBLE_LOCS. Select all columns from MEETINGS, which has the alias m, and select two columns from MEETINGDATE. Combine the two tables. Specify an alias for each table. Select rows where MEETINGDATE matches. Produce all combinations of rows that match on MEETINGDATE. Select from the rows that match on MEETINGDATE those rows that pass the comparison of APPROX_ATTENDANCE and CAPACITY. Order the rows in the output table.
proc sql; create table possible_locs as select m.*, room_number, capacity from meetings m, meetingrooms r where m.meetingdate=r.meetingdate and
approx_attendance le capacity order by meetingdate, event, room_number; quit;
This related technique uses a hash iterator object in a DATA step to find all the possible combinations of observations between the two data sets MEETINGS and MEETINGROOMS that match by meeting date. The DATA step selects matches that have a value of CAPACITY at least equal to the value of APPROX_ATTENDANCE. It produces a data set equivalent to the table in the main example.
Both data sets in this example are small and each easily fits in memory. A data set that is loaded into a hash table is read only once. Retrieval from memory is fast. The hash table in this example is used like a dynamic array, and it can store an indefinite amount of data.
The DATA step on its first iteration declares hash object ROOMS as a hash iterator object and loads data set MEETINGROOMS into ROOMS. One key is defined and multiple data items for each key value are allowed.
After reading an observation from MEETINGS, the DATA step applies the FIRST method to return the first data item in hash object ROOMS. The DATA step then uses a DO WHILE loop to apply the NEXT method to move through the hash object. When the DATA step detects the end of the hash object, the iteration of the DATA step is complete.
To make key variables available for output and for processing by SAS language statements, you must define them as data variables. The DATA step defines key variable ROOMDATE as data with the DEFINEDATA method so that its values are available for testing in the DO WHILE group.
Because the SAS compiler cannot detect the key and data variable assignments that the hash object and the hash iterator make, you will see notes in the SAS log about uninitialized variables unless you define the variables with SAS language statements. This example defines the three items in the hash object as data variables by supplying them as arguments to the CALL MISSING function.
Create data set POSSIBLE_LOCS. Specify attributes for the variables that are processed in this DATA step.
On the first iteration of the DATA step, define hash object ROOMS. Load data set MEETINGROOMS into ROOMS. Rename MEETINGDATE so that its values in the hash table can be compared to MEETINGDATE in data set MEETINGS. Allow multiple data items for each key value. Specify retrieval of items from ROOMS in ascending order of the key values. Declare that ROOMS is a hash iterator object. Define the key in ROOMS. Define all the variables in MEETINGROOMS as data variables, which includes key ROOMDATE. Close the definition of hash object ROOMS. Prevent SAS notes about uninitialized variables.
On each iteration of the DATA step, read one observation from MEETINGS. Drop variables not needed in the output data set, including the renamed version of MEETINGDATE. Return the first set of data values in ROOMS, which are in ascending order by the key values. Assign to variable RC the return code that indicates whether the FIRST method was successful. Iterate through ROOMS. Stop the loop when the return code from the NEXT method is not 0, which indicates that the end of ROOMS has been reached. Output an observation when the set of currently retrieved items meets the conditions in the IF statement. Because the values of ROOMDATE are in ascending order in ROOMS, stop processing the DO WHILE loop if the value of ROOMDATE is greater than the value of MEETINGDATE. Return the next set of values from ROOMS. Assign to variable RC the return code that indicates whether the NEXT method was successful.
data possible_locs; attrib meetingdate length=8 format=mmddyy10. event length=$30 approx_attendance length=8 roomdate length=8 format=mmddyy10. capacity length=8 room_number length=$4; if _n_=1 then do; declare hash rooms(dataset: 'work.meetingrooms(rename=(meetingdate=roomdate))', multidata: 'yes' ordered: 'a'), declare hiter iter('rooms'), rooms.definekey('roomdate'), rooms.definedata(all:'yes'), rooms.definedone(); call missing(roomdate, room_number, capacity); end; set meetings; drop rc roomdate; rc=iter.first(); do while (rc=0); if meetingdate=roomdate and approx_attendance le capacity then output; if roomdate > meetingdate then leave; rc=iter.next(); end; run;
18.227.111.58