Example 3.10 Generating Every Combination of Rows between Tables Based on a Common Column

Goal

Combine two tables that have a common column where the common column has duplicate values in both tables. Produce a table that contains all possible combinations of the rows where the values from the common column match.

Example Features

Featured StepPROC SQL
Featured Step Options and StatementsFull join with WHERE clause
Related TechniqueHash iterator object in the DATA step

Input Tables

Table MEETINGS contains events to be held on two days and the expected attendance for each event. Table MEETINGROOMS contains meeting rooms that are available for the same two days. The column in common between the two tables is MEETINGDATE with multiple occurrences of the two values in each of the tables.

                            MEETINGS

                                                       approx_
Obs    meetingdate               event               attendance
 1     09/14/2009     Civil Engineering Seminar 1         25
 2     09/14/2009     Economics Panel Discussion         150
 3     09/14/2009     Psychology Working Group            15
 4     09/14/2009     Statistics Users Group              45
 5     09/15/2009     Civil Engineering Seminar 2         25
 6     09/15/2009     Energy Solutions Committee          50
 7     09/15/2009     Language Arts Teachers              60
 8     09/15/2009     Management Certification            25
 9     09/15/2009     Transit Planning Hearing            40

              MEETINGROOMS

                       room_
Obs    meetingdate     number    capacity
  1    09/14/2009      A140         100
  2    09/14/2009      A020          40
  3    09/14/2009      B200         300
  4    09/14/2009      B220          75
  5    09/14/2009      C030          15
  6    09/14/2009      C125          30
  7    09/15/2009      A140         100
  8    09/15/2009      A120          25
  9    09/15/2009      A200         300
 10    09/15/2009      B110          50
 11    09/15/2009      B220          25
 12    09/15/2009      C050          25
 13    09/15/2009      C070          10
 14    09/15/2009      C125          20

Resulting Table

Output 3.10 POSSIBLE_LOCS Table

        Example 3.10 POSSIBLE_LOCS Table Created with PROC SQL

                                              approx_  room_
Obs meetingdate            event            attendance number  capacity
  1 09/14/2009  Civil Engineering Seminar 1      25     A020       40
  2 09/14/2009  Civil Engineering Seminar 1      25     A140      100
  3 09/14/2009  Civil Engineering Seminar 1      25     B200      300
  4 09/14/2009  Civil Engineering Seminar 1      25     B220       75
  5 09/14/2009  Civil Engineering Seminar 1      25     C125       30
  6 09/14/2009  Economics Panel Discussion      150     B200      300
  7 09/14/2009  Psychology Working Group         15     A020       40
  8 09/14/2009  Psychology Working Group         15     A140      100
  9 09/14/2009  Psychology Working Group         15     B200      300
 10 09/14/2009  Psychology Working Group         15     B220       75
 11 09/14/2009  Psychology Working Group         15     C030       15
 12 09/14/2009  Psychology Working Group         15     C125       30
 13 09/14/2009  Statistics Users Group           45     A140      100
 14 09/14/2009  Statistics Users Group           45     B200      300
 15 09/14/2009  Statistics Users Group           45     B220       75
 16 09/15/2009  Civil Engineering Seminar 2      25     A120       25
 17 09/15/2009  Civil Engineering Seminar 2      25     A140      100
 18 09/15/2009  Civil Engineering Seminar 2      25     A200      300
 19 09/15/2009  Civil Engineering Seminar 2      25     B110       50
 20 09/15/2009  Civil Engineering Seminar 2      25     B220       25
 21 09/15/2009  Civil Engineering Seminar 2      25     C050       25
 22 09/15/2009  Energy Solutions Committee       50     A140      100
 23 09/15/2009  Energy Solutions Committee       50     A200      300
 24 09/15/2009  Energy Solutions Committee       50     B110       50
 25 09/15/2009  Language Arts Teachers           60     A140      100
 26 09/15/2009  Language Arts Teachers           60     A200      300
 27 09/15/2009  Management Certification         25     A120       25
 28 09/15/2009  Management Certification         25     A140      100
 29 09/15/2009  Management Certification         25     A200      300
 30 09/15/2009  Management Certification         25     B110       50
 31 09/15/2009  Management Certification         25     B220       25
 32 09/15/2009  Management Certification         25     C050       25
 33 09/15/2009  Transit Planning Hearing         40     A140      100
 34 09/15/2009  Transit Planning Hearing         40     A200      300
 35 09/15/2009  Transit Planning Hearing         40     B110       50


Example Overview

This example uses PROC SQL to produce all possible combinations of the rows from two tables that match on a column common to the two tables. And from the product of the two tables, it selects rows that meet a specific condition.

The goal of the program is to create a list of suitable meeting rooms for several events on two days. Table MEETINGS stores event information. Table MEETINGROOMS stores meeting room information. The compound WHERE clause in the SELECT statement matches the two tables by MEETINGDATE, which is the column in common, and it selects matches based on whether the meeting room has the capacity for the event.

Program

Create table POSSIBLE_LOCS. Select all columns from MEETINGS, which has the alias m, and select two columns from MEETINGDATE. Combine the two tables. Specify an alias for each table. Select rows where MEETINGDATE matches. Produce all combinations of rows that match on MEETINGDATE. Select from the rows that match on MEETINGDATE those rows that pass the comparison of APPROX_ATTENDANCE and CAPACITY. Order the rows in the output table.

proc sql;
  create table possible_locs as
    select m.*,
           room_number, capacity

      from meetings m, meetingrooms r

      where m.meetingdate=r.meetingdate and

            approx_attendance le capacity



     order by meetingdate, event, room_number;
quit;

Related Technique

This related technique uses a hash iterator object in a DATA step to find all the possible combinations of observations between the two data sets MEETINGS and MEETINGROOMS that match by meeting date. The DATA step selects matches that have a value of CAPACITY at least equal to the value of APPROX_ATTENDANCE. It produces a data set equivalent to the table in the main example.

Both data sets in this example are small and each easily fits in memory. A data set that is loaded into a hash table is read only once. Retrieval from memory is fast. The hash table in this example is used like a dynamic array, and it can store an indefinite amount of data.

The DATA step on its first iteration declares hash object ROOMS as a hash iterator object and loads data set MEETINGROOMS into ROOMS. One key is defined and multiple data items for each key value are allowed.

After reading an observation from MEETINGS, the DATA step applies the FIRST method to return the first data item in hash object ROOMS. The DATA step then uses a DO WHILE loop to apply the NEXT method to move through the hash object. When the DATA step detects the end of the hash object, the iteration of the DATA step is complete.

To make key variables available for output and for processing by SAS language statements, you must define them as data variables. The DATA step defines key variable ROOMDATE as data with the DEFINEDATA method so that its values are available for testing in the DO WHILE group.

Because the SAS compiler cannot detect the key and data variable assignments that the hash object and the hash iterator make, you will see notes in the SAS log about uninitialized variables unless you define the variables with SAS language statements. This example defines the three items in the hash object as data variables by supplying them as arguments to the CALL MISSING function.

Create data set POSSIBLE_LOCS. Specify attributes for the variables that are processed in this DATA step.

On the first iteration of the DATA step, define hash object ROOMS. Load data set MEETINGROOMS into ROOMS. Rename MEETINGDATE so that its values in the hash table can be compared to MEETINGDATE in data set MEETINGS. Allow multiple data items for each key value. Specify retrieval of items from ROOMS in ascending order of the key values. Declare that ROOMS is a hash iterator object. Define the key in ROOMS. Define all the variables in MEETINGROOMS as data variables, which includes key ROOMDATE. Close the definition of hash object ROOMS. Prevent SAS notes about uninitialized variables.

On each iteration of the DATA step, read one observation from MEETINGS. Drop variables not needed in the output data set, including the renamed version of MEETINGDATE. Return the first set of data values in ROOMS, which are in ascending order by the key values. Assign to variable RC the return code that indicates whether the FIRST method was successful. Iterate through ROOMS. Stop the loop when the return code from the NEXT method is not 0, which indicates that the end of ROOMS has been reached. Output an observation when the set of currently retrieved items meets the conditions in the IF statement. Because the values of ROOMDATE are in ascending order in ROOMS, stop processing the DO WHILE loop if the value of ROOMDATE is greater than the value of MEETINGDATE. Return the next set of values from ROOMS. Assign to variable RC the return code that indicates whether the NEXT method was successful.

data possible_locs;
  attrib meetingdate length=8 format=mmddyy10.
         event length=$30
         approx_attendance length=8
         roomdate length=8 format=mmddyy10.
         capacity length=8
         room_number length=$4;
  if _n_=1 then do;

    declare hash rooms(dataset:
     'work.meetingrooms(rename=(meetingdate=roomdate))',



                        multidata: 'yes'

                        ordered: 'a'),

    declare hiter iter('rooms'),

    rooms.definekey('roomdate'),
    rooms.definedata(all:'yes'),
    rooms.definedone();

    call missing(roomdate, room_number, capacity);

  end;
  set meetings;


  drop rc roomdate;

  rc=iter.first();


  do while (rc=0);


    if meetingdate=roomdate and
             approx_attendance le capacity then output;

      if roomdate > meetingdate then leave;

      rc=iter.next();

  end;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.111.58