Example 3.6 Combining Rows When There Is No Common Column

Goal

Combine rows from two tables based on specific criteria, even when there is no column common to the two tables.

Example Features

Featured StepPROC SQL
Featured Step Options and StatementsFull join, WHERE clause with BETWEEN operator
Related TechniqueDATA step, multiple SET statements, POINT= and NOBS= options

Input Tables

Tables BUILD_PROJECTS and BUILD_BILLS have no common column. The dates associated with each project in BUILD_PROJECTS do not overlap with any other project.

          BUILD_PROJECTS

Obs  start_date    end_date  project
 1   01/08/2010  01/27/2010  Basement
 2   02/01/2010  02/12/2010  Frame
 3   02/15/2010  02/20/2010  Roofing
 4   02/22/2010  02/27/2010  Plumb
 5   03/02/2010  03/05/2010  Wire
 6   03/08/2010  03/29/2010  Brick

           BUILD_BILLS

             completion_
Obs  workid     date        charge
 1   1234    01/18/2010    $944.80
 2   2225    02/18/2010   $1280.94
 3   3879    03/04/2010    $888.90
 4   8888    03/19/2010   $2280.87

Resulting Table

Output 3.6 BUILD_COMPLETE Table

 Example 3.6 BUILD_COMPLETE Table Created with PROC SQL

                                           completion_
 Obs start_date   end_date project  workid    date      charge

  1  01/08/2010 01/27/2010 Basement  1234  01/18/2010  $944.80
  2  02/15/2010 02/20/2010 Roofing   2225  02/18/2010 $1280.94
  3  03/02/2010 03/05/2010 Wire      3879  03/04/2010  $888.90
  4  03/08/2010 03/29/2010 Brick     8888  03/19/2010 $2280.87


Example Overview

The PROC SQL step in this example shows how you can combine two tables that do not have a variable in common.

Table BUILD_PROJECTS contains a list of start and end dates for six home repair projects. Table BUILD_BILLS contains a list of four charges with a completion date. The two data sets do not have a variable in common. None of the start and end dates of one project overlap with another.

The objective of this program is to bill charges to the correct phase of a construction project. A row in BUILD_BILLS is matched to a row in BUILD_PROJECTS when the completion date falls within the start and end dates of a project.

Conceptually, the PROC SQL join results in an internal table that matches every row in BUILD_PROJECTS with every row in BUILD_BILLS. From that internal table, the WHERE clause outputs the rows with a value of COMPLETION_DATE that is between START_DATE and END_DATE.

Program

Create table BUILD_COMPLETE. Select all columns from both tables. Combine every row from BUILD_PROJECTS with every row from BUILD_BILLS. Select from the joined rows those where the completion date is between the start and end date of a project.

proc sql;
  create table build_complete as
    select *
      from build_projects, build_bills

      where completion_date between
              start_date and end_date;

quit;

Related Technique

The DATA step in this related technique creates a data set equivalent to the table that was created by the PROC SQL step in the main example. It programmatically combines the two data sets by using an iterative DO-UNTIL loop and SET statements that control how the two input data sets are read. It does not use MERGE and BY statements to do this.

For each completion date in BUILD_BILLS, the DATA step looks for a range of dates in BUILD_PROJECTS in which the value of COMPLETION_DATE fits. All observations from BUILD_BILLS are output to BUILD_COMPLETE. Not necessarily all observations from BUILD_PROJECTS are output to BUILD_COMPLETE.

Each iteration of the DATA step reads one observation from the first data set, BUILD_BILLS, and each iteration also potentially reads all observations from the second, BUILD_COMPLETE. It starts with a simple SET statement that identifies the first data set. The iterative DO-UNTIL loop that follows processes the second data set.

The POINT= and OBS= data set options in the SET statement for the second data set access observations in BUILD_BILLS sequentially by observation number until a match is found or until all observations in BUILD_BILLS have been read. The POINT= option references the index variable I whose value directly accesses each observation in BUILD_BILLS by observation number. The NOBS= option assigns the number of observations in BUILD_BILLS to the variable N. This assignment occurs before the DATA step executes.

The iterative DO-UNTIL loop iterates once for each observation in BUILD_BILLS until a match is found. If a match is not found, the loop stops when it has iterated the number of times equal to the value saved in N.

When a match is found, the DATA step writes the matching observation to BUILD_COMPLETE, and it sets variable FOUND to 1. The condition of FOUND=1 stops the iterative DO-UNTIL loop so that no more observations are read from BUILD_BILLS. Variable FOUND is initialized to zero prior to the DO-UNTIL loop.

This program might be inefficient if the second data set is large because potentially this large data set could be read over and over again.

The program could be improved to include programming statements that write messages to the SAS log when an observation does not have a match in BUILD_BILLS or when multiple matches are found. Neither of these conditions exists in this example.

Create BUILD_COMPLETE. Read an observation from BUILD_PROJECTS.

Initialize FOUND whose value is used to control execution of the iterative DO-UNTIL loop.

Read observations from BUILD_BILLS until a match is found or until all observations have been read. Directly access an observation in BUILD_BILLS by pointing to the Ith observation. Save the total number of observations in BUILD_BILLS in variable N. Set the upper index value of the iterative DO-UNTIL loop to N. When the condition is met, set FOUND to 1 so that the iterative DO-UNTIL loop stops, and then write the matching observation to BUILD_COMPLETE.

data build_complete;
  set build_projects;

  drop found;
  found=0;
  do i=1 to n until (found);
    set build_bills point=i nobs=n;


    if start_date le completion_date le end_date
            then do;
      found=1;
      output;
    end;
  end;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.146.87