Example 3.8 Combining Multiple Data Sets without a Variable Common to All the Data Sets

Goal

Combine three tables that do not all share a common column. Each table has one column in common with a second table, but not with the third. Use this relationship to combine all three tables. Compute a new column by using information from two of the tables.

Example Features

Featured StepPROC SQL
Featured Step Options and StatementsCompound WHERE clause
Related TechniqueHash object in the DATA step

Input Tables

Table MAINTENANCE contains a list of maintenance tasks performed on four vehicles identified by VEHICLE_ID. Table VEHICLES contains a list of customer vehicles. Table MAINTCHARGES contains a list of tasks identified by TASK_CODE and the charge for each task.

               MAINTENANCE

     vehicle_   task_
Obs     id      code   discount mechanic
 1    195331    M001     0.00     REW
 2    195331    M005     0.05     REW
 3    321551    M003     0.08     PFG
 4    371616    M006     0.12     KJH
 5    371616    M003     0.00     JNB
 6    911192    M002     0.00     TRA

              VEHICLES

     vehicle_
Obs    id      type        customer
 1   195331    Minivan     Lee, HG
 2   152843    Sedan       Gomez, UR
 3   321551    SUV         Carlson, BG
 4   430912    Sedan       Quinn, IP
 5   371616    Minivan     Monte, YR
 6   843200    Hatchback   Weeks, CA
 7   911192    SUV         Lane, NH

Column VEHICLE_ID is found only in tables MAINTENANCE and VEHICLES. Column TASK_CODE is found only in tables MAINTENANCE and MAINTCHARGES.

                MAINTCHARGES
     task_
Obs  code   task_desc              charge
 1   M001   Oil change             $25.00
 2   M002   Filters                $30.00
 3   M003   Tire rotation          $40.00
 4   M004   Brake pads            $100.00
 5   M005   Radiator flush        $100.00
 6   M006   Battery replacement   $120.00

Resulting Table

Output 3.8 MAINTBILLS Table

                  Example 3.8 MAINTBILLS Table Created with PROC SQL

       ehicle_                           task_
Obs      id      customer      type      code    task_desc                cost

 1     195331    Lee, HG       Minivan   M001    Oil change             $25.00
 2     195331    Lee, HG       Minivan   M005    Radiator flush         $95.00
 3     321551    Carlson, BG   SUV       M003    Tire rotation          $36.80
 4     371616    Monte, YR     Minivan   M003    Tire rotation          $40.00
 5     371616    Monte, YR     Minivan   M006    Battery replacement   $105.60
 6     911192    Lane, NH      SUV       M002    Filters                $30.00


Example Overview

The following PROC SQL step demonstrates how to combine three tables that do not have a matching column common to the three tables. Each table has one column in common with a second table, but not with the third. The tables are combined with a compound WHERE clause, and new columns are computed that combine columns from different tables.

Data set MAINTENANCE contains a list of maintenance tasks performed on four vehicles identified by VEHICLE_ID. Data set VEHICLES contains a list of customer vehicles. Data set MAINTCHARGES contains a list of tasks identified by TASK_CODE and their charges.

The objective of this program is to join the MAINTENANCE, VEHICLES, and MAINTCHARGES tables to find vehicle information and final costs for each task in MAINTENANCE. The MAINTENANCE table can be linked to the VEHICLES table by column VEHICLE_ID, and it can be linked to the MAINTCHARGES table by column TASK_CODE.

As a result of the join, all columns from all three tables are available to process. By joining MAINTENANCE and MAINTCHARGES, you can compute the final cost for the maintenance task by multiplying DISCOUNT and CHARGES. By joining MAINTENANCE and VEHICLES, you can find information about each vehicle serviced.

Vehicles 430912 and 843200 do not have any maintenance activity and do not have rows in the MAINTENANCE table. Also, no vehicles in MAINTENANCE have task M004. Therefore, no rows for these vehicles or this task is present in output table MAINTBILLS because no conditions that include these vehicles or task satisfy the compound WHERE clause.

Create table MAINTBILLS. Select specific columns from the three input tables. Specify the origin of the columns by preceding the column name with an alias for those columns in more than one table. Compute a new column by using column CHARGE selected from MAINTCHARGES and column DISCOUNT selected from MAINTENANCE. Identify the three input tables. Assign an alias to each table.

Join the tables with a compound WHERE clause. Precede each column name with the alias for the table source. Arrange the rows in table MAINTBILLS.

proc sql;
  create table maintbills as
    select m.vehicle_id, customer, type,
           m.task_code, task_desc,



           (charge-discount*charge) as cost
                         format=dollar8.2



      from maintenance as m,
           vehicles as v,
           maintcharges as c

      where m.vehicle_id=v.vehicle_id and
            m.task_code=c.task_code

      order by m.vehicle_id, m.task_code;

Related Technique

The following program creates a data set equivalent to the table that was created in the main example. It creates two hash objects: V loaded from data set VEHICLES and C loaded from data set MAINTCHARGES. The DATA step reads data set MAINTENANCE. For each observation in MAINTENANCE, the DATA step looks up information in hash object V based on the values of VEHICLE_ID. It also looks up information in hash object C based on the values of TASK_CODE.

When each hash lookup finds a match, SAS copies the data part of the hash object into the variables that are specified by its DEFINEDATA method. When a match is found in both hash objects, an assignment statement computes COST.

Because the SAS compiler cannot detect the data variable assignments that the hash objects make, you will see notes in the SAS log about uninitialized variables unless you define the variables with SAS language statements. This example defines the four items in the two hash objects as data variables by supplying them as arguments to the CALL MISSING function.

An improvement to the DATA step would be to add code that handles the situation when matches are not found in one or both of the hash objects. In this example, each observation in MAINTENANCE finds a match in both hash objects.

Create data set MAINTBILLS. Define the variables saved in MAINTBILLS and used in the DATA step.

On the first iteration of the DATA step, create, name, and specify attributes of the two hash objects. Create hash object V. Load V with data from data set VEHICLES. Specify one key variable and two data variables.

Create hash object C. Load C with data from data set MAINTCHARGES. Specify one key variable and two data variables.

Prevent SAS notes about uninitialized variables that are data items in the hash objects.

Read data set MAINTENANCE. Determine whether the current values of VEHICLE_ID and TASK_CODE in MAINTENANCE have matches in the two hash objects. Save the return codes of the search results in two different variables. When both lookups succeed in finding a match, compute COST.

data maintbills;
  attrib vehicle_id length=8
         customer length=$15
         type      length=$9
         task_code length=$8
         task_desc length=$20
         cost      length=8 format=dollar10.2
         charge    length=8
         discount  length=8;
   keep vehicle_id customer type task_code task_desc
        cost;
   if _n_=1 then do;

     declare hash v(dataset:"work.vehicles");
     v.defineKey('vehicle_id'),
     v.definedata('type','customer'),
     v.definedone();

     declare hash c(dataset:"work.maintcharges");
     c.defineKey('task_code'),
     c.definedata('task_desc','charge'),
     c.definedone();

     call missing(type, customer, task_desc, charge);

   end;

   set maintenance;
   rcv=v.find(key: vehicle_id);
   rcc=c.find(key: task_code);


   if rcv=0 and rcc=0 then cost=charge-discount*charge;

run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.40.189