Combine three tables that do not all share a common column. Each table has one column in common with a second table, but not with the third. Use this relationship to combine all three tables. Compute a new column by using information from two of the tables.
Featured Step | PROC SQL |
Featured Step Options and Statements | Compound WHERE clause |
Related Technique | Hash object in the DATA step |
Table MAINTENANCE contains a list of maintenance tasks performed on four vehicles identified by VEHICLE_ID. Table VEHICLES contains a list of customer vehicles. Table MAINTCHARGES contains a list of tasks identified by TASK_CODE and the charge for each task.
MAINTENANCE vehicle_ task_ Obs id code discount mechanic 1 195331 M001 0.00 REW 2 195331 M005 0.05 REW 3 321551 M003 0.08 PFG 4 371616 M006 0.12 KJH 5 371616 M003 0.00 JNB 6 911192 M002 0.00 TRA VEHICLES vehicle_ Obs id type customer 1 195331 Minivan Lee, HG 2 152843 Sedan Gomez, UR 3 321551 SUV Carlson, BG 4 430912 Sedan Quinn, IP 5 371616 Minivan Monte, YR 6 843200 Hatchback Weeks, CA 7 911192 SUV Lane, NH
Column VEHICLE_ID is found only in tables MAINTENANCE and VEHICLES. Column TASK_CODE is found only in tables MAINTENANCE and MAINTCHARGES.
MAINTCHARGES task_ Obs code task_desc charge 1 M001 Oil change $25.00 2 M002 Filters $30.00 3 M003 Tire rotation $40.00 4 M004 Brake pads $100.00 5 M005 Radiator flush $100.00 6 M006 Battery replacement $120.00
Output 3.8 MAINTBILLS TableExample 3.8 MAINTBILLS Table Created with PROC SQL ehicle_ task_ Obs id customer type code task_desc cost 1 195331 Lee, HG Minivan M001 Oil change $25.00 2 195331 Lee, HG Minivan M005 Radiator flush $95.00 3 321551 Carlson, BG SUV M003 Tire rotation $36.80 4 371616 Monte, YR Minivan M003 Tire rotation $40.00 5 371616 Monte, YR Minivan M006 Battery replacement $105.60 6 911192 Lane, NH SUV M002 Filters $30.00 |
The following PROC SQL step demonstrates how to combine three tables that do not have a matching column common to the three tables. Each table has one column in common with a second table, but not with the third. The tables are combined with a compound WHERE clause, and new columns are computed that combine columns from different tables.
Data set MAINTENANCE contains a list of maintenance tasks performed on four vehicles identified by VEHICLE_ID. Data set VEHICLES contains a list of customer vehicles. Data set MAINTCHARGES contains a list of tasks identified by TASK_CODE and their charges.
The objective of this program is to join the MAINTENANCE, VEHICLES, and MAINTCHARGES tables to find vehicle information and final costs for each task in MAINTENANCE. The MAINTENANCE table can be linked to the VEHICLES table by column VEHICLE_ID, and it can be linked to the MAINTCHARGES table by column TASK_CODE.
As a result of the join, all columns from all three tables are available to process. By joining MAINTENANCE and MAINTCHARGES, you can compute the final cost for the maintenance task by multiplying DISCOUNT and CHARGES. By joining MAINTENANCE and VEHICLES, you can find information about each vehicle serviced.
Vehicles 430912 and 843200 do not have any maintenance activity and do not have rows in the MAINTENANCE table. Also, no vehicles in MAINTENANCE have task M004. Therefore, no rows for these vehicles or this task is present in output table MAINTBILLS because no conditions that include these vehicles or task satisfy the compound WHERE clause.
Create table MAINTBILLS. Select specific columns from the three input tables. Specify the origin of the columns by preceding the column name with an alias for those columns in more than one table. Compute a new column by using column CHARGE selected from MAINTCHARGES and column DISCOUNT selected from MAINTENANCE. Identify the three input tables. Assign an alias to each table.
Join the tables with a compound WHERE clause. Precede each column name with the alias for the table source. Arrange the rows in table MAINTBILLS.
proc sql; create table maintbills as select m.vehicle_id, customer, type, m.task_code, task_desc, (charge-discount*charge) as cost format=dollar8.2 from maintenance as m, vehicles as v, maintcharges as c where m.vehicle_id=v.vehicle_id and m.task_code=c.task_code order by m.vehicle_id, m.task_code;
The following program creates a data set equivalent to the table that was created in the main example. It creates two hash objects: V loaded from data set VEHICLES and C loaded from data set MAINTCHARGES. The DATA step reads data set MAINTENANCE. For each observation in MAINTENANCE, the DATA step looks up information in hash object V based on the values of VEHICLE_ID. It also looks up information in hash object C based on the values of TASK_CODE.
When each hash lookup finds a match, SAS copies the data part of the hash object into the variables that are specified by its DEFINEDATA method. When a match is found in both hash objects, an assignment statement computes COST.
Because the SAS compiler cannot detect the data variable assignments that the hash objects make, you will see notes in the SAS log about uninitialized variables unless you define the variables with SAS language statements. This example defines the four items in the two hash objects as data variables by supplying them as arguments to the CALL MISSING function.
An improvement to the DATA step would be to add code that handles the situation when matches are not found in one or both of the hash objects. In this example, each observation in MAINTENANCE finds a match in both hash objects.
Create data set MAINTBILLS. Define the variables saved in MAINTBILLS and used in the DATA step.
On the first iteration of the DATA step, create, name, and specify attributes of the two hash objects. Create hash object V. Load V with data from data set VEHICLES. Specify one key variable and two data variables.
Create hash object C. Load C with data from data set MAINTCHARGES. Specify one key variable and two data variables.
Prevent SAS notes about uninitialized variables that are data items in the hash objects.
Read data set MAINTENANCE. Determine whether the current values of VEHICLE_ID and TASK_CODE in MAINTENANCE have matches in the two hash objects. Save the return codes of the search results in two different variables. When both lookups succeed in finding a match, compute COST.
data maintbills; attrib vehicle_id length=8 customer length=$15 type length=$9 task_code length=$8 task_desc length=$20 cost length=8 format=dollar10.2 charge length=8 discount length=8; keep vehicle_id customer type task_code task_desc cost; if _n_=1 then do; declare hash v(dataset:"work.vehicles"); v.defineKey('vehicle_id'), v.definedata('type','customer'), v.definedone(); declare hash c(dataset:"work.maintcharges"); c.defineKey('task_code'), c.definedata('task_desc','charge'), c.definedone(); call missing(type, customer, task_desc, charge); end; set maintenance; rcv=v.find(key: vehicle_id); rcc=c.find(key: task_code); if rcv=0 and rcc=0 then cost=charge-discount*charge; run;
3.145.40.189