Example 4.1 Performing a Simple Table Lookup

Goal

Find a match in a lookup table for the current value of the key column in the primary table. For observations in the primary table with no match in the lookup table, assign a default value to the lookup column.

Example Features

Featured StepPROC SQL
Featured Step Options and StatementsLeft join ON clause
Related Technique 1Hash object in the DATA step
Related Technique 2DATA step match-merge, MERGE statement with IN= option

Input Tables

Table SALARIES contains salaries for four employees identified by column EMPNUM. Table BRACKETS contains tax brackets for five employees. Column EMPNUM is common to both tables.

     SALARIES

Obs  empnum     salary
 1    1234    $125,000
 2    3333     $85,000
 3    4876     $54,000
 4    5489     $29,000

      BRACKETS
                 tax_
 Obs  empnum  bracket
  1    1111      0.28
  2    1234      0.33
  3    3333      0.28
  4    4222      0.15
  5    4876      0.25

Resulting Table

Output 4.1 NETPAY Table

      Example 4.1 NETPAY Table Created with PROC SQL
                                       tax_
      Obs    empnum        salary    bracket       net_pay

       1      1234       $125,000      0.33        $83,750
       2      3333        $85,000      0.28        $61,200
       3      4876        $54,000      0.25        $40,500
       4      5489        $29,000      0.10        $26,100


Example Overview

This example shows you how to look up information for every row in a table by linking to another table by the values of a key column that is common to both tables.

Table SALARIES contains the salaries for four employees. Table BRACKETS contains tax brackets for five employees. Column EMPNUM is common to both tables and uniquely identifies the rows in each table.

The goal of this example is to look up the tax bracket for each employee in SALARIES and to compute the net pay for that employee. One employee, EMPNUM 5489, does not have a matching row in BRACKETS.

A PROC SQL left outer join correctly combines the tables because all rows from the master table, SALARIES, are to be output while values for TAX_BRACKET from only matched rows in the lookup table, BRACKETS, are retrieved. On the left side of the join is SALARIES and on the right side is BRACKETS.

The PROC SQL step matches the rows in SALARIES to the rows in BRACKETS by common column EMPNUM. The left join returns the rows that satisfy the condition in the ON clause and the rows in SALARIES that do not have a match in BRACKETS.

The CASE clause creates a column called TAX_BRACKET. When a row in SALARIES matches a row in BRACKETS, the value assigned to TAX_BRACKET is simply the value for TAX_BRACKET in BRACKETS. When a row in SALARIES has no match in BRACKETS, the value 0.10 is assigned to the TAX_BRACKET column in NETPAY.

Because employee 5489 has no row in BRACKETS, and thus no value for TAX_BRACKET in BRACKETS, the CASE clause assigns 0.10 to the calculated column TAX_BRACKET for employee 5489. It is then possible to compute NET_PAY for employee 5489.

Program

Create table NETPAY. Select specific columns from SALARIES. Use the value of 0.10 for the calculated column TAX_BRACKET in table NETPAY when a value for TAX_BRACKET is not found in BRACKETS. Compute the new column NET_PAY by using information from both SALARIES and BRACKETS. Precede column name TAX_BRACKET with the keyword CALCULATED so that the value used is the value that was computed earlier in the CASE expression and not a value taken directly from BRACKETS. Perform a left join of SALARIES and BRACKETS based on the values of EMPNUM. Assign an alias to the SALARIES table.

proc sql;
  create table netpay as
    select s.empnum, s.salary,

           case when tax_bracket eq . then 0.10
                else tax_bracket
           end as tax_bracket format=4.2,


           salary*(1-calculated tax_bracket)
             as net_pay format=dollar10.



        from salaries s left join brackets
        on s.empnum=brackets.empnum;


quit;

Related Technique 1

The DATA step in this related technique creates a data set equivalent to the table that was created by the PROC SQL step in the main example. It accomplishes the same lookup task by defining a hash object from which to find the information stored in data set BRACKETS through values of the common key variable, EMPNUM.

A hash object provides a fast, easy way to perform lookups. Data that are loaded into a hash object do not need to be sorted or indexed. Because a hash object is stored in memory, it provides quick retrieval. Using a hash object for this example is a good choice because the lookup data set in this example is small and can fit easily into memory.

The DATA step loads data set LOOKUP into hash object B. It defines EMPNUM in BRACKETS as the single hash key and TAX_BRACKET in BRACKETS as data. For each observation in SALARIES, the DATA step looks for a match in hash object B for the current value of EMPNUM. For those observations without a match in BRACKETS, and for those observations with a match in BRACKETS but no value for TAX_BRACKET, the DATA step assigns a default value to TAX_BRACKET before computing NET_PAY.

Specify attributes for the variables saved in data set NETPAY.

Define a default value for the tax bracket.

On the first iteration of the DATA step, create, name, and specify attributes of the hash object. Create hash object B and load the hash object with data from data set BRACKETS. Specify one key variable. Specify one data variable. Close the definition of hash object B.

Read data set SALARIES one observation at a time. Determine whether the current value of EMPNUM from SALARIES is stored in hash object B. Save the return code of the search results in variable RC. When the hash lookup finds a match in hash object B, SAS copies the data part of the hash object into the variables that are specified by DEFINEDATA, which is TAX_BRACKET. When the hash lookup does not find a match in hash object B, or if the value for TAX_BRACKET that was returned from B is missing, assign a default value to TAX_BRACKET. Compute NET_PAY.

data netpay;
  attrib empnum length=8
         salary length=8 format=dollar10.
         tax_bracket length=8 format=4.2
         net_pay length=8 format=dollar10.;
  retain default_bracket .10;

  drop default_bracket rc;
  if _n_=1 then do;

    declare hash b(dataset:'brackets'),

    b.defineKey('empnum'),
    b.defineData('tax_bracket'),
    b.defineDone();
  end;
  set salaries;

  rc=b.find();





  if rc ne 0 or tax_bracket=. then
           tax_bracket=default_bracket;


  net_pay=salary*(1-tax_bracket);
run;

Related Technique 2

This related technique program also uses a DATA step to look up values. It does a match-merge of data sets SALARIES and BRACKETS by variable EMPNUM. Many of the examples in Chapter 3 used match-merges to combine data sets. Match-merging can also be used as a lookup technique. Here, information is looked up in BRACKETS for each observation in SALARIES.

Because the DATA step does the match-merge by using a BY statement, both input data sets must be sorted or indexed by variable EMPNUM. The DATA step uses the IN= option on SALARIES to keep all the observations in SALARIES whether or not the observation has a match in BRACKETS.

Sort the two input data sets by the variable that will match them.

Create data set NETPAY. Merge the two data sets. Add the IN= option to data set SALARIES so it can be determined when an observation is found in SALARIES. Match the data sets by EMPNUM. Keep all observations from SALARIES.

Assign a default value to TAX_BRACKET for those observations in SALARIES without a match in BRACKETS.

proc sort data=salaries;
  by empnum;
run;
proc sort data=brackets;
  by empnum;
run;
data netpay;
  merge salaries(in=insal) brackets;


    by empnum;
  if insal;
  format net_pay dollar10.;
  if tax_bracket=. then tax_bracket=.


  net_pay=salary*(1-tax_bracket);
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.86.121