Example 4.2 Performing a Table Lookup in a Small Lookup Data Set

Goal

Perform a table lookup based on more than one key where the lookup data set is small and not indexed. Subset the observations from the primary data set into one of two output data sets based on specific criteria that are supplied by the lookup data set.

Example Features

Featured StepDATA step
Featured Step Options and StatementsSET statement, NOBS= option Two-dimensional arrays
Related TechniqueHash object in the DATA step
A Closer LookProcessing the Two-Dimensional Array WT Understanding Temporary Arrays

Input Data Sets

Data set BTEAM contains data on team members' height, weight, and body type.

                          BTEAM

   Obs    lname        gender    height    weight     type

     1    Adams          M         67        160        2
     2    Alexander      M         69        115        1
     3    Apple          M         69        139        1
     4    Arthur         M         66        125        2
     5    Avery          M         66        152        2
     6    Barefoot       M         68        158        2
     7    Baucom         M         70        170        3
     8    Blair          M         69        133        1
     9    Blalock        M         68        148        2
    10    Bostic         M         74        170        3

Data set IDEAL shows the ideal weight for males based on one of three body types.

                  IDEAL

Obs    height    small    medium    large

  1      66       126       138      149
  2      67       130       141      154
  3      68       134       145      158
  4      69       138       149      162
  5      70       142       153      167
  6      71       146       157      172
  7      72       150       161      177
  8      73       154       165      181
  9      74       158       169      185
 10      75       162       173      189

Resulting Data Sets

Output 4.2a INSHAPE Data Set

       Example 4.2 INSHAPE Data Set

Obs    height     lname     weight    type

 1       69      Apple        139       1
 2       70      Baucom       170       3
 3       69      Blair        133       1
 4       68      Blalock      148       2


Output 4.2b OUTOFSHAPE Data Set

      Example 4.2 OUTOFSHAPE Data Set

Obs    height    lname        weight    type

 1       67      Adams         160        2
 2       69      Alexander     115        1
 3       66      Arthur        125        2
 4       66      Avery         152        2
 5       68      Barefoot      158        2
 6       74      Bostic        170        3


Example Overview

This example shows you how to look up information for every observation in a data set based on the values of more than one key variable. A DATA step performs the lookup and places the lookup information in a multidimensional array. In this example, an array as a lookup table is a good choice because the lookup data set that fills the array is relatively small and can easily fit into memory. Additionally, the lookup data set is already sorted and organized in the way that is needed to easily fill the array.

The related technique also works just as well in performing this lookup in a small data set.

The goal of the program is to create subsets from the BTEAM data set based on whether a team member's weight is considered to be in shape or out of shape. A DATA step looks up each team member's ideal weight to make the fitness determination based on the team member's height (variable HEIGHT) and body type (variable TYPE). The ideal weights are copied from data set IDEAL to an array that is defined in the DATA step.

Each observation in IDEAL stores the ideal weights for the three body types for a specific height sequentially from 66 inches to 75 inches. The body type variables in IDEAL are SMALL, MEDIUM, and LARGE, which correspond to BTEAM variable TYPE values of 1, 2, and 3, respectively.

The DATA step starts by filling the temporary array WT with all the values from IDEAL. The following "A Closer Look" sections describe characteristics of temporary arrays. The first dimension of the array defines the 10 sequential height values. The second dimension of the array specifies the three body types. Therefore, array WT has a total of 30 elements.

The first IF statement checks that the observation currently being processed is valid for lookup. For valid observations, a second IF statement compares the current value for WEIGHT to the ideal weight for the person's body type and height.

Observations for team members within five pounds of the ideal weight are output to the INSHAPE data set while those not within five pounds of the ideal weight are output to OUTOFSHAPE. For observations that are not valid for lookup, the DATA step writes a message to the SAS log. These observations are not saved in either output data set.

Program

Create INSHAPE and OUTOFSHAPE data sets.

Define a temporary two-dimensional array. Specify the first dimension range from 66 to 75, which corresponds to the heights that were saved in IDEAL. Specify three levels in the second dimension, which correspond to the three body types of small, medium, and large, respectively. On the first iteration of the DATA step, load the WT temporary array with the information that was saved in IDEAL. Execute an iterative DO loop to read each observation in IDEAL. Set the upper index of the DO loop to ALL, which is the total number of observations in IDEAL and is defined by the NOBS= option in the following SET statement. Read data set IDEAL. Assign weight values from IDEAL to the correct array elements.

Read each observation in BTEAM. Process observations that are valid for lookup.

Compare the team member's actual weight to his ideal weight. Find the ideal weight value in the WT array based on the team member's height and body type. Output an observation to either INSHAPE or OUTOFSHAPE based on the results of the comparison.

Write a message to the SAS log if the observation is not valid for lookup.

data inshape outofshape;

  keep lname height weight type;
  array wt(66:75,3) _temporary_;


  if _n_=1 then do;


    do i=1 to all;





       set ideal nobs=all;
       wt(height,1)=small;
       wt(height,2)=medium;
       wt(height,3)=large;
    end;
  end;
  set bteam;
  if gender='M' and
      (1 le type le 3) and
     (66 le height le 75) then do;
    if wt(height,type)-5 le weight le wt(height,type)+5
                              then output inshape;
    else output outofshape;



  end;
  else putlog
         'WARNING: Observation out of lookup range: '
                              _all_;
run;

Related Technique

The DATA step in this related technique creates data sets equivalent to the data sets that were created in the main example. Instead of filling an array with the lookup information, it defines a hash object and loads it with this information. A hash object resides in memory and does not require that you sort the data set before loading it into the hash object.

For each observation in BTEAM that is valid for lookup, the DATA step looks for a match in hash object WT for the current value of HEIGHT. For those observations with a match, assignment statements select the ideal weight based on the team member's body type. IF statements compare the team member's weight to that retrieved from the hash object. If the team member's weight is within five pounds of the ideal weight, the code outputs the observations to INSHAPE. Otherwise, the code outputs the observation to OUTOFSHAPE.

As in the main example, the DATA step writes a message to the SAS log for those observations that are not valid for lookup in IDEAL.

Specify attributes for the variables that are saved in data sets INSHAPE and OUTOFSHAPE.

On the first iteration of the DATA step, create, name, and specify attributes of the hash object. Create hash object WT and load the hash object with data from data set IDEAL. Specify one key variable. Specify three data variables. Close the definition of hash object WT. Prevent SAS notes about uninitialized variables with the CALL MISSING statement. (The SAS compiler does not see variable assignments in the hash object and these three variables do not have explicit assignment statements elsewhere in the DATA step.)

Read data set BTEAM. Process observations valid for lookup.

Look up weight classification information in hash object WT for the current value of HEIGHT from BTEAM. Save the return code of the search results in variable RC. Execute this DO group for matches that are found in hash object WT, which is when the FIND method returns a code of 0. For matches, SAS copies the data part of the hash object into the variables that are specified by DEFINEDATA. Determine the ideal weight based on the value of TYPE.

Output an observation to either INSHAPE or OUTOFSHAPE based the results of the comparison.

For observations valid for lookup, but without a match in the hash object, write a message to the SAS log.

Write a message to the SAS log if the observation is not valid for lookup.

data inshape outofshape;
  attrib lname length=$10
         gender length=$1
         height length=8
         weight length=8;
  keep lname gender height weight;
  if _n_=1 then do;


    declare hash wt(dataset:'ideal'),

     wt.defineKey('height'),
     wt.defineData('small','medium','large'),
     wt.defineDone();
     call missing(small,medium,large);





  end;
  set bteam;
  if 66 le height le 75 and gender='M' and
           type in (1,2,3) then do;
    rc=wt.find();



     if rc=0 then do;



      if type=1 then ideal=small;
      else if type=2 then ideal=medium;
      else if type=3 then ideal=large;
      if ideal-5 le weight le ideal+5 then
            output inshape;
      else output outofshape;
    end;
    else putlog 'WARNING: Height not found in hash
                              object: ' _all_;

  end;
  else putlog 'WARNING: Observation out of range: '
                             _all_;

run;

A Closer Look

Processing the Two-Dimensional Array WT

To help visualize the processing in this example, Figure 4.1 represents the two-dimensional array WT, beginning with the lower bound of 66. If you compare it to the IDEAL data set, you can see how it was constructed.

Figure 4.1. Representation of Two-Dimensional Array WT

This statement processes the array:

if wt(height,type)-5 le weight le wt(height,type)+5 then
        output inshape;

On the first iteration of the DATA step, the first observation from BTEAM is processed:

Adams M 67 160 2

The cell in the array that is the intersection of column 67 (HEIGHT) and row 2 (TYPE) contains the weight value 141. The IF-THEN statement processes these values:

if 141-5 le weight le 141+5 then
        output inshape;

Understanding Temporary Arrays

When elements in an array are constants that are needed only during the duration of the DATA step, you can save execution time by using temporary arrays instead of creating variables, as shown in this ARRAY statement:

array wt(66:75,3) _temporary_;

In addition to saving execution time, temporary array elements differ from variables in the following ways:

  • They are not written to the output data set.

  • They do not have names and can be referenced only by their array names and dimensions.

  • They are automatically retained instead of reset to missing at the beginning of each iteration of the DATA step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.195.34