Perform a table lookup based on more than one key where the lookup data set is small and not indexed. Subset the observations from the primary data set into one of two output data sets based on specific criteria that are supplied by the lookup data set.
Featured Step | DATA step |
Featured Step Options and Statements | SET statement, NOBS= option Two-dimensional arrays |
Related Technique | Hash object in the DATA step |
A Closer Look | Processing the Two-Dimensional Array WT Understanding Temporary Arrays |
Data set BTEAM contains data on team members' height, weight, and body type.
BTEAM Obs lname gender height weight type 1 Adams M 67 160 2 2 Alexander M 69 115 1 3 Apple M 69 139 1 4 Arthur M 66 125 2 5 Avery M 66 152 2 6 Barefoot M 68 158 2 7 Baucom M 70 170 3 8 Blair M 69 133 1 9 Blalock M 68 148 2 10 Bostic M 74 170 3
Data set IDEAL shows the ideal weight for males based on one of three body types.
IDEAL Obs height small medium large 1 66 126 138 149 2 67 130 141 154 3 68 134 145 158 4 69 138 149 162 5 70 142 153 167 6 71 146 157 172 7 72 150 161 177 8 73 154 165 181 9 74 158 169 185 10 75 162 173 189
Output 4.2a INSHAPE Data SetExample 4.2 INSHAPE Data Set Obs height lname weight type 1 69 Apple 139 1 2 70 Baucom 170 3 3 69 Blair 133 1 4 68 Blalock 148 2 |
Output 4.2b OUTOFSHAPE Data SetExample 4.2 OUTOFSHAPE Data Set Obs height lname weight type 1 67 Adams 160 2 2 69 Alexander 115 1 3 66 Arthur 125 2 4 66 Avery 152 2 5 68 Barefoot 158 2 6 74 Bostic 170 3 |
This example shows you how to look up information for every observation in a data set based on the values of more than one key variable. A DATA step performs the lookup and places the lookup information in a multidimensional array. In this example, an array as a lookup table is a good choice because the lookup data set that fills the array is relatively small and can easily fit into memory. Additionally, the lookup data set is already sorted and organized in the way that is needed to easily fill the array.
The related technique also works just as well in performing this lookup in a small data set.
The goal of the program is to create subsets from the BTEAM data set based on whether a team member's weight is considered to be in shape or out of shape. A DATA step looks up each team member's ideal weight to make the fitness determination based on the team member's height (variable HEIGHT) and body type (variable TYPE). The ideal weights are copied from data set IDEAL to an array that is defined in the DATA step.
Each observation in IDEAL stores the ideal weights for the three body types for a specific height sequentially from 66 inches to 75 inches. The body type variables in IDEAL are SMALL, MEDIUM, and LARGE, which correspond to BTEAM variable TYPE values of 1, 2, and 3, respectively.
The DATA step starts by filling the temporary array WT with all the values from IDEAL. The following "A Closer Look" sections describe characteristics of temporary arrays. The first dimension of the array defines the 10 sequential height values. The second dimension of the array specifies the three body types. Therefore, array WT has a total of 30 elements.
The first IF statement checks that the observation currently being processed is valid for lookup. For valid observations, a second IF statement compares the current value for WEIGHT to the ideal weight for the person's body type and height.
Observations for team members within five pounds of the ideal weight are output to the INSHAPE data set while those not within five pounds of the ideal weight are output to OUTOFSHAPE. For observations that are not valid for lookup, the DATA step writes a message to the SAS log. These observations are not saved in either output data set.
Create INSHAPE and OUTOFSHAPE data sets.
Define a temporary two-dimensional array. Specify the first dimension range from 66 to 75, which corresponds to the heights that were saved in IDEAL. Specify three levels in the second dimension, which correspond to the three body types of small, medium, and large, respectively. On the first iteration of the DATA step, load the WT temporary array with the information that was saved in IDEAL. Execute an iterative DO loop to read each observation in IDEAL. Set the upper index of the DO loop to ALL, which is the total number of observations in IDEAL and is defined by the NOBS= option in the following SET statement. Read data set IDEAL. Assign weight values from IDEAL to the correct array elements.
Read each observation in BTEAM. Process observations that are valid for lookup.
Compare the team member's actual weight to his ideal weight. Find the ideal weight value in the WT array based on the team member's height and body type. Output an observation to either INSHAPE or OUTOFSHAPE based on the results of the comparison.
Write a message to the SAS log if the observation is not valid for lookup.
data inshape outofshape; keep lname height weight type; array wt(66:75,3) _temporary_; if _n_=1 then do; do i=1 to all; set ideal nobs=all; wt(height,1)=small; wt(height,2)=medium; wt(height,3)=large; end; end; set bteam; if gender='M' and (1 le type le 3) and (66 le height le 75) then do; if wt(height,type)-5 le weight le wt(height,type)+5 then output inshape; else output outofshape; end; else putlog 'WARNING: Observation out of lookup range: ' _all_; run;
The DATA step in this related technique creates data sets equivalent to the data sets that were created in the main example. Instead of filling an array with the lookup information, it defines a hash object and loads it with this information. A hash object resides in memory and does not require that you sort the data set before loading it into the hash object.
For each observation in BTEAM that is valid for lookup, the DATA step looks for a match in hash object WT for the current value of HEIGHT. For those observations with a match, assignment statements select the ideal weight based on the team member's body type. IF statements compare the team member's weight to that retrieved from the hash object. If the team member's weight is within five pounds of the ideal weight, the code outputs the observations to INSHAPE. Otherwise, the code outputs the observation to OUTOFSHAPE.
As in the main example, the DATA step writes a message to the SAS log for those observations that are not valid for lookup in IDEAL.
Specify attributes for the variables that are saved in data sets INSHAPE and OUTOFSHAPE.
On the first iteration of the DATA step, create, name, and specify attributes of the hash object. Create hash object WT and load the hash object with data from data set IDEAL. Specify one key variable. Specify three data variables. Close the definition of hash object WT. Prevent SAS notes about uninitialized variables with the CALL MISSING statement. (The SAS compiler does not see variable assignments in the hash object and these three variables do not have explicit assignment statements elsewhere in the DATA step.)
Read data set BTEAM. Process observations valid for lookup.
Look up weight classification information in hash object WT for the current value of HEIGHT from BTEAM. Save the return code of the search results in variable RC. Execute this DO group for matches that are found in hash object WT, which is when the FIND method returns a code of 0. For matches, SAS copies the data part of the hash object into the variables that are specified by DEFINEDATA. Determine the ideal weight based on the value of TYPE.
Output an observation to either INSHAPE or OUTOFSHAPE based the results of the comparison.
For observations valid for lookup, but without a match in the hash object, write a message to the SAS log.
Write a message to the SAS log if the observation is not valid for lookup.
data inshape outofshape; attrib lname length=$10 gender length=$1 height length=8 weight length=8; keep lname gender height weight; if _n_=1 then do; declare hash wt(dataset:'ideal'), wt.defineKey('height'), wt.defineData('small','medium','large'), wt.defineDone(); call missing(small,medium,large); end; set bteam; if 66 le height le 75 and gender='M' and type in (1,2,3) then do; rc=wt.find(); if rc=0 then do; if type=1 then ideal=small; else if type=2 then ideal=medium; else if type=3 then ideal=large; if ideal-5 le weight le ideal+5 then output inshape; else output outofshape; end; else putlog 'WARNING: Height not found in hash object: ' _all_; end; else putlog 'WARNING: Observation out of range: ' _all_; run;
To help visualize the processing in this example, Figure 4.1 represents the two-dimensional array WT, beginning with the lower bound of 66. If you compare it to the IDEAL data set, you can see how it was constructed.
This statement processes the array:
if wt(height,type)-5 le weight le wt(height,type)+5 then output inshape;
On the first iteration of the DATA step, the first observation from BTEAM is processed:
Adams M 67 160 2
The cell in the array that is the intersection of column 67 (HEIGHT) and row 2 (TYPE) contains the weight value 141. The IF-THEN statement processes these values:
if 141-5 le weight le 141+5 then output inshape;
When elements in an array are constants that are needed only during the duration of the DATA step, you can save execution time by using temporary arrays instead of creating variables, as shown in this ARRAY statement:
array wt(66:75,3) _temporary_;
In addition to saving execution time, temporary array elements differ from variables in the following ways:
They are not written to the output data set.
They do not have names and can be referenced only by their array names and dimensions.
They are automatically retained instead of reset to missing at the beginning of each iteration of the DATA step.
18.189.195.34