Using Hash Objects as Lookup Tables

Overview

Beginning with SAS 9, the hash object is available for use in a DATA step. The hash object provides an efficient, convenient mechanism for quick data storage and retrieval.
Unlike an array, which uses a series of consecutive integers to address array elements, a hash object can use any combination of numeric and character values as addresses. Also unlike arrays, which return only a single value, hash objects can return multiple values from a given lookup. A hash object can be loaded from hardcoded values or a SAS data set, is sized dynamically, and exists for the duration of the DATA step.
The hash object is a good choice for lookups that use unordered data that can fit into memory because it provides in-memory storage and retrieval and does not require the data to be sorted or indexed.
SAS Data Set, Hash Objects

The Structure of a Hash Object

When a lookup depends on character key values, you can use the hash object. A hash object resembles a table with rows and columns and contains a key component and a data component.
The key component has these characteristics:
  • might consist of numeric and character values
  • maps key values to data rows
  • must be unique
  • can be a composite
The data component has these characteristics:
  • can contain multiple data values per key value
  • can consist of numeric and character values

Example

Suppose you have a data set named Sasuser.Contrib that lists the quarterly contributions to a retirement fund. You can use the hash object to calculate the difference between the actual contribution and the goal amount for each quarter.
SAS Data Set Sasuser. Contrib
The following program creates a hash object that stores the quarterly goal for employee contributions to the retirement fund. To calculate the difference between actual contribution and the goal amount, the program retrieves the goal amount from the hash object based on the key values.
data work.difference (drop= goalamount);
   if _N_ = 1 then do;
      declare hash goal( );
      goal.definekey("QtrNum");
      goal.definedata("GoalAmount");
      goal.definedone( ); 
      call missing(goalamount);
      goal.add(key:'qtr1', data:10 );
      goal.add(key:'qtr2', data:15 );
      goal.add(key:'qtr3', data: 5 );
      goal.add(key:'qtr4', data:15 );
  end;
  set sasuser.contrib;
  goal.find();
  Diff = amount - goalamount;
run;
We will see how the hash object is set up.

DATA Step Component Objects

The hash object is a DATA step component object. Component objects are data elements that consist of attributes and methods. Attributes are the properties that specify the information that is associated with an object. Methods define the operations that an object can perform.
To use a DATA step component object in your SAS program, you must first declare and create (instantiate) the object.

Declaring the Hash Object

You declare a hash object using the DECLARE statement.
General form, DECLARE statement:
DECLARE object-name <(<argument_tag-1: value-1<, ...argument_tag-n: value-n>>)>;
object
specifies the component object.
object-name
specifies the name for the component object.
arg_tag
specifies the information that is used to create an instance of the component object.
value
specifies the value for an argument tag. Valid values depend on the component object.
Valid values for object are as follows:
  • hash indicates a hash object.
  • hiter indicates a hash iterator object.
Note: The hash iterator object retrieves data from the hash object in ascending or descending key order.
The following DECLARE statement creates a hash object named Goal.
data work.difference (drop= goalamount);
   if _N_ = 1 then do;
      declare hash goal();
At this point, you have declared a hash object named Goal.
Note: The DECLARE statement is an executable statement.

Defining Keys and Data

Remember that the hash object uses lookup keys to store and retrieve data. The keys and the data are DATA step variables that you use to initialize the hash object by using object dot notation method calls.
General form, object dot notation method calls:
object.method(<argument_tag-1: value-1<, ...argument_tag-n: value-n>>);
Here is an explanation of the syntax:
object
specifies the name for the DATA step component object.
method
specifies the name of the method to invoke.
argument-tag
identifies the arguments that are passed to the method.
value
specifies the argument value.
A key component is defined by passing the key variable name to the DEFINEKEY method. A data component is defined by passing the data variable name to the DEFINEDATA method. When all key and data components are defined, the DEFINEDONE method is called. Keys and data can consist of any number of character or numeric DATA step variables.
The following code initializes the key variable QtrNum and the data variable GoalAmount.
data work.difference (drop= goalamount);
   length goalamount 8;
   if _N_ = 1 then do;
      declare hash goal();
      goal.definekey ("QtrNum");
      goal.definedata ("GoalAmount");
      goal.definedone();

Using the Call Missing Routine

Since the variable GOALAMOUNT is not in an input data set and does not appear on the left side of an equal sign in an assignment statement, SAS issues a note stating that this variable is not initialized.
To avoid receiving these notes, use the CALL MISSING routine with the key and data variables as parameters. The CALL MISSING routine assigns a missing value to the specified character or numeric variables.
data Work.Difference (drop= goalamount);
if _N_ = 1 then do; declare hash goal();
   goal.definekey("QtrNum");
   goal.definedata("GoalAmount");
   goal.definedone();
   call missing(goalamount);

Loading Key and Data Values

So far, you have declared and instantiated the hash object, and initialized the hash object's key and data variables. You are now ready to populate the hash object using the ADD method. The following code uses the ADD method to load the key values qtr1, qtr2, qtr3, and qtr4 and the corresponding data values 10, 15, 5, and 15 into the hash object.
data work.difference (drop= goalamount);
if _N_ = 1 then do;
   declare hash goal( );
   goal.definekey("QtrNum");
   goal.definedata("GoalAmount");
   goal.definedone( ); 
   call missing(goalamount);
   goal.add(key:'qtr1', data:10 );
   goal.add(key:'qtr2', data:15 );
   goal.add(key:'qtr3', data: 5 );
   goal.add(key:'qtr4', data:15 );
end;

Retrieving Matching Data

Use the FIND method to retrieve matching data from the hash object. The FIND method generates a numeric return code that indicates whether the key is found in the hash object. A return code of zero indicates a successful find. A nonzero value indicates a find failure. If the key is in the hash object, then the FIND method also sets the data variable to the value of the data item so that it is available for use after the method call.
data work.difference (drop= goalamount);
   if _N_ = 1 then do;
      declare hash goal( );
      goal.definekey("QtrNum");
      goal.definedata("GoalAmount");
      goal.definedone( ); 
      call missing(goalamount);
      goal.add(key:'qtr1', data:10 );
      goal.add(key:'qtr2', data:15 );
      goal.add(key:'qtr3', data: 5 );
      goal.add(key:'qtr4', data:15 );
   end;
   set sasuser.contrib;
   rc=goal.find();
   Diff = amount - goalamount;
run;

Hash Object Processing

We will consider what happens when the program is submitted for execution.
data Work.Difference (drop= goalamount);
   if _N_ = 1 then do;
      declare hash goal( );
      goal.definekey("QtrNum");
      goal.definedata("GoalAmount");
      goal.definedone( ); 
      call missing(goalamount);
      goal.add(key:'qtr1', data:10 );
      goal.add(key:'qtr2', data:15 );
      goal.add(key:'qtr3', data: 5 );
      goal.add(key:'qtr4', data:15 );
   end;
   set sasuser.contrib;
   rc=goal.find();
   Diff = amount - goalamount;
run;
The program executes until the SET statement encounters end of file on sasuser.contrib. PROC PRINT output shows the completed data set.
proc print data=work.difference;
run;
proc print output

Creating a Hash Object from a SAS Data Set

Suppose you need to create a report that shows revenue, expenses, profits, and airport information. You have two data sets that contain portions of the required data. The SAS data set Sasuser.Revenue contains flight revenue data. The SAS data set Sasuser.Acities contains airport data, including the airport code, location, and name.
Table 16.27 SAS Data Set Sasuser.Revenue (first five observations)
Origin
Dest
FlightID
Date
Rev1st
RevBusiness
RevEcon
ANC
RDU
IA03400
02DEC1999
15829
28420
68688
ANC
RDU
IA03400
14DEC1999
20146
26460
72981
ANC
RDU
IA03400
26DEC1999
20146
23520
59625
ANC
RDU
IA03401
09DEC1999
15829
22540
58671
ANC
RDU
IA03401
21DEC1999
20146
22540
65826
Table 16.28 SAS Data Set Sasuser.Acities (first five observations)
City
Code
Name
Country
Auckland
AKL
International
New Zealand
Amsterdam
AMS
Schiphol
Netherlands
Anchorage, AK
ANC
Anchorage International Airport
USA
Stockholm
ARN
Arlanda
Sweden
Athens (Athinai)
ATH
Hellinikon International Airport
Greece
To create the report, you can use a hash object to retrieve matching airport data from Sasuser.Acities.
In the following code, the DECLARE statement creates the Airports hash object and loads it from Sasuser.Acities.
data work.report;
   if _N_=1 then do;
      declare hash airports (dataset: "sasuser.acities");

Using a Non-Executing SET Statement

To initialize the attributes of hash variables that originate from an existing SAS data set, you can use a non-executing SET statement.
Because the IF condition is false during execution, the SET statement is compiled but not executed. The PDV is created with the variables Code, City, and Name from Sasuser.Acities.
data work.report;
   if _N_=1 then do;
      if 0 then
         set sasuser.acities (keep=Code City Name);
When you use this technique, CALL MISSING is not required.

Working with Multiple Data Variables

The hash object that you worked with earlier in this chapter contains one key variable and one data variable. In this example, you need to associate more than one data value with each key.
In the following code, the DECLARE statement creates the Airports hash object and loads it from Sasuser.Acities. The DEFINEKEY method call defines the key to be the value of the variable Code. The DEFINEDATA method call defines the data to be the values of the variables City and Name.
data work.report;
      if 0 then
         set sasuser.acities (keep=Code City Name);
   if _N_=1 then do;
   declare hash airports (dataset: "sasuser.acities")
   airports.definekey ("Code");
   airports.definedata ("City", "Name");
   airports.definedone();
end;
Table 16.29 Hash Object Airports
Key: Code
Data: City
Data: Name
ANC
Anchorage, AK
Anchorage International Airport
BNA
Nashville, TN
Nashville International Airport
CDG
Paris
Charles de Gaulle
LAX
Los Angeles, CA
Los Angeles International Airport
RDU
Raleigh-Durham, NC
Raleigh-Durham International Airport
Note: To define all data set variables as data variables for the hash object, use the ALL: “YES” option. Here is an example:
hashobject.DEFINEDATA (ALL:“YES”);
Note: The hash object can store multiple key variables as well as multiple data variables.

Retrieving Multiple Data Values

You can use the FIND method multiple times in order to retrieve multiple data values. In the following program, the FIND method retrieves the values of City and Name from the Airports hash object based on the value of Origin.
data work.report;
   if _N_=1 then do;
        if 0 then set sasuser.acities (keep=Code City Name);
      declare hash airports (dataset: "sasuser.acities");
      airports.definekey ("Code");
      airports.definedata ("City", "Name");
      airports.definedone();
   end;
 set sasuser.revenue;
 rc=airports.find(key:origin);
 OriginCity=city;
 OriginAirport=name;
 rc=airports.find(key:dest);
DestCity=city;
DestAirport=name;
run;
PROC PRINT output shows the completed data set.
proc print data=work.report;
   var origin dest flightid date origincity originairport
      destcity destairport;
run;
Figure 16.4 Partial Output (first five observations of selected variables)
Partial Output (first five observations of selected variables)

Using Return Codes with the FIND Method

Remember that method calls generate a numeric return code that indicates whether the method succeeded or failed. A value of 0 indicates that the method succeeded. A nonzero value indicates that the method failed.
To store the value of the return code in a variable, specify the variable name RC at the beginning of the method call. Here is an example:
     rc=hashobject.find (key:keyvalue);
The return code variable can be used in conditional logic to ensure that the FIND method found a KEY value in the hash object that matches the KEY value from the PDV.

Example

Error messages are written to the log when the following program is submitted.
data work.report;
   if _N_=1 then do;
  	     if 0 then set sasuser.acities (keep=Code City Name);
      declare hash airports (dataset: "sasuser.acities");
      airports.definekey ("Code");
      airports.definedata ("City", "Name");
      airports.definedone();
   end;
set sasuser.revenue;
airports.find(key:origin);
OriginCity=city;
OriginAirport=name;
airports.find(key:dest);
DestCity=city;
DestAirport=name;
run;
Table 16.30 SAS Log
NOTE: There were 50 observations read from the data set SASUSER.ACITIES.
ERROR: Key not found.
ERROR: Key not found.
ERROR: Key not found.
ERROR: Key not found.
ERROR: Key not found.
ERROR: Key not found.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 142 observations read from the data set SASUSER.REVENUE.
WARNING: The data set WORK.REPORT1 may be incomplete. When this step was 
         stopped there were 142 observations and 14 variables.
A closer examination of the output shows that the data set Work.Report contains errors. For example, notice that in observations 6 through 8 the value of both OriginCity and DestCity is Canberra, Australian C and the values of OriginAirport and DestAirport are missing.
The errors occur because the Airports hash object does not include the key value WLG or a corresponding Name value for the key value CBR.
Figure 16.5 SAS Data Set Work.Report (observations 6 through 8 of selected variables)
SAS Data Set Work.Report (observations 6 through 8 of selected variables)
Conditional logic can be added to the program to create blank values if the values loaded from the input data set, Sasuser.Revenue, cannot be found in the Airports hash object:
  • If the return code for the FIND method call has a value of 0, indicating that the method succeeded, the values of City and Name are assigned to the appropriate variables (OriginCity and OriginAirport or DestCity and DestAirport).
  • If the return code for the FIND method call has a nonzero value, indicating the method failed, the values of City and Name are assigned blank values.
data work.report;
   if _N_=1 then do;
  	     if 0 then set sasuser.acities(keep=Code City Name);
      declare hash airports (dataset: "sasuser.acities");
      airports.definekey ("Code");
      airports.definedata ("City", "Name");
      airports.definedone();
end;
set sasuser.revenue;
rc=airports.find(key:origin);
if rc=0 then do;
    OriginCity=city;
    OriginAirport=name;
end;
else do;
   OriginCity='';
   OriginAirport='';
end;
rc=airports.find(key:dest);
if rc=0 then do;
   DestCity=city;
   DestAirport=name;
end;
else do;
   DestCity='';
   DestAirport='';
end;
run;
PROC PRINT output shows the completed data set. Notice that in observations 6 through 8, the value of DestCity is now blank and no error messages appear in the log.
proc print data=work.report;
   var origin dest flightid date origincity originairport
      destcity destairport;
run;
Figure 16.6 SAS Data Set Work.Report (first eight observations of selected variables)
SAS Data Set Work.Report (first eight observations of selected variables)
Table 16.31 SAS Log
NOTE: There were 50 observations read from the data set SASUSER.ACITIES.
NOTE: There were 142 observations read from the data set SASUSER.REVENUE.
NOTE: The data set WORK.REPORT2 has 142 observations and 15 variables.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.194.106