Summary

Text Summary

Introduction

Sometimes, you need to combine data from two or more data sets into a single observation in a new data set according to the values of a common variable. When data sources do not have a common structure, you can use a lookup table to match them.

Using Multidimensional Arrays

When a lookup operation depends on more than one ordinal numeric key, you can use a multidimensional array. Use an ARRAY statement to create an array. The ARRAY statement defines a set of elements that you plan to process as a group.

Using Stored Array Values

In many cases, you might prefer to load an array with values that are stored in a SAS data set rather than loading them in an ARRAY statement. Lookup tables should be stored in a SAS data set when the following conditions are true:
  • there are too many values to initialize easily in the array
  • the values change frequently
  • the same values are used in many programs
The first step in loading an array from a data set is to create an array to hold the values from the source data set. The next step is to load the array elements. One method for accomplishing this task is to load the array within a DO loop. The last step is to read the base data set.

Using PROC TRANSPOSE

The TRANSPOSE procedure can be used to prepare data when the orientation of the data sets differs. PROC TRANSPOSE creates an output data set by restructuring the values in a SAS data set, thereby transposing selected variables into observations. The transposed (output) data set can then be merged with another data set in order to match the data.
The output data set contains several default variables.
  • _NAME_ is the default name of the variable that PROC TRANSPOSE creates to identify the source of the values in each observation in the output data set. This variable is a character variable whose values are the names of the variables that are transposed from the input data set. To override the default name, use the NAME= option.
  • The remaining transposed variables are named COL1...COLn by default. To override the default names, use the PREFIX= option.

Merging the Transposed Data Set

You might need to use a BY statement with PROC TRANSPOSE in order to correctly structure the data for a merge. For each BY group, PROC TRANSPOSE creates one observation for each variable that it transposes. The BY variable itself is not transposed. In order to structure the data for a merge, you might also need to sort the output data set. Any other source data sets might need to be reorganized and sorted as well. When the data is structured correctly, the data sets can be merged.

Using Hash Objects as Lookup Tables

Beginning with SAS 9, the hash object is available for use in a DATA step. The hash object provides an efficient, convenient mechanism for quick data storage and retrieval.
A hash object resembles a table with rows and columns; it contains a key component and a data component. Unlike an array, which uses a series of consecutive integers to address array elements, a hash object can use any combination of numeric and character values as addresses.
The hash object is a DATA step component object. Component objects are data elements that consist of attributes and methods. To use a DATA step component object in your SAS program, you must first declare and create (instantiate) the object. After you declare the hash object's key and data components, you can populate the hash object from hardcoded values or a SAS data set.
Use the FIND method call return code that is a numeric value. The value specifies whether the method succeeded or failed. A value of 0 indicates that the method succeeded. A nonzero value indicates that the method failed. The return code variable can be used in conditional logic to ensure that the FIND method found a KEY value in the hash object that matches the KEY value from the PDV.

Sample Programs

Using a Multidimensional Array

data work.wndchill (drop = column row);
  array WC{4,2} _temporary_ 
     (-22,-16,-28,-22,-32,-26,-35,-29);
  set sasuser.flights;
  row = round(wspeed,5)/5;
  column = (round(temp,5)/5)+3;
  WindChill= wc{row,column};
run;

Using Stored Array Values

data work.lookup1;
   array Targets{1997:1999,12} _temporary_; 
   if _n_=1 then do i= 1 to 3;
      set sasuser.ctargets;
      array Mnth{*} Jan--Dec;
      do j=1 to dim(mnth);
         targets{year,j}=mnth{j};
      end;
   end;
   set sasuser.monthsum(keep=salemon revcargo monthno);
   year=input(substr(salemon,4),4.); 
   Ctarget=targets{year,monthno}; 
   format ctarget dollar15.2; 
run;

Using PROC TRANSPOSE and a Merge

 proc transpose data=sasuser.ctargets 
      out=work.ctarget2
      name=Month
      prefix=Ctarget;
    by year;
 run;   
  
proc sort data=work.ctarget2;
   by year month;
run; 

data work.mnthsum2;
   set sasuser.monthsum(keep=SaleMon RevCargo);
   length Month $ 8;
   Year=input(substr(SaleMon,4),4.);
   Month=substr(SaleMon,1,1)
         ||lowcase(substr(SaleMon,2,2));
run; 
   
proc sort data=work.mnthsum2;
   by year month;
run;

data work.merged;
   merge work.mnthsum2 work.ctarget2;
   by year month;
run;

Loading a Hash Object from Hardcoded Values

data work.difference (drop= goalamount);
   if _N_ = 1 then do;
      declare hash goal( );
      goal.definekey("QtrNum");
      goal.definedata("GoalAmount");
      goal.definedone( ); 
      call missing(goalamount);
      goal.add(key:'qtr1', data:10 );
      goal.add(key:'qtr2', data:15 );
      goal.add(key:'qtr3', data: 5 );
      goal.add(key:'qtr4', data:15 );
  end;
  set sasuser.contrib;
  rc=goal.find();
  Diff = amount - goalamount;
run;

Loading a Hash Object from a SAS Data Set

data work.report;
   if _N_=1 then do;
 	      if 0 then set sasuser.acities(keep=Code City Name);
     declare hash airports (dataset: "sasuser.acities");
     airports.definekey ("Code");
     airports.definedata ("City", "Name");
     airports.definedone();
end;
set sasuser.revenue;
rc=airports.find(key:origin);
if rc=0 then do;
   OriginCity=city;
   OriginAirport=name;
end;
else do;
   OriginCity='';
   OriginAirport='';
end;
rc=airports.find(key:dest);
if rc=0 then do;
   DestCity=city;
   DestAirport=name;
end;
else do;
   DestCity='';
   DestAirport='';
end;
run;

Points to Remember

  • The name of an array must be a SAS name that is not the name of a SAS function or SAS variable in the same DATA step.
  • Array elements must be either all numeric or all character.
  • The initial values specified for an array can be numeric values or character strings. You must enclose all character strings in quotation marks.
  • The input SAS data set must be sorted or indexed before using a BY statement with PROC TRANSPOSE unless you use the NOTSORTED option.
  • The hash object is a good choice for lookups using unordered data that can fit into memory because it provides in-memory storage and retrieval and does not require the data to be sorted.
  • The hash object is sized dynamically, and exists for the duration of the DATA step.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.12.186