Creating Indexes in the DATA Step

Overview

To create an index at the same time that you create a data set, use the INDEX= data set option in the DATA statement.
General form, DATA statement with the INDEX= option:
DATA SAS-data-file-name (INDEX=
(index-specification-1</UNIQUE><...index-specification-n>
</UNIQUE>));
Here is an explanation of the syntax:
SAS-data-file-name
is a valid SAS data set name.
index-specification
for a simple index is the name of the key variable.
index-specification
for a composite index is (index-name=(variable-1...variable-n)).
UNIQUE
specifies that values for the key variable must be unique for each observation.
You can create multiple indexes on a single SAS data set. However, keep in mind that creating and storing indexes does use system resources. Therefore, you should create indexes only on variables that are commonly used to select observations.
The UNIQUE option guarantees that values for the key variable or the combination of a composite group of variables remain unique for every observation in the data set. In an existing data set, if the variable or variables on which you attempt to create a unique index have duplicate values, the index is not created. Similarly, if an update tries to add an observation with a duplicate value for the index variable to that data set, the update is rejected.

Examples

The following example creates a simple index on the Simple data set. The index is named Division, and it contains values of the Division variable.
data simple (index=(division));
   set sasuser.empdata;
run;
The following example creates two simple indexes on the Simple2 data set. The first index is named Division, and it contains values of the Division variable. The second index is called EmpID, and it contains unique values of the EmpID variable.
data simple2 (index=(division empid/unique));
   set sasuser.empdata;
run;
The following example creates a composite index on the Composite data set. The index is named Empdiv, and it contains concatenated values of the Division variable and the EmpID variable.
data composite (index=(Empdiv=(division empid)));
   set sasuser.empdata;
run;
When you create or use an index, you might want to verify that it has been created or used correctly. To display information in the SAS log concerning index creation or index usage, set the value of the MSGLEVEL= system option to I.
General form, MSGLEVEL= system option:
OPTIONS MSGLEVEL= N|I;
Here is an explanation of the syntax:
N
prints notes, warnings, and error messages only. This is the default.
I
prints additional notes or INFO messages pertaining to index usage, merge processing, and sort utilities along with standard notes, warnings, and error messages.

Example

The following code sets the MSGLEVEL= system option to I and creates the Sasuser.Sale2000 data set with two indexes:
options msglevel=i;
data sasuser.sale2000(index=(origin
     flightdate=(flightid date)/unique));
   infile sale2000 dsd;
   input FlightID $ RouteID $ Origin $
         Dest $ Cap1st CapBusiness
         CapEcon CapTotal CapCargo
         Date Psgr1st PsgrBusiness
         PsgrEcon Rev1st RevBusiness
         RevEcon SaleMon $ CargoWgt
         RevCargo;
   format date date9.;
run;
Here are the messages that are written to the SAS log when the program above is submitted.
Table 13.1 SAS Log
NOTE: The infile SALE2000 is:
      File Name=C:My SAS Files9.0sale2000.dat,
      RECFM=V,LRECL=256

NOTE: 153 records were read from the infile SALE2000.
      The minimum record length was 82.
      The maximum record length was 100.
NOTE: The data set SASUSER.SALE2000 has 153 observations
      and 19 variables.
NOTE: Composite index flightdate has been defined.
NOTE: Simple index origin has been defined.
NOTE: DATA statement used (Total process time):
      real time           1.08 seconds
      cpu time            0.04 seconds

Determining Whether SAS Used an Index

It is not always possible or more efficient for SAS to use an existing index to access specific observations directly. An index is not used in these circumstances:
  • with a subsetting IF statement in a DATA step
  • with particular WHERE expressions
  • if SAS determines it is more efficient to read the data sequentially

Example

You can use the MSGLEVEL= option to determine whether SAS used an index. The following SAS logs show examples of the INFO messages that indicate whether an index was used.
Table 13.2 SAS Log
6  options msglevel=i;
7
8  proc print data=sasuser.revenue;
9     where flightid ne 'IA11200';
INFO: Index FlightID not used. Increasing bufno to 3 may help.
Table 13.3 SAS Log
11  options msglevel=i;
12
13  data somflights;
14     set sasuser.revenue;
15     where flightid > 'IA11200';
INFO: Index FlightID selected for WHERE clause optimization.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.182.107