• With an index on variable SSN, SAS accesses the observation directly. SAS satisfies
the condition using the index and goes straight to the observation that contains the
value without having to read each observation.
You can either create an index when you create a data file or create an index for an
existing data file. The data file can be either compressed or uncompressed. For each data
file, you can create one or multiple indexes. Once an index exists, SAS treats it as part of
the data file. That is, if you add or delete observations or modify values, the index is
automatically updated.
Benefits of an Index
In general, SAS can use an index to improve performance in the following situations:
• For WHERE processing, an index can provide faster and more efficient access to a
subset of data. To process a WHERE expression, SAS by default decides whether to
use an index or to read the data file sequentially.
• For BY processing, an index returns observations in the index order, which is in
ascending value order, without using the SORT procedure even when the data file is
not stored in that order.
Note: If you use the SORT procedure, the index is not used.
• For the SET and MODIFY statements, the KEY= option enables you to specify an
index in a DATA step to retrieve particular observations in a data file.
In addition, an index can benefit other areas of SAS. In SCL (SAS Component
Language), an index improves the performance of table lookup operations. For the SQL
procedure, an index enables the software to process certain classes of queries more
efficiently (for example, join queries). For the SAS/IML software, you can explicitly
specify that an index be used for read, delete, list, or Append operations.
Even though an index can reduce the time required to locate a set of observations,
especially for a large data file, there are costs associated with creating, storing, and
maintaining the index. When deciding whether to create an index, you must consider
increased resource usage, along with the performance improvement.
Note: An index is never used for the subsetting IF statement in a DATA step, or for the
FIND and SEARCH commands in the FSEDIT procedure.
The Index File
The index file is a SAS file that has the same name as its associated data file, and that
has a member type of INDEX. There is only one index file per data file. That is, all
indexes for a data file are stored in a single file.
The index file might be a separate file, or be part of the data file, depending on the
operating environment. In any case, the index file is stored in the same SAS library as its
data file.
The index file consists of entries that are organized hierarchically and connected by
pointers, all of which are maintained by SAS. The lowest level in the index file hierarchy
consists of entries that represent each distinct value for an indexed variable, in ascending
value order. Each entry contains this information:
• a distinct value
Understanding SAS Indexes 639