Managing Index Usage

Overview

To manage indexes effectively, it is important to know
  • how SAS decides whether to use an index and which index to use
  • how to determine whether SAS is using an index
  • how to control whether SAS uses an index, or which index it uses.

Understanding How SAS Decides Whether to Use an Index

By default, each time you submit a query (or other SAS program) that contains a WHERE expression, SAS decides whether to use an index, or to read all the observations in the data file sequentially. To make this decision, SAS does the following:
  1. Identifies an available index or indexes.
  2. Estimates the number of rows that would be qualified. If multiple indexes are available, SAS selects the index that it estimates returns the smallest subset of rows.
  3. Compares resource usage to decide whether it is more efficient to satisfy the WHERE expression by using the index or by reading all the observations sequentially.
Next, consider how you can find out whether SAS is using an index.

Determining Whether SAS Is Using an Index

After you create an index, it is important to monitor whether the index is being used. If an index is not being used, the costs of maintaining the index might be greater than the benefits, and you should consider dropping (deleting) the index.
By default, when a PROC SQL query or any other program is submitted in SAS, only notes, warnings, and error messages are written to the SAS log. To display additional messages, such as information about indexes that have been defined and that have been used in processing the program, specify the SAS system option MSGLEVEL=I. You specify the MSGLEVEL= option in the OPTIONS statement, before the PROC SQL statement.
General form, MSGLEVEL= option:
OPTIONS MSGLEVEL=N | I;
Here is an explanation of the syntax:
N
displays notes, warnings, and error messages only. This is the default.
I
displays additional notes pertaining to index usage, merge processing, and sort utilities along with standard notes, warnings, and error messages.
Usually, the option MSGLEVEL= is set to I for debugging and testing, and set to N for production jobs.

Example: Query That Uses an Index

Suppose you are writing a PROC SQL query that references the temporary table Marchflights. Earlier in this chapter, a unique composite index named daily was created on the columns FlightNumber and Date in Marchflights. The WHERE expression in your query specifies the key column FlightNumber. To determine whether PROC SQL uses the index daily when your query is processed, you specify MSGLEVEL=I before the query:
options msglevel=i;
proc sql;
   select *
      from marchflights
      where flightnumber='182';
The message in the SAS log shows that the index was used in processing.
Table 6.6 SAS Log
INFO: Index daily selected for WHERE clause optimization.

Example: Query That Does Not Use an Index

Suppose you submit a different query that also references the key column FlightNumber:
proc sql;
   select *
      from marchflights
      where flightnumber in ('182','202'),
In this example, the SAS log shows that the query does not use the index.
Table 6.7 SAS Log
INFO: Index daily not used. Sorting into index order may help.
INFO: Index daily not used. Increasing bufno to 8 may help.
Note: For more information about the BUFSIZE= option, see Using the BUFSIZE= Option.
Note: SAS Version 8 displays informational messages that indicate when an index is used, but does not display messages that indicate when an index is not used.
Tip
Because the OPTIONS statement is global, the settings remain in effect until you modify them or until you end your SAS session. Therefore, you do not need to specify MSGLEVEL=I in this second query or any subsequent queries until you want to change the setting or until your SAS session ends.

Controlling Index Usage

In general, it is recommended that you allow SAS to decide whether to use an index, or which index to use, in processing a PROC SQL query (or other SAS program). However, in some situations, such as testing, you might find it useful to control the use of indexes by SAS.
To control index usage, use the IDXWHERE= and IDXNAME= SAS data set options to override the default settings. You can use either of these options, but you cannot use both options at the same time. As with other SAS data set options, you specify the IDXWHERE= or IDXNAME= option in parentheses after the table name in the FROM clause of a PROC SQL query.

Using IDXWHERE= to Direct SAS to Use or Not to Use an Index

The IDXWHERE= option enables you to override the decision that SAS makes about whether to use an index.
General form, IDXWHERE= option:
IDXWHERE=YES | NO;
Here is an explanation of the syntax:
YES
tells SAS to choose the best index to optimize a WHERE expression, and to disregard the possibility that a sequential search of the table might be more resource-efficient.
NO
tells SAS to ignore all indexes and satisfy the conditions of a WHERE expression with a sequential search of the table.
Note: Use the IDXWHERE=NO option when you know an available index does not optimize WHERE clause processing.

Example

In an earlier example, you used the option MSGLEVEL=I to verify that PROC SQL does use an index to process the following query:
options msglevel=i;     
proc sql;
   select *
      from marchflights
      where flightnumber='182';
To force SAS to ignore the index and to process the rows of the table sequentially, specify IDXWHERE=NO in the query:
proc sql;
   select *
      from marchflights (idxwhere=no)
      where flightnumber='182';
A message in the SAS log indicates that SAS was forced to process the data sequentially.
Table 6.8 SAS Log
INFO: Data set option (IDXWHERE=NO) forced a sequential pass of the data 
rather than use of an index for where-clause processing.

Using IDXNAME= to Direct SAS to Use a Specified Index

The IDXNAME= option directs SAS to use an index that you specify, even if SAS would have selected not to use an index or to use a different index.
General form, IDXNAME= option:
IDXNAME=index-name;
Here is an explanation of the syntax:
index-name
specifies the name of the index that should be used for processing.
SAS uses the specified index if the following conditions are true:
  • The specified index must exist.
  • The specified index must be suitable by having at least its first or only column match a condition in the WHERE expression.
Note: Use the IDXNAME= option when you know the better index so that SAS does not have to do the evaluation.

Example

In an earlier example, a composite index named daily was defined on the columns FlightNumber and Date in the temporary table Marchflights. Suppose you create a second index, a simple index, on the column Date (the secondary key in the composite index) by using the following PROC SQL step:
proc sql;
   create index Date    
      on work.marchflights(Date);
Next, you submit the following query:
proc sql;
   select *
      from marchflights
      where date='01MAR2000'd;
The WHERE clause in this query references the key column Date. By default, SAS decides whether to use an index and, if an index is used, which index to use. The SAS log indicates that, with both a simple index and a composite index defined on Date, PROC SQL used the simple index Date to process the query.
Table 6.9 SAS Log
INFO: Index Date selected for WHERE clause optimization.
Note: This example assumes that the option MSGLEVEL=I, which was specified in the previous example, is still in effect.
You decide that you want to force PROC SQL to use the index daily instead of Date, so you add IDXNAME= to your query:
proc sql;
   select *
      from marchflights (idxname=daily)
      where flightnumber='182';
After this query is submitted, a message in the SAS log indicates that PROC SQL used the index daily:
Table 6.10 SAS Log
INFO: Index daily selected for WHERE clause optimization.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.220.92