Identifying Available Indexes

Overview

The first step for SAS is to determine whether there are any existing indexes that might be used to process the WHERE expression. Specifically, SAS checks the variable in each condition in the WHERE expression to determine whether the variable is a key variable in an index.
index on Order_Date
SAS can use either a simple index or a composite index to optimize a WHERE expression. To be considered for use in optimizing a single WHERE condition, one of the following requirements must be met:
  • the variable in the WHERE condition is the key variable in a simple index
  • the variable in the WHERE condition is the first key variable in a composite index
SAS identifies all indexes that are defined on any variable in the WHERE expression. However, no matter how many indexes are available, SAS can use only one index to process a WHERE expression. So, if multiple indexes are available, SAS must choose between them.
When SAS looks for available indexes, there are three possible outcomes:
Index
Outcome
There is no index defined on any variables in the WHERE expression.
SAS does not continue with the decision process. SAS must use sequential access to process the WHERE expression.
There is one available index that is defined on one or more variables in the WHERE expression.
SAS continues with the decision process and determines whether using the available index is more efficient than using sequential access.
There are multiple available indexes. Each is defined on one or more of the variables in the WHERE expression.
SAS continues with the decision process. SAS must choose between the available indexes in the next few steps. SAS tries to select the index that satisfies the most conditions and that selects the smallest subset of observations.
Note: If a program specifies both a WHERE expression and a BY statement, SAS looks for one index that satisfies conditions for both. If such an index is not found, the BY statement takes precedence so that SAS can ensure that the data is returned in sorted order. With a BY statement, SAS cannot use an index to optimize a WHERE expression if the optimization invalidates the BY order.

Example: Identifying One Available Index

Suppose you submit a program that contains the following WHERE statement, and suppose that the data set has one index, as shown below:
WHERE Statement
Available Index
where delivery_date='02jul2000'd
simple index defined on Delivery_Date
This WHERE expression has one condition, and the variable in that condition (Delivery_Date) is the key variable in the simple index. If all other requirements for optimization are met in later steps, then SAS can use this index to optimize the WHERE expression.
Likewise, if the only available index is a composite index in which Delivery_Date is the first key variable, then SAS can use the index if all other requirements for optimization are met.
Even if a WHERE statement has multiple conditions, SAS can use either a simple index or a composite index to optimize just one of the conditions. For example, suppose your program contains a WHERE statement that has two conditions, and suppose that the data set has one index, as shown below:
WHERE Statement
Available Index
where order_date='01jan2000'd and 
delivery_date='02jul2000'd';
simple index defined on Delivery_Date
Assuming that all other requirements for optimization are met, SAS can use this index to optimize the second condition in this WHERE expression.

Example: Identifying Multiple Available Indexes

Suppose your program contains a WHERE statement with two conditions, and suppose that each condition references a key variable in a different index, as shown below:
WHERE Statement
Available Index
where order_date='01jan2000'd and 
delivery_date='02jul2000'd';
  • simple index defined on Order_Date
  • simple index defined on Delivery_Date.
Although two indexes are available, SAS can use only one index to optimize a WHERE statement. In a later step of the process, SAS tries to select the index that satisfies the most conditions and that selects the smallest subset of observations.

Compound Optimization

SAS usually uses an index to process just one condition, no matter how many conditions and variables a WHERE expression contains. However, in a process called compound optimization, SAS can use a composite index to optimize multiple conditions on multiple variables, which are joined with a logical operator such as AND. Constructing your WHERE expression to take advantage of multiple key variables in a single index can greatly improve performance.
In order for compound optimization to occur, at least the first two key variables in the composite index must be used in the WHERE conditions. Later in this chapter, you learn about other requirements that must be met in order for compound optimization to occur.
Note: The WHERE expression can also contain non-indexed variables, and the key variables and non-indexed variables can appear in any order in the expression.

Example: Composite Index That Can Be Used to Optimize Multiple Conditions

Suppose your program contains a WHERE statement that has two conditions, and suppose that each condition references one of the first two key variables in a composite index:
WHERE Statement
Available Index
where order_date='01jan2000'd and 
delivery_date='02jul2000'd';
composite index defined on the following variables:
  • Order_Date (first key variable)
  • Delivery_Date (second key variable)
  • Product_ID (third key variable)
Because the two variables that are referenced in the WHERE expression are the first two key variables in the composite index, SAS can use the composite index for compound optimization if the WHERE conditions meet all other requirements for optimization.

Example: Composite Index That Can Be Used to Optimize One Condition

The following WHERE statement also contains two conditions, and each condition references one of the variables in the composite index:
WHERE Statement
Available Index
where order_date='01jan2000'd and 
product_id='220101400106';
composite index defined on the following variables:
  • Order_Date (first key variable)
  • Delivery_Date (second key variable)
  • Product_ID (third key variable)
As in the previous WHERE statement, Order_Date is the first key variable in the index. However, in this situation, the composite index can be used to optimize only the first condition. The second condition references the third key variable, Product_ID, but the WHERE expression does not reference the second key variable, Delivery_Date. Without a reference to both the first and second key variables, compound optimization cannot occur.

Example: Composite Index That Cannot Be Used for Optimizing

Now suppose your program contains a WHERE statement that references only the second and third key variables in the composite index, as shown below:
WHERE Statement
Available Index
where delivery_date='02jul2000'd' and 
product_id='220101400106';
composite index defined on the following variables:
  • Order_Date (first key variable)
  • Delivery_Date (second key variable)
  • Product_ID (third key variable)
In this situation, SAS cannot use the index for optimization at all because the WHERE statement does not reference the first key variable.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.192.183