6.3. Indexes

An index consists of one or more columns used to uniquely identify each row within a table. Operating as a SAS object containing the values in one or more columns in a table, an index is composed of one or more columns and may be defined as numeric, character, or a combination of both.

There is no rule that says a table has to have an index, but they can often make information retrieval more efficient and considerably faster.

For example, if you know a specific part number and its location from a list of thousands, then you can look up the part and find its manufacturer, cost, and location far more efficiently than if you did not know this information.

6.3.1. Defining Indexes

When defining an index, you should first understand the purpose the index is to serve. The most important thing to keep in mind about indexes is that they should be created only when they are absolutely needed. Too many, or unnecessary, indexes use up computer resources. An index also takes up space and has to be updated any time a DELETE, INSERT, or UPDATE is performed on rows in a table. For this reason, care should be used when deciding when and what indexes to create.

To help determine when indexes are necessary, consider existing data as well as the way the base table(s) will be used. You also need to know what queries will be used and how they will access columns of data. If an index is used to specify some order within a table, such as manufacturer number or product number in the PRODUCTS table, you should fully assess what the impact of that index will be.

Sometimes the column(s) making up an index is obvious, and other times it is not. When determining whether an index provides any value, some very important rules should be kept in mind. An index should permit the greatest flexibility so every column in a table can be accessed and displayed. You can also improve query results by assigning indexes only to those columns that have many unique values or that you use regularly in joins.

When an index is specified for one or more tables, a join process may actually occur faster. The PROC SQL processor may use an index when certain conditions permit its use. Here are a few things to keep in mind before creating an index:

  • If the table is small, sequential processing may be just as fast, or faster, than processing with an index

  • If the page count as displayed in the CONTENTS procedure is less than 3 pages, avoid creating or using an index

  • Do not create more indexes than you absolutely need

  • If the data subset for the index is not small, sequential access may be more efficient than using the index

  • If the percentage of matches is approximately 15% or less then an index should be used

  • The costs associated with an index can outweigh its performance value – an index is updated each time when rows in a table are added, deleted, or modified.

Two types of indexes can be defined and used in PROC SQL: simple and composite. When a simple index is created, it references only a single column. In contrast, a composite index references two or more columns in a table.

6.3.2. Creating a Simple Index

A simple index is specifically defined for one column in a table and must be the same name as the column. Suppose you had to create an index consisting of product type (PRODTYPE) in the PRODUCTS table. Once created, the index becomes a separate object located in the SAS library.

SQL Code

PROC SQL;          1
						2
						3
  CREATE INDEX PRODTYPE ON PRODUCTS(PRODTYPE);
QUIT;

SAS Log Results

      PROC SQL;          1
												2
												3
        CREATE INDEX PRODTYPE ON PRODUCTS(PRODTYPE);
 NOTE: Simple index PRODTYPE has been defined.
      QUIT;
 NOTE: PROCEDURE SQL used:
       real time           0.37 seconds


The simple index is assigned a name of PRODTYPE, which must be the same as the column name.

The simple index is defined on the PRODUCTS table.

The PRODTYPE column in the PRODUCTS table is designated as the column to be used by the index.

6.3.3. Creating a Composite Index

A composite index is specifically defined for two or more columns in a table and must have a different name from the columns. Suppose you had to create an index consisting of manufacturer number (MANUNUM) and product type (PRODTYPE) located in the PRODUCTS table. You should be aware that only one composite index is allowed per set of columns, but more than one composite index is allowed. The composite index, as with the simple index, becomes a separate object located in the SAS library.

SQL Code

PROC SQL;
  CREATE INDEX
						1
						2
						3
         MANUNUM_PRODTYPE ON PRODUCTS(MANUNUM,PRODTYPE);
QUIT;

SAS Log Results

      PROC SQL;
        CREATE INDEX    1
												2
												3
               MANUNUM_PRODTYPE ON PRODUCTS
(MANUNUM,PRODTYPE);
 NOTE: Composite index MANUNUM_PRODTYPE has been
 defined.
      QUIT;
 NOTE: PROCEDURE SQL used:
       real time           0.00 seconds


The composite index is assigned a name of MANUNUM_PRODTYPE, which is used to represent the MANUNUM and PRODTYPE column names.

The composite index is defined on the PRODUCTS table.

The MANUNUM and PRODTYPE columns in the PRODUCTS table are designated as the columns to be used by the index.

6.3.4. Preventing Duplicate Values in an Index

The UNIQUE keyword prevents the entry of a duplicate value in an index. You should use this keyword with care because there may be times when more than one occurrence of a data value in a table is necessary. When multiple occurrences of the same value appear in a table, the UNIQUE keyword is rejected and the index is not created for that particular column.

6.3.5. Modifying Columns Containing Indexes

Altering the attributes of a column that contains an associated index (simple or composite) does NOT prohibit the values in the altered column from using the index. But, if a column that contains an index is dropped, then the index is also dropped. Accordingly, when a column is dropped, any data in that index is also lost.

6.3.6. Deleting (Dropping) Indexes

When one or more indexes are no longer needed, the DROP INDEX statement can be used to remove them. Suppose you determine that you no longer need the composite index MANUNUM_PRODTYPE (created earlier) because processing requirements have changed. The next example illustrates a single composite index being deleted from the SAS library.

SQL Code

PROC SQL;
  DROP INDEX MANUNUM_PRODTYPE
    FROM PRODUCTS;
QUIT;

SAS Log Results

       PROC SQL;
       DROP INDEX MANUNUM_PRODTYPE
          FROM PRODUCTS;
 NOTE: Index MANUNUM_PRODTYPE has been dropped.
       QUIT;
 NOTE: PROCEDURE SQL used:
       real time           0.00 seconds


According to the ANSI SQL standard, two or more indexes can also be deleted in a DROP INDEX statement. The next example illustrates the MANUNUM and PRODTYPE indexes being deleted from the SAS library in a single DROP INDEX statement.

SQL Code

PROC SQL;
  DROP INDEX MANUNUM, PRODTYPE
    FROM PRODUCTS;
QUIT;

SAS Log Results

  PROC SQL;
    DROP INDEX MANUNUM, PRODTYPE
      FROM PRODUCTS;
NOTE: Index MANUNUM has been dropped.
NOTE: Index PRODTYPE has been dropped.
  QUIT;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.187.113