Set the cart before the horse.
This chapter covers basic material about generating and reading execution plans. It’s optional, in terms of both when and whether you need to read it for the rest of the book to make sense. The database vendors all provide specialized, often graphical tools to generate and view execution plans. There are also popular third-party tools, such as TOAD, for this purpose. If you have access to these well-documented tools and already know how to use them, you can probably skip or skim this chapter. Otherwise, this chapter is not intended to replace or compete with specialized tools or their documentation. Instead, I describe the most basic methods of generating and reading execution plans, methods that are guaranteed to be available to you regardless of the available tools in your environment. These basic methods are especially useful to know if you work in diverse environments, where you cannot count on having the specialized tools readily available. If you already have and use more elaborate tools, you won’t need (and might not even like) my methods. In my own work, across diverse environments, I never bother with the more elaborate tools. I have found that when you know which execution plan you want and how to get it, simple tools, native to the database, will suffice. Reading an execution plan is just a quick check for whether the database is using the desired plan.
If you choose to read this chapter, you can probably skip straight to the section on reading execution plans for your choice of vendor database, unless you want to tune on multiple vendor databases. Each of those sections stands alone, even repeating material from the other sections, when applicable. However, as you read this chapter, please keep in mind that the execution plans you see will not really be useful to you until you have learned the material of Chapter 5-Chapter 7. These later chapters will teach you how to decide which execution plan you even want, and viewing execution plans is of little use unless you know which plan you want.
Oracle uses a SQL-centric approach to generating and displaying execution plans. You use SQL to place plan data into a table, after which you can view the data with a normal SQL query. The process can seem awkward at first, especially if you perform it manually. SQL Server sends execution-plan descriptions directly to your screen upon request, but Oracle’s SQL-centric approach, writing to a plan table, is much more flexible when you wish to automate the process or analyze whole sets of execution plans at once.
Oracle places execution-plan data into a table, which is
normally called PLAN_TABLE
. If you
do not already have a PLAN_TABLE
in
the schema you are using to investigate execution plans, create one.
You can create an up-to-date PLAN_TABLE
with the utlxplan.sql script in the rdbms/admin directory under ORACLE_HOME. If you cannot reach ORACLE_HOME, you can create a serviceable
PLAN_TABLE
with this script:
CREATE TABLE PLAN_TABLE( STATEMENT_ID VARCHAR2(30), TIMESTAMP DATE, REMARKS VARCHAR2(80), OPERATION VARCHAR2(30), OPTIONS VARCHAR2(30), OBJECT_NODE VARCHAR2(128), OBJECT_OWNER VARCHAR2(30), OBJECT_NAME VARCHAR2(30), OBJECT_INSTANCE NUMBER(38), OBJECT_TYPE VARCHAR2(30), OPTIMIZER VARCHAR2(255), SEARCH_COLUMNS NUMBER(38), ID NUMBER(38), PARENT_ID NUMBER(38), POSITION NUMBER(38), COST NUMBER(38), CARDINALITY NUMBER(38), BYTES NUMBER(38), OTHER_TAG VARCHAR2(255), OTHER LONG);
You use a four-step process from SQL*Plus to generate and display execution plans on Oracle with the least interference to other end users, who may also be using the plan table:
Delete all rows from Oracle’s special execution-plan table
PLAN_TABLE
in the schema you
are using to generate the execution plans. You can generate an
execution plan for a SQL statement only from a database user that
has the privilege to run that SQL statement. Therefore, you
usually generate execution plans while connected to the same
schema in which the SQL to be tuned runs.
It is sometimes tempting to set up special analysis-only database users and schemas for purposes such as generating execution plans, and to grant the users enough privilege to execute the SQL to be tuned. This approach must be used with caution, because the special schemas will operate from their own namespace (potentially seeing different versions of a view, for example). When you connect to these special users, the database will potentially interpret a given query differently than it is interpreted within the business application, yielding a different execution plan.
Generate the execution-plan records in PLAN_TABLE
with the SQL statement
EXPLAIN PLAN FOR
<Statement_To_Be_Tuned>
;
.
Display the execution plan with a statement like this:
SELECT LPAD(' ',2*(LEVEL-1))||OPERATION||' '||OPTIONS||' '|| DECODE(OBJECT_INSTANCE, NULL, OBJECT_NAME, TO_CHAR(OBJECT_INSTANCE)||'*'|| OBJECT_NAME) PLAN FROM PLAN_TABLE START WITH ID=0 CONNECT BY PRIOR ID = PARENT_ID ORDER BY ID;
Clean up your work with ROLLBACK;
.
Let’s follow this process to analyze the execution plan for a simple query:
SELECT Last_Name, First_Name, Salary FROM Employees WHERE Manager_ID=137 ORDER BY Last_Name, First_Name;
Following is the actual content of a SQL*Plus session to manually determine the execution plan of this query:
SQL>delete from plan_table;
0 rows deleted. SQL>EXPLAIN PLAN FOR SELECT Last_Name, First_Name, Salary FROM Employees
2WHERE Manager_ID=137
3ORDER BY Last_Name, First_Name;
Explained. SQL>SELECT LPAD(' ',2*(LEVEL-1))||OPERATION||' '||OPTIONS||' '||
2DECODE(OBJECT_INSTANCE, NULL, OBJECT_NAME,
3TO_CHAR(OBJECT_INSTANCE)||'*'|| OBJECT_NAME) PLAN
4FROM PLAN_TABLE
5START WITH ID=0
6CONNECT BY PRIOR ID = PARENT_ID
7ORDER BY ID;
PLAN -------------------------------------------------------------------------------- SELECT STATEMENT SORT ORDER BY TABLE ACCESS BY INDEX ROWID 1*EMPLOYEES INDEX RANGE SCAN EMPLOYEES_MANAGER_ID 4 rows selected. SQL>rollback;
Rollback complete.
This shows an execution plan that finds the index range (on the
index Employees_Manager_ID
) that
covers employees who report to the manager with ID 137
. That index range scan (as shown in the
last row of output above the feedback 4
rows
selected
) delivers a
list of rowids that point to specific rows in specific blocks of the
Employees
table. For each of those
rowids, Oracle performs logical I/O and, if necessary, physical I/O to
the necessary table block, where it finds the specific row indicated.
Following the table reads, Oracle sorts the rows in ascending order,
based on the indicated ORDER BY
columns.
To a beginner, Oracle’s process for displaying execution plans looks clumsy, I know, but you can automate the underlying steps with a little simple scripting. If you are working from Unix, create the following files:
-- File called head.sql: set pagesize 999 set feedback off DELETE FROM PLAN_TABLE WHERE STATEMENT_ID = '<Your name>
'; EXPLAIN PLAN SET STATEMENT_ID = '<Your name>
' FOR -- File called tail.sql: SELECT LPAD(' ',2*(LEVEL-1))||OPERATION||' '||OPTIONS||' '|| DECODE(OBJECT_INSTANCE, NULL, OBJECT_NAME, TO_CHAR(OBJECT_INSTANCE)||'*'|| OBJECT_NAME) PLAN FROM PLAN_TABLE START WITH ID=0 AND STATEMENT_ID = '<Your name>
' CONNECT BY PRIOR ID = PARENT_ID AND STATEMENT_ID = '<Your name>
' ORDER BY ID; ROLLBACK; -- File called ex.sql: !cat head.sql tmp.sql tail.sql > tmp2.sql spool tmp.out @tmp2 spool off
You then can iterate execution plans rapidly by editing a copy
of the SQL in question (complete with terminating ;
) in tmp.sql, using the editor of your choice,
in one window. In another window, start a SQL*Plus session from the
directory that holds head.sql,
tail.sql, ex.sql, and tmp.sql. Generate new execution plans for
the current version of tmp.sql
(after you save it!) by issuing the command @ex
from the SQL>
prompt in the window that is running
SQL*Plus. The process for analyzing and displaying execution plans
then becomes:
Place the bare SQL to be analyzed into tmp.sql, in the same directory as ex.sql, head.sql, and tail.sql.
From a SQL*Plus session started in that same directory, run
@ex
from the SQL>
prompt.
View the execution plan.
Tweak the database (for example, with index changes) and the SQL to be tuned in tmp.sql (following the methods of Chapter 4).
Save tmp.sql and loop back to Step 2. Repeat until you have the execution plan you want, and then save the corrected result someplace permanent.
With this process, it takes just seconds to make a change and see the results. If you need to print the execution plan or to view it with an editor (especially if it is large), it is already available, spooled to the file tmp.out.
In operating systems other than Unix, you can try similar tricks
or you can always just add the contents of head.sql to the top of tmp.sql, add the contents of tail.sql to the bottom, and run @tmp
from the SQL>
prompt, an approach that works in
any operating system.
In practice, about half the changes you will make to force the execution plan you want will be to tmp.sql, and the other half will be to the database environment, through SQL*Plus, with operations such as creating and dropping indexes, generating table and index statistics, or modifying session optimization parameters.
When tuning SQL, you’ll usually want to verify that you are getting simple execution plans that drive through nested loops in the correct join order. I refer to these execution plans as robust, because they tend to scale well to high data volumes. Here’s an example that returns a robust plan, to make the process clear, with the following SQL statement to be tuned, placed in tmp.sql:
-- File called tmp.sql SELECT /*+ RULE */ E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Locations LE, Locations LM, Employees M, Employees E WHERE E.Last_Name = :1 AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID AND UPPER(LE.Description)=:2;
From SQL*Plus, in the directory with tmp.sql, head.sql, tail.sql, and ex.sql, the command @ex
from the SQL>
prompt produces the following
output, with indexes only on the primary keys and on Employees(Last_Name)
:
SQL>@ex
PLAN
----------------------------------------------------------------------
SELECT STATEMENT
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID 4*EMPLOYEES
INDEX RANGE SCAN EMPLOYEE_LAST_NAME
TABLE ACCESS BY INDEX ROWID 3*EMPLOYEES
INDEX UNIQUE SCAN EMPLOYEE_PKEY
TABLE ACCESS BY INDEX ROWID 2*LOCATIONS
INDEX UNIQUE SCAN LOCATION_PKEY
TABLE ACCESS BY INDEX ROWID 1*LOCATIONS
INDEX UNIQUE SCAN LOCATION_PKEY
SQL>
The preceding example uses a RULE
hint for
convenience only, not to imply that you should prefer the rule-based
optimizer. A RULE
hint is just a
convenient way to get a reproducible, nested-loops plan on empty
tables, such as I wished to demonstrate.
Here is how you read the execution-plan output:
All joins are nested loops, based on the nested
series of rows stating NESTED
LOOPS
. If you have a mix of
join methods, the first join executed will be the innermost
(most highly indented) one, the last one listed. You’ll read the
order of join methods executed from the inside out, or from the
bottom up.
This standard way of displaying Oracle execution plans is confusing, if you think about it. If you were to implement comparable nested loops as your own procedural program, the first join, shown as the innermost loop, would actually be the outermost loop in the true nested-loops structure! When I first drafted Chapter 2, I even erroneously described the first-executed nested-loops join as the innermost loop, since I was so used to the way Oracle displays execution plans. An alternative method of display would be useful, if everyone could start from scratch. Unfortunately, by now, so many tools and so much practice and education have trained Oracle developers to expect this form of output that changing it would only add to the confusion. If you are new to this, take heart: it will feel natural soon enough.
The order of table access is Employees
, twice, followed by Locations
, twice—the same order they
appear in the execution-plan output. When SQL references the
same tables multiple times, aliases for those tables are
mandatory. As you can see in the example FROM
clause, the Employees
table is aliased to both
E
and M
. You might guess from the index
choices that alias E
, rather
than alias M
, represents the
driving table, even though both aliases map to the same Employees
table. It is less obvious
which alias mapping to Locations
the database reaches first.
This is where the numbers in front of the table names come in:
they indicate the order of the alias reference in the FROM
clause, so you know that the
first Locations
alias,
LE
, is actually the last one
the execution plan reaches.
This addition of the number in front of the table name is
the only real change I have made from the standard form that
Oracle developers use to view execution plans. My addition of
TO_CHAR(OBJECT_INSTANCE)||'*
'
in the plan-display SQL adds this ambiguity-resolving feature. The
number helps in cases when the same table appears multiple times
in a FROM
clause but one join
order to those aliases is superior to another.
All four table reads are through some index, as
shown by the phrase TABLE
ACCESS BY INDEX ROWID
in
front of each table name. The indexes used, and indication of
whether the index use is guaranteed to be unique, come in the
indented entries just below each table access. Thus, you know
that the driving table E
is
reached through an index range scan (a read that at least
potentially touches multiple rows at a time) on the index
EMPLOYEE_LAST_NAME
. The rest
of the table accesses are unique reads through the tables’
primary keys. Since all reads after the driving table are for
unique joins, you know that the query will read at most the same
number of rows for each of these other tables as it reads for
the driving table.
For this example, I contrived index names that make clear which indexed column provides the table access, but indexes are often much more cryptically named than this. If it is not completely clear which column or columns are included in the index used, do not guess—check! One of the most common pitfalls in tuning on Oracle is to assume that the index range scan you wanted is the index range scan you got!
When you find unique scans on an index, you can safely assume they serve an equality condition on a unique key. There is usually only one column or combination of columns the index could cover to provide that unique scan, but even if there is a choice, it does not especially matter which unique condition the database uses, so you can safely guess. Index range scans are another matter. If you do not already know the indexes for a table and how they are named for each combination of columns, and if the index names do not resolve the question, always check in case the index range scan is not the one you expected. The simplest script to provide this check is as follows:
-- File called index.sql column column_name format a40 set pagesize 999 SELECT INDEX_NAME, COLUMN_NAME FROM USER_IND_COLUMNS WHERE TABLE_NAME = UPPER('&&1') ORDER BY INDEX_NAME, COLUMN_POSITION;
From SQL*Plus, logged into the schema that holds the table you
need to check, run @index
<NameOfTable>
from the SQL>
prompt. The script lists
multicolumn indexes in order, first column first. Here is an example
use of this script:
SQL>@index Locations
INDEX_NAME COLUMN_NAME
------------------------------ ----------------------------------------
LOCATION_PKEY LOCATION_ID
SQL>
To see functional indexes, where those would apply (usually
where you are matching UPPER(
<Some_Column>
)
or LOWER(<Some_Column>)
, or a type
conversion on a column), use the findex.sql script:
-- File called findex.sql set long 40 set pagesize 999 SELECT INDEX_NAME, COLUMN_EXPRESSION FROM USER_IND_EXPRESSIONS WHERE TABLE_NAME = UPPER('&&1') ORDER BY INDEX_NAME, COLUMN_POSITION;
I just explained how to find the join order, the join methods, and the table-access methods for the robust execution plan I showed earlier. If you combine that with the basics covered in Chapter 2, you should understand how Oracle will reach the data, from end to end. To test your understanding, try constructing a narrative that explains the full execution plan in English, as a set of instructions to the database. Compare your result with what follows. If it does not match well, try again later, after you have read a few more execution plans, to see if your understanding has improved. Here is the execution plan expressed in narrative form, as instructions to the database:
Using the condition E.Last_Name =
:1
, go to the index EMPLOYEE_LAST_NAME
and find the list
of rowids that correspond to employees with the requested last
name.
For each of these rowids, go to the table Employees
(E
) with a single-block read (logical
read, physical when necessary) according to each rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias E
) from that
row.
For each such row, using the join condition E.Manager_ID=M.Employee_ID
, go to the
primary-key index EMPLOYEE_PKEY
to find a single
matching rowid that corresponds to the employee record of the
manager for the employee whose record you already read. If no
matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Employees
(M
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias M
) from that row.
Append the applicable data to the incoming row from the earlier
table read to build a partial result row.
For each such row, using the join condition M.Location_ID=LM.Location_ID
, go to
the primary-key index LOCATION_PKEY
to find a single
matching rowid that corresponds to the location record that
matches the manager for the employee whose record you already
read. If no matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Locations
(LM
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias LM
) from that row.
Append the applicable data to the incoming row from the earlier
table reads to build a partial result row.
For each such row, using the join condition E.Location_ID=LE.Location_ID
, go to
the primary-key index LOCATION_PKEY
to find a single
matching rowid that corresponds to the location record that
matches the employee whose record you already read. If no
matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Locations
(LE
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias LE
) from that row.
Append the applicable data to the incoming row from the earlier
table reads to complete the result row. Discard the whole result
row if it contains data that fails to meet the condition
UPPER(LE.Description)=:2
.
Otherwise, immediately return the fully built result row.
You will find no explicit step in the execution plan for this last filter, which discards rows that fail to meet the condition on the location description. I call this filter a post-read filter, since it does not contribute to the method of reaching a table row but instead is used to discard some rows after they are read. Oracle does not make the discard actions on post-read filters explicit in the execution plan, but you can always count on Oracle to apply them at the first opportunity, as soon as it has reached the data necessary to evaluate the truth of their conditions. If the execution plan included further joins after this last join, Oracle would only perform those joins on rows that passed this post-read filter, discarding the rest.
Execution plans for the SQL you tune will often be nonrobust in the beginning, often as a part of the performance problem you must resolve. These nonrobust execution plans use join methods other than nested loops. You often do not need to understand the nonoptimal execution plans you start with in detail, as long as you can recognize that they are not the plans you want. However, it is useful to have at least a rough idea of why the starting execution plans are as slow as they are, to guess how much better your optimal plans will be. Now, I’ll show how alternative execution plans appear for the query you’ve been looking at for the past couple sections. If I drop all the indexes, the rule-based optimizer delivers a new execution plan:
PLAN -------------------------------------------------------------------------------- SELECT STATEMENT MERGE JOIN SORT JOIN MERGE JOIN SORT JOIN MERGE JOIN SORT JOIN TABLE ACCESS FULL 4*EMPLOYEES SORT JOIN TABLE ACCESS FULL 3*EMPLOYEES SORT JOIN TABLE ACCESS FULL 2*LOCATIONS SORT JOIN TABLE ACCESS FULL 1*LOCATIONS
This shows the same join order, but now the database performs sort-merge joins and finds the rows for each table through full table scans.
Hash joins are more common than merge joins in
cost-based execution plans, and you will occasionally even prefer them
over nested-loops joins, so I next show an example that produces this
style of join. Note that the original SQL that produced the previous
plan has a hint (/*+ RULE */
)
immediately following the SELECT
keyword. If I replace the hint /*+ RULE
*/
with /*+ORDERED USE_HASH(M LE
LM) */
and reverse the order of the FROM
clause—with empty tables, no indexes,
and complete statistics—the cost-based optimizer delivers a new
execution plan:
PLAN ---------------------------------------------------------------------- SELECT STATEMENT HASH JOIN HASH JOIN HASH JOIN TABLE ACCESS FULL 1*EMPLOYEES TABLE ACCESS FULL 2*EMPLOYEES TABLE ACCESS FULL 3*LOCATIONS TABLE ACCESS FULL 4*LOCATIONS
This is identical to the previous execution plan, except that it replaces the merge joins with hash joins.
There are other execution-plan features, such as indicators of which joins are outer joins and steps for sorts and sort-unique operations that discard duplicates that you will see regularly, but these are fairly self-explanatory and are not usually important to performance. The only remaining important subtleties that you will often see deal with subqueries and multipart execution plans. I’ll cover both of these at once with one final example:
SELECT /*+ RULE */ E.First_Name, E.Nickname, E.Last_Name, E.Phone_Number, L.Description FROM Employees E, Locations L WHERE (E.First_Name='Kathy' OR E.Nickname='Kathy') AND E.Location_ID=L.Location_ID AND EXISTS (SELECT null FROM Wage_Payments P WHERE P.Employee_ID=E.Employee_ID AND P.Payment_Date > sysdate-31);
Place indexes on:
Employees(First_Name)
Employees(Nickname)
Locations(Location_ID)
Wage_Payments(Employee_ID)
You then find the following execution plan:
PLAN ---------------------------------------------------------------------- SELECT STATEMENT CONCATENATION FILTER NESTED LOOPS TABLE ACCESS BY INDEX ROWID 1*EMPLOYEES INDEX RANGE SCAN EMPLOYEE_NICKNAME TABLE ACCESS BY INDEX ROWID 2*LOCATIONS INDEX UNIQUE SCAN LOCATION_PKEY TABLE ACCESS BY INDEX ROWID 3*WAGE_PAYMENTS INDEX RANGE SCAN WAGE_PAYMENT_EMPLOYEE_ID FILTER NESTED LOOPS TABLE ACCESS BY INDEX ROWID 1*EMPLOYEES INDEX RANGE SCAN EMPLOYEE_FIRST_NAME TABLE ACCESS BY INDEX ROWID 2*LOCATIONS INDEX UNIQUE SCAN LOCATION_PKEY
The CONCATENATION
step
indicates that the optimizer has implemented this as the implicit
UNION
of essentially two distinct
queries, one driving from the index on First_Name
and the other driving from the
index on Nickname
. Following the
completion of the outer query, the FILTER
step implements the correlation join
on P.Employee_ID=E.Employee_ID
,
following the index on the foreign key from Wage_Payments
to Employees
. This FILTER
step is really no different than a
nested-loops join, except that it halts after finding the first
matching row, if there is one. Note that the second FILTER
step refers back to the same
correlation join to Wage_Payments
as the first FILTER
step. This is
an artifact of the concatenated execution plan, which repeats the
steps for the joins in the outer query, but not the steps for the
correlated join.
DB2 uses multiple approaches to generate and display execution plans. You use SQL to place plan data into a table, after which you can view the data by several means. These are the primary methods that IBM itself describes in its documentation:
Visual Explain requires a client installation on your workstation and is not available on all supported platforms. For that reason, I’ve never used it; I prefer a tool that I can always count on being readily accessible.
This tool runs from the command line in any environment, including nongraphical environments, so you can count on it being available. However, I find that it tells me far more than I want to know, making it hard to find the forest for the trees, so to speak. For example, it produced a 1,216-line report for an execution plan of a simple four-way join. Even the portion of the report that shows the big picture is hard to use. It displays the execution plan tree in an ASCII text layout that mimics a graphical picture of the tree structure, but it requires far more line-width than you can easily view for all but the simplest execution plans.
This approach works best for me, so I describe it in this section in detail. If you already know how to answer the basic questions about an execution plan (e.g., the join order, the join methods, and the table-access methods) using the other tools, you probably don’t need this section and can function well with the method you already know.
DB2 places execution-plan data into the following seven tables:
To create these tables, run the EXPLAIN.DDL script located in the misc subdirectory under the sqllib directory, while connected to the schema in which you need these tables. From the misc directory, connect and change to the schema that belongs to the user you will use when generating execution plans. From the Unix prompt, you then execute the command:
db2 -tf EXPLAIN.DDL
DB2’s plan tables contain a hierarchy of data about each
execution plan stored, with EXPLAIN_INSTANCE
at the top of the hierarchy
with one row per execution plan. When you delete an EXPLAIN_INSTANCE
row, the delete cascades to
remove details for that execution plan from the other tables as well.
Normally, your execution plans end up in these tables in the schema
that belongs to the end user you logged on as. For example, you might
have connected with this command:
CONNECT TOServer_Name
USERUser_Name
USINGSomePassword
;
In this case, you likely set your schema to the schema that contains the application data, so you could run and explain queries against that data:
SET SCHEMAAppl_Schema
;
However, this latter step has no effect on where execution plans
you generate will end up; they still go to EXPLAIN_
tables in the User_Name
schema.
You use a four-step process from the DB2 command-line interpreter to generate and display execution plans with the least interference to other end users who might also be using the plan table:
Delete all rows from the top-level execution-plan table
EXPLAIN_INSTANCE
in the schema
you are using to store the execution plans, usually the schema
belonging to the user you logged in as. The DELETE
from the EXPLAIN_INSTANCE
table automatically
cascades to clean up the execution plan data in the other six
tables as well.
Generate the execution-plan records with the SQL statement
EXPLAIN PLAN FOR
<Statement_To_Be_Tuned>
;
.
Display the execution plan with a statement by any of several means that DB2 provides, as I described in the earlier, just under the heading Section 3.2.
Clean up your work with ROLLBACK;
.
I’ll demonstrate this process to show the execution plan for a simple query:
SELECT Last_Name, First_Name, Salary FROM Employees WHERE Manager_ID=137 ORDER BY Last_Name, First_Name;
Here is the actual content of a DB2 session to manually determine the execution plan of this query, with generic passwords and names:
$db2 +c -t
(c) Copyright IBM Corporation 1993,1997 Command Line Processor for DB2 SDK 5.2.0 You can issue database manager commands and SQL statements from the command prompt. For example: db2 => connect to sample db2 => bind sample.bnd For general help, type: ?. For command help, type: ? command, where command can be the first few keywords of a database manager command. For example: ? CATALOG DATABASE for help on the CATALOG DATABASE command ? CATALOG for help on all of the CATALOG commands. To exit db2 interactive mode, type QUIT at the command prompt. Outside interactive mode, all commands must be prefixed with 'db2'. To list the current command option settings, type LIST COMMAND OPTIONS. For more detailed help, refer to the Online Reference Manual. db2 =>CONNECT TO Server_Name USER User_Name USING SomePassword;
Database Connection Information Database server = DB2/SUN 5.2.0 SQL authorization ID = USER_NAME Local database alias = SERVER_NAME db2 =>SET SCHEMA Appl_Schema;
DB20000I The SQL command completed successfully. db2 =>DELETE FROM USER_NAME.EXPLAIN_INSTANCE;
DB20000I The SQL command completed successfully. db2 =>EXPLAIN PLAN FOR SELECT Last_Name, First_Name, Salary FROM Employees
db2 (cont.) =>WHERE Manager_ID=137
db2 (cont.) =>ORDER BY Last_Name, First_Name;
DB20000I The SQL command completed successfully. db2 =>SELECT O.Operator_ID, S2.Target_ID, O.Operator_Type,
db2 (cont.) =>S.Object_Name, CAST(O.Total_Cost AS INTEGER) Cost
db2 (cont.) =>FROM USER_NAME.EXPLAIN_OPERATOR O
db2 (cont.) =>LEFT OUTER JOIN USER_NAME.EXPLAIN_STREAM S2
db2 (cont.) =>ON O.Operator_ID=S2.Source_ID
db2 (cont.) =>LEFT OUTER JOIN USER_NAME.EXPLAIN_STREAM S
db2 (cont.) =>ON O.Operator_ID = S.Target_ID
db2 (cont.) =>AND O.Explain_Time = S.Explain_Time
db2 (cont.) =>AND S.Object_Name IS NOT NULL
db2 (cont.) =>ORDER BY O.Explain_Time ASC, Operator_ID ASC;
OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST ----------- --------- ------------- ------------------ ----------- 1 - RETURN - 186 2 1 TBSCAN - 186 3 2 SORT - 186 4 3 FETCH EMPLOYEES 186 5 4 IXSCAN EMP_MGR_ID 25 5 record(s) selected. db2 =>ROLLBACK;
DB20000I The SQL command completed successfully. db2 =>
This shows an execution plan that finds the index range (on the
index Emp_Mgr_ID
) that covers
employees who report to the manager with ID 137
. That index range scan delivers a list
of rowids that point to specific rows in specific blocks of the
Employees
table. For each of those
rowids, DB2 performs logical I/O and, if necessary, physical I/O to
the necessary table block, where it finds the specific row indicated.
Following the table reads, DB2 sorts the rows in ascending order into
a temporary table, based on the indicated ORDER BY
columns. Finally, it scans the
temporary table that contains the sorted result.
This form of query shows steps labeled by OPERATOR_ID
and allows tracing of a
tree-like plan through the column TARGET_ID
. TARGET_ID
points to the step that is a
parent of the step shown. In the example, each parent has a single
child, but many potential steps, such as nested-loops steps, are
parents to a pair of later steps. You can use TARGET_ID
to lay the steps out in a tree
structure that corresponds to the execution plan. DB2’s other methods
for showing execution plans show this same tree structure directly,
though it is hard to see all at once on your screen.
The same sort of tree structure is reflected in the indentation
of the execution plans from the earlier query I showed to illustrate
Oracle execution plans, but that query uses CONNECT BY
, a feature lacking in DB2. SQL
Server also uses indentation to show the tree structure of the
underlying execution plan, in plans shown with SHOWPLAN_TEXT
, described later.
To a beginner, the process for displaying DB2 execution plans looks clumsy, I know, but you can automate the underlying steps with a little simple scripting. If you are working from Unix, create the following files:
-- File called head.sql DELETE FROM User_Name.EXPLAIN_INSTANCE; EXPLAIN PLAN FOR -- File called tail.sql SELECT O.Operator_ID, S2.Target_ID, O.Operator_Type, S.Object_Name, CAST(O.Total_Cost AS INTEGER) Cost FROM User_Name.EXPLAIN_OPERATOR O LEFT OUTER JOIN User_Name.EXPLAIN_STREAM S2 ON O.Operator_ID=S2.Source_ID LEFT OUTER JOIN User_Name.EXPLAIN_STREAM S ON O.Operator_ID = S.Target_ID AND O.Explain_Time = S.Explain_Time AND S.Object_Name IS NOT NULL ORDER BY O.Explain_Time ASC, Operator_ID ASC; ROLLBACK;
With the aid of head.sql and tail.sql, the practical process of displaying execution plans, after you have chosen the execution plan you want (see Chapter 5-Chapter 7), becomes:
Place the bare SQL to be analyzed into tmp.sql, in the same directory as head.sql and tail.sql.
From a DB2 session started in that same directory, after
running quit;
to reach the
shell prompt, run cat head.sql tmp.sql
tail.sql | db2 +c +p -t
from the shell prompt.
Tweak the database (for example, with index changes) and the SQL to be tuned in tmp.sql (following the methods of Chapter 4) and repeat the previous step from the shell prompt until you have the execution plan you want. Then, save the corrected result in a permanent location.
Begin by editing a copy of the SQL in question (complete with
terminating semicolon) in tmp.sql, using the editor of your choice,
in one window. In another window, start a DB2 session from the
directory that holds head.sql,
tail.sql, and tmp.sql. Next, exit the db2 command-line processor with quit
, but stay at the shell prompt. Generate
and view new execution plans for the current version of tmp.sql (after you save it!) with the
following command:
cat head.sql tmp.sql tail.sql | db2 +c +p -t
Use your favorite shell shortcut to repeat this command as needed. With this process, it takes just seconds to make a change and see the results. If you need to print the execution plan or to view it with an editor, you can redirect the output:
cat head.sql tmp.sql tail.sql | db2 +c +p -t > tmp.out
In operating systems other than Unix, you can try similar tricks
or you can always just add the contents of head.sql to the top of tmp.sql, add the contents of tail.sql to the bottom, and run the whole
script at one time, an approach that works in any operating system.
Here is an example of the process in action, with the same query I
explained earlier, beginning with the quit
command to reach the shell
prompt:
db2 =>quit;
DB20000I The QUIT command completed successfully. $cat head.sql tmp.sql tail.sql | db2 +c +p -t
DB20000I The SQL command completed successfully. DB20000I The SQL command completed successfully. OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST ----------- --------- ------------- ------------------ ----------- 1 - RETURN - 186 2 1 TBSCAN - 186 3 2 SORT - 186 4 3 FETCH EMPLOYEES 186 5 4 IXSCAN EMP_MGR_ID 25 5 record(s) selected. DB20000I The SQL command completed successfully. $
In practice, about half the changes you will make to force the execution plan you want will be to tmp.sql, and the other half will be to the environment, through the db2 command-line interface, with operations such as creating and dropping indexes, generating table and index statistics, or modifying session optimization parameters.
When tuning SQL, you’ll usually want to verify that you are getting simple execution plans that drive through nested loops in the correct join order. I refer to these execution plans as robust, because they tend to scale well to high data volumes. Here’s an example that generates a robust plan, to better understand the process, with the following SQL statement to be tuned, placed in tmp.sql:
-- File called tmp.sql SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Employees E INNER JOIN Locations LE ON E.Location_ID=LE.Location_ID INNER JOIN Employees M ON E.Manager_ID=M.Employee_ID INNER JOIN Locations LM ON M.Location_ID=LM.Location_ID WHERE E.Last_Name = ? AND UCASE(LE.Description) = ? ;
To demonstrate this SQL on a realistic case, I populated the
Employees
table with 100,000 rows,
having 10,000 different values for Last_Name
. I populated the Locations
table with 1,000 rows. I quit to
the shell prompt after connecting to DB2 in the directory with
tmp.sql, head.sql, and tail.sql. I executed cat head.sql tmp.sql
tail.sql | db2 +c +p -t
from the shell
prompt and produced the following output, with indexes only on the
primary keys and on Employees(Last_Name)
:
$cat head.sql tmp.sql tail.sql | db2 +c +p -t
DB20000I The SQL command completed successfully.
DB20000I The SQL command completed successfully.
OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST
----------- --------- ------------- ------------------ -----------
1 - RETURN - 305
2 1 NLJOIN - 305
3 2 NLJOIN - 285
4 3 NLJOIN - 260
5 4 FETCH EMPLOYEES 80
6 5 IXSCAN EMP_LAST_NAME 50
7 4 FETCH LOCATIONS 50
8 7 IXSCAN LOCATION_PKEY 25
9 3 FETCH EMPLOYEES 75
10 9 IXSCAN EMPLOYEE_PKEY 50
11 2 FETCH LOCATIONS 50
12 11 IXSCAN LOCATION_PKEY 25
12 record(s) selected.
DB20000I The SQL command completed successfully.
$
Here is how you read the execution plan output:
All joins are nested loops, based on the series of
rows that state NLJOIN
. If
you have a mix of join methods, the first join executed will be
the last one listed. You read the order of join methods executed
from the bottom up.
The order of table access is Employees
, Locations
, Employees
, Locations
—the same order they appear
in the execution plan output. When SQL references the same
tables multiple times, aliases for those tables are mandatory.
As you can see in the example FROM
clause, the Employees
table is aliased to both
E
and M
. You can tell from the index choices
that alias E
, rather than
alias M
, represents the
driving table, even though both aliases map to the same Employees
table. It is less obvious
which alias that maps to Locations
the database reaches first,
but it must be LE
, since only
that alias is reachable second in the join order.
All four table reads are through some index, as
shown by the OPERATOR_TYPE
FETCH
in front of each table
name. The indexes used come in the OPERATOR_TYPE
IXSCAN
entries just below each table
access. Thus, you know that the driving table E
is reached through an index scan (a
read that potentially touches multiple rows at a time) on the
index EMP_LAST_NAME
. The rest
of the table accesses are unique reads since they use equality
conditions on the tables’ primary keys. Since all reads after
the driving table are for unique joins, you know that the query
will read at most the same number of rows for each of these
other tables as it reads for the driving table.
For this example, I contrived index names that make clear which indexed column provides the table access, but indexes are often much more cryptically named than this. If it is not completely clear which column or columns are included in the index used, do not guess—check! One of the most common pitfalls in tuning is to assume that the index range scan you wanted is the index range scan you got!
If you do not already know the indexes for a table, you don’t know how they are named for each combination of columns, and the index names do not resolve the question. Always check in case the index range scan is not the one you expected. The simplest script to provide this check is as follows:
-- File called inddb2.sql SELECT IndName, ColNames FROM SYSCAT.INDEXES WHERE TabName = UCASE('EMPLOYEES'),
From DB2, logged into the schema that holds the table you need
to check, edit the script to reference the table you want to
investigate and run db2
-tf inddb2.sql
from the shell prompt. The
script lists multicolumn indexes in order, first column first, on a
single line, separated by +
signs. Here is an example of the use of this script:
$db2 -tf inddb2.sql
INDNAME COLNAMES
------------------ ---------------------
EMP_MGR_ID +MANAGER_ID
EMPLOYEE_PKEY +EMPLOYEE_ID
EMP_LOCATION_ID +LOCATION_ID
EMP_DEPARTMENT_ID +DEPARTMENT_ID
EMP_HIRE_DATE +HIRE_DATE
EMP_LAST_NAME +LAST_NAME
EMP_NICKNAME +NICKNAME
EMP_FIRST_NAME +FIRST_NAME
8 record(s) selected.
I just explained how to find the join order, the join methods, and the table-access methods for the robust execution plan I showed earlier. If you combine that with the basics covered in Chapter 2, you should understand how DB2 will reach the data, from end to end. To test your understanding, try constructing a narrative that explains the full execution plan in English, as a set of instructions to the database. Compare your result with what follows. If it does not match well, try again later, after you have read a few more execution plans, to see if your understanding has improved. Here is the execution plan expressed in narrative form, as instructions to the database:
Using the condition E.Last_Name =
?
, go to the index EMP_LAST_NAME
and find the list of
rowids that correspond to employees with the requested last
name.
For each of these rowids, go to the table Employees
(E
) with a single-block read (logical
read, physical when necessary) according to each rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias E
) from that
row.
For each such row, using the join condition E.Location_ID=LE.Location_ID
, go to
the primary-key index LOCATION_PKEY
to find a single
matching rowid that corresponds to the location record that
matches the employee whose record you already read. If no
matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Locations
(LE
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias LE
) from that row.
Append the applicable data to the incoming row from the earlier
table read to complete the result row. Discard the whole result
row if it contains data that fails to meet the condition
UCASE(LE.Description) =
?
.
Note that you will find no explicit step in the execution plan for this last filter, which discards rows that fail to meet the condition on the location description. I call this filter a post-read filter, since it does not contribute to the method of reaching this table row but instead is used to discard some rows after they are read. DB2 does not make the discard actions on post-read filters explicit in the tables I queried, but you can always count on DB2 to apply them at the first opportunity, as soon as it has reached the data necessary to evaluate the truth of their conditions. Since the execution plan includes further joins after this join, DB2 performs those joins only on rows that passed this post-read filter, discarding the rest.
For each row returned that combines E
and LE
:
Using the join condition E.Manager_ID=M.Employee_ID
, go to the
primary-key index EMPLOYEE_PKEY
to find a single
matching rowid that corresponds to the employee record of the
manager for the employee whose record you already read. If no
matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Employees
(M
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias M
) from that row.
Append the applicable data to the incoming row from the earlier
table reads to build a partial result row.
For each such row, using the join condition M.Location_ID=LM.Location_ID
, go to
the primary-key index LOCATION_PKEY
to find a single
matching rowid that corresponds to the location record that
matches the manager for the employee whose record you already
read. If no matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table
Locations
(LM
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that
the rowid points to and read all necessary data (requested data
for alias LM
) from that row.
Append the applicable data to the incoming row from the earlier
table reads to complete each result row. Immediately return the
fully built result row.
Execution plans often use join methods other than nested loops, especially the starting plans you will need to tune, so I next show an example that performs one of the joins by the less robust sort-merge method. If I drop all the indexes, DB2 delivers a new execution plan:
$cat head.sql tmp.sql tail.sql | db2 +c +p -t
DB20000I The SQL command completed successfully.
DB20000I The SQL command completed successfully.
OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST
----------- --------- ------------- ------------------ -----------
1 - RETURN - 21033
2 1 NLJOIN - 21033
3 2 NLJOIN - 20830
4 3 MSJOIN - 10517
5 4 TBSCAN - 204
6 5 SORT - 204
7 6 TBSCAN LOCATIONS 204
8 4 FILTER - 10313
9 8 TBSCAN - 10313
10 9 SORT - 10313
11 10 TBSCAN EMPLOYEES 10313
12 3 TBSCAN EMPLOYEES 10313
13 2 TBSCAN LOCATIONS 202
13 record(s) selected.
DB20000I The SQL command completed successfully.
$
In steps shown with OPERATOR_ID
5
through 11
, DB2 sorts full table scans of Locations
and Employees
(aliases LE
and E
)
on the join key Location_ID
,
discarding rows that fail to meet the filter conditions on these
tables. In the step shown with OPERATOR_ID=4
, DB2 performs a sort-merge
join between E
and LE
. Interestingly, since it sees such good
filters on both these tables, it estimates it will likely have at most
a single row left at that step, and it chooses to do nested loops to
full table scans to join to aliases M
and LM
,
as the last two steps. Nested loops to full table scans such as this
would scale badly if the data caused DB2 to loop many times. The cost
of merge or hash joins would be slightly higher than nested loops to a
single full table scan, but such joins would scale much better.
There are other execution-plan features, such as indicators of which joins are outer joins and steps for sorts and sort-unique operations that discard duplicates that you will see regularly, but these are fairly self-explanatory and are not usually important to performance. The only remaining important subtleties that you will often see deal with subqueries and multipart execution plans. I’ll cover both of these at once with one final example:
SELECT E.First_Name, E.Nickname, E.Last_Name, E.Phone_Number, L.Description FROM Employees E INNER JOIN Locations L ON E.Location_ID=L.Location_ID WHERE (E.First_Name= ? OR E.Nickname= ?) AND EXISTS (SELECT 1 FROM Wage_Payments P WHERE P.Employee_ID=E.Employee_ID AND P.Payment_Date > CURRENT DATE - 31 DAYS);
Populate Wage_Payments
with
500,000 rows. Place indexes on:
Employees(First_Name)
Employees(Nickname)
Locations(Location_ID)
Wage_Payments(Employee_ID)
You then find the following execution plan:
$cat head.sql tmp.sql tail.sql | db2 +c +p -t
DB20000I The SQL command completed successfully.
DB20000I The SQL command completed successfully.
OPERATOR_ID TARGET_ID OPERATOR_TYPE OBJECT_NAME COST
----------- --------- ------------- ------------------ -----------
1 - RETURN - 2014
2 1 MSJOIN - 2014
3 2 TBSCAN - 203
4 3 SORT - 203
5 4 TBSCAN LOCATIONS 202
6 2 FILTER - 1810
7 6 TBSCAN - 1810
8 7 SORT - 1810
9 8 NLJOIN - 1810
10 9 FETCH EMPLOYEES 422
11 10 RIDSCN - 100
12 11 SORT - 50
13 12 IXSCAN EMP_FIRST_NAME 50
14 11 SORT - 50
15 14 IXSCAN EMP_NICKNAME 50
16 9 FETCH WAGE_PAYMENTS 134
17 16 IXSCAN WAGE_PYMNT_EMP_ID 50
17 record(s) selected.
$
Steps shown with OPERATOR_ID
11
through 15
show the collection of a union of the
sets of rowids from the name conditions joined by OR
on E
.
The resulting new set of rowids feeds into the step labeled OPERATOR_ID=10
to get just the set of
employees that have the chosen name or nickname. From that list, DB2
chooses nested loops (NLJOIN
) to
Wage_Payments
. The loops halt as
soon as the first match is found, since this is an EXISTS
correlated join. This nested-loops
join is labeled OPERATOR_ID=9
. It
discards any Employees
records that
fail to find a matching Wage_Payment
in the subquery. Since DB2
calculates that it still has a fairly long list of Employees
by that point, it chooses to read
the Locations
table once and
perform a merge join (MSJOIN
) with
the Employees
records, sorting both
rowsets on the join keys.
Microsoft SQL Server uses multiple approaches to generate and display execution plans. These approaches create execution plans sent to your screen, in either graphical or text form, rather than place execution-plan data into tables, as DB2 and Oracle do.
If you bought this book hoping to tune Sybase Adaptive Server, take heart. The two databases share a common heritage, and almost all of what I will say about Microsoft SQL Server applies to Sybase as well, except for the SQL Server Query Analyzer graphical plan facility, which Microsoft added after the two versions split.
SQL Server has two approaches to displaying execution plans: a graphical approach, built into the SQL Server Query Analyzer, and a text-based approach, driven by the underlying database. The graphical display does not fit a whole execution plan of even a modest-sized multitable query onto a single screen. Therefore, I find it difficult to use the graphical display to answer the key questions about a long execution plan:
What is the join order?
What method is used for each join?
What method is used for each table access?
The text-based execution-plan display provides the answers to all three of these questions easily and fairly compactly.
To see execution plans graphically, you click on the Display Estimated Execution Plan button in SQL Server Query Analyzer. In the window where you usually see query results, you see a diagram of arrows connecting a series of icons that indicate the type of action (nested loops, indexed read, table access, etc.). Text goes with each icon, but the text is generally truncated so that it contains nothing useful until you point to it with your mouse, at which point you get a window that shows the missing details. Furthermore, for even a simple four-way join, the whole diagram doesn’t fit on the screen, even with Query Analyzer maximized to fill the screen. I find the graphical approach less useful than the text-based approach, which tells me everything I need to know at a glance.
In the alternative to graphical execution-plan display, you
place the query in the Query window of the SQL Server Query Analyzer, preceded by SET SHOWPLAN_TEXT
ON
, then click on Query Analyzer’s Execute
Query button:
SET SHOWPLAN_TEXT ON GO SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Locations LE, Locations LM, Employees M, Employees E WHERE E.Last_Name = 'Stevenson' AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID AND UPPER(LE.Description) = 'SAN FRANCISCO'
If you’re on Sybase, just replace SHOWPLAN_TEXT
with SHOWPLAN
. If you want an even more
detailed execution plan, you can also use SHOWPLAN_ALL
on Microsoft SQL
Server.
When I run the preceding commands with empty tables having
statistics, I find the following output in my results window
(insignificant text is replaced with ..
., and, to fit the output on the page,
I’ve added (wrapped line)
to
show continued lines):
StmtText ----------------------------------------------------------------------- SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Locations LE, Locations LM, Employees M, Employees E WHERE E.Last_Name = 'Stevenson' AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID AND UPPER(LE.Description) = 'SAN FRANCISCO' (1 row(s) affected) StmtText ----------------------------------------------------------------------- |--Bookmark Lookup(BOOKMARK:([Bmk1001]), OBJECT:([my_acct].[dbo].[Locations] AS [LM])) |--Nested Loops(Inner Join) |--Bookmark Lookup(...(...[Employees] AS [M])) | |--Nested Loops(Inner Join) | |--Filter(WHERE:(upper([LE].[Description])='SAN FRANCISCO')) | | |--Bookmark Lookup(...(...[Locations] AS [LE])) | | |--Nested Loops(Inner Join) | | |--Bookmark Lookup(...(...[Employees] AS [E])) | | | |--Index Seek(...(...[Employees].(wrapped line)
[Emp_Last_Name] AS [E]), SEEK:([E].[Last_Name]='Stevenson') ORDERED) | | |--Index Seek(...(...[Locations].[Location_PKey](wrapped line)
AS [LE]), SEEK:([LE].[Location_ID]=[E].[Location_ID]) ORDERED) | |--Index Seek(...(...[Employees].[Employee_PKey](wrapped line)
AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED) |--Index Seek(...(...[Locations].[Location_PKey](wrapped line)
AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED) (12 row(s) affected)
After you have executed the query analysis once, you no longer
need the top two lines activating SHOWPLAN_TEXT
. All future queries will
show plans only, until you click on Query Analyzer’s Execute Query
button for:
SET SHOWPLAN_TEXT OFF GO
Here is how you read the execution-plan output:
All joins are nested loops, based on the series of
rows that state Nested
Loops(Inner Join)
. If you have a mix of
join methods, the first join executed will be the innermost one,
the last one listed. You read the order of join methods executed
from the bottom up.
The order of table access is Employees
(E
), Locations
(LE
), Employees
(M
), Locations
(LM
)—the reverse order that the tables
appear in the execution plan output, from the innermost nesting
out. When SQL references the same tables multiple times, aliases
for those tables are mandatory. Since the execution plan
explicitly references the aliases, there is no ambiguity regarding
which alias each step refers to.
All four table reads are through some index, as
shown by the phrase Bookmark
Lookup
in front of each table
name. The indexes used come in the lower entries with matched
indentation for each Bookmark
Lookup
. Thus, you know that the driving table E
is reached through an index range scan
(a read that at least potentially touches multiple rows at a time)
on the index EMPLOYEE_LAST_NAME
. The rest of the
table accesses are unique reads through the tables’ primary keys.
Since all reads after the driving table are for unique joins, you
know that the query will read at most the same number of rows for
each of these other tables as it reads for the driving
table.
When you find scans on an index, the condition following the
index name makes clear how much of a range the scan will cover. If you
need to see other indexes available on a table, the simplest way is to
use the sp_helpindex
stored
procedure. Here are the command and its result:
sp_helpindex Employees
index_name index_description index_keys
--------------------------- ---------------------------------------- -----------
Employee_Manager_ID nonclustered located on PRIMARY Manager_ID
Employee_Last_Name nonclustered located on PRIMARY Last_Name
Employee_Location_ID nonclustered located on PRIMARY Location_ID
Employee_Department_ID nonclustered located on PRIMARY Department_ID
Employee_Hire_Date nonclustered located on PRIMARY Hire_Date
Employee_PKey nonclustered, unique located on PRIMARY Employee_ID
Employee_First_Name nonclustered located on PRIMARY First_Name
Employee_Nickname nonclustered located on PRIMARY Nickname
When an index covers multiple columns, they are listed in order
in the index_keys
column. You can
also use sp_help
to see a complete
description of a table, which includes a list of the table’s
indexes.
I just explained how to find the join order, the join methods, and the table-access methods for the robust execution plan I showed earlier. If you combine that with the basics covered in Chapter 2, you should understand how SQL Server will reach the data, from end to end. To test your understanding, try constructing a narrative that explains the full execution plan in English, as a set of instructions to the database. Compare your result with what follows. If it does not match well, try again later, after you have read a few more execution plans, to see if your understanding has improved. Here is the execution plan expressed in narrative form, as instructions to the database:
Using the condition E.Last_Name =
'Stevenson
', go to the index EMPLOYEE_LAST_NAME
and find the list of
rowids that correspond to employees with the requested last
name.
For each of these rowids, go to the table Employees
(E
) with a single-block read (logical
read, physical when necessary) according to each rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that the
rowid points to and read all necessary data (requested data for
alias E
) from that row.
For each such row, using the join condition E.Location_ID=LE.Location_ID
, go to the
primary-key index LOCATION_PKEY
to find a single matching rowid that corresponds to the location
record that matches the employee whose record you already read. If
no matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table Locations
(LE
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that the
rowid points to and read all necessary data (requested data for
alias LE
) from that row. Append
the applicable data to the incoming row from the earlier table
read to build a partial result row.
For each such row, discard the whole result row if it
contains data that fails to meet the condition UPPER(LE.Description)= 'SAN
FRANCISCO
‘.
For each remaining row, using the join condition E.Manager_ID=M.Employee_ID
, go to the
primary-key index EMPLOYEE_PKEY
to find a single matching rowid that corresponds to the employee
record of the manager for the employee whose record you already
read. If no matching row is found, discard the result row being
built.
Otherwise, for the matching rowid, go to the table Employees
(M
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that the
rowid points to and read all necessary data (requested data for
alias M
) from that row. Append
the applicable data to the incoming row from the earlier table
reads to build a partial result row.
For each such row, using the join condition M.Location_ID=LM.Location_ID
, go to the
primary-key index LOCATION_PKEY
to find a single matching rowid that corresponds to the location
record that matches the manager for the employee whose record you
already read. If no matching row is found, discard the result row
being built.
Otherwise, for the matching rowid, go to the table Locations
(LM
) with a single-block read (logical
read, physical when necessary) according to the rowid from the
previous step, using the block-address part of the rowid. Using
the row-address part of the rowid, find the specific row that the
rowid points to and read all necessary data (requested data for
alias LM
) from that row. Append
the applicable data to the incoming row from the earlier table
reads to complete and immediately return the fully built result
row.
Execution plans often use join methods other than nested
loops, especially as the starting plans you will need to tune, so I
next show an example that performs hash joins in place of robust nested-loops joins. If I
drop all the indexes and add an OPTION(HASH
JOIN)
hint at the end of the query, SQL Server delivers a
new execution plan:
StmtText
-------------------------------------------------------------------------
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
M.First_Name, M.Last_Name, LM.Description
FROM Locations LE, Locations LM, Employees M, Employees E
WHERE E.Last_Name = 'Stevenson'
AND E.Manager_ID=M.Employee_ID
AND E.Location_ID=LE.Location_ID
AND M.Location_ID=LM.Location_ID
AND UPPER(LE.Description) = 'SAN FRANCISCO'
OPTION(HASH JOIN)
(1 row(s) affected)
StmtText
------------------------------------------------------------------------------
|--Hash Match(Inner Join, ...([LM].[Location_ID])=([M].[Location_ID]),...)
|--Table Scan(...(...[Locations] AS [LM]))
|--Hash Match(Inner Join, ...([M].[Employee_ID])=([E].[Manager_ID]),...)
|--Table Scan(...(...[Employees] AS [M]))
|--Hash Match(Inner ...([E].[Location_ID])=([LE].[Location_ID]),...)
|--Table Scan(...(...[Employees] AS [E]),(wrapped line)
WHERE:([E].[Last_Name]='Stevenson'))
|--Filter(WHERE:(upper([LE].[Description])='SAN FRANCISCO'))
|--Table Scan(...(...[Locations] AS [LE]))
(8 row(s) affected)
This shows table scans for every table access. The query drives
from LE
and filters for locations
with the correct description. The second table accessed is E
, which is filtered for employees with the
correct last name. The remaining rows from these two tables are hashed
and joined. This result is next hash-joined to a full scan of M
and, finally, to a full scan of LM
.
There are other execution-plan features that you will see regularly, such as indicators of which joins are outer joins and steps for sorts and sort-unique operations that discard duplicates, but these are fairly self-explanatory and are not usually important to performance. The main remaining important subtlety that you will often see deals with subqueries. I’ll cover this with one final example:
SELECT E.First_Name, E.Nickname, E.Last_Name, E.Phone_Number, L.Description FROM Employees E INNER JOIN Locations L ON E.Location_ID=L.Location_ID WHERE (E.First_Name= ? OR E.Nickname= ?) AND EXISTS (SELECT 1 FROM Wage_Payments P WHERE P.Employee_ID=E.Employee_ID AND P.Payment_Date > CURRENT DATE - 31 DAYS);
Leave all tables empty. Place indexes on:
Employees(First_Name)
Employees(Nickname)
Locations(Location_ID)
Wage_Payments(Employee_ID)
You then find the following execution plan:
StmtText ----------------------------------------------------------------------------- SELECT E.First_Name, E.Nickname, E.Last_Name, E.Phone_Number, L.Description FROM Employees E, Locations L WHERE (E.First_Name='Kathy' OR E.Nickname='Kathy') AND E.Location_ID=L.Location_ID AND EXISTS (SELECT null FROM Wage_Payments P WHERE P.Employee_ID=E.Employee_ID AND P.Payment_Date > DATEADD(DAY,-31,GETDATE( ))); (1 row(s) affected) StmtText ------------------------------------------------------------------------------- |--Nested Loops(Left Semi Join) |--Filter(WHERE:([E].[First_Name]='Kathy' OR [E].[Nickname]='Kathy')) | |--Bookmark Lookup(...(...[Employees] AS [E])) | |--Nested Loops(Inner Join) | |--Table Scan(...(...[Locations] AS [L])) | |--Index Seek(...[Employees].[Employee_Location_ID](wrapped line)
AS [E]), SEEK:([E].[Location_ID]=[L].[Location_ID]) ORDERED) |--Filter(WHERE:([P].[Payment_Date]>dateadd(4, -31, getdate( )))) |--Bookmark Lookup(...(...[Wage_Payments] AS [P])) |--Index Seek(...(...[Wage_Payments].[Wage_Payment_Employee_ID](wrapped line)
AS [P]), SEEK:([P].[Employee_ID]=[E].[Employee_ID]) ORDERED) (9 row(s) affected)
The execution plan shows a full table scan on Locations
as the driving table, since it is
the first data-access step at the innermost level of nesting. SQL
Server then follows nested loops into the foreign-key index Employee_Location_ID
to join to Employees
. Once SQL Server reaches Employees
, it discards rows that fail to
meet the conditions on First_Name
and Nickname
. SQL Server then
performs a special join called a semi-join to
reach the correlated subquery on the join on matching Employee_ID
s, with the index Wage_Payment_Employee_ID
. That index
provides rowids to reach Wage_Payments
, following which the final
filter condition on Payment_Date
discards nonrecent rows that do not satisfy the EXISTS
subquery. The join to the correlated
EXISTS
subquery is shown as the top
step, which is described as a Left Semi
Join
. This result is not the optimum execution plan for
fully populated tables, but since the test was with empty tables, I
would not usually expect an execution plan that scales well to high
data volumes.
3.128.226.121