Chapter 15. SQL Best Practices

Writing maintainable and efficient SQL statements requires a good deal of experience. You can write a SQL query in many different ways, each giving the same result, but one can be a hundred times slower than another, or one can be easier to understand and maintain than the other.

Know When to Use Specific Constructs

Depending on the circumstances, certain SQL constructs are preferable to others. For example, use of the EXISTS predicate is often preferable to DISTINCT. The next sections discuss the usage of such constructs.

EXISTS Is Preferable to DISTINCT

The DISTINCT keyword used in a SELECT clause eliminates duplicate rows in the result set. To eliminate those duplicates, Oracle performs a sort, and that sort requires time and disk space. Therefore, avoid using DISTINCT if you can tolerate having duplicate rows returned by a query. If you can’t tolerate the duplicate rows, or your application can’t handle them, use EXISTS in place of DISTINCT.

For example, assume you are trying to find the names of customers who have orders. Your query has to be based on two tables: customer and cust_order. Using DISTINCT, your query would be written as follows:

SELECT DISTINCT c.cust_nbr, c.name
FROM customer c JOIN cust_order o
ON c.cust_nbr = o.cust_nbr;

The corresponding execution plan for this query is as follows. Note the SORT operation, which is a result of DISTINCT being used.

Query Plan
-----------------------------------------
SELECT STATEMENT   Cost = 3056
  SORT UNIQUE
    MERGE JOIN
      INDEX FULL SCAN IND_ORD_CUST_NBR
      SORT JOIN
        TABLE ACCESS FULL CUSTOMER

To use EXISTS, the query needs to be rewritten as follows:

SELECT c.cust_nbr, c.name
FROM customer c
WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);

Here is the execution plan for the EXISTS version of the queries:

Query Plan
---------------------------------------
SELECT STATEMENT   Cost = 320
  FILTER
    TABLE ACCESS FULL CUSTOMER
    INDEX RANGE SCAN IND_ORD_CUST_NBR

Notice that the second query eliminates the overhead of the sort operation, and therefore runs faster.

WHERE Versus HAVING

We discussed the GROUP BY and HAVING clauses in Chapter 4. Sometimes, when writing a GROUP BY query, you have a condition that you can specify in either the WHERE or HAVING clause. In situations where you have a choice, you’ll always get better performance if you specify the condition in the WHERE clause. The reason is that it’s less expensive to eliminate rows before they are summarized than it is to eliminate results after summarization.

Let’s look at an example illustrating the advantage of WHERE over HAVING. Here’s a query with the HAVING clause that reports the number of orders in the year 2000:

               SELECT year, COUNT(*)
               FROM orders
               GROUP BY year
               HAVING year = 2001;

      YEAR   COUNT(*)
---------- ----------
      2001       1440

The execution plan for this query is as follows:

Query Plan
-------------------------------------------
SELECT STATEMENT   Cost = 6
  FILTER
    SORT GROUP BY
      INDEX FAST FULL SCAN ORDERS_PK

Now, look at that same query, but with the year restriction in the WHERE clause:

               SELECT year, COUNT(*)
               FROM orders
               WHERE year = 2001
               GROUP BY year;

     YEAR   COUNT(*)
--------- ----------
     2001       1440

The execution plan for this version of the query is:

Query Plan
-------------------------------------
SELECT STATEMENT   Cost = 2
  SORT GROUP BY NOSORT
    INDEX FAST FULL SCAN ORDERS_PK

With the HAVING clause, the query performs the group operation first, and then filters the groups for the condition specified. The WHERE clause version of the query filters the rows before performing the group operation. The result of filtering with the WHERE clause is that there are fewer rows to summarize, and consequently, the query performs better.

However, you should note that not all types of filtering can be achieved using the WHERE clause. Sometimes, you may need to summarize the data first and then filter the summarized data based on the summarized values. In such situations, you have to filter using the HAVING clause, because only the HAVING clause can “see” summarized values. Moreover, there are situations when you may need to use the WHERE clause and the HAVING clause together in a query to filter the results the way you want. For details, see Chapter 4.

UNION Versus UNION ALL

We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two SELECT statements. UNION combines the results of two SELECT statements, and then returns only distinct rows from the combination; duplicates are eliminated. It is, therefore, obvious that to remove the duplicates, UNION performs one extra step than UNION ALL. This extra step is a sort, which is costly in terms of performance. Therefore, whenever your application can handle duplicates or you are certain that no duplicates will result, consider using UNION ALL instead of UNION.

Let’s look an example to understand this issue better. The following query uses UNION to return a list of orders where the sale price exceeds $50.00 or where the customer is located in region 5:

               SELECT order_nbr, cust_nbr 
               FROM cust_order 
               WHERE sale_price > 50
               UNION
               SELECT order_nbr, cust_nbr 
               FROM cust_order
               WHERE cust_nbr IN 
               (SELECT cust_nbr FROM customer WHERE region_id = 5);

 ORDER_NBR   CUST_NBR
---------- ----------
      1000          1
      1001          1
      1002          5
      1003          4
      1004          4
      1005          8
      1006          1
      1007          5
      1008          5
      1009          1
      1011          1
      1012          1
      1015          5
      1017          4
      1019          4
      1021          8
      1023          1
      1025          5
      1027          5
      1029          1

20 rows selected.

The execution plan for this UNION query is:

Query Plan
---------------------------------------------------------------------------
SELECT STATEMENT   Cost = 8
  SORT UNIQUE
    UNION-ALL
      TABLE ACCESS FULL CUST_ORDER
      HASH JOIN
        TABLE ACCESS FULL CUSTOMER
        TABLE ACCESS FULL CUST_ORDER

The following query uses UNION ALL instead of UNION to get the same information:

               SELECT order_nbr, cust_nbr 
               FROM cust_order 
               WHERE sale_price > 50
               UNION ALL
               SELECT order_nbr, cust_nbr 
               FROM cust_order
               WHERE cust_nbr IN 
               (SELECT cust_nbr FROM customer WHERE region_id = 5);

 ORDER_NBR   CUST_NBR
---------- ----------
      1001          1
      1003          4
      1005          8
      1009          1
      1012          1
      1017          4
      1021          8
      1029          1
      1001          1
      1000          1
      1002          5
      1003          4
      1004          4
      1006          1
      1007          5
      1008          5
      1009          1
      1012          1
      1011          1
      1015          5
      1017          4
      1019          4
      1023          1
      1025          5
      1027          5
      1029          1

26 rows selected.

Note the duplicate rows in the output. However, note also that UNION ALL performs better than UNION, as you can see from the following execution plan:

Query Plan
---------------------------------------------------------------------------
SELECT STATEMENT   Cost = 4
  UNION-ALL
    TABLE ACCESS FULL CUST_ORDER
    HASH JOIN
      TABLE ACCESS FULL CUSTOMER
      TABLE ACCESS FULL CUST_ORDER

You can see that the extra operation (SORT UNIQUE) in the UNION makes it run slower than UNION ALL.

LEFT Versus RIGHT OUTER JOIN

As you have seen in Chapter 3, outer joins can be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways of looking at the same operation. Mixing LEFT and RIGHT outer joins in the same application can cause confusion, as you and other programmers must constantly shift your point-of-view from one approach to the other. Use both LEFT and RIGHT outer joins in the same query, and you’ll find your confusion greatly magnified. For example:

               SELECT e.lname, j.function, d.name
               FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id
                          RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id;

LNAME                FUNCTION                       NAME
-------------------- ------------------------------ ------------
MILLER               CLERK                          ACCOUNTING
CLARK                MANAGER                        ACCOUNTING
KING                 PRESIDENT                      ACCOUNTING
SMITH                CLERK                          RESEARCH
FORD                 ANALYST                        RESEARCH
JONES                MANAGER                        RESEARCH
SCOTT                ANALYST                        RESEARCH
JAMES                CLERK                          SALES
BLAKE                MANAGER                        SALES
MARTIN               SALESPERSON                    SALES
TURNER               SALESPERSON                    SALES
ALLEN                SALESPERSON                    SALES
                                                    OPERATIONS

Such confusion is unnecessary. Since both LEFT and RIGHT outer joins represent the same operation, but from differing points of view, you can simply pick one point of view and use it consistently. For example, many programmers write all outer joins as either FULL or LEFT, ignoring RIGHT.

The preceding query uses a LEFT and then a RIGHT outer join to do the following:

  1. Connect an outer join from employee to job, with employee as the required table

  2. Connect another outer join from department to the results from Step 1, with department as the required table

Using parentheses to explicitly state the above order of operations, you can rewrite the query using all LEFT outer joins, as follows:

               SELECT e.lname, j.function, d.name
               FROM department d LEFT OUTER JOIN
                       (job j LEFT OUTER JOIN employee e
                        ON e.job_id = j.job_id)
                    ON e.dept_id = d.dept_id;

LNAME                FUNCTION                       NAME
-------------------- ------------------------------ -------------
MILLER               CLERK                          ACCOUNTING
CLARK                MANAGER                        ACCOUNTING
KING                 PRESIDENT                      ACCOUNTING
SMITH                CLERK                          RESEARCH
FORD                 ANALYST                        RESEARCH
JONES                MANAGER                        RESEARCH
SCOTT                ANALYST                        RESEARCH
JAMES                CLERK                          SALES
BLAKE                MANAGER                        SALES
MARTIN               SALESPERSON                    SALES
TURNER               SALESPERSON                    SALES
ALLEN                SALESPERSON                    SALES
                                                    OPERATIONS

The tradeoff here is between using parentheses and mixing RIGHT and LEFT outer joins. This second version of the query still joins employee to job, and then joins department to that result. The operations are exactly the same as in the previous version. This time, the parentheses make the order of operations clearer, and we personally find the second version of the query a bit easier to understand.

Avoid Unnecessary Parsing

Before your SQL can be executed by Oracle, it needs to be parsed. The importance of parsing when it comes to tuning SQL lies in the fact that no matter how many times a given SQL statement is executed, it needs to be parsed only once. During parsing, the following steps are performed (not necessarily in the sequence shown):

  • The syntax of the SQL statement is verified.

  • The data dictionary is searched to verify table and column definitions.

  • The data dictionary is searched to verify security privileges on relevant objects.

  • Parse locks are acquired on the relevant objects.

  • The optimal execution plan is determined.

  • The statement is loaded into the shared SQL area (also known as the library cache) in the shared pool of the system global area (SGA). The execution plan and parse information are saved here in case the same statement is executed once again.

If a SQL statement involves any remote objects (e.g., database links), then these steps are repeated for the remote objects. As you can see, lots of work is performed during the parsing of a SQL statement. However, a statement is parsed only if Oracle doesn’t find an identical SQL statement already in the shared SQL area (library cache) of the SGA.

Before parsing a SQL statement, Oracle searches the library cache for an identical SQL statement. If Oracle finds an exact match, there is no need to parse the statement again. However, if an identical SQL statement is not found, Oracle goes through all the aforementioned steps to parse the statement.

The most important keyword in the previous paragraph is “identical.” To share the same SQL area, two statements need to be truly identical. Two statements that look similar, or that return the same result, need not be identical. To be truly identical, the statements must:

  • Have the same uppercase and lowercase characters

  • Have the same whitespace and newline characters

  • Reference the same objects using the same names, which must in turn have the same owners

If there is a possibility that your application executes the same (or similar) SQL statements multiple times, by all means try to avoid unnecessary parsing. This will improve the overall performance of your applications. The following techniques can help you reduce SQL parsing:

  • Use bind variables.

  • Use table aliases.

Using Bind Variables

When multiple users use an application, they actually execute the same set of SQL statements over and over, but with different data values. For example, one customer service representative may be executing the following statement:

SELECT * FROM customer WHERE cust_nbr = 121;

while another customer service representative will be executing:

SELECT * FROM customer WHERE cust_nbr = 328;

These two statements are similar, but not “identical”—the customer ID numbers are different; therefore, Oracle has to parse twice.

Because the only difference between these statements is the value used for the customer number, this application can be rewritten to use bind variables. In that case, the SQL statement in question can be as follows:

SELECT * FROM customer WHERE cust_nbr = :x;

Oracle needs to parse this statement only once. The actual customer numbers would be supplied after parsing for each execution of the statement. Multiple, concurrently executing programs could share the same copy of this SQL statement while at the same time supplying different customer number values.

In a multiuser application, situations such as the one described here are very common, and overall performance can be significantly improved by using bind variables, thereby reducing unnecessary parsing.

Using Table Aliases

The use of table aliases can help to improve the performance of your SQL statements. Before getting into the performance aspects of table aliases, let’s quickly review what table aliases are and how they are used.

When you select data from two or more tables, you should specify which table each column belongs to. Otherwise, if the two tables have columns with the same name, you will end up with an error:

               SELECT cust_nbr, name, order_nbr
               FROM customer, cust_order;
SELECT cust_nbr, name, order_nbr
       *
ERROR at line 1:
ORA-00918: column ambiguously defined

The error in this case occurs because both the customer and cust_order tables have columns named cust_nbr. Oracle can’t tell which cust_nbr column you are referring to. To fix this problem, you can rewrite this statement as follows:

               SELECT customer.cust_nbr, customer.name, cust_order.order_nbr
               FROM customer JOIN cust_order
               ON customer.cust_nbr = cust_order.cust_nbr;

  CUST_NBR NAME                            ORDER_NBR
---------- ------------------------------ ----------
         1 Cooper Industries                    1001
         1 Cooper Industries                    1000
         5 Gentech Industries                   1002
         4 Flowtech Inc.                        1003
         4 Flowtech Inc.                        1004
         8 Zantech Inc.                         1005
         1 Cooper Industries                    1006
         5 Gentech Industries                   1007
         5 Gentech Industries                   1008
         1 Cooper Industries                    1009
         1 Cooper Industries                    1012
         1 Cooper Industries                    1011
         5 Gentech Industries                   1015
         4 Flowtech Inc.                        1017
         4 Flowtech Inc.                        1019
         8 Zantech Inc.                         1021
         1 Cooper Industries                    1023
         5 Gentech Industries                   1025
         5 Gentech Industries                   1027
         1 Cooper Industries                    1029

20 rows selected.

Note the use of the table name to qualify each column name. This eliminates any ambiguity as to which cust_nbr column the query is referring to.

Instead of qualifying column names with full table names, you can use table aliases, as in the following example:

               SELECT c.cust_nbr, c.name, o.order_nbr
               FROM customer c JOIN cust_order 0
               ON c.cust_nbr = o.cust_nbr;

  CUST_NBR NAME                            ORDER_NBR
---------- ------------------------------ ----------
         1 Cooper Industries                    1001
         1 Cooper Industries                    1000
         5 Gentech Industries                   1002
         4 Flowtech Inc.                        1003
         4 Flowtech Inc.                        1004
         8 Zantech Inc.                         1005
         1 Cooper Industries                    1006
         5 Gentech Industries                   1007
         5 Gentech Industries                   1008
         1 Cooper Industries                    1009
         1 Cooper Industries                    1012
         1 Cooper Industries                    1011
         5 Gentech Industries                   1015
         4 Flowtech Inc.                        1017
         4 Flowtech Inc.                        1019
         8 Zantech Inc.                         1021
         1 Cooper Industries                    1023
         5 Gentech Industries                   1025
         5 Gentech Industries                   1027
         1 Cooper Industries                    1029

20 rows selected.

The letters “c” and “o” in this example are table aliases. You can specify these aliases following their respective table names in the FROM clause, and they can be used everywhere else in the query in place of the table name. Table aliases provide a convenient shorthand notation, allowing your queries to be more readable and concise.

Tip

Table aliases are not limited to one character in length; they can be up to 30 characters in length.

An important thing to remember while using table aliases is that if you define aliases in the FROM clause, you must use only those aliases, and not the actual table names, in the rest of the query. If you alias a table, and then use the actual table name elsewhere in the query, you will encounter errors. For example:

               SELECT c.cust_nbr, c.name, o.order_nbr
               FROM customer c JOIN cust_order o
               ON customer.cust_nbr = cust_order.cust_nbr;
WHERE customer.cust_nbr = cust_order.cust_nbr
                                     *
ERROR at line 3:
ORA-00904: invalid column name

Many developers make the mistake of forgetting to use their table aliases while writing hints. Once you define an alias, you must specify the alias instead of the actual table name in any hints; otherwise, those hints will be silently ignored. Here’s an example of this common mistake:

SELECT /*+ USE_HASH(customer cust_order) */ c.cust_nbr, c.name, o.order_nbr
FROM customer c JOIN cust_order o
ON c.cust_nbr = o.cust_nbr;

The USE_HASH hint specifies the customer and cust_order tables. However, the FROM clause provides aliases for both those tables. Because of the aliases, the table names have no meaning, and the hint is ignored, possibly to the detriment of performance. Following is the correct version of this query:

SELECT /*+ USE_HASH(c o) */ c.cust_nbr, c.name, o.order_nbr
FROM customer c JOIN cust_order o
ON c.cust_nbr = o.cust_nbr;

This time, the USE_HASH hint properly uses the table aliases that have been defined in the FROM clause. This hint will have the desired effect.

When selecting data from multiple tables, it makes sense to qualify each column name with its corresponding table alias. The column cust_nbr appears in both the customer and cust_order tables. Without proper qualification, this column is said to be “ambiguously defined” in the query. Therefore, you must qualify the cust_nbr column with a table alias (or a full table name, if you are not using aliases). However, the other two columns used in the query are not ambiguous. Therefore, the following statement, which only qualifies the cust_nbr column, is valid:

               SELECT c.cust_nbr, name, order_nbr
               FROM customer c, cust_order o
               WHERE c.cust_nbr = o.cust_nbr;

  CUST_NBR NAME                            ORDER_NBR
---------- ------------------------------ ----------
         1 Cooper Industries                    1001
         1 Cooper Industries                    1000
         5 Gentech Industries                   1002
         4 Flowtech Inc.                        1003
         4 Flowtech Inc.                        1004
         8 Zantech Inc.                         1005
         1 Cooper Industries                    1006
         5 Gentech Industries                   1007
         5 Gentech Industries                   1008
         1 Cooper Industries                    1009
         1 Cooper Industries                    1012
         1 Cooper Industries                    1011
         5 Gentech Industries                   1015
         4 Flowtech Inc.                        1017
         4 Flowtech Inc.                        1019
         8 Zantech Inc.                         1021
         1 Cooper Industries                    1023
         5 Gentech Industries                   1025
         5 Gentech Industries                   1027
         1 Cooper Industries                    1029

20 rows selected.

This is where the performance aspect of using table aliases comes into play. Since the query doesn’t qualify the columns NAME and ORDER_NBR, Oracle has to search both the CUSTOMER and CUST_ORDER tables while parsing this statement to find which table each of these columns belongs to. The time required for this search may be negligible for one query, but it does add up if you have a number of such queries to parse. It’s good programming practice to qualify all columns in a query with table aliases, even those that are not ambiguous, so that Oracle can avoid this extra search when parsing the statement.

Consider Literal SQL for Decision-Support Systems

We discussed the benefits of using bind variables previously. The use of bind variables is often beneficial in terms of performance. However, there is a downside to consider. Bind variables hide actual values from the optimizer. This hiding of actual values can have negative performance implications, especially in decision-support systems. For example, consider the following statement:

SELECT * FROM customer WHERE region_id = :x

The optimizer can parse this statement, but it won’t be able to take into account the specific region being selected. If 90% of your customers were in region 5, then a full table scan would likely be the most efficient approach when selecting those customers. An index scan would probably be more efficient when selecting customers in other regions. When you hardcode values into your SQL statements, the cost-based optimizer (CBO) can look at histograms (a type of statistic) and generate an execution plan that takes into account the specific values you are supplying. When you use bind variables, however, the optimizer generates an execution plan without having a complete picture of the SQL statement. Such an execution plan may or may not be the most efficient.

In Decision-Support Systems (DSS), it is very rare that multiple users use the same query over and over. More typically, a handful of users execute complex, different queries against a large database. Since it is very rare that the SQL statements will be repetitive, the parsing time saved by using bind variables will be negligible. At the same time, since DSS applications run complex queries against large databases, the time required to fetch the resulting data can be significant. Therefore, it is important that the optimizer generate the most efficient execution plan for the query. To help the optimizer generate the best possible plan, provide the optimizer as much information as you can, including the actual values of the columns or variables. Therefore, in DSS applications, use literal SQL statements with hardcoded values instead of bind variables.

Our earlier advice about using bind variables in Online Transaction Processing (OLTP) applications is still valid. In OLTP systems, multiple users all use the same programs, and thus issue the same queries. The amount of data returned per query is typically small. Thus, parse time is a more significant performance factor than in DSS systems. When developing OLTP applications, save parsing time and space in the shared SQL area by using bind variables.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.68.18