Chapter 29. Optimization of Statements

Introduction

In Chapter 20, “Using Indexes,” we showed that the presence of an index can improve the execution time of certain statements. The question remains, though, whether the optimizer can always develop the best processing strategy for all statements. Unfortunately, the answer is that it cannot. Some statements are written in such a way that the optimizer is in no position to develop the fastest processing strategy. This occurs principally when WHERE clause conditions are too complex or when the optimizer is taken along a “false trail.” In addition, even when indexes are available, the optimizer sometimes chooses a sequential processing strategy for those statements.

Practice has shown that a certain number of general forms of SQL statements are not easily optimized and give rise to long processing times. By reformulating such statements, you can give the optimizer a better chance of developing an optimal processing strategy. In this chapter, we provide a number of guidelines for formulating “faster” statements. In other words, we are giving the optimizer a “helping hand.”

In view of the size of the tables in the sample database, almost every SQL statement is fast. The result is that the guidelines in this chapter will not improve the execution time of the statements. However, you can fill the PLAYERS table with many thousands of rows, and that way you can test whether the guidelines apply. For this purpose, you should execute the same two statements that we have used in Section 20.9.3, in Chapter 20, to show the impact of indexes.

Portability

All optimizers are not the same. Big differences in quality exist between the optimizers of the various products. One optimizer can devise a better processing strategy for a larger number of statements than another. The guidelines we present do not apply to all SQL statements and to all situations. We advise you to examine them in the context of your product. We also advise you to look for additional guidelines applicable to your product.

Avoid the OR Operator

In most cases, SQL will not use an index if the condition in a WHERE clause contains the OR operator. These statements can be rewritten in two ways. In certain circumstances, we can replace the condition with one containing an IN operator, or we can replace the complete statement with two SELECT statements linked with UNION.

Example 29.1. Get the names and initials of players 6, 83, and 44.

SELECT   NAME, INITIALS
FROM     PLAYERS
WHERE    PLAYERNO = 6
OR       PLAYERNO = 83
OR       PLAYERNO = 44

SQL will not use the index on the PLAYERNO column, although we assume that such an index has been defined. However, we can replace the condition in the SELECT statement simply by an IN operator. Then, SQL will probably use the index.

SELECT    NAME, INITIALS
FROM      PLAYERS
WHERE     PLAYERNO IN (6, 83, 44)

For UPDATE and DELETE statements, the same applies.

Example 29.2. Get the players who joined the club in 1980, plus the players who live in Stratford.

SELECT   *
FROM     PLAYERS
WHERE    JOINED = 1980
OR       TOWN = 'Stratford'

In this situation, SQL will develop a sequential processing strategy regardless of the presence of indexes on the TOWN and JOINED columns. However, we cannot replace the condition with an IN operator as in the previous example. Instead, we can replace the entire statement with two SELECT statements combined with UNION:

SELECT   *
FROM     PLAYERS
WHERE    JOINED = 1980
UNION
SELECT   *
FROM     PLAYERS
WHERE    TOWN = 'Stratford'

In this situation, it is not possible to replace UPDATE and DELETE statements with a UNION. In such a case, two separate statements are required.

Example 29.3. Update the penalty amount to $150 for all penalties that are equal to $100 or that were incurred on December 1, 1980.

UPDATE   PENALTIES
SET      AMOUNT = 150
WHERE    AMOUNT = 100
OR       PAYMENT_DATE = '1980-12-01'

Another formulation is:

UPDATE   PENALTIES
SET      AMOUNT = 150
WHERE    AMOUNT = 100

and

UPDATE   PENALTIES
SET      AMOUNT = 150
WHERE    PAYMENT_DATE = '1980-12-01'

Let us return to the example with the SELECT statement. With UNION, SQL automatically executes a DISTINCT and all duplicate rows are removed. However, there are no duplicate rows in this example because the SELECT clause includes the primary key of the PLAYERS table.

If the original SELECT statement had looked like the following one (no primary key column in the SELECT clause), an alternative formulation with UNION would not have been possible. The reason is that the following statement could produce duplicate rows, whereas the version with the UNION operator removes duplicate rows from the result. The two formulations would give different results.

SELECT   NAME
FROM     PLAYERS
WHERE    JOINED = 1980
OR       TOWN = 'Stratford'

If the original statement had contained DISTINCT, the alternative would have been possible.

Avoid Unnecessary Use of the UNION Operator

In the previous section, we recommended using the UNION operator. We do not mean, however, that UNION should be used whether it is relevant or not. This operator must also be used with care.

Example 29.4. Get, for each match, the match number and the difference between the number of sets won and lost.

SELECT   MATCHNO, WON - LOST
FROM     MATCHES
WHERE    WON >= LOST
UNION
SELECT   MATCHNO, LOST - WON
FROM     MATCHES
WHERE    WON < LOST

Odds are, during the processing of this statement, SQL will browse the entire MATCHES table twice. This can be prevented by using an ABS function in the SELECT statement:

SELECT   MATCHNO, ABS(WON - LOST)
FROM     MATCHES

SQL will browse the MATCHES table only once and execute the calculation for each row. The expression in the SELECT statement is somewhat more complex than the one used in the previous statement, but the extra processing time caused by this is easily compensated for by the gain in processing time caused by browsing the table only once.

Avoid the NOT Operator

If the condition in a WHERE clause contains the NOT operator, SQL will generally not use an index. Replace a NOT operator, if possible, with a comparison operator.

Example 29.5. Get the players who did not join the club after 1980.

SELECT   *
FROM     PLAYERS
WHERE    NOT (JOINED > 1980)

The WHERE clause can be replaced by the following:

WHERE JOINED <= 1980

Another solution is possible if you know the permitted set of values for a column.

Example 29.6. Get the players who are not men.

SELECT   *
FROM     PLAYERS
WHERE    NOT (SEX = 'M')

We know that the SEX column can contain only the values M and F. Therefore, we could also formulate the statements as follows:

SELECT   *
FROM     PLAYERS
WHERE    SEX = 'F'

Isolate Columns in Conditions

When an index is defined on a column that occurs in a calculation or scalar function, that index will not be used.

Example 29.7. Find the players who joined the club ten years before 1990.

SELECT   *
FROM     PLAYERS
WHERE    JOINED + 10 = 1990

On the left of the comparison equal to operator, there is an expression that contains both a column name and a literal. To the right of the same operator is another literal. The index on the JOINED column will not be used. A faster execution could be expected with the following formulation:

SELECT   *
FROM     PLAYERS
WHERE    JOINED = 1980

Now, the expression to the left of the comparison operator contains only one column name. In other words, the column has been isolated.

Use the BETWEEN Operator

If you look in the condition of a WHERE clause for values in a particular range using the AND operator, SQL will generally not use an index. We can replace such a condition with a BETWEEN operator.

Example 29.8. Find the player numbers of the players born in the period from January 1, 1962, to December 31, 1965.

SELECT   PLAYERNO
FROM     PLAYERS
WHERE    BIRTH_DATE >= '1962-01-01'
AND      BIRTH_DATE <= '1965-12-31'

An index on the BIRTH_DATE column will not be used here. The index will be used if we adjust the condition as follows:

SELECT   PLAYERNO
FROM     PLAYERS
WHERE    BIRTH_DATE BETWEEN '1962-01-01' AND '1965-12-31'

Avoid Particular Forms of the LIKE Operator

In some cases, when an index is defined on a column used with the LIKE operator in a WHERE clause condition, the index will not be considered. If the mask in the LIKE operator begins with a percentage sign or an underscore character, the index cannot be used.

Example 29.9. Find the players whose names end with the letter n.

SELECT   *
FROM     PLAYERS
WHERE    NAME LIKE '%n'

The index will not be used, and, unfortunately, there is no alternative solution for this example.

Add Redundant Conditions to Joins

Sometimes, joins can be accelerated easily by adding an extra condition to the WHERE clause, which does not change the end result.

Example 29.10. Get the payment number and name of the player for all penalties incurred for player 44.

SELECT   PAYMENTNO, NAME
FROM     PENALTIES AS PEN, PLAYERS AS P
WHERE    PEN.PLAYERNO = P.PLAYERNO
AND      PEN.PLAYERNO = 44

Sometimes, SQL can develop a more efficient processing strategy if the condition is extended with a redundant condition, as shown earlier. Obviously, the result of the statement does not change.

SELECT   PAYMENTNO, NAME
FROM     PENALTIES AS PEN, PLAYERS AS P
WHERE    PEN.PLAYERNO = P.PLAYERNO
AND      PEN.PLAYERNO = 44
AND      P.PLAYERNO = 44

Avoid the HAVING Clause

In a SELECT statement, conditions can be specified in two places, in the WHERE and the HAVING clauses. Always try to place as many conditions as possible in the WHERE clause and as few as possible in the HAVING clause. The main reason is that indexes are not used for conditions specified in the HAVING clause.

Example 29.11. Find, for each player with a number higher than 40, the player number and the number of penalties incurred.

SELECT   PLAYERNO, COUNT(*)
FROM     PENALTIES
GROUP BY PLAYERNO
HAVING   PLAYERNO >= 40

The condition in the HAVING clause can also be specified in the WHERE clause. This makes the HAVING clause completely superfluous:

SELECT   PLAYERNO, COUNT(*)
FROM     PENALTIES
WHERE    PLAYERNO >= 40
GROUP BY PLAYERNO

Make the SELECT Clause as Small as Possible

The SELECT clause of a main query formulates which data is to be presented. Avoid the use of unnecessary columns because it can affect the processing speed in a negative way.

You are allowed to specify multiple expressions in the SELECT clause of a subquery if that subquery is linked to the main query with the EXISTS operator. However, the end result of the SELECT statement is not affected by the expressions specified. Therefore, the advice is to formulate only one expression consisting of one literal in the SELECT clause.

Example 29.12. Get the player numbers and names of the players for whom at least one penalty has been paid.

SELECT   PLAYERNO, NAME
FROM     PLAYERS
WHERE    EXISTS
        (SELECT    '1'
         FROM      PENALTIES
         WHERE     PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)

Avoid DISTINCT

Specifying DISTINCT in the SELECT clause leads to the removal of duplicate rows from a result. This can have a negative effect on the processing time. Therefore, avoid the use of DISTINCT when it is not required or even redundant. In Section 9.4, in Chapter 9, “SELECT Statement: SELECT Clause and Aggregation Functions,” we described when DISTINCT is superfluous. DISTINCT is not necessary in subqueries.

Example 29.13. Find, for each match, the match number and the name of the player.

SELECT   DISTINCT MATCHNO, NAME
FROM     MATCHES, PLAYERS
WHERE    MATCHES.PLAYERNO = PLAYERS.PLAYERNO

DISTINCT is unnecessary here because the SELECT clause contains the primary key of the MATCHES table, as well as a condition on the primary key of the PLAYERS table.

Use the ALL Option with Set Operators

In Chapter 13, “Combining Table Expressions,” we discussed the ALL option for the set operators UNION, INTERSECT, and EXCEPT. Adding ALL to these operators has the effect that duplicate rows are not removed from the result. The ALL option has a function that is comparable to ALL in the SELECT clause; see Section 9.4, in Chapter 9. If ALL is not specified, all rows have to be sorted to be able to remove duplicate rows (sorting takes places behind the scenes). In other words, the guidelines given in the previous section also apply to the ALL option: If possible, use ALL in conjunction with the set operators.

Example 29.14. Find the names and initials of the players who live in Stratford and Douglas.

SELECT   NAME, INITIALS
FROM     PLAYERS
WHERE    TOWN = 'Stratford'
UNION ALL
SELECT   NAME, INITIALS
FROM     PLAYERS
WHERE    TOWN = 'Douglas'

ExplanationBecause of the presence of the keyword ALL, SQL will not perform a sort to remove possible duplicate rows. Luckily, this result will never return duplicate rows because each player lives in only one town. So, in this example, a sort would always be performed unnecessarily, thus wasting performance.

Prefer Outer Joins to UNION Operators

The outer join was a late addition to SQL. The result is that many statements still do not make use of it. The UNION operator is used many times to simulate an outer join. Here is an example.

Example 29.15. Find, for each player, the player number, name, and penalties incurred by him or her; order the result by player number.

This question used to be solved with the following construct:

SELECT   PLAYERS.PLAYERNO, NAME, AMOUNT
FROM     PLAYERS, PENALTIES
WHERE    PLAYERS.PLAYERNO = PENALTIES.PLAYERNO
UNION
SELECT   PLAYERNO, NAME, NULL
FROM     PLAYERS
WHERE    PLAYERNO NOT IN
        (SELECT   PLAYERNO
         FROM     PENALTIES)
ORDER BY 1

However, this is a complex statement for SQL. Such statements seldom have a fast processing time. For example, the PLAYERS table is accessed twice, once in each select block. Avoid this type of formulation and use the new formulation, in which the outer join is formulated explicitly:

SELECT   PLAYERNO, NAME, AMOUNT
FROM     PLAYERS LEFT OUTER JOIN PENALTIES
         USING (PLAYERNO)
ORDER BY 1

Avoid Data Type Conversions

SQL automatically performs data type conversions. The following condition, for example, is correct even if the numeric PLAYERNO column is compared with an alphanumeric literal:

WHERE PLAYERNO = '44'

Converting data types adversely affects the processing speed, obviously. If this type of conversion is not really required, try to avoid it.

The Largest Table Last

When you formulate joins, it is possible that the sequence of the tables in the FROM clause can affect the processing speed. The rule is: Specify the largest table last in the FROM clause. Thus, the following FROM clause:

FROM    PLAYERS, TEAMS

would be better if replaced by the following because the PLAYERS table is the larger table:

FROM   TEAMS, PLAYERS

Avoid the ANY and ALL Operators

Many optimizers will not use an index when processing conditions with the ALL operator. Replace an ALL operator, if possible, with one of the aggregation functions: MIN or MAX.

Example 29.16. Get the player numbers, names, and dates of birth of the oldest players. (We already used this example in Section 8.12.)

SELECT   PLAYERNO, NAME, BIRTH_DATE
FROM     PLAYERS
WHERE    BIRTH_DATE <= ALL
        (SELECT   BIRTH_DATE
         FROM     PLAYERS)

We can replace the ALL operator here with the MIN function.

SELECT   PLAYERNO, NAME, BIRTH_DATE
FROM     PLAYERS
WHERE    BIRTH_DATE =
        (SELECT   MIN(BIRTH_DATE)
         FROM     PLAYERS)

The same reasoning applies to the ANY operator.

Example 29.17. Find the player numbers, names, and dates of birth of the players who are not among the oldest players.

SELECT   PLAYERNO, NAME, BIRTH_DATE
FROM     PLAYERS
WHERE    BIRTH_DATE > ANY
        (SELECT   BIRTH_DATE
         FROM     PLAYERS)

We can also replace the ANY operator with the MIN function in this example.

SELECT   PLAYERNO, NAME, BIRTH_DATE
FROM     PLAYERS
WHERE    BIRTH_DATE >
        (SELECT   MIN(BIRTH_DATE)
         FROM     PLAYERS)

Exercise 29.1:

Get alternative formulations for the following statements:

  1. SELECT   *
    FROM     PLAYERS
    WHERE    (TOWN = 'Stratford'
    AND      STREET = 'Edgecombe Way')
    OR       (NOT (BIRTH_DATE >= '1960-01-01'))
    
  2. SELECT   DISTINCT *
    FROM     PLAYERS
    
  3. SELECT   *
    FROM     TEAMS
    WHERE    TEAMNO IN
            (SELECT   TEAMNO
             FROM     MATCHES
             WHERE    WON * LOST = WON * 4)
    
  4. SELECT   DISTINCT TEAMNO
    FROM     MATCHES
    WHERE    TEAMNO IN
            (SELECT   TEAMNO
             FROM     TEAMS
             WHERE    NOT (DIVISION <> 'second'))
    
  5. SELECT   DISTINCT P.PLAYERNO
    FROM     PLAYERS AS P, MATCHES AS M
    WHERE    P.PLAYERNO <> M.PLAYERNO
    
  6. SELECT   PLAYERNO, 'Male'
    FROM     PLAYERS
    WHERE    SEX = 'M'
    UNION
    SELECT   PLAYERNO, 'Female'
    FROM     PLAYERS
    WHERE    SEX = 'F'
    
  7. SELECT   BIRTH_DATE, COUNT(*)
    FROM     PLAYERS
    GROUP BY BIRTH_DATE
    HAVING   BIRTH_DATE >= '1970-01-01'
    

Exercise 29.2:

The difference between a “fast” and a “slow” statement depends on the number of rows in the tables: the more rows, the bigger the difference. The number of rows in the sample database is small. However, in Section 20.8, in Chapter 20, the PLAYERS_XXL table has been created and that table contains many rows. Now, we extend this table by filling it with 500,000 rows. Enter the following SELECT statements and determine the processing time. (Some of these statements have been described in the previous exercise.) A watch is not required because in the window at the bottom of the screen, WinSQL reports the processing time of each SQL statement. Next, determine a faster formulation, get the processing time once again, and see whether you have indeed sped up the statement. You will see that some statements have been sped up considerably.

  1. SELECT   PLAYERNO, NAME, BIRTH_DATE
    FROM     PLAYERS
    WHERE    STREET <= ALL
            (SELECT   STREET
             FROM     PLAYERS)
    
  2. SELECT   DISTINCT *
    FROM     PLAYERS
    
  3. SELECT   PLAYERNO, 'Male'
    FROM     PLAYERS
    WHERE    SEX = 'M'
    UNION
    SELECT   PLAYERNO, 'Female'
    FROM     PLAYERS
    WHERE    SEX = 'F'
    
  4. SELECT   POSTCODE, COUNT(*)
    FROM     PLAYERS
    GROUP BY POSTCODE
    HAVING   POSTCODE >= 'Y'
    
  5. SELECT   *
    FROM     PLAYERS
    WHERE    NOT (PLAYERNO > 10)
    

The Future of the Optimizer

This chapter clearly shows that the current optimizers are not optimal yet. In some cases, the optimizer cannot determine the most efficient processing strategy. This can lead to poor processing times. This applies not only to database servers with SQL as their database language, but also to any system that has to determine the processing strategy itself.

A lot of research is being carried out to improve optimizers. Experience shows that each new version of an SQL product is faster than its predecessor. This trend will continue in the years to come. One day, optimizers will always find better strategies than most human programmers. E. F. Codd, founder of the relational model, put it as follows [CODE82]:

If suitable fast access paths are supported, there is no reason why a high-level language such as SQL . . . should result in less efficient runtime code . . . than a lower level language . . . .

Answers

29.1

  1. SELECT   *
    FROM     PLAYERS
    WHERE    TOWN = 'Stratford'
    AND      STREET = 'Edgecombe Way'
    UNION
    SELECT   *
    FROM     PLAYERS
    WHERE    BIRTH_DATE < '1960-01-01'
    
  2. SELECT   *
    FROM     PLAYERS
    
  3. Condition WON * LOST = WON * 4 cannot be simplified to LOST = 4 because both sides of the equation are divided by WON to get a simplified condition. WON can be equal to 0, and that would mean that we divide by 0, which is not allowed:

    SELECT   DISTINCT TEAMS.*
    FROM     TEAMS, MATCHES
    WHERE    TEAMS.TEAMNO = MATCHES.TEAMNO
    AND      WON * LOST = WON * 4
    
  4. SELECT   DISTINCT T.TEAMNO
    FROM     TEAMS AS T, MATCHES AS M
    WHERE    T.TEAMNO = M.TEAMNO
    AND      DIVISION = 'second'
    
  5. SELECT   PLAYERNO
    FROM     PLAYERS
    
  6. SELECT   PLAYERNO,
             CASE SEX
                WHEN 'F' THEN 'Female'
                ELSE 'Male' END
    FROM     PLAYERS
    
  7. SELECT   BIRTH_DATE, COUNT(*)
    FROM     PLAYERS
    WHERE    BIRTH_DATE >= '1970-01-01'
    GROUP BY BIRTH_DATE
    

29.2

  1. SELECT   PLAYERNO, NAME, BIRTH_DATE
    FROM     PLAYERS_XXL
    WHERE    BIRTH_DATE =
            (SELECT   MIN(BIRTH_DATE)
             FROM     PLAYERS_XXL)
    
  2. SELECT   *
    FROM     PLAYERS_XXL
    
  3. SELECT   PLAYERNO,
             CASE SEX
                WHEN 'F' THEN 'Female'
                ELSE 'Male' END
    FROM     PLAYERS_XXL
    
  4. SELECT   POSTCODE, COUNT(*)
    FROM     PLAYERS_XXL
    WHERE    POSTCODE >= 'Y'
    GROUP BY POSTCODE
    
  5. SELECT   *
    FROM     PLAYERS_XXL
    WHERE    PLAYERNO <= 10
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201