Chapter 9. SELECT Statement: SELECT Clause and Aggregation Functions

Introduction

The WHERE clause described in the previous chapter selects rows. The intermediate result from this clause forms a horizontal subset of a table. In contrast, the SELECT clause selects only columns and not rows; the result forms a vertical subset of a table.

The features, limitations, and use of the SELECT clause depend on the presence or absence of a GROUP BY clause. This chapter discusses table expressions without a GROUP BY clause. In Chapter 10, “SELECT Statement: The GROUP BY Clause,” which concentrates on the GROUP BY clause, we discuss the features of the SELECT clause when the table expression does contain a GROUP BY clause.

A large part of this chapter is devoted to so-called aggregation functions. In Chapter 5, “SELECT Statement: Common Elements,” we referred to these functions but did not explore them in any depth.

<select clause> ::=
   SELECT [ DISTINCT | ALL ] <select element list>

<select element list> ::=
   <select element> [ { , <select element> }... ] |
   *


<select element> ::=
   <scalar expression> [[ AS ] <column name> ] |
   <table specification>.* |
   <pseudonym>.*

<column name> ::= <name>

Selecting All Columns (*)

The shortest SELECT clause is the one in which only an asterisk (*) is specified. This asterisk is a shorthand notation for all columns in each table mentioned in the FROM clause. Two equivalent SELECT statements are in the following example:

Example 9.1. Get the entire PENALTIES table.

SELECT    *
FROM      PENALTIES

and

SELECT   PAYMENTNO, PLAYERNO, PAYMENT_DATE, AMOUNT
FROM     PENALTIES

Explanation: The * symbol, then, does not mean multiplication in this context.

When a FROM clause contains two or more tables, it is sometimes necessary to use a table specification in front of the * symbol to clarify which columns should be presented.

Example 9.2. Get all the information on all the penalties incurred by players who are also captains.

The following three statements are equivalent:

SELECT   PENALTIES.*
FROM     PENALTIES INNER JOIN TEAMS
         ON PENALTIES.PLAYERNO = TEAMS.PLAYERNO

SELECT   PENALTIES.PAYMENTNO, PENALTIES.PLAYERNO,
         PENALTIES.PAYMENT_DATE, PENALTIES.AMOUNT
FROM     PENALTIES INNER JOIN TEAMS
         ON PENALTIES.PLAYERNO = TEAMS.PLAYERNO

SELECT   PEN.*
FROM     PENALTIES AS PEN INNER JOIN TEAMS
         ON PEN.PLAYERNO = TEAMS.PLAYERNO

The result is:

PAYMENTNO  PLAYERNO  PAYMENT_DATE  AMOUNT
---------  --------  ------------  ------
        1         6    1980-12-08  100.00
        3        27    1983-09-10  100.00
        8        27    1984-11-12  75.00

Expressions in the SELECT Clause

In processing the SELECT clause, the intermediate result is evaluated row by row. Each expression gives rise to a value in each result row. Most of the examples of the SELECT clause described so far contain only column names, but an expression can also take the form of a literal, a calculation, or a scalar function.

Example 9.3. For each match, get the match number, the word Tally, the difference between the columns WON and LOST and the value of the WON column multiplied by 10.

SELECT   MATCHNO, 'Tally', WON - LOST,
         WON * 10
FROM     MATCHES

The result is:

MATCHNO  TALLY  WON - LOST WON * 10
-------  -----  ---------- --------
      1  Tally           2       30
      2  Tally          -1       20
      3  Tally           3       30
      4  Tally           1       30
      5  Tally          -3        0
      6  Tally          -2       10
      7  Tally           3       30
      8  Tally          -3        0
      9  Tally           1       30
     10  Tally           1       30
     11  Tally          -1       20
     12  Tally          -2       10
     13  Tally          -3        0

Removing Duplicate Rows with DISTINCT

A SELECT clause can consist of a number of expressions preceded by the word DISTINCT. (See the definition at the beginning of this chapter.) When DISTINCT is specified, SQL removes duplicate rows from the intermediate result.

Example 9.4. Find all the different town names from the PLAYERS table.

SELECT   TOWN
FROM     PLAYERS

The result is:

TOWN
---------
Stratford
Stratford
Stratford
Inglewood
Eltham
Midhurst
Stratford
Inglewood
Stratford
Stratford
Douglas
Stratford
Eltham
Plymouth

In this result table, the towns Stratford, Inglewood, and Eltham appear seven, two, and two times, respectively. If the statement is expanded to include DISTINCT:

SELECT   DISTINCT TOWN
FROM     PLAYERS

it produces the following result, in which all duplicate rows are removed:

TOWN
---------
Stratford
Midhurst
Inglewood
Plymouth
Douglas
Eltham

Example 9.5. Get every existing combination of street and town names.

SELECT   STREET, TOWN
FROM     PLAYERS

The result is:

STREET          TOWN
-------------- ---------
Stoney Road     Stratford
Haseltine Lane  Stratford
Edgecombe Way   Stratford
Station Road    Inglewood
Long Drive      Eltham
Old Main Road   Midhurst
Eaton Square    Stratford
Lewis Street    Inglewood
Edgecombe Way   Stratford
Magdalene Road  Stratford
High Street     Douglas
Haseltine Lane  Stratford
Stout Street    Eltham
Vixen Road      Plymouth

This result also contains duplicate rows; for example, Edgecombe Way and Haseltine Lane in Stratford are each mentioned twice. When DISTINCT is added:

SELECT DISTINCT STREET, TOWN
FROM PLAYERS

the result is:

STREET          TOWN
--------------  ---------
Edgecombe Way   Stratford
Eaton Square    Stratford
Haseltine Lane  Stratford
High Street     Douglas
Lewis Street    Inglewood
Long Drive      Eltham
Magdalena Road  Stratford
Old Main Road   Midhurst
Station Road    Inglewood
Stoney Road     Stratford
Stout Street    Eltham
Vixen Road      Plymouth

DISTINCT, then, is concerned with the whole row, and not only with the expression that directly follows the word DISTINCT in the statement. In these two constructs, the use of DISTINCT is superfluous (but not forbidden):

  • When the SELECT clause includes at least one candidate key for each table specified in the FROM clause, DISTINCT is superfluous. The most important property of a candidate key is that the set of columns that forms the candidate key never allows duplicate values, so a table that has a candidate key never has duplicate rows. The inclusion of a candidate key in the SELECT clause offers a guarantee that no duplicate rows will appear in the end result.

  • When the table expression results in none or only one row with values, DISTINCT is superfluous. For equal rows, at least two rows are necessary. For example, if you are looking for players with a certain player number (WHERE PLAYERNO = 45), the statement results in one row if that player number exists, and otherwise no rows.

The user is allowed to specify the word ALL in the same position in the statement as where DISTINCT appears. Note that ALL actually has the opposite effect to DISTINCT and does not alter the result of a “normal” table expression. In other words, the results of the following two statements are equivalent:

SELECT   TOWN
FROM     PLAYERS

and

SELECT   ALL TOWN
FROM     PLAYERS

Exercise 9.1:

In which of the following statements is DISTINCT superfluous?

  1. SELECT   DISTINCT PLAYERNO
    FROM     TEAMS
    
  2. SELECT   DISTINCT PLAYERNO
    FROM     MATCHES
    WHERE    TEAMNO = 2
    
  3. SELECT   DISTINCT *
    FROM     PLAYERS
    WHERE    PLAYERNO = 100
    
  4. SELECT   DISTINCT M.PLAYERNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    
  5. SELECT   DISTINCT PEN.PAYMENTNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    
  6. SELECT   DISTINCT PEN.PAYMENTNO, M.TEAMNO,
             PEN.PLAYERNO
    FROM     MATCHES AS M, PENALTIES AS PEN
    WHERE    M.PLAYERNO = PEN.PLAYERNO
    

When Are Two Rows Equal?

When are two rows identical or equal? At first sight, this seems a trivial question, but are two rows still equal when one of the values is equal to the NULL value? We answer these two questions somewhat formally.

Imagine that two rows, R1 and R2, both consist of n values wi (1 <= i <= n). These two rows R1 and R2 are equal under the following conditions:

  • The number of values in the rows is equal.

  • For each i (1 <= i <= n), it holds that R1wi is equal to R2wi or that R1wi and R2wi are both equal to the NULL value.

This means that if, for example, the value R1w3 is equal to the NULL value and R2w3 is not, the rows R1 and R2 cannot be equal (regardless of the other values). However, if both R1w3 and R2w3 are equal to the NULL value, they could be equal.

Example 9.6. Get all the different league numbers.

SELECT   DISTINCT LEAGUENO
FROM     PLAYERS

The result is:

LEAGUENO
--------
1124
1319
1608
2411
2513
2983
6409
6524
7060
8467
?

ExplanationThe NULL value appears only once in the result because rows that consist of only a NULL value are equal to each other.

This rule does not seem to be in line with the rules described in Section 8.2, in Chapter 8, “SELECT Statement: The WHERE Clause.” There, we stated that two NULL values are not equal to each other. Also, when comparing row expressions, two NULL values are not considered to be equal or unequal. For example, the next two conditions both evaluate to unknown.

NULL = 4
(1, NULL) = (1, NULL)

Informally, we could say that SQL executes a horizontal comparison with conditions. The values that must be compared are, besides each other, or to the left or right of the comparison operator. And that is the difference with DISTINCT. We could state that DISTINCT rows “underneath” each other in the intermediate result are compared, instead of rows that are “next to” each other. In other words, with DISTINCT, a vertical comparison takes place. In that case, NULL values are equal to each other. Imagine the intermediate result of a certain table expression that looks as follows:

(1, NULL)
(1, NULL)

Two rows are compared vertically when processing DISTINCT. In the end result, only one of the two rows is left. This rule might look somewhat strange, but it is in accordance with the rules of the original relational model.

Example 9.7. Determine which rows will be deleted by DISTINCT.

SELECT   DISTINCT *
FROM    (SELECT   1 AS A, 'Hello' AS B, 4 AS C UNION
         SELECT   1, 'Hello', NULL UNION
         SELECT   1, 'Hello', NULL UNION
         SELECT   1, NULL, NULL) AS X

The result is:

A  B      C
-  -----  -
1  Hello   4
1  Hello   ?
1  ?       ?

Exercise 9.2:

Determine the results of these SELECT statements for the following T table:

T:  C1  C2  C3
--  --  --  --
    c1  c2  c3
    c2  c2  c3
    c3  c2  ?
    c4  c2  ?
    c5  ?   ?
    c6  ?   ?
  1. SELECT   DISTINCT C2
    FROM     T
    
  2. SELECT   DISTINCT C2, C3
    FROM     T
    

An Introduction to Aggregation Functions

Expressions in the SELECT clause can contain so-called aggregation functions (also called statistical, group, set, or column functions). If the table expression has no GROUP BY clause, an aggregation function in a SELECT clause operates on all rows. If a SELECT clause does contain an aggregation function, the entire table expression yields only one row as an end result. (Remember, we are still assuming here that the table expression has no GROUP BY clause.) In fact, the values of a group rows are aggregated to one value. For example, all penalty amounts in the PENALTIES table are added up to one value with the SUM function.

<aggregation function> ::=
   COUNT    ( [ DISTINCT | ALL ] { * | <expression>
An Introduction to Aggregation Functions } ) |
   MIN      ( [ DISTINCT | ALL ] <expression> )   
An Introduction to Aggregation Functions      |
   MAX      ( [ DISTINCT | ALL ] <expression> )   
An Introduction to Aggregation Functions      |
   SUM      ( [ DISTINCT | ALL ] <expression> )   
An Introduction to Aggregation Functions      |
   AVG      ( [ DISTINCT | ALL ] <expression> )   
An Introduction to Aggregation Functions      |
   STDDEV   ( [ DISTINCT | ALL ] <expression> )   
An Introduction to Aggregation Functions      |
   VARIANCE ( [ DISTINCT | ALL ] <expression> )

Example 9.8. How many players are registered in the PLAYERS table?

SELECT   COUNT(*)
FROM     PLAYERS

The result is:

COUNT(*)
--------
      14

ExplanationThe function COUNT(*) counts the number of rows that remain after processing the FROM clause. In this case, the number equals the number of rows in the PLAYERS table.

Example 9.9. How many players live in Stratford?

SELECT   COUNT(*)
FROM     PLAYERS
WHERE    TOWN = 'Stratford'

The result is:

COUNT(*)
--------
       7

Explanation: Because the SELECT clause is processed after the WHERE clause, the number of rows in which the TOWN column has the value Stratford is counted.

We look at various aggregation functions in more detail in the following sections. Several general rules apply to the use of aggregation functions. These rules apply when the concerning table expression contains no GROUP BY clause.

  • A table expression with an aggregation function yields only one row as a result. This can be a row consisting of only NULL values, but there is always one row. The result can never consist of zero rows or more than one row.

  • It is not allowed to nest aggregation functions. Several expression forms can be used as parameters for an aggregation function but not an aggregation function itself. Therefore, this expression is not allowed: COUNT(MAX(...)).

  • If the SELECT clause contains one or more aggregation functions, a column specification in the SELECT clause can occur only within an aggregation function.

The last rule requires some explanation. According to this rule, the following statement is not correct because the SELECT clause contains an aggregation function as an expression, while the column name PLAYERNO occurs outside an aggregation function.

SELECT   COUNT(*), PLAYERNO
FROM     PLAYERS

The reason for this limitation is that the result of an aggregation function always consists of one value, while the result of a column specification consists of a set of values. SQL considers this to be incompatible results.

Note, however, that this rule applies only to column specifications and not to, for example, literals and system variables. Therefore, the following statement is correct:

SELECT   'The number of players', COUNT(*)
FROM     PLAYERS

The result is:

'The number of players is'  COUNT(*)
--------------------------  --------
The number of players is          14

In Chapter 10, we extend these rules for the SELECT clause for table expressions that do contain a GROUP BY clause.

Exercise 9.3:

Is the following SELECT statement correct?

SELECT   TEAMNO, COUNT(*)
FROM     MATCHES

Exercise 9.4:

Find the number of penalties and the highest penalty amount.

The COUNT Function

With the COUNT function, an asterisk (*) or an expression can be specified between brackets. The first case in which an asterisk is used was discussed in the previous section. In this section, we discuss the other possibilities.

Example 9.10. How many league numbers are there?

SELECT   COUNT(LEAGUENO)
FROM     PLAYERS

The result is:

COUNT(LEAGUENO)
---------------
             10

ExplanationThe function COUNT(LEAGUENO) is used to count the number of non-NULL values in the LEAGUENO column instead of the number of rows in the intermediate result. So, the result is 10 and not 14 (the number of non-NULL and all values in the column, respectively).

Specifying does not change the result of the query. This applies to all the aggregation functions. Therefore, the previous statement could have been written as follows:

SELECT   COUNT(ALL LEAGUENO)
FROM     PLAYERS

The COUNT function can also be used to calculate the number of different values in a column.

Example 9.11. How many different town names are there in the TOWN column?

SELECT   COUNT(DISTINCT TOWN)
FROM     PLAYERS

The result is:

COUNT(DISTINCT TOWN)
--------------------
                   6

Explanation: When DISTINCT is specified in front of the column name, all the duplicate values are removed first and then the addition is carried out.

Example 9.12. Get the number of different characters that start the names of the players.

SELECT   COUNT(DISTINCT SUBSTR(NAME, 1, 1))
FROM     PLAYERS

The result is:

COUNT(DISTINCT SUBSTR(NAME, 1, 1))
----------------------------------
                                 8

Explanation: This example shows clearly that all kinds of expression forms can be used within aggregation functions, including scalar functions (see Appendix B, “Scalar Functions,” for a description of the SUBSTR function).

Example 9.13. Get the number of different years that appear in the PENALTIES table.

SELECT   COUNT(DISTINCT YEAR(PAYMENT_DATE))
FROM     PENALTIES

The result is:

COUNT(DISTINCT YEAR(PAYMENT_DATE))
----------------------------------
                                 5

Example 9.14. Get the number of different town names and the number of sexes represented.

SELECT   COUNT(DISTINCT TOWN), COUNT(DISTINCT SEX)
FROM     PLAYERS

The result is:

COUNT(DISTINCT TOWN)  COUNT(DISTINCT SEX)
--------------------  -------------------
                   6                    2

ExplanationMore than one aggregation function can be specified in a SELECT clause.

Example 9.15. Get the numbers and names of players who incurred more penalties than they played matches.

SELECT   PLAYERNO, NAME
FROM     PLAYERS AS P
WHERE   (SELECT   COUNT(*)
         FROM     PENALTIES AS PEN
         WHERE    P.PLAYERNO = PEN.PLAYERNO)
         >
        (SELECT   COUNT(*)
         FROM     MATCHES AS M
         WHERE    P.PLAYERNO = M.PLAYERNO)

The result is:

PLAYERNO  NAME
--------  -------
      27  Collins
      44  Baker

Explanation: Aggregation functions can appear in the SELECT clause of each table expression, including subqueries.

Example 9.16. For each player, find the player number, the name, and the number of penalties incurred by him or her, but only for players who have at least two penalties.

SELECT   PLAYERNO, NAME,
        (SELECT   COUNT(*)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)
         AS NUMBER
FROM     PLAYERS
WHERE   (SELECT   COUNT(*)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) >= 2

The result is:

PLAYERNO  NAME     NUMBER
--------  -------  ------
      27  Collins       2
      44  Baker         3

ExplanationThe correlated subquery in the SELECT clause calculates the number of penalties for each player. That same subquery checks whether that number is greater than 1.

This statement can also be formulated in a more compact way by placing the subquery within the FROM clause:

SELECT   PLAYERNO, NAME, NUMBER
FROM    (SELECT   PLAYERNO, NAME,
                 (SELECT   COUNT(*)
                  FROM     PENALTIES
                  WHERE    PENALTIES.PLAYERNO =
                           PLAYERS.PLAYERNO)
                  AS NUMBER
        FROM      PLAYERS) AS PN
WHERE   NUMBER >= 2

Explanation: The subquery in the FROM clause determines the number, the name, and the number of penalties for each player. Next, this number becomes a column in the intermediate result. After that, a condition can be specified (NUMBER>= 2); finally, the value of that column in the SELECT clause is retrieved.

Example 9.17. Get the total number of penalties followed by the total number of matches.

SELECT (SELECT   COUNT(*)
        FROM     PENALTIES),
       (SELECT   COUNT(*)
        FROM     MATCHES)

The result is:

SELECT ...  SELECT ...
----------  ----------
         8          13

Exercise 9.5:

Get the number of different committee positions.

Exercise 9.6:

Get the number of league numbers of players resident in Inglewood.

Exercise 9.7:

Find for each team the number, the division, and the number of matches played for that team.

Exercise 9.8:

For each player, get the number, the name, and the number of matches won.

Exercise 9.9:

Create a SELECT statement that results in the following table:

TABLES             NUMBERS
-----------------  -------
Number of players       14
Number of teams          2
Number of matches       13

The MAX and MIN Functions

With the MAX and MIN functions, you can determine the largest and smallest values, respectively, in a column.

Example 9.18. What is the highest penalty?

SELECT   MAX(AMOUNT)
FROM     PENALTIES

The result is:

MAX(AMOUNT)
-----------
     100.00

Example 9.19. What is the lowest penalty incurred by a player resident in Stratford?

SELECT   MIN(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO IN
        (SELECT   PLAYERNO
         FROM     PLAYERS
         WHERE    TOWN = 'Stratford')

The result is:

MIN(AMOUNT)
-----------
     100.00

Example 9.20. How many penalties are equal to the lowest one?

SELECT   COUNT(*)
FROM     PENALTIES
WHERE    AMOUNT =
        (SELECT   MIN(AMOUNT)
         FROM     PENALTIES)

The result is:

COUNT(AMOUNT)
-------------
            2

ExplanationThe subquery calculates the lowest penalty, which is $25. The SELECT statement calculates the number of penalties equal to the amount of this lowest penalty.

Example 9.21. For each team, find the team number followed by the player number of the player who has won the most matches for that team.

SELECT   DISTINCT TEAMNO, PLAYERNO
FROM     MATCHES AS M1
WHERE    WON =
        (SELECT   MAX(WON)
         FROM     MATCHES AS M2
         WHERE    M1.TEAMNO = M2.TEAMNO)

The result is:

TEAMNO  PLAYERNO
------  --------
     1         6
     1        44
     1        57
     2        27
     2       104

Explanation: In the result, more than one player appears for each team because several players won a match in three sets.

Aggregation functions can occur in calculations. Here are two examples.

Example 9.22. What is the difference between the highest and lowest penalty in cents?

SELECT    (MAX(AMOUNT) - MIN(AMOUNT)) * 100
FROM      PENALTIES

The result is:

(MAX(AMOUNT) - MIN(AMOUNT)) * 100
---------------------------------
                          7500.00

Example 9.23. Get the first letter of the last name of all players, alphabetically.

SELECT   SUBSTR(MAX(NAME), 1, 1)
FROM     PLAYERS

The result is:

SUBSTR(MAX(NAME), 1, 1)
-----------------------
W

Explanation: First, the MAX function finds the last name in alphabetical order, and then the scalar function SUBSTR picks out the first letter from this name. See Appendix B for a description of this and other functions.

In principle, DISTINCT can be used with the MAX and MIN functions, but this, of course, does not change the end result (work out why for yourself).

When MAX and MIN functions are processed, two special situations must be taken into consideration:

  • If a column in a given row contains only NULL values, the values of the MIN and MAX functions are also NULL.

  • If the MIN and MAX functions are executed on an empty intermediate result, the value of these functions is also NULL.

    Here is an example of each.

Example 9.24. What is the highest league number of all players from Midhurst?

SELECT   MAX(LEAGUENO)
FROM     PLAYERS
WHERE    TOWN = 'Midhurst'

The result is:

MAX(LEAGUENO)
-------------
?

Explanation: The PLAYERS table contains only one player from Midhurst, and she has no league number. That is why the answer of this statement has only one row consisting of the NULL value.

Example 9.25. What is the lowest league number of all players from Amsterdam? If a player does not exist, print the text Unknown.

SELECT   CASE WHEN MIN(LEAGUENO) IS NULL
            THEN 'Unknown'
            ELSE MIN(LEAGUENO)
         END
FROM     PLAYERS
WHERE    TOWN = 'Amsterdam'

The result is:

CASE WHEN ...
-------------
Unknown

Example 9.26. For each player who incurred at least one penalty, find the player number, the highest penalty, and the date on which that penalty was paid.

SELECT   PLAYERNO, AMOUNT, PAYMENT_DATE
FROM     PENALTIES AS PEN1
WHERE    AMOUNT =
        (SELECT  MAX(AMOUNT)
         FROM    PENALTIES AS PEN2
         WHERE   PEN2.PLAYERNO = PEN1.PLAYERNO)

The result is:

PLAYERNO  AMOUNT  PAYMENT_DATE
--------  ------  ------------
       6  100.00    1980-12-08
       8   25.00    1980-12-08
      27  100.00    1983-09-10
      44   75.00    1981-05-05
     104   50.00    1984-12-08

Example 9.27. For each player, get the player number, the highest penalty amount that was paid for him or her, and the highest number of sets won in a match.

SELECT   PLAYERNO,
        (SELECT   MAX(AMOUNT)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)
         AS HIGHESTPENALTY,
        (SELECT   MAX(WON)
         FROM     MATCHES
         WHERE    MATCHES.PLAYERNO = PLAYERS.PLAYERNO)
         AS NUMBEROFSETS
FROM     PLAYERS

The result is:

PLAYERNO  HIGHESTPENALTY  NUMBEROFSETS
--------  --------------  ------------
       2               ?             1
       6          100.00             3
       7               ?             ?
       8           25.00             0
      27          100.00             3
      28               ?             ?
      39               ?             ?
      44           75.00             3
      57               ?             3
      83               ?             0
      95               ?             ?
     100               ?             ?
     104           50.00             3
     112               ?             2

ExplanationThe two correlated subqueries are processed for each player separately. When no rows are found, the subquery returns a NULL value.

Example 9.28. Get the number of each player whose lowest penalty amount is equal to his or her highest penalty amount.

SELECT   PLAYERNO
FROM     PLAYERS
WHERE   (SELECT   MIN(AMOUNT)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) =
        (SELECT   MAX(AMOUNT)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)

The result is:

PLAYERNO
--------
       6
       8
     104

Exercise 9.10:

Get the lowest number of sets by which a match has been won.

Exercise 9.11:

For each player, get the number and the difference between his or her lowest and highest penalty amounts.

Exercise 9.12:

Get the number and the date of birth of each player born in the same year as the youngest player who played for the first team.

The SUM and AVG Functions

The SUM function calculates the sum of all values in a particular column. The AVG function calculates the arithmetic average of the values in a particular column. Both functions are, of course, applicable only to columns with a numeric data type.

Example 9.29. What is the total amount of penalties incurred by players from Inglewood?

SELECT   SUM(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO IN
        (SELECT   PLAYERNO
         FROM     PLAYERS
         WHERE    TOWN = 'Inglewood')

The result is:

SUM(AMOUNT)
-----------
     155.00

You can specify the word ALL in front of the column name without affecting the result. By adding ALL, you explicitly demand that all values are considered. In contrast, the use of DISTINCT within the SUM function can alter the end result. If you extend the SUM function in the previous SELECT statement with DISTINCT, you get the following result:

SELECT   SUM(DISTINCT AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO IN
        (SELECT   PLAYERNO
         FROM     PLAYERS
         WHERE    TOWN = 'Inglewood')

The result is:

SUM(AMOUNT)
-----------
     130.00

Note that, unlike the COUNT, MIN, and MAX functions, the SUM function is applicable only to columns with a numeric data type. The former three functions can also be applied to columns with alphanumeric and temporal data types.

Example 9.30. Get the average amount of penalties incurred by player 44.

SELECT   AVG(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO = 44

The result is:

AVG(AMOUNT)
-----------
      43.33

Explanation: The amount $43.33 is the average of the amounts $75, $25, and $30.

Example 9.31. Which players have ever incurred a penalty greater than the average penalty?

SELECT   DISTINCT PLAYERNO
FROM     PENALTIES
WHERE    AMOUNT >
        (SELECT   AVG(AMOUNT)
         FROM     PENALTIES)

The result is:

PLAYERNO
--------
       6
      27
      44

Explanation: The average penalty is $60.

Adding the word ALL does not affect the result because it simply reinforces the idea that all values are included in the calculation. On the other hand, adding DISTINCT within the AVG function does influence the result.

Example 9.32. What is the unweighted arithmetic mean of the penalty amounts? (By “unweighted,” we mean that each value is considered only once in the calculation, even when it occurs more than once.)

SELECT   AVG(DISTINCT AMOUNT)
FROM     PENALTIES

The result is:

AVG(DISTINCT AMOUNT)
--------------------
               56.00

ExplanationThe amount $56 is equal to $100 + $75 + $50 + $30 + $25 divided by 5.

Example 9.33. What is the average length (in number of characters) of the names of the players, and how long is the longest name?

SELECT   AVG(LENGTH(RTRIM(NAME))), MAX(LENGTH(RTRIM(NAME)))
FROM     PLAYERS

The result is:

AVG(LENGTH(RTRIM(NAME)))  MAX(LENGTH(RTRIM(NAME)))
------------------------  ------------------------
                  6.5000                         9

Example 9.34. For each penalty, get the payment number, the amount, and the difference between the amount and the average penalty amount.

SELECT   PAYMENTNO, AMOUNT,
         ABS(AMOUNT - (SELECT AVG(AMOUNT)
                       FROM   PENALTIES)) AS DIFFERENCE
FROM     PENALTIES AS P

The result is:

PAYMENTNO  AMOUNT  DIFFERENCE
---------  ------  ----------
        1  100.00       40.00
        2   75.00       15.00
        3  100.00       40.00
        4   50.00       10.00
        5   25.00       35.00
        6   25.00       35.00
        7   30.00       30.00
        8   75.00       15.00

ExplanationIn this example, the subquery is part of a compound expression. The result of the subquery is subtracted from the AMOUNT column, and next the absolute value of this result is calculated with the scalar function ABS.

For the SUM and AVG functions, the same rules apply as for MIN and MAX:

  • If a column in a given row contains only NULL values, the value of the function is equal to NULL.

  • If some of the values in a column are NULL, the value of the function is equal to the sum of the average of all non-NULL values divided by the number of non-NULL values (and, therefore, not divided by the total number of values).

  • If the intermediate result for which SUM or AVG must be calculated is empty, the result of the function is equal to the NULL value.

Exercise 9.13:

Determine the value of these functions for the following set of values in the NUMBER column: { 1, 2, 3, 4, 1, 4, 4, NULL, 5 }.

COUNT(*)
COUNT(NUMBER)
MIN(NUMBER)
MAX(NUMBER)
SUM(NUMBER)
AVG(NUMBER)
COUNT(DISTINCT NUMBER)
MIN(DISTINCT NUMBER)
MAX(DISTINCT NUMBER)
SUM(DISTINCT NUMBER)
AVG(DISTINCT NUMBER)

Exercise 9.14:

What is the average penalty for players who have ever competed for team 1?

Exercise 9.15:

Get the numbers and names of the players for whom the total amount of penalties is higher than 100.

Exercise 9.16:

Get the names and initials of the players who have won more sets in at least one of their matches than player 27 has won in total.

Exercise 9.17:

Get the numbers and names of the players for whom the sum of all sets won is equal to 8.

Exercise 9.18:

Get the numbers and names of the players for whom the length of their name is greater than the average length.

Exercise 9.19:

Get for each player (also those without penalties) the player number and the difference between his or her maximum and the average penalty.

Exercise 9.20:

Get for each player the average penalty amount in the form of a simple, horizontal histogram. Make use of the scalar function REPEAT.

The VARIANCE and STDDEV Functions

The VARIANCE and STDDEV functions calculate, respectively, the variance and the standard deviation of the values in a particular column. These functions are, of course, applicable only to columns with a numeric data type.

Portability

Not every SQL product supports the functions VARIANCE and STDDEV. That is why we also illustrate in this section how these values can be calculated with standard SQL.

The VARIANCE function, or the VAR function, for short, calculates the variance. Variance is a measurement that indicates how close all values are to the average. In other words, it refers to the distribution of all values. The closer each value is to the average, the lower the variance is.

Example 9.35. Get the variance of all penalties incurred by player 44.

SELECT   VARIANCE(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO = 44

The result is:

VARIANCE(AMOUNT)
----------------
         505.555

Explanation: The variance is calculated on the basis of the following steps:

  1. Calculate the average of the column concerned.

  2. Determine for each value in the column how much the absolute value differs from the average.

  3. Calculate the sum of the squares of the differences.

  4. Divide the sum by the number of values (in the column).

If you execute these steps for the previous statement, the first step returns the answer: 43.33333, the average of the three values 75, 25, and 30. Next, for each of the three values, the difference with the average is calculated. You can determine this with the following SELECT statement:

SELECT   AMOUNT –
        (SELECT   AVG(AMOUNT)
         FROM     PENALTIES
         WHERE    PLAYERNO = 44)
FROM     PENALTIES
WHERE    PLAYERNO = 44

This gives the result: 31.666667, –18.33333, and –13.33333. You can use the following SELECT statement to calculate this intermediate result:

SELECT   SUM(P)
FROM    (SELECT   POWER(AMOUNT -
                 (SELECT   AVG(AMOUNT)
                  FROM     PENALTIES
                  WHERE    PLAYERNO = 44),2) AS P
         FROM     PENALTIES
         WHERE    PLAYERNO = 44) AS POWERS

The result is 1516.6666666667. In the final step, this amount is divided by the number of values, which gives an end result of 505.5555. To calculate all these steps without the VARIANCE function, the following statement can be used:

SELECT   SUM(P) /
        (SELECT COUNT(*) FROM PENALTIES WHERE PLAYERNO = 44)
FROM    (SELECT   POWER(AMOUNT -
                  (SELECT   AVG(AMOUNT)
                  FROM     PENALTIES
                  WHERE    PLAYERNO = 44),2) AS P
         FROM     PENALTIES
         WHERE    PLAYERNO = 44) AS POWERS

The STDDEV function calculates the standard deviation of a set of values. Standard deviation is another measure of distribution for determining how close the values are to the average. By definition, the standard deviation is equal to the square root of the variance. In other words, the following two expressions are equal: STDDEV(...) and SQRT(VARIANCE(...)).

Example 9.36. Get the standard deviation for all penalties incurred by player 44.

SELECT   STDDEV(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO = 44

The result is:

STDDEV(AMOUNT)
--------------
     22.484563

Exercise 9.21:

Get the standard deviation of all penalties of player 44 without using the STDDEV function.

Answers

9.1

Not superfluous.

Not superfluous.

Superfluous because a condition appears on the primary key.

Not superfluous.

Not superfluous.

Not superfluous.

9.2

  1. C2
    —
    c2
    ?
    
  2. C2  C3
    — —
    c2  c3
    c2  ?
    ?   ?
    

9.3

This statement is not correct. An aggregation function is used in the SELECT clause; therefore, all other column names must appear within an aggregation function.

9.4

SELECT   COUNT(*), MAX(AMOUNT)
FROM     PENALTIES

9.5

SELECT   COUNT(DISTINCT POSITION)
FROM     COMMITTEE_MEMBERS

9.6

SELECT   COUNT(LEAGUENO)
FROM     PLAYERS
WHERE    TOWN = 'Inglewood'

9.7

SELECT   TEAMNO, DIVISION,
        (SELECT   COUNT(*)
         FROM     MATCHES
         WHERE    TEAMS.TEAMNO = MATCHES.TEAMNO)
FROM     TEAMS

9.8

SELECT   PLAYERNO, NAME,
        (SELECT   COUNT(*)
         FROM     MATCHES
         WHERE    MATCHES.PLAYERNO = PLAYERS.PLAYERNO
         AND      WON > LOST)
FROM     PLAYERS

9.9

SELECT 'Number of players' ,
        (SELECT COUNT(*) FROM PLAYERS) UNION
SELECT 'Number of teams',
        (SELECT COUNT(*) FROM TEAMS) UNION
SELECT 'Number of matches',
        (SELECT COUNT(*) FROM MATCHES)

9.10

SELECT   MIN(WON)
FROM     MATCHES
WHERE    WON > LOST

9.11

SELECT   PLAYERNO,
         ABS((SELECT   MIN(AMOUNT)
              FROM     PENALTIES
              WHERE    PENALTIES.PLAYERNO =
                       PLAYERS.PLAYERNO) -
             (SELECT   MAX(AMOUNT)
              FROM     PENALTIES
              WHERE    PENALTIES.PLAYERNO =
                       PLAYERS.PLAYERNO))
FROM     PLAYERS

9.12

SELECT   PLAYERNO, BIRTH_DATE
FROM     PLAYERS
WHERE    YEAR(BIRTH_DATE) =
        (SELECT   MAX(YEAR(BIRTH_DATE))
         FROM     PLAYERS
         WHERE    PLAYERNO IN
                 (SELECT   PLAYERNO
                  FROM     MATCHES
                  WHERE    TEAMNO = 1))

9.13

9

8

1

5

24

3

5

1

5

15

15/ 5 = 3

9.14

SELECT   AVG(AMOUNT)
FROM     PENALTIES
WHERE    PLAYERNO IN
        (SELECT   PLAYERNO
         FROM     MATCHES
         WHERE    TEAMNO = 1)

9.15

SELECT   PLAYERNO, NAME
FROM     PLAYERS
WHERE   (SELECT   SUM(AMOUNT)
         FROM     PENALTIES
         WHERE    PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)
         > 100

9.16

SELECT   NAME, INITIALS
FROM     PLAYERS
WHERE    PLAYERNO IN
        (SELECT   PLAYERNO
         FROM     MATCHES
         WHERE    WON >
                 (SELECT   SUM(WON)
                  FROM     MATCHES
                  WHERE    PLAYERNO = 27))

9.17

SELECT   PLAYERNO, NAME
FROM     PLAYERS
WHERE   (SELECT   SUM(WON)
         FROM     MATCHES
         WHERE    MATCHES.PLAYERNO =
                  PLAYERS.PLAYERNO) = 8

9.18

SELECT   PLAYERNO, NAME
FROM     PLAYERS
WHERE    LENGTH(RTRIM(NAME)) >
        (SELECT   AVG(LENGTH(RTRIM(NAME)))
         FROM     PLAYERS)

9.19

SELECT   PLAYERNO,
         ABS((SELECT   AVG(AMOUNT)
              FROM     PENALTIES
              WHERE    PENALTIES.PLAYERNO =
                       PLAYERS.PLAYERNO) -
             (SELECT   MAX(AMOUNT)
              FROM     PENALTIES
              WHERE    PENALTIES.PLAYERNO =
                       PLAYERS.PLAYERNO))
FROM     PLAYERS

9.20

SELECT   PLAYERNO,
         REPEAT('*',
            CAST((SELECT   AVG(AMOUNT)
                  FROM     PENALTIES
                  WHERE    PENALTIES.PLAYERNO =
                           PLAYERS.PLAYERNO)/10
                  AS SIGNED INTEGER))
FROM     PLAYERS

9.21

SELECT   SQRT(SUM(P) /
        (SELECT COUNT(*) FROM PENALTIES WHERE
                         PLAYERNO = 44))
FROM    (SELECT   POWER(AMOUNT -
                  (SELECT   AVG(AMOUNT)
                  FROM      PENALTIES
                  WHERE     PLAYERNO = 44),2) AS P
         FROM     PENALTIES
         WHERE    PLAYERNO = 44) AS POWERS
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.130.201