The WHERE
clause described in the previous chapter selects rows. The intermediate result from this clause forms a horizontal subset of a table. In contrast, the SELECT
clause selects only columns and not rows; the result forms a vertical subset of a table.
The features, limitations, and use of the SELECT
clause depend on the presence or absence of a GROUP BY
clause. This chapter discusses table expressions without a GROUP BY
clause. In Chapter 10, “SELECT
Statement: The GROUP BY
Clause,” which concentrates on the GROUP BY
clause, we discuss the features of the SELECT
clause when the table expression does contain a GROUP BY
clause.
A large part of this chapter is devoted to so-called aggregation functions. In Chapter 5, “SELECT
Statement: Common Elements,” we referred to these functions but did not explore them in any depth.
<select clause> ::= SELECT [ DISTINCT | ALL ] <select element list> <select element list> ::= <select element> [ { , <select element> }... ] | * <select element> ::= <scalar expression> [[ AS ] <column name> ] | <table specification>.* | <pseudonym>.* <column name> ::= <name> |
The shortest SELECT
clause is the one in which only an asterisk (*
) is specified. This asterisk is a shorthand notation for all columns in each table mentioned in the FROM
clause. Two equivalent SELECT
statements are in the following example:
Example 9.1. Get the entire PENALTIES
table.
SELECT * FROM PENALTIES
and
SELECT PAYMENTNO, PLAYERNO, PAYMENT_DATE, AMOUNT FROM PENALTIES
Explanation: The * symbol, then, does not mean multiplication in this context.
When a FROM
clause contains two or more tables, it is sometimes necessary to use a table specification in front of the *
symbol to clarify which columns should be presented.
Example 9.2. Get all the information on all the penalties incurred by players who are also captains.
The following three statements are equivalent:
SELECT PENALTIES.* FROM PENALTIES INNER JOIN TEAMS ON PENALTIES.PLAYERNO = TEAMS.PLAYERNO SELECT PENALTIES.PAYMENTNO, PENALTIES.PLAYERNO, PENALTIES.PAYMENT_DATE, PENALTIES.AMOUNT FROM PENALTIES INNER JOIN TEAMS ON PENALTIES.PLAYERNO = TEAMS.PLAYERNO SELECT PEN.* FROM PENALTIES AS PEN INNER JOIN TEAMS ON PEN.PLAYERNO = TEAMS.PLAYERNO
The result is:
PAYMENTNO PLAYERNO PAYMENT_DATE AMOUNT --------- -------- ------------ ------ 1 6 1980-12-08 100.00 3 27 1983-09-10 100.00 8 27 1984-11-12 75.00
In processing the SELECT
clause, the intermediate result is evaluated row by row. Each expression gives rise to a value in each result row. Most of the examples of the SELECT
clause described so far contain only column names, but an expression can also take the form of a literal, a calculation, or a scalar function.
Example 9.3. For each match, get the match number, the word Tally
, the difference between the columns WON
and LOST
and the value of the WON
column multiplied by 10.
SELECT MATCHNO, 'Tally', WON - LOST, WON * 10 FROM MATCHES
The result is:
MATCHNO TALLY WON - LOST WON * 10 ------- ----- ---------- -------- 1 Tally 2 30 2 Tally -1 20 3 Tally 3 30 4 Tally 1 30 5 Tally -3 0 6 Tally -2 10 7 Tally 3 30 8 Tally -3 0 9 Tally 1 30 10 Tally 1 30 11 Tally -1 20 12 Tally -2 10 13 Tally -3 0
A SELECT
clause can consist of a number of expressions preceded by the word DISTINCT
. (See the definition at the beginning of this chapter.) When DISTINCT
is specified, SQL removes duplicate rows from the intermediate result.
Example 9.4. Find all the different town names from the PLAYERS
table.
SELECT TOWN FROM PLAYERS
The result is:
TOWN --------- Stratford Stratford Stratford Inglewood Eltham Midhurst Stratford Inglewood Stratford Stratford Douglas Stratford Eltham Plymouth
In this result table, the towns Stratford, Inglewood, and Eltham appear seven, two, and two times, respectively. If the statement is expanded to include DISTINCT
:
SELECT DISTINCT TOWN FROM PLAYERS
it produces the following result, in which all duplicate rows are removed:
TOWN --------- Stratford Midhurst Inglewood Plymouth Douglas Eltham
Example 9.5. Get every existing combination of street and town names.
SELECT STREET, TOWN FROM PLAYERS
The result is:
STREET TOWN -------------- --------- Stoney Road Stratford Haseltine Lane Stratford Edgecombe Way Stratford Station Road Inglewood Long Drive Eltham Old Main Road Midhurst Eaton Square Stratford Lewis Street Inglewood Edgecombe Way Stratford Magdalene Road Stratford High Street Douglas Haseltine Lane Stratford Stout Street Eltham Vixen Road Plymouth
This result also contains duplicate rows; for example, Edgecombe Way and Haseltine Lane in Stratford are each mentioned twice. When DISTINCT
is added:
SELECT DISTINCT STREET, TOWN FROM PLAYERS
the result is:
STREET TOWN -------------- --------- Edgecombe Way Stratford Eaton Square Stratford Haseltine Lane Stratford High Street Douglas Lewis Street Inglewood Long Drive Eltham Magdalena Road Stratford Old Main Road Midhurst Station Road Inglewood Stoney Road Stratford Stout Street Eltham Vixen Road Plymouth
DISTINCT
, then, is concerned with the whole row, and not only with the expression that directly follows the word DISTINCT
in the statement. In these two constructs, the use of DISTINCT
is superfluous (but not forbidden):
When the SELECT
clause includes at least one candidate key for each table specified in the FROM
clause, DISTINCT
is superfluous. The most important property of a candidate key is that the set of columns that forms the candidate key never allows duplicate values, so a table that has a candidate key never has duplicate rows. The inclusion of a candidate key in the SELECT
clause offers a guarantee that no duplicate rows will appear in the end result.
When the table expression results in none or only one row with values, DISTINCT
is superfluous. For equal rows, at least two rows are necessary. For example, if you are looking for players with a certain player number (WHERE PLAYERNO = 45
), the statement results in one row if that player number exists, and otherwise no rows.
The user is allowed to specify the word ALL
in the same position in the statement as where DISTINCT
appears. Note that ALL
actually has the opposite effect to DISTINCT
and does not alter the result of a “normal” table expression. In other words, the results of the following two statements are equivalent:
SELECT TOWN FROM PLAYERS
and
SELECT ALL TOWN FROM PLAYERS
When are two rows identical or equal? At first sight, this seems a trivial question, but are two rows still equal when one of the values is equal to the NULL
value? We answer these two questions somewhat formally.
Imagine that two rows, R1 and R2, both consist of n values wi (1 <= i <= n). These two rows R1 and R2 are equal under the following conditions:
The number of values in the rows is equal.
For each i (1 <= i <= n), it holds that R1wi is equal to R2wi or that R1wi and R2wi are both equal to the NULL
value.
This means that if, for example, the value R1w3 is equal to the NULL
value and R2w3 is not, the rows R1 and R2 cannot be equal (regardless of the other values). However, if both R1w3 and R2w3 are equal to the NULL
value, they could be equal.
Example 9.6. Get all the different league numbers.
SELECT DISTINCT LEAGUENO FROM PLAYERS
The result is:
LEAGUENO -------- 1124 1319 1608 2411 2513 2983 6409 6524 7060 8467 ?
Explanation. The NULL
value appears only once in the result because rows that consist of only a NULL
value are equal to each other.
This rule does not seem to be in line with the rules described in Section 8.2, in Chapter 8, “SELECT
Statement: The WHERE
Clause.” There, we stated that two NULL
values are not equal to each other. Also, when comparing row expressions, two NULL
values are not considered to be equal or unequal. For example, the next two conditions both evaluate to unknown.
NULL = 4 (1, NULL) = (1, NULL)
Informally, we could say that SQL executes a horizontal comparison with conditions. The values that must be compared are, besides each other, or to the left or right of the comparison operator. And that is the difference with DISTINCT
. We could state that DISTINCT
rows “underneath” each other in the intermediate result are compared, instead of rows that are “next to” each other. In other words, with DISTINCT
, a vertical comparison takes place. In that case, NULL
values are equal to each other. Imagine the intermediate result of a certain table expression that looks as follows:
(1, NULL) (1, NULL)
Two rows are compared vertically when processing DISTINCT
. In the end result, only one of the two rows is left. This rule might look somewhat strange, but it is in accordance with the rules of the original relational model.
Example 9.7. Determine which rows will be deleted by DISTINCT
.
SELECT DISTINCT * FROM (SELECT 1 AS A, 'Hello' AS B, 4 AS C UNION SELECT 1, 'Hello', NULL UNION SELECT 1, 'Hello', NULL UNION SELECT 1, NULL, NULL) AS X
The result is:
A B C - ----- - 1 Hello 4 1 Hello ? 1 ? ?
Expressions in the SELECT
clause can contain so-called aggregation functions (also called statistical, group, set, or column functions). If the table expression has no GROUP BY
clause, an aggregation function in a SELECT
clause operates on all rows. If a SELECT
clause does contain an aggregation function, the entire table expression yields only one row as an end result. (Remember, we are still assuming here that the table expression has no GROUP BY
clause.) In fact, the values of a group rows are aggregated to one value. For example, all penalty amounts in the PENALTIES
table are added up to one value with the SUM
function.
<aggregation function> ::= COUNT ( [ DISTINCT | ALL ] { * | <expression> } ) | MIN ( [ DISTINCT | ALL ] <expression> ) | MAX ( [ DISTINCT | ALL ] <expression> ) | SUM ( [ DISTINCT | ALL ] <expression> ) | AVG ( [ DISTINCT | ALL ] <expression> ) | STDDEV ( [ DISTINCT | ALL ] <expression> ) | VARIANCE ( [ DISTINCT | ALL ] <expression> ) |
Example 9.8. How many players are registered in the PLAYERS
table?
SELECT COUNT(*) FROM PLAYERS
The result is:
COUNT(*) -------- 14
Explanation. The function COUNT
(*) counts the number of rows that remain after processing the FROM
clause. In this case, the number equals the number of rows in the PLAYERS
table.
Example 9.9. How many players live in Stratford?
SELECT COUNT(*) FROM PLAYERS WHERE TOWN = 'Stratford'
The result is:
COUNT(*) -------- 7
Explanation: Because the SELECT
clause is processed after the WHERE
clause, the number of rows in which the TOWN
column has the value Stratford is counted.
We look at various aggregation functions in more detail in the following sections. Several general rules apply to the use of aggregation functions. These rules apply when the concerning table expression contains no GROUP BY
clause.
A table expression with an aggregation function yields only one row as a result. This can be a row consisting of only NULL
values, but there is always one row. The result can never consist of zero rows or more than one row.
It is not allowed to nest aggregation functions. Several expression forms can be used as parameters for an aggregation function but not an aggregation function itself. Therefore, this expression is not allowed: COUNT(MAX(...))
.
If the SELECT
clause contains one or more aggregation functions, a column specification in the SELECT
clause can occur only within an aggregation function.
The last rule requires some explanation. According to this rule, the following statement is not correct because the SELECT
clause contains an aggregation function as an expression, while the column name PLAYERNO
occurs outside an aggregation function.
SELECT COUNT(*), PLAYERNO FROM PLAYERS
The reason for this limitation is that the result of an aggregation function always consists of one value, while the result of a column specification consists of a set of values. SQL considers this to be incompatible results.
Note, however, that this rule applies only to column specifications and not to, for example, literals and system variables. Therefore, the following statement is correct:
SELECT 'The number of players', COUNT(*) FROM PLAYERS
The result is:
'The number of players is' COUNT(*) -------------------------- -------- The number of players is 14
In Chapter 10, we extend these rules for the SELECT
clause for table expressions that do contain a GROUP BY
clause.
With the COUNT
function, an asterisk (*
) or an expression can be specified between brackets. The first case in which an asterisk is used was discussed in the previous section. In this section, we discuss the other possibilities.
Example 9.10. How many league numbers are there?
SELECT COUNT(LEAGUENO) FROM PLAYERS
The result is:
COUNT(LEAGUENO) --------------- 10
Explanation. The function COUNT(LEAGUENO)
is used to count the number of non-NULL
values in the LEAGUENO
column instead of the number of rows in the intermediate result. So, the result is 10
and not 14
(the number of non-NULL and all values in the column, respectively).
Specifying does not change the result of the query. This applies to all the aggregation functions. Therefore, the previous statement could have been written as follows:
SELECT COUNT(ALL LEAGUENO) FROM PLAYERS
The COUNT
function can also be used to calculate the number of different values in a column.
Example 9.11. How many different town names are there in the TOWN
column?
SELECT COUNT(DISTINCT TOWN) FROM PLAYERS
The result is:
COUNT(DISTINCT TOWN) -------------------- 6
Explanation: When DISTINCT
is specified in front of the column name, all the duplicate values are removed first and then the addition is carried out.
Example 9.12. Get the number of different characters that start the names of the players.
SELECT COUNT(DISTINCT SUBSTR(NAME, 1, 1)) FROM PLAYERS
The result is:
COUNT(DISTINCT SUBSTR(NAME, 1, 1)) ---------------------------------- 8
Explanation: This example shows clearly that all kinds of expression forms can be used within aggregation functions, including scalar functions (see Appendix B, “Scalar Functions,” for a description of the SUBSTR
function).
Example 9.13. Get the number of different years that appear in the PENALTIES
table.
SELECT COUNT(DISTINCT YEAR(PAYMENT_DATE)) FROM PENALTIES
The result is:
COUNT(DISTINCT YEAR(PAYMENT_DATE)) ---------------------------------- 5
Example 9.14. Get the number of different town names and the number of sexes represented.
SELECT COUNT(DISTINCT TOWN), COUNT(DISTINCT SEX) FROM PLAYERS
The result is:
COUNT(DISTINCT TOWN) COUNT(DISTINCT SEX) -------------------- ------------------- 6 2
Explanation. More than one aggregation function can be specified in a SELECT
clause.
Example 9.15. Get the numbers and names of players who incurred more penalties than they played matches.
SELECT PLAYERNO, NAME FROM PLAYERS AS P WHERE (SELECT COUNT(*) FROM PENALTIES AS PEN WHERE P.PLAYERNO = PEN.PLAYERNO) > (SELECT COUNT(*) FROM MATCHES AS M WHERE P.PLAYERNO = M.PLAYERNO)
The result is:
PLAYERNO NAME -------- ------- 27 Collins 44 Baker
Explanation: Aggregation functions can appear in the SELECT
clause of each table expression, including subqueries.
Example 9.16. For each player, find the player number, the name, and the number of penalties incurred by him or her, but only for players who have at least two penalties.
SELECT PLAYERNO, NAME, (SELECT COUNT(*) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) AS NUMBER FROM PLAYERS WHERE (SELECT COUNT(*) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) >= 2
The result is:
PLAYERNO NAME NUMBER -------- ------- ------ 27 Collins 2 44 Baker 3
Explanation. The correlated subquery in the SELECT
clause calculates the number of penalties for each player. That same subquery checks whether that number is greater than 1
.
This statement can also be formulated in a more compact way by placing the subquery within the FROM
clause:
SELECT PLAYERNO, NAME, NUMBER FROM (SELECT PLAYERNO, NAME, (SELECT COUNT(*) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) AS NUMBER FROM PLAYERS) AS PN WHERE NUMBER >= 2
Explanation: The subquery in the FROM
clause determines the number, the name, and the number of penalties for each player. Next, this number becomes a column in the intermediate result. After that, a condition can be specified (NUMBER>= 2
); finally, the value of that column in the SELECT
clause is retrieved.
With the MAX
and MIN
functions, you can determine the largest and smallest values, respectively, in a column.
Example 9.18. What is the highest penalty?
SELECT MAX(AMOUNT) FROM PENALTIES
The result is:
MAX(AMOUNT) ----------- 100.00
Example 9.19. What is the lowest penalty incurred by a player resident in Stratford?
SELECT MIN(AMOUNT) FROM PENALTIES WHERE PLAYERNO IN (SELECT PLAYERNO FROM PLAYERS WHERE TOWN = 'Stratford')
The result is:
MIN(AMOUNT) ----------- 100.00
Example 9.20. How many penalties are equal to the lowest one?
SELECT COUNT(*) FROM PENALTIES WHERE AMOUNT = (SELECT MIN(AMOUNT) FROM PENALTIES)
The result is:
COUNT(AMOUNT) ------------- 2
Explanation. The subquery calculates the lowest penalty, which is $25. The SELECT
statement calculates the number of penalties equal to the amount of this lowest penalty.
Example 9.21. For each team, find the team number followed by the player number of the player who has won the most matches for that team.
SELECT DISTINCT TEAMNO, PLAYERNO FROM MATCHES AS M1 WHERE WON = (SELECT MAX(WON) FROM MATCHES AS M2 WHERE M1.TEAMNO = M2.TEAMNO)
The result is:
TEAMNO PLAYERNO ------ -------- 1 6 1 44 1 57 2 27 2 104
Explanation: In the result, more than one player appears for each team because several players won a match in three sets.
Aggregation functions can occur in calculations. Here are two examples.
Example 9.22. What is the difference between the highest and lowest penalty in cents?
SELECT (MAX(AMOUNT) - MIN(AMOUNT)) * 100 FROM PENALTIES
The result is:
(MAX(AMOUNT) - MIN(AMOUNT)) * 100 --------------------------------- 7500.00
Example 9.23. Get the first letter of the last name of all players, alphabetically.
SELECT SUBSTR(MAX(NAME), 1, 1) FROM PLAYERS
The result is:
SUBSTR(MAX(NAME), 1, 1) ----------------------- W
Explanation: First, the MAX
function finds the last name in alphabetical order, and then the scalar function SUBSTR
picks out the first letter from this name. See Appendix B for a description of this and other functions.
In principle, DISTINCT
can be used with the MAX
and MIN
functions, but this, of course, does not change the end result (work out why for yourself).
When MAX
and MIN
functions are processed, two special situations must be taken into consideration:
If a column in a given row contains only NULL
values, the values of the MIN
and MAX
functions are also NULL
.
If the MIN
and MAX
functions are executed on an empty intermediate result, the value of these functions is also NULL
.
Here is an example of each.
Example 9.24. What is the highest league number of all players from Midhurst?
SELECT MAX(LEAGUENO) FROM PLAYERS WHERE TOWN = 'Midhurst'
The result is:
MAX(LEAGUENO) ------------- ?
Explanation: The PLAYERS
table contains only one player from Midhurst, and she has no league number. That is why the answer of this statement has only one row consisting of the NULL
value.
Example 9.25. What is the lowest league number of all players from Amsterdam? If a player does not exist, print the text Unknown
.
SELECT CASE WHEN MIN(LEAGUENO) IS NULL THEN 'Unknown' ELSE MIN(LEAGUENO) END FROM PLAYERS WHERE TOWN = 'Amsterdam'
The result is:
CASE WHEN ... ------------- Unknown
Example 9.26. For each player who incurred at least one penalty, find the player number, the highest penalty, and the date on which that penalty was paid.
SELECT PLAYERNO, AMOUNT, PAYMENT_DATE FROM PENALTIES AS PEN1 WHERE AMOUNT = (SELECT MAX(AMOUNT) FROM PENALTIES AS PEN2 WHERE PEN2.PLAYERNO = PEN1.PLAYERNO)
The result is:
PLAYERNO AMOUNT PAYMENT_DATE -------- ------ ------------ 6 100.00 1980-12-08 8 25.00 1980-12-08 27 100.00 1983-09-10 44 75.00 1981-05-05 104 50.00 1984-12-08
Example 9.27. For each player, get the player number, the highest penalty amount that was paid for him or her, and the highest number of sets won in a match.
SELECT PLAYERNO, (SELECT MAX(AMOUNT) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) AS HIGHESTPENALTY, (SELECT MAX(WON) FROM MATCHES WHERE MATCHES.PLAYERNO = PLAYERS.PLAYERNO) AS NUMBEROFSETS FROM PLAYERS
The result is:
PLAYERNO HIGHESTPENALTY NUMBEROFSETS -------- -------------- ------------ 2 ? 1 6 100.00 3 7 ? ? 8 25.00 0 27 100.00 3 28 ? ? 39 ? ? 44 75.00 3 57 ? 3 83 ? 0 95 ? ? 100 ? ? 104 50.00 3 112 ? 2
Explanation. The two correlated subqueries are processed for each player separately. When no rows are found, the subquery returns a NULL
value.
Example 9.28. Get the number of each player whose lowest penalty amount is equal to his or her highest penalty amount.
SELECT PLAYERNO FROM PLAYERS WHERE (SELECT MIN(AMOUNT) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO) = (SELECT MAX(AMOUNT) FROM PENALTIES WHERE PENALTIES.PLAYERNO = PLAYERS.PLAYERNO)
The result is:
PLAYERNO -------- 6 8 104
The SUM
function calculates the sum of all values in a particular column. The AVG
function calculates the arithmetic average of the values in a particular column. Both functions are, of course, applicable only to columns with a numeric data type.
Example 9.29. What is the total amount of penalties incurred by players from Inglewood?
SELECT SUM(AMOUNT) FROM PENALTIES WHERE PLAYERNO IN (SELECT PLAYERNO FROM PLAYERS WHERE TOWN = 'Inglewood')
The result is:
SUM(AMOUNT) ----------- 155.00
You can specify the word ALL
in front of the column name without affecting the result. By adding ALL
, you explicitly demand that all values are considered. In contrast, the use of DISTINCT
within the SUM
function can alter the end result. If you extend the SUM
function in the previous SELECT
statement with DISTINCT
, you get the following result:
SELECT SUM(DISTINCT AMOUNT) FROM PENALTIES WHERE PLAYERNO IN (SELECT PLAYERNO FROM PLAYERS WHERE TOWN = 'Inglewood')
The result is:
SUM(AMOUNT) ----------- 130.00
Note that, unlike the COUNT
, MIN
, and MAX
functions, the SUM
function is applicable only to columns with a numeric data type. The former three functions can also be applied to columns with alphanumeric and temporal data types.
Example 9.30. Get the average amount of penalties incurred by player 44.
SELECT AVG(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44
The result is:
AVG(AMOUNT) ----------- 43.33
Explanation: The amount $43.33 is the average of the amounts $75, $25, and $30.
Example 9.31. Which players have ever incurred a penalty greater than the average penalty?
SELECT DISTINCT PLAYERNO FROM PENALTIES WHERE AMOUNT > (SELECT AVG(AMOUNT) FROM PENALTIES)
The result is:
PLAYERNO -------- 6 27 44
Explanation: The average penalty is $60.
Adding the word ALL
does not affect the result because it simply reinforces the idea that all values are included in the calculation. On the other hand, adding DISTINCT
within the AVG
function does influence the result.
Example 9.32. What is the unweighted arithmetic mean of the penalty amounts? (By “unweighted,” we mean that each value is considered only once in the calculation, even when it occurs more than once.)
SELECT AVG(DISTINCT AMOUNT) FROM PENALTIES
The result is:
AVG(DISTINCT AMOUNT) -------------------- 56.00
Explanation. The amount $56 is equal to $100 + $75 + $50 + $30 + $25 divided by 5.
Example 9.33. What is the average length (in number of characters) of the names of the players, and how long is the longest name?
SELECT AVG(LENGTH(RTRIM(NAME))), MAX(LENGTH(RTRIM(NAME))) FROM PLAYERS
The result is:
AVG(LENGTH(RTRIM(NAME))) MAX(LENGTH(RTRIM(NAME))) ------------------------ ------------------------ 6.5000 9
Example 9.34. For each penalty, get the payment number, the amount, and the difference between the amount and the average penalty amount.
SELECT PAYMENTNO, AMOUNT, ABS(AMOUNT - (SELECT AVG(AMOUNT) FROM PENALTIES)) AS DIFFERENCE FROM PENALTIES AS P
The result is:
PAYMENTNO AMOUNT DIFFERENCE --------- ------ ---------- 1 100.00 40.00 2 75.00 15.00 3 100.00 40.00 4 50.00 10.00 5 25.00 35.00 6 25.00 35.00 7 30.00 30.00 8 75.00 15.00
Explanation. In this example, the subquery is part of a compound expression. The result of the subquery is subtracted from the AMOUNT
column, and next the absolute value of this result is calculated with the scalar function ABS
.
For the SUM
and AVG
functions, the same rules apply as for MIN
and MAX
:
If a column in a given row contains only NULL
values, the value of the function is equal to NULL
.
If some of the values in a column are NULL
, the value of the function is equal to the sum of the average of all non-NULL
values divided by the number of non-NULL
values (and, therefore, not divided by the total number of values).
If the intermediate result for which SUM
or AVG
must be calculated is empty, the result of the function is equal to the NULL
value.
The VARIANCE
and STDDEV
functions calculate, respectively, the variance and the standard deviation of the values in a particular column. These functions are, of course, applicable only to columns with a numeric data type.
Not every SQL product supports the functions VARIANCE
and STDDEV
. That is why we also illustrate in this section how these values can be calculated with standard SQL.
The VARIANCE
function, or the VAR
function, for short, calculates the variance. Variance is a measurement that indicates how close all values are to the average. In other words, it refers to the distribution of all values. The closer each value is to the average, the lower the variance is.
Example 9.35. Get the variance of all penalties incurred by player 44.
SELECT VARIANCE(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44
The result is:
VARIANCE(AMOUNT) ---------------- 505.555
Explanation: The variance is calculated on the basis of the following steps:
Calculate the average of the column concerned.
Determine for each value in the column how much the absolute value differs from the average.
Calculate the sum of the squares of the differences.
Divide the sum by the number of values (in the column).
If you execute these steps for the previous statement, the first step returns the answer: 43.33333, the average of the three values 75, 25, and 30. Next, for each of the three values, the difference with the average is calculated. You can determine this with the following SELECT
statement:
SELECT AMOUNT – (SELECT AVG(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44) FROM PENALTIES WHERE PLAYERNO = 44
This gives the result: 31.666667, –18.33333, and –13.33333. You can use the following SELECT
statement to calculate this intermediate result:
SELECT SUM(P) FROM (SELECT POWER(AMOUNT - (SELECT AVG(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44),2) AS P FROM PENALTIES WHERE PLAYERNO = 44) AS POWERS
The result is 1516.6666666667. In the final step, this amount is divided by the number of values, which gives an end result of 505.5555. To calculate all these steps without the VARIANCE
function, the following statement can be used:
SELECT SUM(P) / (SELECT COUNT(*) FROM PENALTIES WHERE PLAYERNO = 44) FROM (SELECT POWER(AMOUNT - (SELECT AVG(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44),2) AS P FROM PENALTIES WHERE PLAYERNO = 44) AS POWERS
The STDDEV
function calculates the standard deviation of a set of values. Standard deviation is another measure of distribution for determining how close the values are to the average. By definition, the standard deviation is equal to the square root of the variance. In other words, the following two expressions are equal: STDDEV(...)
and SQRT(VARIANCE(...))
.
Example 9.36. Get the standard deviation for all penalties incurred by player 44.
SELECT STDDEV(AMOUNT) FROM PENALTIES WHERE PLAYERNO = 44
The result is:
STDDEV(AMOUNT) -------------- 22.484563
Not superfluous. Not superfluous. Superfluous because a condition appears on the primary key. Not superfluous. Not superfluous. Not superfluous. |
3.16.130.201