• What functions are
• How functions are used
• When to use functions
• Using aggregate functions
• Summarizing data with aggregate functions
• Results from using functions
In this hour, you learn about SQL’s aggregate functions. You can perform a variety of useful functions with aggregate functions, such as getting the highest total of a sale or counting the number of orders processed on a given day. The real power of aggregate functions will be discussed in the next hour when we tackle the GROUP BY
clause.
Functions are keywords in SQL used to manipulate values within columns for output purposes. A function is a command normally used in conjunction with a column name or expression that processes the incoming data to produce a result. SQL contains several types of functions. This hour covers aggregate functions. An aggregate function provides summarization information for an SQL statement, such as counts, totals, and averages.
The basic set of aggregate functions discussed in this hour are
• COUNT
• SUM
• MAX
• AVG
The following queries show the data used for most of this hour’s examples:
SELECT * FROM PRODUCTS_TBL;
PROD_ID PROD_DESC COST
------------------------------------------------
11235 WITCH COSTUME 29.99
222 PLASTIC PUMPKIN 18 INCH 7.75
13 FALSE PARAFFIN TEETH 1.1
90 LIGHTED LANTERNS 14.5
15 ASSORTED COSTUMES 10
9 CANDY CORN 1.35
6 PUMPKIN CANDY 1.45
87 PLASTIC SPIDERS 1.05
119 ASSORTED MASKS 4.95
1234 KEY CHAIN 5.95
2345 OAK BOOKSHELF 59.99
11 rows selected.
The following query lists the employee information from the EMPLOYEE_TBL
table. Note that some of the employees do not have pager numbers assigned.
SELECT EMP_ID, LAST_NAME, FIRST_NAME, PAGER
FROM EMPLOYEE_TBL;
EMP_ID LAST_NAM FIRST_NA PAGER
--------------------------------------
311549902 STEPHENS TINA
442346889 PLEW LINDA
213764555 GLASS BRANDON 3175709980
313782439 GLASS JACOB 8887345678
220984332 WALLACE MARIAH
443679012 SPURGEON TIFFANY
6 rows selected.
You use the COUNT
function to count rows or values of a column that do not contain a NULL
value. When used within a query, the COUNT
function returns a numeric value. You can also use the COUNT
function with the DISTINCT
command to only count the distinct rows of a dataset. ALL
(opposite of DISTINCT
) is the default; it is not necessary to include ALL
in the syntax. Duplicate rows are counted if DISTINCT
is not specified. One other option with the COUNT
function is to use it
with an asterisk. COUNT(*)
counts all the rows of a table including duplicates, whether a NULL
value is contained in a column or not.
You cannot use the DISTINCT
command with COUNT(*)
, only with COUNT
(column_name).
The syntax for the COUNT
function is as follows:
COUNT [ (*) | (DISTINCT | ALL) ] (COLUMN NAME)
This example counts all employee IDs:
SELECT COUNT(EMPLOYEE_ID) FROM EMPLOYEE_PAY_ID
This example counts only the distinct rows:
SELECT COUNT(DISTINCT SALARY)FROM EMPLOYEE_PAY_TBL
This example counts all rows for SALARY
:
SELECT COUNT(ALL SALARY)FROM EMPLOYEE_PAY_TBL
This final example counts all rows of the EMPLOYEE
table:
SELECT COUNT(*) FROM EMPLOYEE_TBL
COUNT(*)
is used in the following example to get a count of all records in the EMPLOYEE_TBL
table. There are six employees.
SELECT COUNT(*)
FROM EMPLOYEE_TBL;
COUNT(*)
---------
6
COUNT(*)
produces slightly different calculations than other count variations. This is because when the COUNT
function is used with the asterisk, it counts the rows in the returned result set without regard to duplicates and NULL
values. This is an important distinction. If you need your query to return a count of a particular field and include NULLs,
you need to use a function such as ISNULL
to replace the NULL
values.
COUNT(EMP_ID)
is used in the next example to get a count of all the employee identification IDs that exist in the table. The returned count is the same as the last query because all employees have an identification number.
SELECT COUNT(EMP_ID)
FROM EMPLOYEE_TBL;
COUNT(EMP_ID)
-------------
6
COUNT(PAGER)
is used in the following example to get a count of all the employee records that have a pager number. Only two employees had pager numbers.
SELECT COUNT(PAGER)
FROM EMPLOYEE_TBL;
COUNT(PAGER)
------------
2
The ORDERS_TBL
table is shown next:
SELECT *
FROM ORDERS_TBL;
ORD_NUM CUST_ID PROD_ID QTY ORD_DATE_
-----------------------------------------------------
56A901 232 11235 1 22-OCT-99
56A917 12 907 100 30-SEP-99
32A132 43 222 25 10-OCT-99
16C17 090 222 2 17-OCT-99
18D778 287 90 10 17-OCT-99
23E934 432 13 20 15-OCT-99
90C461 560 1234 2
7 rows selected.
This example obtains a count of all distinct product identifications in the ORDERS_TBL
table.
SELECT COUNT(DISTINCT PROD_ID )
FROM ORDERS_TBL;
COUNT(DISTINCT PROD_ID )
------------------------
6
The PROD_ID 222
has two entries in the table, thus reducing the distinct values from 7 to 6.
Because the COUNT
function counts the rows, data types do not play a part. The rows can contain columns with any data type.
The SUM
function
returns a total on the values of a column for a group of rows. You can also use the SUM
function in conjunction with DISTINCT
.
When you use SUM
with DISTINCT
, only the distinct rows are totaled, which might not have much purpose. Your total is not accurate in that case because rows of data are omitted.
The syntax for the SUM
function is as follows:
SUM ([ DISTINCT ] COLUMN NAME)
The value of an argument must be numeric to use the SUM
function. You cannot use the SUM function on columns having a data type other than numeric, such as character or date.
This example totals the salaries:
SELECT SUM(SALARY) FROM EMPLOYEE_PAY_TBL
This example totals the distinct salaries:
SELECT SUM(DISTINCT SALARY) FROM EMPLOYEE_PAY_TBL
In the following query, the sum, or total amount, of all cost values is being retrieved from the PRODUCTS_TBL
table:
SELECT SUM(COST)
FROM PRODUCTS_TBL;
SUM(COST)
-----------
163.07
Observe the way the DISTINCT
command in the following example skews the previous results. This is why it is rarely useful:
SELECT SUM(DISTINCT COST)
FROM PRODUCTS_TBL;
SUM(COST)
----------
72.14
The following query demonstrates that, although some aggregate functions require numeric data, this is only limited to the type of data. Here the PAGER
column of the EMPLOYEE_TBL
table shows that the implicit conversion of the CHAR
data to a numeric type is supported:
SELECT SUM(PAGER)
FROM EMPLOYEE_TBL;
SUM(PAGER)
-----------
12063055658
When you use a type of data that cannot be implicitly converted to a numeric type, such as the LAST_NAME
column, it returns a result of 0
.
SELECT SUM(LAST_NAME)
FROM EMPLOYEE_TBL;
SUM(LAST_NAME)
----------
0
The AVG
function finds the average value for a given group of rows. When used with the DISTINCT
command, the AVG
function returns the average of the distinct rows. The syntax for the AVG
function is as follows:
AVG ([ DISTINCT ] COLUMN NAME)
The value of the argument must be numeric for the AVG
function to work.
This example returns the average salary:
SELECT AVG(SALARY) FROM EMPLOYEE_PAY_TBL
This example returns the distinct average salary:
SELECT AVG(DISTINCT SALARY) EMPLOYEE_PAY_TBL
The average value for all values in the PRODUCTS_TBL
table’s COST
column is being retrieved in the following example:
SELECT AVG(COST)
FROM PRODUCTS_TBL;
AVG(COST)
----------
13.5891667
In some implementations, the results of your query might be truncated to the precision of the data type.
The next example uses two aggregate functions in the same query. Because some employees are paid hourly and others are on salary, you want to retrieve the average value for both PAY_RATE
and SALARY
.
SELECT AVG(PAY_RATE), AVG(SALARY)
FROM EMPLOYEE_PAY_TBL;
AVG(PAY_RATE) AVG(SALARY)
------------- ---------
13.5833333 30000
The MAX
function returns the maximum value from the values of a column in a group of rows. NULL
values are ignored when using the MAX
function. The DISTINCT
command is an option. However, because the maximum value for all the rows is the same as the distinct maximum value, DISTINCT
is useless.
The syntax for the MAX
function is
MAX([ DISTINCT ] COLUMN NAME)
This example returns the highest salary:
SELECT MAX(SALARY) FROM EMPLOYEE_PAY_TBL
This example returns the highest distinct salary:
SELECT MAX(DISTINCT SALARY) FROM EMPLOYEE_PAY_TBL
The following example returns the maximum value for the COST
column in the PRODUCTS_TBL
table:
SELECT MAX(COST)
FROM PRODUCTS_TBL;
MAX(COST)
----------29.99
SELECT MAX(DISTICNT COST)
FROM PRODUCTS_TBL;
MAX(COST)
29.99
You can also use aggregate functions such as MAX
and MIN
on character data. In the case of these values, collation of your database comes into play again. Most commonly your database collation is set to a dictionary order, so the results are ranked according to that. For example, say we performed a MAX
on the PRODUCT_DESC
column of the products table:
SELECT MAX(PRODUCT_DESC)
FROM PRODUCTS_TBL;
MAX(PRODUCT_DESC)
-------------------
WITCH COSTUME
In this instance, the function returned the largest value according to a dictionary ordering of the data in the column.
The MIN
function returns the minimum value of a column for a group of rows. NULL
values are ignored when using the MIN
function. The DISTINCT
command is an option. However, because the minimum value for all rows is the same as the minimum value for distinct rows, DISTINCT
is useless.
The syntax for the MIN
function is
MIN([ DISTINCT ] COLUMN NAME)
This example returns the lowest salary:
SELECT MIN(SALARY) FROM EMPLOYEE_PAY_TBL
This example returns the lowest distinct salary:
SELECT MIN(DISTINCT SALARY) FROM EMPLOYEE_PAY_TBL
The following example returns the minimum value for the COST
column in the PRODUCTS_TBL
table:
SELECT MIN(COST)
FROM PRODUCTS_TBL;
MIN(COST)
----------
1.05
SELECT MIN(DISTINCT COST)
FROM PRODUCTS_TBL;
MIN(COST)
----------
1.05
One important thing to keep in mind when using aggregate functions with the DISTINCT
command is that your query might not return the desired results. The purpose of aggregate functions is to return summarized data based on all rows of data in a table.
As with the MAX
function, the MIN
function can work against character data and returns the minimum value according to the dictionary ordering of the data.
SELECT MINPRODUCT_DESC)
FROM PRODUCTS_TBL;
MIN(PRODUCT_DESC)
-------------------
ASSORTED COSTUMES
The final example combines aggregate functions with the use of arithmetic operators:
SELECT COUNT(ORD_NUM), SUM(QTY),
SUM(QTY) / COUNT(ORD_NUM) AVG_QTY
FROM ORDERS_TBL;
COUNT(ORD_NUM) SUM(QTY) AVG_QTY
-------------- -------- ---------
7 160 22.857143
You have performed a count on all order numbers, figured the sum of all quantities ordered, and, by dividing the two figures, derived the average quantity of an item per order. You also created a column alias for the computation—AVG_QTY
.
Aggregate functions can be useful and are quite simple to use. You have learned how to count values in columns, count rows of data in a table, get the maximum and minimum values for a column, figure the sum of the values in a column, and figure the average value for values in a column. Remember that NULL
values are not considered when using aggregate functions, except when using the COUNT
function in the format COUNT(*)
.
Aggregate functions are the first functions in SQL that you have learned, but more follow. You can also use aggregate functions for group values, which are discussed during the next hour. As you learn about other functions, you see that the syntaxes of most functions are similar to one another and that their concepts of use are relatively easy to understand.
Q. Why are NULL
values ignored when using the MAX
or MIN
function?
A. A NULL
value means that nothing is there.
Q. Why don’t data types matter when using the COUNT
function?
A. The COUNT
function only counts rows.
The following workshop is composed of a series of quiz questions and practical exercises. The quiz questions are designed to test your overall understanding of the current material. The practical exercises are intended to afford you the opportunity to apply the concepts discussed during the current hour, as well as build upon the knowledge acquired in previous hours of study. Please take time to complete the quiz questions and exercises before continuing. Refer to Appendix C, “Answers to Quizzes and Exercises,” for answers.
1. True or false: The AVG
function returns an average of all rows from a SELECT
column, including any NULL
values.
2. True or false: The SUM
function adds column totals.
3. True or false: The COUNT(*)
function counts all rows in a table.
4. Will the following SELECT
statements work? If not, what fixes the statements?
a. SELECT COUNT *
FROM EMPLOYEE_PAY_TBL;
b. SELECT COUNT(EMPLOYEE_ID), SALARY
FROM EMPLOYEE_PAY_TBL;
c. SELECT MIN(BONUS), MAX(SALARY)
FROM EMPLOYEE_PAY_TBL
WHERE SALARY > 20000;
d. SELECT COUNT(DISTINCT PROD_ID) FROM PRODUCTS_TBL;
e. SELECT AVG(LAST_NAME) FROM EMPLOYEE_TBL;
f. SELECT AVG(PAGER) FROM EMPLOYEE_TBL;
1. Use EMPLOYEE_PAY_TBL
to construct SQL statements to solve the following exercises:
A. What is the average salary?
B. What is the maximum bonus?
C. What are the total salaries?
D. What is the minimum pay rate?
E. How many rows are in the table?
2. Write a query to determine how many employees are in the company whose last names begin with a G.
3. Write a query to determine the total dollar amount for all the orders in the system. Rewrite the query to determine the total dollar amount if we set the price of each item as $10.00
.
4. Write two sets of queries to find the first employee name and last employee name when they are listed in alphabetical order.
5. Write a query to perform an AVG
function on the employee names. Does the statement work? Determine why it is that you got that result.
3.139.88.165