Sometimes, we are just looking for information about our data. For this, MariaDB has several built-in functions: AVG
, COUNT
, MIN
, MAX
, and SUM
.
The AVG
function is used for obtaining the average of the data in a column. For example, combined with the TIMESTAMPDIFF
and CURDATE
functions, we can use the AVG
function to calculate the average age of all of the people in the employees table.
The CURDATE
function doesn't take any arguments and when called, it simply returns the current date.
The TIMESTAMPDIFF
function takes three arguments—the unit to count by and two dates, and then outputs the difference between the two. The unit is one of several time units, including MINUTE
, HOUR
, DAY
, WEEK
, MONTH
, QUARTER
, and YEAR
.
Putting all three functions together we get the following:
SELECT AVG(TIMESTAMPDIFF(YEAR,birthday,CURDATE())) FROM employees;
Depending on when you run this statement and the dates in the birthday column, the average you get back will be different.
The COUNT
function is often used to count the number of rows returned by a query. For example, to count the number of rows in the employees
table, we would type the following command:
SELECT COUNT(*) FROM employees;
Doing this is a little silly on a table like ours, which only has a few rows; after all, the output of SELECT * FROM employees;
includes a line at the very end telling us the number of rows returned, and thus the number of rows in the table. But, for a table with tens of thousands or even millions of rows, this function is a much better way of finding out the number of rows.
Another use of COUNT
is to discover how many rows have a value in a specific column. For example, not every row in our table has a preferred name set; to count how many do have a set preferred name, we can run the following command:
SELECT COUNT(pref_name) FROM employees;
These two functions determine the minimum and maximum values. For example, the oldest employee can be determined with a double SELECT
statement, where we look up an employee using the output of the function as follows:
SELECT * FROM employees WHERE birthday = (SELECT MIN(birthday) from employees);
The output of the preceding command will look similar to the following:
MariaDB [test]> SELECT * FROM employees -> WHERE birthday = (SELECT MIN(birthday) FROM employees); +----+----------+-----------+-----------+------------+ | id | surname | givenname | pref_name | birthday | +----+----------+-----------+-----------+------------+ | 5 | Woodruff | Wilford | Will | 1957-03-01 | +----+----------+-----------+-----------+------------+ 1 row in set (0.00 sec)
And likewise, the youngest employee can be determined by using MAX
instead of MIN
in the example as follows:
MariaDB [test]> SELECT * FROM employees -> WHERE birthday = (SELECT MAX(birthday) FROM employees); +----+---------+------------+-----------+------------+ | id | surname | givenname | pref_name | birthday | +----+---------+------------+-----------+------------+ | 1 | Perry | Lowell Tom | Tom | 1988-08-05 | +----+---------+------------+-----------+------------+ 1 row in set (0.00 sec)
The SUM
function is used to compute the total of a set of values. For example, we could total up the combined ages of every employee with something like the following using the TIMESTAMPDIFF
and CURDATE
functions. This is similar to what we did with the AVG
function to convert a birth date into an employee's age.
SELECT SUM(TIMESTAMPDIFF(YEAR,birthday,CURDATE())) FROM employees;
Computing combined ages is not very useful. But for other tables, such as an orders table, the SUM
function would come in very handy for finding out, for example, how many widgets a specific customer has ordered in the last year.
Sometimes, the rows in our database contain natural groups of data. For example, the number of red or blue shirts ordered. The GROUP BY
clause can be used with summary functions to group like data together.
In our employees
database, we can use GROUP BY
with the COUNT
function to find out which surnames are the most popular as follows:
SELECT surname, COUNT(*) FROM employees GROUP BY surname;
The output of the preceding command will look similar to the following:
MariaDB [test]> SELECT surname, COUNT(*) -> FROM employees -> GROUP BY surname; +----------------+----------+ | surname | COUNT(*) | +----------------+----------+ | Anderson | 1 | | Christofferson | 1 | | McKay | 1 | | Perry | 1 | | Pratt | 1 | | Smith | 1 | | Snow | 2 | | Taylor | 1 | | Woodruff | 1 | +----------------+----------+ 9 rows in set (0.00 sec)
The previous GROUP BY
example outputs all the surnames with a count of how many times each surname is used. Most of them are only used once, so since we are trying to determine which surnames are used the most, it makes sense to filter out the ones that are only used once. To do this, we add a HAVING
clause after the GROUP BY
clause as follows:
SELECT surname, COUNT(*) FROM employees GROUP BY surname HAVING COUNT(*) > 1;
The HAVING
clause eliminates most of the results, leading to a more readable output:
MariaDB [test]> SELECT surname, COUNT(*) -> FROM employees -> GROUP BY surname -> HAVING COUNT(*) > 1; +---------+----------+ | surname | COUNT(*) | +---------+----------+ | Snow | 2 | +---------+----------+ 1 row in set (0.00 sec)
In a larger table, we may want to filter out the names used only two to three times, to keep the number of rows returned manageable.
The HAVING
clause is a filter similar to WHERE
, and so it can use any of the various comparison operators.
3.149.249.252