Summarizing data

Sometimes, we are just looking for information about our data. For this, MariaDB has several built-in functions: AVG, COUNT, MIN, MAX, and SUM.

The AVG function

The AVG function is used for obtaining the average of the data in a column. For example, combined with the TIMESTAMPDIFF and CURDATE functions, we can use the AVG function to calculate the average age of all of the people in the employees table.

The CURDATE function doesn't take any arguments and when called, it simply returns the current date.

The TIMESTAMPDIFF function takes three arguments—the unit to count by and two dates, and then outputs the difference between the two. The unit is one of several time units, including MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, and YEAR.

Tip

The TIMESTAMPDIFF and CURDATE functions are just two of several functions that make working with dates and times easier. Find out more about them at the following location:

https://mariadb.com/kb/en/date-and-time-functions/

Putting all three functions together we get the following:

SELECT AVG(TIMESTAMPDIFF(YEAR,birthday,CURDATE())) 
FROM employees; 

Depending on when you run this statement and the dates in the birthday column, the average you get back will be different.

The COUNT function

The COUNT function is often used to count the number of rows returned by a query. For example, to count the number of rows in the employees table, we would type the following command:

SELECT COUNT(*) FROM employees;

Doing this is a little silly on a table like ours, which only has a few rows; after all, the output of SELECT * FROM employees; includes a line at the very end telling us the number of rows returned, and thus the number of rows in the table. But, for a table with tens of thousands or even millions of rows, this function is a much better way of finding out the number of rows.

Another use of COUNT is to discover how many rows have a value in a specific column. For example, not every row in our table has a preferred name set; to count how many do have a set preferred name, we can run the following command:

SELECT COUNT(pref_name) FROM employees;

The MIN and MAX functions

These two functions determine the minimum and maximum values. For example, the oldest employee can be determined with a double SELECT statement, where we look up an employee using the output of the function as follows:

SELECT * FROM employees
   WHERE birthday = (SELECT MIN(birthday) from employees);

The output of the preceding command will look similar to the following:

MariaDB [test]> SELECT * FROM employees 
    -> WHERE birthday = (SELECT MIN(birthday) FROM employees); 
+----+----------+-----------+-----------+------------+ 
| id | surname  | givenname | pref_name | birthday   | 
+----+----------+-----------+-----------+------------+ 
|  5 | Woodruff | Wilford   | Will      | 1957-03-01 | 
+----+----------+-----------+-----------+------------+ 
1 row in set (0.00 sec) 

And likewise, the youngest employee can be determined by using MAX instead of MIN in the example as follows:

MariaDB [test]> SELECT * FROM employees
    -> WHERE birthday = (SELECT MAX(birthday) FROM employees);
+----+---------+------------+-----------+------------+ 
| id | surname | givenname  | pref_name | birthday   | 
+----+---------+------------+-----------+------------+ 
|  1 | Perry   | Lowell Tom | Tom       | 1988-08-05 | 
+----+---------+------------+-----------+------------+ 
1 row in set (0.00 sec) 

Tip

In the two previous examples where there are two SELECT queries, one inside the other, the inside query is called a subquery. We will not be covering them in this book, but if you'd like to learn more about them, check out the Subqueries section of the MariaDB Knowledge Base by going to:

https://mariadb.com/kb/en/subqueries/

The SUM function

The SUM function is used to compute the total of a set of values. For example, we could total up the combined ages of every employee with something like the following using the TIMESTAMPDIFF and CURDATE functions. This is similar to what we did with the AVG function to convert a birth date into an employee's age.

SELECT SUM(TIMESTAMPDIFF(YEAR,birthday,CURDATE())) 
FROM employees;

Computing combined ages is not very useful. But for other tables, such as an orders table, the SUM function would come in very handy for finding out, for example, how many widgets a specific customer has ordered in the last year.

Using GROUP BY with summarized data

Sometimes, the rows in our database contain natural groups of data. For example, the number of red or blue shirts ordered. The GROUP BY clause can be used with summary functions to group like data together.

In our employees database, we can use GROUP BY with the COUNT function to find out which surnames are the most popular as follows:

SELECT surname, COUNT(*) 
   FROM employees 
   GROUP BY surname; 

The output of the preceding command will look similar to the following:

MariaDB [test]> SELECT surname, COUNT(*) 
    -> FROM employees 
    -> GROUP BY surname; 
+----------------+----------+ 
| surname        | COUNT(*) | 
+----------------+----------+ 
| Anderson       |        1 | 
| Christofferson |        1 | 
| McKay          |        1 | 
| Perry          |        1 | 
| Pratt          |        1 | 
| Smith          |        1 | 
| Snow           |        2 | 
| Taylor         |        1 | 
| Woodruff       |        1 | 
+----------------+----------+ 
9 rows in set (0.00 sec) 

Using HAVING to filter GROUP BY

The previous GROUP BY example outputs all the surnames with a count of how many times each surname is used. Most of them are only used once, so since we are trying to determine which surnames are used the most, it makes sense to filter out the ones that are only used once. To do this, we add a HAVING clause after the GROUP BY clause as follows:

SELECT surname, COUNT(*)
   FROM employees
   GROUP BY surname
   HAVING COUNT(*) > 1; 

The HAVING clause eliminates most of the results, leading to a more readable output:

MariaDB [test]> SELECT surname, COUNT(*) 
    -> FROM employees 
    -> GROUP BY surname 
    -> HAVING COUNT(*) > 1; 
+---------+----------+ 
| surname | COUNT(*) | 
+---------+----------+ 
| Snow    |        2 | 
+---------+----------+ 
1 row in set (0.00 sec)

In a larger table, we may want to filter out the names used only two to three times, to keep the number of rows returned manageable.

The HAVING clause is a filter similar to WHERE, and so it can use any of the various comparison operators.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.35