Appendix B. Appedix B: Common Table Expressions

Many of the queries presented in this cookbook go beyond what is possible using tables as they are typically available in a database, especially in relation to aggregate functions and window functions. Therefore, for some queries, you need to make a derived table – either a subquery or a Common Table Expression (CTE).

Subqueries

Arguably the simplest way to create a virtual table that allows you to run queries on window functions or aggregate functions is a subquery. All that’s required here is to write out the query that you need to query within parentheses, and then to write a second query that uses it. The table below illustrates the use of subqueries with a simple double aggregate - you want to find not just the counts of employees in each job, but then identify the highest number, but you can’t nest aggregate functions directly in a standard query.

One pitfall is that some vendors require you to give the subquery table and alias, but others do not. The example below was written in MySQL which does require an alias. The alias here is head_count_tab after the closing parenthesis. Others that require an alias are PostgreSQL and SQL Server, while Oracle does not.

select max(HeadCount) as HighestJobHeadCount from
(select job,count(empno) as HeadCount
from emp
group by job) head_count_tab

Common Table Expressions

Common Table Expressions were intended to overcome some of the limits of subqueries, and may be most well known for allowing recursive queries to be used within SQL. In fact, enablng recursion within SQL was the main inspiration for Common Table Expressions. The example below achieves the same result as the subquery we saw earlier - it finds a double aggregate.

with head_count_tab (job,HeadCount) as

(select job,count(empno)
from emp
group by job)

select max(HeadCount) as HighestJobHeadCount
from head_count_tab

Although this query solves a simple problem it illustrates the essential features of a CTE. We introduce the derived table using the WITH clause, specifying the column headings in the parentheses, and use parentheses around the derived table’s query itself. If we want to add more derived tables, we can add more as long as we separate each one a comma, and provide its name before its query (the reverse of how aliasing usually works in SQL).

Because the inner queries are presented before the outer query, in many circumstances they may also be considered more readable – they make it easier to study each logical element of the query separately in order to understand the logical flow. Of course, as with all things in coding, this will very according to circumstances, and sometimes the subquery will be more readable.

Considering that recursion is the key reason for Common Table Expressions to exist, the best way to demonstrate their capability is through a recursive query.

The query that follows calculates the first 20 numbers in the Fibonacci sequence using a recursive CTE. Note that in the first part of the anchor query we can initialise the values in the first row of the virtual table.

with recursive workingTable ( fibNum, NextNumber, index1)
as
(select 0,1,1
union all
select fibNum+nextNumber,fibNUm,index1+1
from anchor
where index1<20)

select fibNum from workingTable as fib

The Fibonacci sequence finds the next number by adding the current and the previous number; you could also use LAG to achieve this result. However, in this case we’ve made a pseudo-LAG by using two columns to account for the current number and the previous. Note the keyword RECURSIVE, which is mandatory in MySQL, Oracle and PostgreSQL but not in SQL Server or DB2. In this query, the index1 column is largely redundant in the sense of not being used for the Fibonacci calculation. Instead, we have included it to make it simpler to set the number of rows returned via the WHERE clause. In a recursive CTE, the WHERE clause becomes crucial, as without it the query would not terminate (although in this specific case, if you try deleting it, you are likely to find that your database throws an overflow error when the numbers become too large for the data type).

At the simple end of the spectrum, there’s not a lot of difference between a subquery and CTE in terms of useability. Both allow for nesting or writing more complicated queries that refer to other derived tables. However, once you start nesting many subqueries, it readability becomes tricky as you need to pull back the layers of query like an onion within nested brackets, whereas the CTE writes the queries down the page, making understanding which new derived table refers to which earlier derived much simpler.

Summing Up

The use of derived tables dramatically extends the range of SQL. Both subqueries and CTES are used many times throughout the book, so it is important to understand how they work, especially as they each have a particular syntax that you need to master to ensure success. The recursive CTE, which is now available in the vendor offerings in this book is one of the biggest extensions to have occurred within SQL, allowing for many extra possibilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.60.29