Hierarchical data structures are found quite often in business. Some common examples are organizational structures with departments and subdepartments, and bill-of-material information involving assemblies and subassemblies. Recursion provides a powerful way to query such structures.
In SQL, recursive queries involve a three-phase process:
Create an initial result set (sometimes known as the anchor)
Recursively add results from further queries to the initial result set
Carry out a final query on the accumulated results
Recursion generally requires some termination condition. Here, the termination is implicit: the recursion ends when there are no more results to add in Phase 2. Particular SQL implementations may provide facilities for controlling the maximum depth of recursion to avoid infinite loops.
Creating the initial result set involves the concept of a common table expression (CTE), first introduced in SQL: 1999. A common table expression can be imagined as a view definition with a scope that is limited to the query that immediately follows. The next chapter discusses views, which are basically named select statements stored in the system. The result of a view can be visualized as a table, although such a table is not normally stored in any physical sense. The basic syntax for a CTE is:
with cte-name [ (col-list) ] as cte-query-definition query-statement
The col-list is not required as the cte-query-definition returns distinct columns. The query-statement immediately following the with clause uses the results of the cte-query-definition, referencing the cte-name. Attempts by any other query to refer to this CTE will meet with failure, unlike a view definition. This construct has other uses in SQL, but we’re interested here in the role it plays in forming a recursive query. The anchor part of the recursive query can involve multiple subqueries with results combined using the standard SQL set operations (union, intersect, and except), but for simplicity we’ll assume here that we have a single query.
Once the anchor results are formed, results from the recursion are added using SQL’s union ail operation. At each stage, the recursive part of the query takes the accumulated results as input, runs the query, and adds the results on to the result set. This continues until no more results are produced. Typically, this happens when there are no more unexplored nodes in the hierarchy.
A skeleton of the recursive CTE looks like the following:
with cte-name [ (col-list) ] as anchor-query-definition union all recursive-query-definition query-statement
To make this more concrete, we’ll look at a simple example. Figure 12.63 shows a schema for a table that holds hierarchical information about departments in an organization, together with a sample data population.
As shown in Figure 12.63(b), this organization has three departmental levels. We may want to write various queries that exploit the intrinsic parent—child relationships in the hierarchy, and recursive queries are a natural way to do this. Given the simple nature of the scenario in Figure 12.63, using recursive queries may seem overkill, but we can easily demonstrate the essential features.
We’ll begin with a simple level-by-level listing of the departments to illustrate how the result is produced. Here is the query:
with myCTE (deptld, deptName, parentDept, “level”) as ( select anchor.deptld, anchor.deptName, anchor.parentDept, 1 as “level” from Dept as anchor where parentDept is null union all select child.deptld, child.deptName, child.parentDept, “level” +1 from Dept as child join myCTE as parent on child.parentDept = parentdeptld ) select deptld, deptName, “level” from myCTE
When executed, the query produces the result:
deptld | deptName | level |
---|---|---|
101 | Management | 1 |
201 | Sales | 2 |
202 | Production | 2 |
301 | Sales - West | 3 |
302 | Sales - Central | 3 |
303 | Sales- East | 3 |
Here the “level” column indicates the relative depth in the hierarchy, taking 1 as the top level (“level” is a delimited identifier here because it’s a recognized keyword in SQL).
We’ve taken as our anchor the topmost level by specifying where parentDept is null, so the anchor tuple is <101, ‘Management’, null, 1>. At the next level, we’re now looking for departments that have a deptld of 101. The union all operation adds the tuples <201, ‘Sales’, 101, 2> and <202, ‘Production’, 101, 2>. The join in the recursive part of the CTE can now add the departments under Sales. Production has no subde-partments and so the join produces no results. After adding the three Sales subdepart-ments there are no other possibilities for the join (because no department has a parent deptld of 301, 302, or 303), so the recursion ends.
When the select statement after the CTE is executed, it creates the result set for myCTE and then applies its own query, which in this case just extracts three of the columns produced by the recursive CTE. Of course, the final result depends on the query that operates on the CTE results and it does not have to list all of the CTE rows. For instance, the following recursive query finds the total number of staff below a particular point in the hierarchy—again, we’ve taken the top node, but a with simple change to the query defining the anchor we could have taken any point in the hierarchy.
with myCTE (deptld, totalStaff, parentDept) as ( select anchor.deptld, anchor.totalStaff, anchor.parentDept from Dept as anchor where parentDept is null union all select child.deptld, child.totalStaff, child.parentDept from Dept as child join myCTE as parent on child.parentDept = parentdeptld ) select sum(totalStaff) as totalStaff from myCTE
which produces the expected result:
totalStaff |
---|
20 |
Clearly, we could have produced this particular result using a much simpler query, but, in general, recursion allows us to frame queries on hierarchical structures that would be difficult (or clumsy) to achieve in other ways. Most commercial dialects of SQL support common table expressions and support recursion, but the details may vary from the outline we’ve given here.
18.219.95.244