This chapter introduces recipes for expressing hierarchical relationships that you may have in your data. It is typical when working with hierarchical data to have more difficulty retrieving and displaying the data (as a hierarchy) than storing it. This is particularly true because of the inflexibility of SQL (SQL’s nonrecursive nature). When working with hierarchical queries, it is absolutely crucial that you take advantage of what your RDBMS supplies you to facilitate these operations; otherwise you will end up writing potentially less efficient queries and constructing convoluted data models to deal with the hierarchical data. For PostgreSQL users, the recursive WITH clause will most likely be added to later versions PostgreSQL, so it would behoove you to pay attention to the DB2 solutions to these queries.
This chapter will provide recipes to help you unravel the hierarchical structure of your data by taking advantage of the functions supplied by each of the RDBMSs. Before starting, examine table EMP and the hierarchical relationship between EMPNO and MGR:
select empno,mgr
from emp
order by 2
EMPNO MGR ---------- ---------- 7788 7566 7902 7566 7499 7698 7521 7698 7900 7698 7844 7698 7654 7698 7934 7782 7876 7788 7566 7839 7782 7839 7698 7839 7369 7902 7839
If you look carefully, you will see that each value for MGR is also an EMPNO, meaning the manager of each employee in table EMP is also an employee in table EMP and not stored somewhere else. The relationship between MGR and EMPNO is a parent–child relationship in that the value for MGR is the most immediate parent for a given EMPNO (it is also possible that the manager for a specific employee can have a manager herself, and those managers can in turn have managers, and so on, creating an n-tier hierarchy). If an employee has no manager, then MGR is NULL.
You want to include parent information along with data from child records. For example, you want to display each employee’s name along with the name of his manager. You want to return the following result set:
EMPS_AND_MGRS ------------------------------ FORD works for JONES SCOTT works for JONES JAMES works for BLAKE TURNER works for BLAKE MARTIN works for BLAKE WARD works for BLAKE ALLEN works for BLAKE MILLER works for CLARK ADAMS works for SCOTT CLARK works for KING BLAKE works for KING JONES works for KING SMITH works for FORD
Self join EMP on MGR and EMPNO to find the name of each employee’s manager. Then use your RDBMS’s supplied function(s) for string concatenation to generate the strings in the desired result set.
Self join on EMP. Then use the double vertical-bar (||) concatenation operator:
1 select a.ename || ' works for ' || b.ename as emps_and_mgrs 2 from emp a, emp b 3 where a.mgr = b.empno
The implementation is essentially the same for all the solutions. The difference lies only in the method of string concatenation, and thus one discussion will cover all of the solutions.
The key is the join between MGR and EMPNO. The fist step is to build a Cartesian product by joining EMP to itself (only a portion of the rows returned by the Cartesian product is shown below):
select a.empno, b.empno
from emp a, emp b
EMPNO MGR ----- ---------- 7369 7369 7369 7499 7369 7521 7369 7566 7369 7654 7369 7698 7369 7782 7369 7788 7369 7839 7369 7844 7369 7876 7369 7900 7369 7902 7369 7934 7499 7369 7499 7499 7499 7521 7499 7566 7499 7654 7499 7698 7499 7782 7499 7788 7499 7839 7499 7844 7499 7876 7499 7900 7499 7902 7499 7934
As you can see, by using a Cartesian product you are returning every possible EMPNO/EMPNO combination (such that it looks like the manager for EMPNO 7369 is all the other employees in the table, including EMPNO 7369).
The next step is to filter the results such that you return only each employee and his manager’s EMPNO. Accomplish this by joining on MGR and EMPNO:
1 select a.empno, b.empno mgr
2 from emp a, emp b
3 where a.mgr = b.empno
EMPNO MGR ---------- ---------- 7902 7566 7788 7566 7900 7698 7844 7698 7654 7698 7521 7698 7499 7698 7934 7782 7876 7788 7782 7839 7698 7839 7566 7839 7369 7902
Now that you have each employee and the EMPNO of his manager, you can return the name of each manager by simply selecting B.ENAME rather than B.EMPNO. If after some practice you have difficulty grasping how this works, you can use a scalar subquery rather than a self join to get the answer:
select a.ename,
(select b.ename
from emp b
where b.empno = a.mgr) as mgr
from emp a
ENAME MGR ---------- ---------- SMITH FORD ALLEN BLAKE WARD BLAKE JONES KING MARTIN BLAKE BLAKE KING CLARK KING SCOTT JONES KING TURNER BLAKE ADAMS SCOTT JAMES BLAKE FORD JONES MILLER CLARK
The scalar subquery version is equivalent to the self join, except for one row: employee KING is in the result set, but that is not the case with the self join. “Why not?” you might ask. Remember, NULL is never equal to anything, not even itself. In the self-join solution, you use an equi-join between EMPNO and MGR, thus filtering out any employees who have NULL for MGR. To see employee KING when using the self-join method, you must outer join as shown in the following two queries. The first solution uses the ANSI outer join while the second uses the Oracle outer-join syntax. The output is the same for both and is shown following the second query:
/* ANSI */
select a.ename, b.ename mgr
from emp a left join emp b
on (a.mgr = b.empno)
/* Oracle */
select a.ename, b.ename mgr
from emp a, emp b
where a.mgr = b.empno (+)
ENAME MGR ---------- ---------- FORD JONES SCOTT JONES JAMES BLAKE TURNER BLAKE MARTIN BLAKE WARD BLAKE ALLEN BLAKE MILLER CLARK ADAMS SCOTT CLARK KING BLAKE KING JONES KING SMITH FORD KING
Employee CLARK works for KING and to express that relationship you can use the first recipe in this chapter. What if employee CLARK was in turn a manager for another employee? Consider the following query:
select ename,empno,mgr
from emp
where ename in ('KING','CLARK','MILLER')
ENAME EMPNO MGR --------- -------- ------- CLARK 7782 7839 KING 7839 MILLER 7934 7782
As you can see, employee MILLER works for CLARK who in turn works for KING. You want to express the full hierarchy from MILLER to KING. You want to return the following result set:
LEAF___BRANCH___ROOT --------------------- MILLER-->CLARK-->KING
However, the single self-join approach from the previous recipe will not suffice to show the entire relationship from top to bottom. You could write a query that does two self joins, but what you really need is a general approach for traversing such hierarchies.
This recipe differs from the first recipe because there is now a three-tier relationship, as the title suggests. If your RDBMS does not supply functionality for traversing tree-structured data, then you can solve this problem using the techniques described for PostgreSQL and MySQL, but you must add an additional self join. DB2, SQL Server, and Oracle offer functions for expressing hierarchies. Thus self joins on those RDBMSs aren’t necessary, though they certainly work.
Use the recursive WITH clause to find MILLER’s manager, CLARK, then CLARK’s manager, KING. The SQL Server string concatenation operator + is used in this solution:
1 with x (tree,mgr,depth) 2 as ( 3 select cast(ename as varchar(100)), 4 mgr, 0 5 from emp 6 where ename = 'MILLER' 7 union all 8 select cast(x.tree+'-->'+e.ename as varchar(100)), 9 e.mgr, x.depth+1 10 from emp e, x 11 where x.mgr = e.empno 12 ) 13 select tree leaf___branch___root 14 from x 15 where depth = 2
The only modification necessary for this solution to work on DB2 is to use DB2’s concatenation operator, ||. Otherwise, the solution will work as is, on DB2 as well as SQL Server.
Use the function SYS_CONNECT_BY_PATH to return MILLER, MILLER’s manager, CLARK, then CLARK’s manager, KING. Use the CONNECT BY clause to walk the tree:
1 select ltrim( 2 sys_connect_by_path(ename,'-->'), 3 '-->') leaf___branch___root 4 from emp 5 where level = 3 6 start with ename = 'MILLER' 7 connect by prior mgr = empno
Self join on table EMP twice to return MILLER, MILLER’s manager, CLARK, then CLARK’s manager, KING. The following solution uses PostgreSQL’s concatenation operator, the double vertical-bar (||):
1 select a.ename||'-->'||b.ename 2 ||'-->'||c.ename as leaf___branch___root 3 from emp a, emp b, emp c 4 where a.ename = 'MILLER' 5 and a.mgr = b.empno 6 and b.mgr = c.empno
For MySQL users, simply use the CONCAT function; this solution will work for PostgreSQL as well.
The approach here is to start at the leaf node and walk your way up to the root (as useful practice, try walking in the other direction). The upper part of the UNION ALL simply finds the row for employee MILLER (the leaf node). The lower part of the UNION ALL finds the employee who is MILLER’s manager, then finds that person’s manager, and this process of finding the “manager’s manager” repeats until processing stops at the highest-level manager (the root node). The value for DEPTH starts at 0 and increments automatically by 1 each time a manager is found. DEPTH is a value that DB2 maintains for you when you execute a recursive query.
For an interesting and in-depth introduction to the WITH clause with focus on its use recursively, see Jonathan Gennick’s article "Understanding the WITH Clause” at http://gennick.com/with.htm.
Next, the second query of the UNION ALL joins the recursive view X to table EMP, to define the parent–child relationship. The query at this point, using SQL Server’s concatenation operator, is as follows:
with x (tree,mgr,depth)
as (
select cast(ename as varchar(100)),
mgr, 0
from emp
where ename = 'MILLER'
union all
select cast(x.tree+'-->'+e.ename as varchar(100)),
e.mgr, x.depth+1
from emp e, x
where x.mgr = e.empno
)
select tree leaf___branch___root
from x
TREE DEPTH ---------- ---------- MILLER 0 CLARK 1 KING 2
At this point, the heart of the problem has been solved; starting from MILLER, return the full hierarchical relationship from bottom to top. What’s left then is merely formatting. Since the tree traversal is recursive, simply concatenate the current ENAME from EMP to the one before it, which gives you the following result set:
with x (tree,mgr,depth)
as (
select cast(ename as varchar(100)),
mgr, 0
from emp
where ename = 'MILLER'
union all
select cast(x.tree+'-->'+e.ename as varchar(100)),
e.mgr, x.depth+1
from emp e, x
where x.mgr = e.empno
)
select depth, tree
from x
DEPTH TREE ----- --------------------------- 0 MILLER 1 MILLER-->CLARK 2 MILLER-->CLARK-->KING
The final step is to keep only the last row in the hierarchy. There are several ways to do this, but the solution uses DEPTH to determine when the root is reached (obviously, if CLARK has a manager other than KING, the filter on DEPTH would have to change; for a more generic solution that requires no such filter, see the next recipe).
The CONNECT BY clause does all the work in the Oracle solution. Starting with MILLER, you walk all the way to KING without the need for any joins. The expression in the CONNECT BY clause defines the relationship of the data and how the tree will be walked:
select ename
from emp
start with ename = 'MILLER'
connect by
prior mgr = empno
ENAME -------- MILLER CLARK KING
The keyword PRIOR lets you access values from the previous record in the hierarchy. Thus, for any given EMPNO you can use PRIOR MGR to access that employee’s manager number. When you see a clause such as CONNECT BY PRIOR MGR = EMPNO, think of that clause as expressing a join between, in this case, parent and child.
For more on CONNECT BY and related features, see the following Oracle Technology Network articles: “Querying Hierarchies: Top-of-the-Line Support” at http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby.html, and “New CONNECT BY Features in Oracle Database 10g"at http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby_10g.html.
At this point you have successfully displayed the full hierarchy starting from MILLER and ending at KING. The problem is for the most part solved. All that remains is the formatting. Use the function SYS_CONNECT_BY_PATH to append each ENAME to the one before it:
select sys_connect_by_path(ename,'-->') tree
from emp
start with ename = 'MILLER'
connect by prior mgr = empno
TREE --------------------------- -->MILLER -->MILLER-->CLARK -->MILLER-->CLARK-->KING
Because you are interested in only the complete hierarchy, you can filter on the pseudo-column LEVEL (a more generic approach is shown in the next recipe):
select sys_connect_by_path(ename,'-->') tree
from emp
where level = 3
start with ename = 'MILLER'
connect by prior mgr = empno
TREE --------------------------- -->MILLER-->CLARK-->KING
The final step is to use the LTRIM function to remove the leading “-->” from the result set.
Without built-in support for hierarchical queries, you must self join n times to return the whole tree (where n is the number of nodes between the leaf and the root, including the root itself; in this example, relative to MILLER, CLARK is a branch node and KING is the root node, so the distance is two nodes, and n = 2). This solution simply uses the technique from the previous recipe and adds one more self join:
select a.ename as leaf,
b.ename as branch,
c.ename as root
from emp a, emp b, emp c
where a.ename = 'MILLER'
and a.mgr = b.empno
and b.mgr = c.empno
LEAF BRANCH ROOT --------- ---------- ----- MILLER CLARK KING
The next and last step is to format the output using the || concatenation operator for PostgreSQL or the CONCAT function for MySQL. The drawback to this kind of query is that if the hierarchy changes—for example, if there is another node between CLARK and KING—the query would need to have yet another join to return the whole tree. This is why it is such an advantage to have and use built-in functions for hierarchies.
You want to return a result set that describes the hierarchy of an entire table. In the case of the EMP table, employee KING has no manager, so KING is the root node. You want to display, starting from KING, all employees under KING and all employees (if any) under KING’s subordinates. Ultimately, you want to return the following result set:
EMP_TREE ------------------------------ KING KING - BLAKE KING - BLAKE - ALLEN KING - BLAKE - JAMES KING - BLAKE - MARTIN KING - BLAKE - TURNER KING - BLAKE - WARD KING - CLARK KING - CLARK - MILLER KING - JONES KING - JONES - FORD KING - JONES - FORD - SMITH KING - JONES - SCOTT KING - JONES - SCOTT - ADAMS
Use the recursive WITH clause to start building the hierarchy at KING and then ultimately display all the employees. The solution following uses the DB2 concatenation operator “||”. SQL Server users use the concatenation operator +. Other than the concatenation operators, the solution will work as-is on both RDBMSs:
1 with x (ename,empno) 2 as ( 3 select cast(ename as varchar(100)),empno 4 from emp 5 where mgr is null 6 union all 7 select cast(x.ename||' - '||e.ename as varchar(100)), 8 e.empno 9 from emp e, x 10 where e.mgr = x.empno 11 ) 12 select ename as emp_tree 13 from x 14 order by 1
Use the CONNECT BY function to define the hierarchy. Use SYS_CONNECT_BY_PATH function to format the output accordingly:
1 select ltrim( 2 sys_connect_by_path(ename,' - '), 3 ' - ') emp_tree 4 from emp 5 start with mgr is null 6 connect by prior empno=mgr 7 order by 1
This solution differs from that in the previous recipe in that it includes no filter on the LEVEL pseudo-column. Without the filter, all possible trees (where PRIOR EMPNO=MGR) are displayed.
Use three UNIONs and multiple self joins:
1 select emp_tree 2 from ( 3 select ename as emp_tree 4 from emp 5 where mgr is null 6 union 7 select a.ename||' - '||b.ename 8 from emp a 9 join 10 emp b on (a.empno=b.mgr) 11 where a.mgr is null 12 union 13 select rtrim(a.ename||' - '||b.ename 14 ||' - '||c.ename,' - ') 15 from emp a 16 join 17 emp b on (a.empno=b.mgr) 18 left join 19 emp c on (b.empno=c.mgr) 20 where a.ename = 'KING' 21 union 22 select rtrim(a.ename||' - '||b.ename||' - '|| 23 c.ename||' - '||d.ename,' - ') 24 from emp a 25 join 26 emp b on (a.empno=b.mgr) 27 join 28 emp c on (b.empno=c.mgr) 29 left join 30 emp d on (c.empno=d.mgr) 31 where a.ename = 'KING' 32 ) x 33 where tree is not null 34 order by 1
Use three UNIONs and multiple self joins:
1 select emp_tree 2 from ( 3 select ename as emp_tree 4 from emp 5 where mgr is null 6 union 7 select concat(a.ename,' - ',b.ename) 8 from emp a 9 join 10 emp b on (a.empno=b.mgr) 11 where a.mgr is null 12 union 13 select concat(a.ename,' - ', 14 b.ename,' - ',c.ename) 15 from emp a 16 join 17 emp b on (a.empno=b.mgr) 18 left join 19 emp c on (b.empno=c.mgr) 20 where a.ename = 'KING' 21 union 22 select concat(a.ename,' - ',b.ename,' - ', 23 c.ename,' - ',d.ename) 24 from emp a 25 join 26 emp b on (a.empno=b.mgr) 27 join 28 emp c on (b.empno=c.mgr) 29 left join 30 emp d on (c.empno=d.mgr) 31 where a.ename = 'KING' 32 ) x 33 where tree is not null 34 order by 1
The first step is to identify the root row (employee KING) in the upper part of the UNION ALL in the recursive view X. The next step is to find KING’s subordinates, and their subordinates if there are any, by joining recursive view X to table EMP. Recursion will continue until you’ve returned all employees. Without the formatting you see in the final result set, the result set returned by the recursive view X is shown below:
with x (ename,empno)
as (
select cast(ename as varchar(100)),empno
from emp
where mgr is null
union all
select cast(e.ename as varchar(100)),e.empno
from emp e, x
where e.mgr = x.empno
)
select ename emp_tree
from x
EMP_TREE ---------------- KING JONES SCOTT ADAMS FORD SMITH BLAKE ALLEN WARD MARTIN TURNER JAMES CLARK MILLER
All the rows in the hierarchy are returned (which can be useful), but without the formatting you cannot tell who the managers are. By concatenating each employee to her manager, you return more meaningful output. Produce the desired output simply by using
cast(x.ename+','+e.ename as varchar(100))
in the SELECT clause of the lower portion of the UNION ALL in recursive view X.
The WITH clause is extremely useful in solving this type of problem, because the hierarchy can change (for example, leaf nodes become branch nodes) without any need to modify the query.
The CONNECT BY clause returns the rows in the hierarchy. The START WITH clause defines the root row. If you run the solution without SYS_CONNECT_BY_PATH, you can see that the correct rows are returned (which can be useful), but not formatted to express the relationship of the rows:
select ename emp_tree
from emp
start with mgr is null
connect by prior empno = mgr
EMP_TREE ----------------- KING JONES SCOTT ADAMS FORD SMITH BLAKE ALLEN WARD MARTIN TURNER JAMES CLARK MILLER
By using the pseudo-column LEVEL and the function LPAD, you can see the hierarchy more clearly, and you can ultimately see why SYS_CONNECT_BY_PATH returns the results that you see in the desired output shown earlier:
select lpad('.',2*level,'.')||ename emp_tree from emp start with mgr is null connect by prior empno = mgr EMP_TREE ----------------- ..KING ....JONES ......SCOTT ........ADAMS ......FORD ........SMITH ....BLAKE ......ALLEN ......WARD ......MARTIN ......TURNER ......JAMES ....CLARK ......MILLER
The indentation in this output indicates who the managers are by nesting subordinates under their superiors. For example, KING works for no one. JONES works for KING. SCOTT works for JONES. ADAMS works for SCOTT.
If you look at the corresponding rows from the solution when using SYS_CONNECT_BY_PATH, you will see that SYS_CONNECT_BY_PATH rolls up the hierarchy for you. When you get to a new node, you see all the prior nodes as well:
KING KING - JONES KING - JONES - SCOTT KING - JONES - SCOTT - ADAMS
If you are on Oracle8i Database or earlier, you can use the PostgreSQL solution to this problem. Alternatively, because CONNECT BY is available on older versions of Oracle, you can simply use LEVEL and RPAD/ LPAD for formatting (although to reproduce the output created by SYS_CONNECT_BY_PATH would require a bit more work).
With the exception of string concatenation in the SELECT clauses, the solutions are the same for both PostgreSQL and MySQL. The first step is to determine the maximum number of nodes for any one branch. You have to do this manually, before you write the query. If you examine the data in the EMP table, you will see that employees ADAM and SMITH are the leaf nodes at the greatest depth (you may wish to look at the Oracle discussion where you’ll find a nicely formatted tree of the EMP hierarchy). If you look at employee ADAMS, you see that ADAMS works for SCOTT who in turn works for JONES who in turn works for KING, so the depth is 4. To be able to express a hierarchy with a depth of four, you must self join four instances of table EMP, and you must write a four-part UNION query. The results of the four-way self join (which is the lower part of the last UNION, from top to bottom) is shown below (using PostgreSQL syntax; MySQL users, simply substitute “||” for the CONCAT function call):
select rtrim(a.ename||' - '||b.ename||' - '||
c.ename||' - '||d.ename,' - ') as max_depth_4
from emp a
join
emp b on (a.empno=b.mgr)
join
emp c on (b.empno=c.mgr)
left join
emp d on (c.empno=d.mgr)
where a.ename = 'KING'
MAX_DEPTH_4 ----------------------------- KING - JONES - FORD - SMITH KING - JONES - SCOTT - ADAMS KING - BLAKE - TURNER KING - BLAKE - ALLEN KING - BLAKE - WARD KING - CLARK - MILLER KING - BLAKE - MARTIN KING - BLAKE - JAMES
The filter on A.ENAME is necessary to ensure that the root row is KING and no other employee. If you look at the result set above and compare it with the final result set, you’ll see that there are some three-deep hierarchies not returned: KING - JONES - FORD and KING - JONES - SCOTT. To include those rows in the final result set, you need to write another query similar to the one above, but with one less join (self joining only three instances of table EMP rather than four). The result set of this query is shown below:
select rtrim(a.ename||' - '||b.ename
||' - '||c.ename,' - ') as max_depth_3
from emp a
join
emp b on (a.empno=b.mgr)
left join
emp c on (b.empno=c.mgr)
where a.ename = 'KING'
MAX_DEPTH_3 --------------------------- KING - BLAKE - ALLEN KING - BLAKE - WARD KING - BLAKE - MARTIN KING - JONES - SCOTT KING - BLAKE - TURNER KING - BLAKE - JAMES KING - JONES - FORD KING - CLARK - MILLER
Like the query before it, the filter on A.ENAME is necessary to ensure the root row node is KING. You’ll notice some overlapping rows between the query above and the four-way EMP join. To get rid of the redundant rows, simply UNION the two queries:
select rtrim(a.ename||' - '||b.ename
||' - '||c.ename,' - ') as partial_tree
from emp a
join
emp b on (a.empno=b.mgr)
left join
emp c on (b.empno=c.mgr)
where a.ename = 'KING'
union
select rtrim(a.ename||' - '||b.ename||' - '||
c.ename||' - '||d.ename,' - ')
from emp a
join
emp b on (a.empno=b.mgr)
join
emp c on (b.empno=c.mgr)
left join
emp d on (c.empno=d.mgr)
where a.ename = 'KING'
PARTIAL_TREE ------------------------------- KING - BLAKE - ALLEN KING - BLAKE - JAMES KING - BLAKE - MARTIN KING - BLAKE - TURNER KING - BLAKE - WARD KING - CLARK - MILLER KING - JONES - FORD KING - JONES - FORD - SMITH KING - JONES - SCOTT KING - JONES - SCOTT - ADAMS
At this point the tree is almost complete. The next step is to return rows that represent a two-deep hierarchy with KING as the root node (i.e., employees who work directly for KING). The query to return those rows is shown below:
select a.ename||' - '||b.ename as max_depth_2
from emp a
join
emp b on (a.empno=b.mgr)
where a.mgr is null
MAX_DEPTH_2 --------------- KING - JONES KING - BLAKE KING - CLARK
The next step is to UNION the above query, to the PARTIAL_TREE union:
select a.ename||' - '||b.ename as partial_tree
from emp a
join
emp b on (a.empno=b.mgr)
where a.mgr is null
union
select rtrim(a.ename||' - '||b.ename
||' - '||c.ename,' - ')
from emp a
join
emp b on (a.empno=b.mgr)
left join
emp c on (b.empno=c.mgr)
where a.ename = 'KING'
union
select rtrim(a.ename||' - '||b.ename||' - '||
c.ename||' - '||d.ename,' - ')
from emp a
join
emp b on (a.empno=b.mgr)
join
emp c on (b.empno=c.mgr)
left join
emp d on (c.empno=d.mgr)
where a.ename = 'KING'
PARTIAL_TREE ---------------------------------- KING - BLAKE KING - BLAKE - ALLEN KING - BLAKE - JAMES KING - BLAKE - MARTIN KING - BLAKE - TURNER KING - BLAKE - WARD KING - CLARK KING - CLARK - MILLER KING - JONES KING - JONES - FORD KING - JONES - FORD - SMITH KING - JONES - SCOTT KING - JONES - SCOTT - ADAMS
The final step is to UNION KING to the top of PARTIAL_TREE to return the desired result set.
You want to find all the employees who work for JONES, either directly or indirectly (i.e., they work for someone who works for JONES). The list of employees under JONES is shown below (JONES is included in the result set):
ENAME ---------- JONES SCOTT ADAMS FORD SMITH
Being able to move to the absolute top or bottom of a tree is extremely useful. For this solution there is no special formatting necessary. The goal is to simply return all employees who work under employee JONES, including JONES himself. This type of query really shows the usefulness of recursive SQL extensions like Oracle’s CONNECT BY and SQL Server’s/DB2’s WITH clause.
Use the recursive WITH clause to find all employees under JONES. Begin with JONES by specifying WHERE ENAME = ‘JONES’ in the first of the two union queries:
1 with x (ename,empno) 2 as ( 3 select ename,empno 4 from emp 5 where ename = 'JONES' 6 union all 7 select e.ename, e.empno 8 from emp e, x 9 where x.empno = e.mgr 10 ) 11 select ename 12 from x
Use the CONNECT BY clause and specify START WITH ENAME = ‘JONES’ to find all the employees under JONES:
1 select ename 2 from emp 3 start with ename = 'JONES' 4 connect by prior empno = mgr
You must know in advance how many nodes there are in the tree. The following queries show how to determine the depth of the hierarchy:
/* find JONES' EMPNO */
select ename,empno,mgr
from emp
where ename = 'JONES'
ENAME EMPNO MGR ---------- ----------- --------- JONES 7566 7839/* are there any employees who work directly under JONES? */
select count(*)
from emp
where mgr = 7566
COUNT(*) --------- 2/* there are two employees under JONES, find their EMPNOs */
select ename,empno,mgr
from emp
where mgr = 7566
ENAME EMPNO MGR ---------- ----------- ----------- SCOTT 7788 7566 FORD 7902 7566/* are there any employees under SCOTT or FORD? */
select count(*)
from emp
where mgr in (7788,7902)
COUNT(*) --------- 2/* there are two employees under SCOTT or FORD, find their EMPNOs */
select ename,empno,mgr
from emp
where mgr in (7788,7902)
ENAME EMPNO MGR --------- ----------- -------- SMITH 7369 7902 ADAMS 7876 7788/* are there any employees under SMITH or ADAMS? */
select count(*)
from emp
where mgr in (7369,7876)
COUNT(*) ---------- 0
The hierarchy starting from JONES ends with employees SMITH and ADAMS. That makes the hierarchy three levels deep. Now that you know the depth, you can begin to traverse the hierarchy from top to bottom.
First, self join table EMP twice. Then unpivot inline view X to transform three columns with two rows into one column with six rows (in PostgreSQL, you can use GENERATE_SERIES(1,6) as an alternative to querying the T100 pivot table):
1 select distinct 2 case t100.id 3 when 1 then root 4 when 2 then branch 5 else leaf 6 end as JONES_SUBORDINATES 7 from ( 8 select a.ename as root, 9 b.ename as branch, 10 c.ename as leaf 11 from emp a, emp b, emp c 12 where a.ename = 'JONES' 13 and a.empno = b.mgr 14 and b.empno = c.mgr 15 ) x, 16 t100 17 where t100.id <= 6
As an alternative, you can use views and UNION the results. If you create the following views:
create view v1 as select ename,mgr,empno from emp where ename = 'JONES' create view v2 as select ename,mgr,empno from emp where mgr = (select empno from v1) create view v3 as select ename,mgr,empno from emp where mgrin (select empno from v2)
the solution then becomes:
select ename from v1 union select ename from v2 union select ename from v3
The recursive WITH clause makes this a relatively easy problem to solve. The first part of the WITH clause, the upper part of the UNION ALL, returns the row for employee JONES. You need to return ENAME to see the name and EMPNO so you can use it to join on. The lower part of the UNION ALL recursively joins EMP.MGR to X.EMPNO. The join condition will be applied until the result set is exhausted.
The START WTH clause tells the query to make JONES the root node. The condition in the CONNECT BY clause drives the tree walk and will run until the condition is no longer true.
The technique used here is the same as that of the second recipe in this chapter, “Expressing a Child-Parent-Grandparent Relationship.” A major drawback is that you must know in advance the depth of the hierarchy.
You want to determine what type of node a given row is: a leaf, branch, or root. For this example, a leaf node is an employee who is not a manager. A branch node is an employee who is both a manager and also has a manager. A root node is an employee without a manager. You want to return 1 (TRUE) or 0 (FALSE) to reflect the status of each row in the hierarchy. You want to return the following result set:
ENAME IS_LEAF IS_BRANCH IS_ROOT ---------- ---------- ---------- ---------- KING 0 0 1 JONES 0 1 0 SCOTT 0 1 0 FORD 0 1 0 CLARK 0 1 0 BLAKE 0 1 0 ADAMS 1 0 0 MILLER 1 0 0 JAMES 1 0 0 TURNER 1 0 0 ALLEN 1 0 0 WARD 1 0 0 MARTIN 1 0 0 SMITH 1 0 0
It is important to realize that the EMP table is modeled in a tree hierarchy, not a recursive hierarchy, the value for MGR for root nodes is NULL. If EMP was modeled to use a recursive hierarchy, root nodes would be self-referencing (i.e., the value for MGR for employee KING would be KING’s EMPNO). I find self-referencing to be counterintuitive and thus am using NULL values for root nodes’ MGR. For Oracle users using CONNECT BY and DB2/SQL Server users using WITH, you’ll find tree hierarchies easier to work with and potentially more efficient than recursive hierarchies. If you are in a situation where you have a recursive hierarchy and are using CONNECT BY or WITH, watch out: you can end up with a loop in your SQL. You need to code around such loops if you are stuck with recursive hierarchies.
Use three scalar subqueries to determine the correct “Boolean” value (either a 1 or a 0) to return for each node type:
1 select e.ename, 2 (select sign(count(*)) from emp d 3 where 0 = 4 (select count(*) from emp f 5 where f.mgr = e.empno)) as is_leaf, 6 (select sign(count(*)) from emp d 7 where d.mgr = e.empno 8 and e.mgr is not null) as is_branch, 9 (select sign(count(*)) from emp d 10 where d.empno = e.empno 11 and d.mgr is null) as is_root 12 from emp e 13 order by 4 desc,3 desc
The scalar subquery solution will work for Oracle as well, and should be used if you are on a version of Oracle prior to Oracle Database 10g. The following solution highlights built-in functions provided by Oracle (that were introduced in Oracle Database 10g) to identify root and leaf rows. The functions are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively:
1 select ename, 2 connect_by_isleaf is_leaf, 3 (select count(*) from emp e 4 where e.mgr = emp.empno 5 and emp.mgr is not null 6 and rownum = 1) is_branch, 7 decode(ename,connect_by_root(ename),1,0) is_root 8 from emp 9 start with mgr is null 10 connect by prior empno = mgr 11 order by 4 desc, 3 desc
This solution simply applies the rules defined in the “Problem” section to determine leaves, branches, and roots. The first step is to find determine whether an employee is a leaf node. If the employee is not a manager (no one works under her), then she is a leaf node. The first scalar subquery, IS_LEAF, is shown below:
select e.ename,
(select sign(count(*)) from emp d
where 0 =
(select count(*) from emp f
where f.mgr = e.empno)) as is_leaf
from emp e
order by 2 desc
ENAME IS_LEAF ----------- -------- SMITH 1 ALLEN 1 WARD 1 ADAMS 1 TURNER 1 MARTIN 1 JAMES 1 MILLER 1 JONES 0 BLAKE 0 CLARK 0 FORD 0 SCOTT 0 KING 0
Because the output for IS_LEAF should be a 0 or 1, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you would get 14 instead of 1 for leaf rows. As an alternative, you can use a table with only one row to count against, because you only want to return 0 or 1. For example:
select e.ename,
(select count(*) from t1 d
where not exists
(select null from emp f
where f.mgr = e.empno)) as is_leaf
from emp e
order by 2 desc
ENAME IS_LEAF ---------- ---------- SMITH 1 ALLEN 1 WARD 1 ADAMS 1 TURNER 1 MARTIN 1 JAMES 1 MILLER 1 JONES 0 BLAKE 0 CLARK 0 FORD 0 SCOTT 0 KING 0
The next step is to find branch nodes. If an employee is a manager (someone works for them), and they also happen to work for someone else, then the employee is a branch node. The results of the scalar subquery IS_BRANCH are shown below:
select e.ename,
(select sign(count(*)) from emp d
where d.mgr = e.empno
and e.mgr is not null) as is_branch
from emp e
order by 2 desc
ENAME IS_BRANCH ----------- --------- JONES 1 BLAKE 1 SCOTT 1 CLARK 1 FORD 1 SMITH 0 TURNER 0 MILLER 0 JAMES 0 ADAMS 0 KING 0 ALLEN 0 MARTIN 0 WARD 0
Again, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you will get (potentially) values greater than 1 when a node is a branch. Like scalar subquery IS_LEAF, you can use a table with one row to avoid using SIGN. The following solution uses a one-row table named dual:
select e.ename,
(select count(*) from t1 t
where exists (
select null from emp f
where f.mgr = e.empno
and e.mgr is not null)) as is_branch
from emp e
order by 2 desc
ENAME IS_BRANCH --------------- ---------- JONES 1 BLAKE 1 SCOTT 1 CLARK 1 FORD 1 SMITH 0 TURNER 0 MILLER 0 JAMES 0 ADAMS 0 KING 0 ALLEN 0 MARTIN 0 WARD 0
The last step is to find the root nodes. A root node is defined as an employee who is a manager but who does not work for anyone else. In table EMP, only KING is a root node. Scalar subquery IS_ROOT is shown below:
select e.ename,
(select sign(count(*)) from emp d
where d.empno = e.empno
and d.mgr is null) as is_root
from emp e
order by 2 desc
ENAME IS_ROOT ---------- --------- KING 1 SMITH 0 ALLEN 0 WARD 0 JONES 0 TURNER 0 JAMES 0 MILLER 0 FORD 0 ADAMS 0 MARTIN 0 BLAKE 0 CLARK 0 SCOTT 0
Because EMP is a small 14-row table, it is easy to see that employee KING is the only root node, so in this case taking the SIGN of the COUNT(*) operation is not strictly necessary. If there can be multiple root nodes, then you can use SIGN, or you can use a one-row table in the scalar subquery as is shown earlier for IS_BRANCH and IS_LEAF.
For those of you on versions of Oracle prior to Oracle Database 10g, you can follow the discussion for the other RDBMSs, as that solution will work (without modifications) in Oracle. If you are on Oracle Database 10g or later, you may want to take advantage of two functions to make identifying root and leaf nodes a simple task: they are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively. As of the time of this writing, it is necessary to use CONNECT BY in your SQL statement in order for you to be able to use CONNECT_BY_ROOT and CONNECT_BY_ISLEAF. The first step is to find the leaf nodes by using CONNECT_BY_ISLEAF as follows:
select ename,
connect_by_isleaf is_leaf
from emp
start with mgr is null
connect by prior empno = mgr
order by 2 desc
ENAME IS_LEAF ---------- ---------- ADAMS 1 SMITH 1 ALLEN 1 TURNER 1 MARTIN 1 WARD 1 JAMES 1 MILLER 1 KING 0 JONES 0 BLAKE 0 CLARK 0 FORD 0 SCOTT 0
The next step is to use a scalar subquery to find the branch nodes. Branch nodes are employees who are managers but who also work for someone else:
select ename,
(select count(*) from emp e
where e.mgr = emp.empno
and emp.mgr is not null
and rownum = 1) is_branch
from emp
start with mgr is null
connect by prior empno = mgr
order by 2 desc
ENAME IS_BRANCH ---------- ---------- JONES 1 SCOTT 1 BLAKE 1 FORD 1 CLARK 1 KING 0 MARTIN 0 MILLER 0 JAMES 0 TURNER 0 WARD 0 ADAMS 0 ALLEN 0 SMITH 0
The filter on ROWNUM is necessary to ensure that you return a count of 1 or 0, and nothing else.
The last step is to identify the root nodes by using the function CONNECT_BY_ROOT. The solution finds the ENAME for the root node and compares it with all the rows returned by the query. If there is a match, that row is the root node:
select ename,
decode(ename,connect_by_root(ename),1,0) is_root
from emp
start with mgr is null
connect by prior empno = mgr
order by 2 desc
ENAME IS_ROOT ---------- ---------- KING 1 JONES 0 SCOTT 0 ADAMS 0 FORD 0 SMITH 0 BLAKE 0 ALLEN 0 WARD 0 MARTIN 0 TURNER 0 JAMES 0 CLARK 0 MILLER 0
If using Oracle9i Database or later, you can use the SYS_CONNECT_BY_PATH function as an alternative to CONNECT_BY_ROOT. The Oracle9i Database version of the preceding would be:
select ename,
decode(substr(root,1,instr(root,',')-1),NULL,1,0) root
from (
select ename,
ltrim(sys_connect_by_path(ename,','),',') root
from emp
start with mgr is null
connect by prior empno=mgr
)
ENAME ROOT ---------- ---- KING 1 JONES 0 SCOTT 0 ADAMS 0 FORD 0 SMITH 0 BLAKE 0 ALLEN 0 WARD 0 MARTIN 0 TURNER 0 JAMES 0 CLARK 0 MILLER 0
The SYS_CONNECT_BY_PATH function rolls up a hierarchy starting from the root value as is shown below:
select ename,
ltrim(sys_connect_by_path(ename,','),',') path
from emp
start with mgr is null
connect by prior empno=mgr
ENAME PATH ---------- ---------------------------- KING KING JONES KING,JONES SCOTT KING,JONES,SCOTT ADAMS KING,JONES,SCOTT,ADAMS FORD KING,JONES,FORD SMITH KING,JONES,FORD,SMITH BLAKE KING,BLAKE ALLEN KING,BLAKE,ALLEN WARD KING,BLAKE,WARD MARTIN KING,BLAKE,MARTIN TURNER KING,BLAKE,TURNER JAMES KING,BLAKE,JAMES CLARK KING,CLARK MILLER KING,CLARK,MILLER
To get the root row, simply substring out the first ENAME in PATH:
select ename,
substr(root,1,instr(root,',')-1) root
from (
select ename,
ltrim(sys_connect_by_path(ename,','),',') root
from emp
start with mgr is null
connect by prior empno=mgr
)
ENAME ROOT ---------- ---------- KING JONES KING SCOTT KING ADAMS KING FORD KING SMITH KING BLAKE KING ALLEN KING WARD KING MARTIN KING TURNER KING JAMES KING CLARK KING MILLER KING
The last step is to flag the result from the ROOT column if it is NULL; that is your root row.
3.21.246.123