Chapter 13. Hierarchical Queries

This chapter introduces recipes for expressing hierarchical relationships that you may have in your data. It is typical when working with hierarchical data to have more difficulty retrieving and displaying the data (as a hierarchy) than storing it. This is particularly true because of the inflexibility of SQL (SQL’s nonrecursive nature). When working with hierarchical queries, it is absolutely crucial that you take advantage of what your RDBMS supplies you to facilitate these operations; otherwise you will end up writing potentially less efficient queries and constructing convoluted data models to deal with the hierarchical data. For PostgreSQL users, the recursive WITH clause will most likely be added to later versions PostgreSQL, so it would behoove you to pay attention to the DB2 solutions to these queries.

This chapter will provide recipes to help you unravel the hierarchical structure of your data by taking advantage of the functions supplied by each of the RDBMSs. Before starting, examine table EMP and the hierarchical relationship between EMPNO and MGR:

	select empno,mgr
	  from emp
	order by 2

	     EMPNO        MGR
	---------- ----------
	      7788       7566
	      7902       7566
	      7499       7698
	      7521       7698
	      7900       7698
	      7844       7698
	      7654       7698
	      7934       7782
	      7876       7788
	      7566       7839
	      7782       7839
	      7698       7839
	      7369       7902
	      7839

If you look carefully, you will see that each value for MGR is also an EMPNO, meaning the manager of each employee in table EMP is also an employee in table EMP and not stored somewhere else. The relationship between MGR and EMPNO is a parent–child relationship in that the value for MGR is the most immediate parent for a given EMPNO (it is also possible that the manager for a specific employee can have a manager herself, and those managers can in turn have managers, and so on, creating an n-tier hierarchy). If an employee has no manager, then MGR is NULL.

13.1. Expressing a Parent-Child Relationship

Problem

You want to include parent information along with data from child records. For example, you want to display each employee’s name along with the name of his manager. You want to return the following result set:

	EMPS_AND_MGRS
	------------------------------
	FORD works for JONES
	SCOTT works for JONES
	JAMES works for BLAKE
	TURNER works for BLAKE
	MARTIN works for BLAKE
	WARD works for BLAKE
	ALLEN works for BLAKE
	MILLER works for CLARK
	ADAMS works for SCOTT
	CLARK works for KING
	BLAKE works for KING
	JONES works for KING
	SMITH works for FORD

Solution

Self join EMP on MGR and EMPNO to find the name of each employee’s manager. Then use your RDBMS’s supplied function(s) for string concatenation to generate the strings in the desired result set.

DB2, Oracle, and PostgreSQL

Self join on EMP. Then use the double vertical-bar (||) concatenation operator:

	1 select a.ename || ' works for ' || b.ename as emps_and_mgrs
	2   from emp a, emp b
	3  where a.mgr = b.empno

MySQL

Self join on EMP. Then use the concatenation function CONCAT:

	1 select concat(a.ename, ' works for ',b.ename) as emps_and_mgrs
	2   from emp a, emp b
	3  where a.mgr = b.empno

SQL Server

Self join on EMP. Then use the plus sign (+) as the concatenation operator:

	1 select a.ename + ' works for ' + b.ename as emps_and_mgrs
	2   from emp a, emp b
	3  where a.mgr = b.empno

Discussion

The implementation is essentially the same for all the solutions. The difference lies only in the method of string concatenation, and thus one discussion will cover all of the solutions.

The key is the join between MGR and EMPNO. The fist step is to build a Cartesian product by joining EMP to itself (only a portion of the rows returned by the Cartesian product is shown below):

	select a.empno, b.empno
	  from emp a, emp b

	EMPNO        MGR
	----- ----------
	 7369       7369
	 7369       7499
	 7369       7521
	 7369       7566
	 7369       7654
	 7369       7698
	 7369       7782
	 7369       7788
	 7369       7839
	 7369       7844
	 7369       7876
	 7369       7900
	 7369       7902
	 7369       7934
	 7499       7369
	 7499       7499
	 7499       7521
	 7499       7566
	 7499       7654
	 7499       7698
	 7499       7782
	 7499       7788
	 7499       7839
	 7499       7844
	 7499       7876
	 7499       7900
	 7499       7902
	 7499       7934

As you can see, by using a Cartesian product you are returning every possible EMPNO/EMPNO combination (such that it looks like the manager for EMPNO 7369 is all the other employees in the table, including EMPNO 7369).

The next step is to filter the results such that you return only each employee and his manager’s EMPNO. Accomplish this by joining on MGR and EMPNO:

	1 select a.empno, b.empno mgr
	2   from emp a, emp b
	3  where a.mgr = b.empno

	     EMPNO        MGR
	---------- ----------
	      7902       7566
	      7788       7566
	      7900       7698
	      7844       7698
	      7654       7698
	      7521       7698
	      7499       7698
	      7934       7782
	      7876       7788
	      7782       7839
	      7698       7839
	      7566       7839
	      7369       7902

Now that you have each employee and the EMPNO of his manager, you can return the name of each manager by simply selecting B.ENAME rather than B.EMPNO. If after some practice you have difficulty grasping how this works, you can use a scalar subquery rather than a self join to get the answer:

	select a.ename,
	       (select b.ename
	          from emp b
	         where b.empno = a.mgr) as mgr
	  from emp a

	ENAME      MGR
	---------- ----------
	SMITH      FORD
	ALLEN      BLAKE
	WARD       BLAKE
	JONES      KING
	MARTIN     BLAKE
	BLAKE      KING
	CLARK      KING
	SCOTT      JONES
	KING
	TURNER     BLAKE
	ADAMS      SCOTT
	JAMES      BLAKE
	FORD       JONES
	MILLER     CLARK

The scalar subquery version is equivalent to the self join, except for one row: employee KING is in the result set, but that is not the case with the self join. “Why not?” you might ask. Remember, NULL is never equal to anything, not even itself. In the self-join solution, you use an equi-join between EMPNO and MGR, thus filtering out any employees who have NULL for MGR. To see employee KING when using the self-join method, you must outer join as shown in the following two queries. The first solution uses the ANSI outer join while the second uses the Oracle outer-join syntax. The output is the same for both and is shown following the second query:

	/* ANSI */
	select a.ename, b.ename mgr
	  from emp a left join emp b
	    on (a.mgr = b.empno)

	/* Oracle */
	select a.ename, b.ename mgr
	  from emp a, emp b
	 where a.mgr = b.empno (+)

	ENAME      MGR
	---------- ----------
	FORD       JONES
	SCOTT      JONES
	JAMES      BLAKE
	TURNER     BLAKE
	MARTIN     BLAKE
	WARD       BLAKE
	ALLEN      BLAKE
	MILLER     CLARK
	ADAMS      SCOTT
	CLARK      KING
	BLAKE      KING
	JONES      KING
	SMITH      FORD
	KING

13.2. Expressing a Child-Parent-Grandparent Relationship

Problem

Employee CLARK works for KING and to express that relationship you can use the first recipe in this chapter. What if employee CLARK was in turn a manager for another employee? Consider the following query:

	select ename,empno,mgr
	  from emp
	 where ename in ('KING','CLARK','MILLER')

	ENAME        EMPNO     MGR
	--------- -------- -------
	CLARK         7782    7839
	KING          7839
	MILLER        7934    7782

As you can see, employee MILLER works for CLARK who in turn works for KING. You want to express the full hierarchy from MILLER to KING. You want to return the following result set:

	LEAF___BRANCH___ROOT
	---------------------
	MILLER-->CLARK-->KING

However, the single self-join approach from the previous recipe will not suffice to show the entire relationship from top to bottom. You could write a query that does two self joins, but what you really need is a general approach for traversing such hierarchies.

Solution

This recipe differs from the first recipe because there is now a three-tier relationship, as the title suggests. If your RDBMS does not supply functionality for traversing tree-structured data, then you can solve this problem using the techniques described for PostgreSQL and MySQL, but you must add an additional self join. DB2, SQL Server, and Oracle offer functions for expressing hierarchies. Thus self joins on those RDBMSs aren’t necessary, though they certainly work.

DB2 and SQL Server

Use the recursive WITH clause to find MILLER’s manager, CLARK, then CLARK’s manager, KING. The SQL Server string concatenation operator + is used in this solution:

	1    with  x (tree,mgr,depth)
	2      as  (
	3  select  cast(ename as varchar(100)),
	4          mgr, 0
	5    from  emp
	6   where  ename = 'MILLER'
	7   union  all
	8  select  cast(x.tree+'-->'+e.ename as varchar(100)),
	9          e.mgr, x.depth+1
	10   from  emp e, x
	11  where x.mgr = e.empno
	12 )
	13 select tree leaf___branch___root
	14   from x
	15  where depth = 2

The only modification necessary for this solution to work on DB2 is to use DB2’s concatenation operator, ||. Otherwise, the solution will work as is, on DB2 as well as SQL Server.

Oracle

Use the function SYS_CONNECT_BY_PATH to return MILLER, MILLER’s manager, CLARK, then CLARK’s manager, KING. Use the CONNECT BY clause to walk the tree:

	1  select ltrim(
	2           sys_connect_by_path(ename,'-->'),
	3         '-->') leaf___branch___root
	4    from emp
	5   where level = 3
	6   start with ename = 'MILLER'
	7 connect by prior mgr = empno

PostgreSQL and MySQL

Self join on table EMP twice to return MILLER, MILLER’s manager, CLARK, then CLARK’s manager, KING. The following solution uses PostgreSQL’s concatenation operator, the double vertical-bar (||):

	1  select a.ename||'-->'||b.ename
	2                ||'-->'||c.ename as leaf___branch___root
	3    from emp a, emp b, emp c
	4   where a.ename = 'MILLER'
	5     and a.mgr = b.empno
	6     and b.mgr = c.empno

For MySQL users, simply use the CONCAT function; this solution will work for PostgreSQL as well.

Discussion

DB2 and SQL Server

The approach here is to start at the leaf node and walk your way up to the root (as useful practice, try walking in the other direction). The upper part of the UNION ALL simply finds the row for employee MILLER (the leaf node). The lower part of the UNION ALL finds the employee who is MILLER’s manager, then finds that person’s manager, and this process of finding the “manager’s manager” repeats until processing stops at the highest-level manager (the root node). The value for DEPTH starts at 0 and increments automatically by 1 each time a manager is found. DEPTH is a value that DB2 maintains for you when you execute a recursive query.

Tip

For an interesting and in-depth introduction to the WITH clause with focus on its use recursively, see Jonathan Gennick’s article "Understanding the WITH Clause” at http://gennick.com/with.htm.

Next, the second query of the UNION ALL joins the recursive view X to table EMP, to define the parent–child relationship. The query at this point, using SQL Server’s concatenation operator, is as follows:

	  with x (tree,mgr,depth)
	    as (
	select cast(ename as varchar(100)),
	       mgr, 0
	  from emp
	 where ename = 'MILLER'
	 union all
	select cast(x.tree+'-->'+e.ename as varchar(100)),
	       e.mgr, x.depth+1
	  from emp e, x
	 where x.mgr = e.empno
	)
	select tree leaf___branch___root
	  from x

	TREE            DEPTH
	---------- ----------
	MILLER              0
	CLARK               1
	KING                2

At this point, the heart of the problem has been solved; starting from MILLER, return the full hierarchical relationship from bottom to top. What’s left then is merely formatting. Since the tree traversal is recursive, simply concatenate the current ENAME from EMP to the one before it, which gives you the following result set:

	  with x (tree,mgr,depth)
	    as (
	select  cast(ename as varchar(100)),
	        mgr, 0
	  from emp
	 where ename = 'MILLER'
	 union all
	select cast(x.tree+'-->'+e.ename as varchar(100)),
	       e.mgr, x.depth+1
	  from emp e, x
	 where x.mgr = e.empno
	)
	select depth, tree
	  from x

	DEPTH TREE
	----- ---------------------------
	    0 MILLER
	    1 MILLER-->CLARK
	    2 MILLER-->CLARK-->KING

The final step is to keep only the last row in the hierarchy. There are several ways to do this, but the solution uses DEPTH to determine when the root is reached (obviously, if CLARK has a manager other than KING, the filter on DEPTH would have to change; for a more generic solution that requires no such filter, see the next recipe).

Oracle

The CONNECT BY clause does all the work in the Oracle solution. Starting with MILLER, you walk all the way to KING without the need for any joins. The expression in the CONNECT BY clause defines the relationship of the data and how the tree will be walked:

	 select ename
	   from emp
	  start with ename = 'MILLER'
	connect by prior mgr = empno

	ENAME
	--------
	MILLER
	CLARK
	KING

The keyword PRIOR lets you access values from the previous record in the hierarchy. Thus, for any given EMPNO you can use PRIOR MGR to access that employee’s manager number. When you see a clause such as CONNECT BY PRIOR MGR = EMPNO, think of that clause as expressing a join between, in this case, parent and child.

Tip

For more on CONNECT BY and related features, see the following Oracle Technology Network articles: “Querying Hierarchies: Top-of-the-Line Support” at http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby.html, and “New CONNECT BY Features in Oracle Database 10g"at http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/gennick_connectby_10g.html.

At this point you have successfully displayed the full hierarchy starting from MILLER and ending at KING. The problem is for the most part solved. All that remains is the formatting. Use the function SYS_CONNECT_BY_PATH to append each ENAME to the one before it:

	 select sys_connect_by_path(ename,'-->') tree
	   from emp
	  start with ename = 'MILLER'
	connect by prior mgr = empno

	TREE
	---------------------------
	-->MILLER
	-->MILLER-->CLARK
	-->MILLER-->CLARK-->KING

Because you are interested in only the complete hierarchy, you can filter on the pseudo-column LEVEL (a more generic approach is shown in the next recipe):

	 select sys_connect_by_path(ename,'-->') tree
	   from emp
	  where level = 3
	  start with ename = 'MILLER'
	connect by prior mgr = empno

	TREE
	---------------------------
	-->MILLER-->CLARK-->KING

The final step is to use the LTRIM function to remove the leading “-->” from the result set.

PostgreSQL and MySQL

Without built-in support for hierarchical queries, you must self join n times to return the whole tree (where n is the number of nodes between the leaf and the root, including the root itself; in this example, relative to MILLER, CLARK is a branch node and KING is the root node, so the distance is two nodes, and n = 2). This solution simply uses the technique from the previous recipe and adds one more self join:

	select a.ename as leaf,
	       b.ename as branch,
	       c.ename as root
	  from emp a, emp b, emp c
	 where a.ename = 'MILLER'
	   and a.mgr = b.empno
	   and b.mgr = c.empno

	LEAF      BRANCH      ROOT
	--------- ----------  -----
	MILLER    CLARK       KING

The next and last step is to format the output using the || concatenation operator for PostgreSQL or the CONCAT function for MySQL. The drawback to this kind of query is that if the hierarchy changes—for example, if there is another node between CLARK and KING—the query would need to have yet another join to return the whole tree. This is why it is such an advantage to have and use built-in functions for hierarchies.

13.3. Creating a Hierarchical View of a Table

Problem

You want to return a result set that describes the hierarchy of an entire table. In the case of the EMP table, employee KING has no manager, so KING is the root node. You want to display, starting from KING, all employees under KING and all employees (if any) under KING’s subordinates. Ultimately, you want to return the following result set:

	EMP_TREE
	------------------------------
	KING
	KING - BLAKE
	KING - BLAKE - ALLEN
	KING - BLAKE - JAMES
	KING - BLAKE - MARTIN
	KING - BLAKE - TURNER
	KING - BLAKE - WARD
	KING - CLARK
	KING - CLARK - MILLER
	KING - JONES
	KING - JONES - FORD
	KING - JONES - FORD - SMITH
	KING - JONES - SCOTT
	KING - JONES - SCOTT - ADAMS

Solution

DB2 and SQL Server

Use the recursive WITH clause to start building the hierarchy at KING and then ultimately display all the employees. The solution following uses the DB2 concatenation operator “||”. SQL Server users use the concatenation operator +. Other than the concatenation operators, the solution will work as-is on both RDBMSs:

	 1   with x (ename,empno)
	 2      as (
	 3  select cast(ename as varchar(100)),empno
	 4    from emp
	 5   where mgr is null
 	 6   union all
	 7  select cast(x.ename||' - '||e.ename as varchar(100)),
	 8         e.empno
	 9    from emp e, x
	10   where e.mgr = x.empno
	11  )
	12  select ename as emp_tree
	13    from x
	14   order by 1

Oracle

Use the CONNECT BY function to define the hierarchy. Use SYS_CONNECT_BY_PATH function to format the output accordingly:

	1  select ltrim(
	2           sys_connect_by_path(ename,' - '),
	3         ' - ') emp_tree
	4    from emp
	5    start with mgr is null
	6  connect by prior empno=mgr
	7    order by 1

This solution differs from that in the previous recipe in that it includes no filter on the LEVEL pseudo-column. Without the filter, all possible trees (where PRIOR EMPNO=MGR) are displayed.

PostgreSQL

Use three UNIONs and multiple self joins:

	 1  select emp_tree
	 2    from  (
	 3  select ename as emp_tree
	 4    from  emp
	 5   where mgr is null
	 6  union
	 7  select a.ename||' - '||b.ename
	 8    from emp a
	 9         join
	10         emp b on (a.empno=b.mgr)
	11   where a.mgr is null
	12  union
	13  select rtrim(a.ename||' - '||b.ename
	14                      ||' - '||c.ename,' - ')
	15    from emp a
	16         join
	17         emp b on (a.empno=b.mgr)
	18         left join
	19         emp c on (b.empno=c.mgr)
	20   where a.ename = 'KING'
	21  union
	22  select rtrim(a.ename||' - '||b.ename||' - '||
	23               c.ename||' - '||d.ename,' - ')
	24    from emp a
	25         join
	26         emp b on (a.empno=b.mgr)
	27         join
	28         emp c on (b.empno=c.mgr)
	29         left join
	30         emp d on (c.empno=d.mgr)
	31   where a.ename = 'KING'
	32         ) x
	33   where tree is not null
	34   order by 1

MySQL

Use three UNIONs and multiple self joins:

	 1  select emp_tree
	 2    from (
	 3  select ename as emp_tree
	 4    from emp
	 5   where mgr is null
	 6  union
	 7  select concat(a.ename,' - ',b.ename)
	 8    from emp a
	 9         join
	10         emp b on (a.empno=b.mgr)
	11   where a.mgr is null
	12  union
	13  select concat(a.ename,' - ',
	14                b.ename,' - ',c.ename)
	15    from emp a
	16         join
	17         emp b on (a.empno=b.mgr)
	18         left join
	19         emp c on (b.empno=c.mgr)
	20   where a.ename = 'KING'
	21  union
	22  select concat(a.ename,' - ',b.ename,' - ',
	23                c.ename,' - ',d.ename)
	24    from emp a
	25         join
	26         emp b on (a.empno=b.mgr)
	27         join
	28         emp c on (b.empno=c.mgr)
	29         left join
	30         emp d on (c.empno=d.mgr)
	31   where a.ename = 'KING'
	32         ) x
	33   where tree is not null
	34   order by 1

Discussion

DB2 and SQL Server

The first step is to identify the root row (employee KING) in the upper part of the UNION ALL in the recursive view X. The next step is to find KING’s subordinates, and their subordinates if there are any, by joining recursive view X to table EMP. Recursion will continue until you’ve returned all employees. Without the formatting you see in the final result set, the result set returned by the recursive view X is shown below:

	with x (ename,empno)
	    as (
	select cast(ename as varchar(100)),empno
	  from emp
	 where mgr is null
	 union all
	select cast(e.ename as varchar(100)),e.empno
	  from emp e, x
	 where e.mgr = x.empno
	 )
	 select ename emp_tree
	   from x

	 EMP_TREE
	 ----------------
	 KING
	 JONES
	 SCOTT
	 ADAMS
	 FORD
	 SMITH
	 BLAKE
	 ALLEN
	 WARD
	 MARTIN
	 TURNER
	 JAMES
	 CLARK
	 MILLER

All the rows in the hierarchy are returned (which can be useful), but without the formatting you cannot tell who the managers are. By concatenating each employee to her manager, you return more meaningful output. Produce the desired output simply by using

	cast(x.ename+','+e.ename as varchar(100))

in the SELECT clause of the lower portion of the UNION ALL in recursive view X.

The WITH clause is extremely useful in solving this type of problem, because the hierarchy can change (for example, leaf nodes become branch nodes) without any need to modify the query.

Oracle

The CONNECT BY clause returns the rows in the hierarchy. The START WITH clause defines the root row. If you run the solution without SYS_CONNECT_BY_PATH, you can see that the correct rows are returned (which can be useful), but not formatted to express the relationship of the rows:

	select ename emp_tree
	  from emp
	 start with mgr is null
	connect by prior empno = mgr

	EMP_TREE
	-----------------
	KING
	JONES
	SCOTT
	ADAMS
	FORD
	SMITH
	BLAKE
	ALLEN
	WARD
	MARTIN
	TURNER
	JAMES
	CLARK
	MILLER

By using the pseudo-column LEVEL and the function LPAD, you can see the hierarchy more clearly, and you can ultimately see why SYS_CONNECT_BY_PATH returns the results that you see in the desired output shown earlier:

	select lpad('.',2*level,'.')||ename emp_tree
	   from emp
	  start with mgr is null
	connect by prior empno = mgr

	EMP_TREE
	-----------------
	..KING
	....JONES
	......SCOTT
	........ADAMS
	......FORD
	........SMITH
	....BLAKE
	......ALLEN
	......WARD
	......MARTIN
	......TURNER
	......JAMES
	....CLARK
	......MILLER

The indentation in this output indicates who the managers are by nesting subordinates under their superiors. For example, KING works for no one. JONES works for KING. SCOTT works for JONES. ADAMS works for SCOTT.

If you look at the corresponding rows from the solution when using SYS_CONNECT_BY_PATH, you will see that SYS_CONNECT_BY_PATH rolls up the hierarchy for you. When you get to a new node, you see all the prior nodes as well:

	KING
	KING - JONES
	KING - JONES - SCOTT
	KING - JONES - SCOTT - ADAMS

Tip

If you are on Oracle8i Database or earlier, you can use the PostgreSQL solution to this problem. Alternatively, because CONNECT BY is available on older versions of Oracle, you can simply use LEVEL and RPAD/ LPAD for formatting (although to reproduce the output created by SYS_CONNECT_BY_PATH would require a bit more work).

PostgreSQL and MySQL

With the exception of string concatenation in the SELECT clauses, the solutions are the same for both PostgreSQL and MySQL. The first step is to determine the maximum number of nodes for any one branch. You have to do this manually, before you write the query. If you examine the data in the EMP table, you will see that employees ADAM and SMITH are the leaf nodes at the greatest depth (you may wish to look at the Oracle discussion where you’ll find a nicely formatted tree of the EMP hierarchy). If you look at employee ADAMS, you see that ADAMS works for SCOTT who in turn works for JONES who in turn works for KING, so the depth is 4. To be able to express a hierarchy with a depth of four, you must self join four instances of table EMP, and you must write a four-part UNION query. The results of the four-way self join (which is the lower part of the last UNION, from top to bottom) is shown below (using PostgreSQL syntax; MySQL users, simply substitute “||” for the CONCAT function call):

	select rtrim(a.ename||' - '||b.ename||' - '||
	             c.ename||' - '||d.ename,' - ') as max_depth_4
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       join
	       emp c on (b.empno=c.mgr)
	       left join
	       emp d on (c.empno=d.mgr)
	 where a.ename = 'KING'

	MAX_DEPTH_4
	-----------------------------
	KING - JONES - FORD - SMITH
	KING - JONES - SCOTT - ADAMS
	KING - BLAKE - TURNER
	KING - BLAKE - ALLEN
	KING - BLAKE - WARD
	KING - CLARK - MILLER
	KING - BLAKE - MARTIN
	KING - BLAKE - JAMES

The filter on A.ENAME is necessary to ensure that the root row is KING and no other employee. If you look at the result set above and compare it with the final result set, you’ll see that there are some three-deep hierarchies not returned: KING - JONES - FORD and KING - JONES - SCOTT. To include those rows in the final result set, you need to write another query similar to the one above, but with one less join (self joining only three instances of table EMP rather than four). The result set of this query is shown below:

	select rtrim(a.ename||' - '||b.ename
	                    ||' - '||c.ename,' - ') as max_depth_3
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       left join
	       emp c on (b.empno=c.mgr)
	 where a.ename = 'KING'

	MAX_DEPTH_3
	---------------------------
	KING - BLAKE - ALLEN
	KING - BLAKE - WARD
	KING - BLAKE - MARTIN
	KING - JONES - SCOTT
	KING - BLAKE - TURNER
	KING - BLAKE - JAMES
	KING - JONES - FORD
	KING - CLARK - MILLER

Like the query before it, the filter on A.ENAME is necessary to ensure the root row node is KING. You’ll notice some overlapping rows between the query above and the four-way EMP join. To get rid of the redundant rows, simply UNION the two queries:

	select rtrim(a.ename||' - '||b.ename
	                    ||' - '||c.ename,' - ') as partial_tree
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       left join
	       emp c on (b.empno=c.mgr)
	 where a.ename = 'KING'
	union
	select rtrim(a.ename||' - '||b.ename||' - '||
	             c.ename||' - '||d.ename,' - ')
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       join
	       emp c on (b.empno=c.mgr)
	       left join
	       emp d on (c.empno=d.mgr)
	 where a.ename = 'KING'

	PARTIAL_TREE
	-------------------------------
	KING - BLAKE - ALLEN
	KING - BLAKE - JAMES
	KING - BLAKE - MARTIN
	KING - BLAKE - TURNER
	KING - BLAKE - WARD
	KING - CLARK - MILLER
	KING - JONES - FORD
	KING - JONES - FORD - SMITH
	KING - JONES - SCOTT
	KING - JONES - SCOTT - ADAMS

At this point the tree is almost complete. The next step is to return rows that represent a two-deep hierarchy with KING as the root node (i.e., employees who work directly for KING). The query to return those rows is shown below:

	select a.ename||' - '||b.ename as max_depth_2
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	 where a.mgr is null

	MAX_DEPTH_2
	---------------
	KING - JONES
	KING - BLAKE
	KING - CLARK

The next step is to UNION the above query, to the PARTIAL_TREE union:

	select a.ename||' - '||b.ename as partial_tree
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	 where a.mgr is null
	union
	select rtrim(a.ename||' - '||b.ename
	                    ||' - '||c.ename,' - ')
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       left join
	       emp c on (b.empno=c.mgr)
	 where a.ename = 'KING'
	union
	select rtrim(a.ename||' - '||b.ename||' - '||
	             c.ename||' - '||d.ename,' - ')
	  from emp a
	       join
	       emp b on (a.empno=b.mgr)
	       join
	       emp c on (b.empno=c.mgr)
	       left join
	       emp d on (c.empno=d.mgr)
	 where a.ename = 'KING'

	PARTIAL_TREE
	----------------------------------
	KING - BLAKE
	KING - BLAKE - ALLEN
	KING - BLAKE - JAMES
	KING - BLAKE - MARTIN
	KING - BLAKE - TURNER
	KING - BLAKE - WARD
	KING - CLARK
	KING - CLARK - MILLER
	KING - JONES
	KING - JONES - FORD
	KING - JONES - FORD - SMITH
	KING - JONES - SCOTT
	KING - JONES - SCOTT - ADAMS

The final step is to UNION KING to the top of PARTIAL_TREE to return the desired result set.

13.4. Finding All Child Rows for a Given Parent Row

Problem

You want to find all the employees who work for JONES, either directly or indirectly (i.e., they work for someone who works for JONES). The list of employees under JONES is shown below (JONES is included in the result set):

	ENAME
	----------
	JONES
	SCOTT
	ADAMS
	FORD
	SMITH

Solution

Being able to move to the absolute top or bottom of a tree is extremely useful. For this solution there is no special formatting necessary. The goal is to simply return all employees who work under employee JONES, including JONES himself. This type of query really shows the usefulness of recursive SQL extensions like Oracle’s CONNECT BY and SQL Server’s/DB2’s WITH clause.

DB2 and SQL Server

Use the recursive WITH clause to find all employees under JONES. Begin with JONES by specifying WHERE ENAME = ‘JONES’ in the first of the two union queries:

	 1   with x (ename,empno)
	 2     as (
	 3 select ename,empno
	 4   from emp
	 5  where ename = 'JONES'
	 6  union all
	 7 select e.ename, e.empno
	 8   from emp e, x
	 9  where x.empno = e.mgr
	10 )
	11 select ename
	12   from x

Oracle

Use the CONNECT BY clause and specify START WITH ENAME = ‘JONES’ to find all the employees under JONES:

	1 select ename
	2   from emp
	3  start with ename = 'JONES'
	4 connect by prior empno = mgr

PostgreSQL and MySQL

You must know in advance how many nodes there are in the tree. The following queries show how to determine the depth of the hierarchy:

	/* find JONES' EMPNO */
	select ename,empno,mgr
	  from emp
	 where ename = 'JONES'

	ENAME            EMPNO       MGR
	---------- ----------- ---------
	JONES             7566      7839

	
	/* are there any employees who work directly under JONES? */
	select count(*)
	  from emp
	 where mgr = 7566

	  COUNT(*)
	 ---------
	         2

	
	/* there are two employees under JONES, find their EMPNOs */
	select ename,empno,mgr
	  from emp
	 where mgr = 7566

	ENAME            EMPNO         MGR
	---------- ----------- -----------
	SCOTT             7788        7566
	FORD              7902        7566

	
	/* are there any employees under SCOTT or FORD? */
	select count(*)
	  from emp
	 where mgr in (7788,7902)

	  COUNT(*)
	 ---------
	         2

	
	/* there are two employees under SCOTT or FORD, find their EMPNOs */
	select ename,empno,mgr
	  from emp
	 where mgr in (7788,7902)

	ENAME           EMPNO      MGR
	--------- ----------- --------
	SMITH            7369     7902
	ADAMS            7876     7788

	
	/* are there any employees under SMITH or ADAMS? */
	select count(*)
	  from emp
	 where mgr in (7369,7876)

	  COUNT(*)
	----------     
	         0

The hierarchy starting from JONES ends with employees SMITH and ADAMS. That makes the hierarchy three levels deep. Now that you know the depth, you can begin to traverse the hierarchy from top to bottom.

First, self join table EMP twice. Then unpivot inline view X to transform three columns with two rows into one column with six rows (in PostgreSQL, you can use GENERATE_SERIES(1,6) as an alternative to querying the T100 pivot table):

	 1  select distinct
	 2         case t100.id
	 3              when 1 then root
	 4              when 2 then branch
	 5              else        leaf
	 6         end as JONES_SUBORDINATES
	 7   from (
	 8 select a.ename as root,
	 9        b.ename as branch,
	10        c.ename as leaf
	11   from emp a, emp b, emp c
	12  where a.ename = 'JONES'
	13    and a.empno = b.mgr
	14    and b.empno = c.mgr
	15        ) x,
	16        t100
	17  where t100.id <= 6

As an alternative, you can use views and UNION the results. If you create the following views:

	create view v1
	as
	select ename,mgr,empno
	  from emp
	 where ename = 'JONES'
	 create view v2
	 as
	 select ename,mgr,empno
	   from emp
	  where mgr = (select empno from v1)

	 create view v3
	 as
	 select ename,mgr,empno
	   from emp
	  where mgrin (select empno from v2)

the solution then becomes:

	select ename from v1
	 union
	select ename from v2
	 union
	select ename from v3

Discussion

DB2 and SQL Server

The recursive WITH clause makes this a relatively easy problem to solve. The first part of the WITH clause, the upper part of the UNION ALL, returns the row for employee JONES. You need to return ENAME to see the name and EMPNO so you can use it to join on. The lower part of the UNION ALL recursively joins EMP.MGR to X.EMPNO. The join condition will be applied until the result set is exhausted.

Oracle

The START WTH clause tells the query to make JONES the root node. The condition in the CONNECT BY clause drives the tree walk and will run until the condition is no longer true.

PostgreSQL and MySQL

The technique used here is the same as that of the second recipe in this chapter, “Expressing a Child-Parent-Grandparent Relationship.” A major drawback is that you must know in advance the depth of the hierarchy.

13.5. Determining Which Rows Are Leaf, Branch, or Root Nodes

Problem

You want to determine what type of node a given row is: a leaf, branch, or root. For this example, a leaf node is an employee who is not a manager. A branch node is an employee who is both a manager and also has a manager. A root node is an employee without a manager. You want to return 1 (TRUE) or 0 (FALSE) to reflect the status of each row in the hierarchy. You want to return the following result set:

	ENAME          IS_LEAF   IS_BRANCH     IS_ROOT
	----------  ----------  ----------  ----------
	KING                 0           0           1
	JONES                0           1           0
	SCOTT                0           1           0
	FORD                 0           1           0
	CLARK                0           1           0
	BLAKE                0           1           0
	ADAMS                1           0           0
	MILLER               1           0           0
	JAMES                1           0           0
	TURNER               1           0           0
	ALLEN                1           0           0
	WARD                 1           0           0
	MARTIN               1           0           0
	SMITH                1           0           0

Solution

It is important to realize that the EMP table is modeled in a tree hierarchy, not a recursive hierarchy, the value for MGR for root nodes is NULL. If EMP was modeled to use a recursive hierarchy, root nodes would be self-referencing (i.e., the value for MGR for employee KING would be KING’s EMPNO). I find self-referencing to be counterintuitive and thus am using NULL values for root nodes’ MGR. For Oracle users using CONNECT BY and DB2/SQL Server users using WITH, you’ll find tree hierarchies easier to work with and potentially more efficient than recursive hierarchies. If you are in a situation where you have a recursive hierarchy and are using CONNECT BY or WITH, watch out: you can end up with a loop in your SQL. You need to code around such loops if you are stuck with recursive hierarchies.

DB2, PostgreSQL, MySQL, and SQL Server

Use three scalar subqueries to determine the correct “Boolean” value (either a 1 or a 0) to return for each node type:

	 1 select e.ename,
	 2        (select sign(count(*)) from emp d
	 3          where 0 =
	 4            (select count(*) from emp f
	 5              where f.mgr = e.empno)) as is_leaf,
	 6        (select sign(count(*)) from emp d
	 7          where d.mgr = e.empno
	 8           and e.mgr is not null) as is_branch,
	 9        (select sign(count(*)) from emp d
	10          where d.empno = e.empno
	11            and d.mgr is null) as is_root
	12   from emp e
	13 order by 4 desc,3 desc

Oracle

The scalar subquery solution will work for Oracle as well, and should be used if you are on a version of Oracle prior to Oracle Database 10g. The following solution highlights built-in functions provided by Oracle (that were introduced in Oracle Database 10g) to identify root and leaf rows. The functions are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively:

	 1  select ename,
	 2         connect_by_isleaf is_leaf,
	 3         (select count(*) from emp e
	 4           where e.mgr = emp.empno
	 5             and emp.mgr is not null
	 6             and rownum = 1) is_branch,
	 7         decode(ename,connect_by_root(ename),1,0) is_root
	 8    from emp
	 9   start with mgr is null
	10 connect by prior empno = mgr
	11 order by 4 desc, 3 desc

Discussion

DB2, PostgreSQL, MySQL, and SQL Server

This solution simply applies the rules defined in the “Problem” section to determine leaves, branches, and roots. The first step is to find determine whether an employee is a leaf node. If the employee is not a manager (no one works under her), then she is a leaf node. The first scalar subquery, IS_LEAF, is shown below:

	select e.ename,
	       (select sign(count(*)) from emp d
	         where 0 =
	           (select count(*) from emp f
	             where f.mgr = e.empno)) as is_leaf
	  from emp e
	order by 2 desc

	ENAME        IS_LEAF
	----------- --------
	SMITH              1
	ALLEN              1
	WARD               1
	ADAMS              1
	TURNER             1
	MARTIN             1
	JAMES              1
	MILLER             1
	JONES              0
	BLAKE              0
	CLARK              0
	FORD               0
	SCOTT              0
	KING               0

Because the output for IS_LEAF should be a 0 or 1, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you would get 14 instead of 1 for leaf rows. As an alternative, you can use a table with only one row to count against, because you only want to return 0 or 1. For example:

	select e.ename,
	       (select count(*) from t1 d
	         where not exists
	           (select null from emp f
	             where f.mgr = e.empno)) as is_leaf
	  from emp e
	order by 2 desc

	ENAME         IS_LEAF
	---------- ----------
	SMITH               1
	ALLEN               1
	WARD                1
	ADAMS               1
	TURNER              1
	MARTIN              1
	JAMES               1
	MILLER              1
	JONES               0
	BLAKE               0
	CLARK               0
	FORD                0
	SCOTT               0
	KING                0

The next step is to find branch nodes. If an employee is a manager (someone works for them), and they also happen to work for someone else, then the employee is a branch node. The results of the scalar subquery IS_BRANCH are shown below:

	select e.ename,
	       (select sign(count(*)) from emp d
	         where d.mgr = e.empno
	          and e.mgr is not null) as is_branch
	  from emp e
	order by 2 desc

	ENAME       IS_BRANCH
	----------- ---------
	JONES               1
	BLAKE               1
	SCOTT               1
	CLARK               1
	FORD                1
	SMITH               0
	TURNER              0
	MILLER              0
	JAMES               0
	ADAMS               0
	KING                0
	ALLEN               0
	MARTIN              0
	WARD                0

Again, it is necessary to take the SIGN of the COUNT(*) operation. Otherwise you will get (potentially) values greater than 1 when a node is a branch. Like scalar subquery IS_LEAF, you can use a table with one row to avoid using SIGN. The following solution uses a one-row table named dual:

	select e.ename,
	      (select count(*) from t1 t
	        where exists (
	         select null from emp f
	          where f.mgr = e.empno
	            and e.mgr is not null)) as is_branch
	  from emp e
	order by 2 desc

	ENAME            IS_BRANCH
	--------------- ----------
	JONES                    1
	BLAKE                    1
	SCOTT                    1
	CLARK                    1
	FORD                     1
	SMITH                    0
	TURNER                   0
	MILLER                   0
	JAMES                    0
	ADAMS                    0
	KING                     0
	ALLEN                    0
	MARTIN                   0
	WARD                     0

The last step is to find the root nodes. A root node is defined as an employee who is a manager but who does not work for anyone else. In table EMP, only KING is a root node. Scalar subquery IS_ROOT is shown below:

	select e.ename,
	       (select sign(count(*)) from emp d
	         where d.empno = e.empno
	           and d.mgr is null) as is_root
	  from emp e
	order by 2 desc

	ENAME         IS_ROOT
	----------  ---------
	KING                1
	SMITH               0
	ALLEN               0
	WARD                0
	JONES               0
	TURNER              0
	JAMES               0
	MILLER              0
	FORD                0
	ADAMS               0
	MARTIN              0
	BLAKE               0
	CLARK               0
	SCOTT               0

Because EMP is a small 14-row table, it is easy to see that employee KING is the only root node, so in this case taking the SIGN of the COUNT(*) operation is not strictly necessary. If there can be multiple root nodes, then you can use SIGN, or you can use a one-row table in the scalar subquery as is shown earlier for IS_BRANCH and IS_LEAF.

Oracle

For those of you on versions of Oracle prior to Oracle Database 10g, you can follow the discussion for the other RDBMSs, as that solution will work (without modifications) in Oracle. If you are on Oracle Database 10g or later, you may want to take advantage of two functions to make identifying root and leaf nodes a simple task: they are CONNECT_BY_ROOT and CONNECT_BY_ISLEAF, respectively. As of the time of this writing, it is necessary to use CONNECT BY in your SQL statement in order for you to be able to use CONNECT_BY_ROOT and CONNECT_BY_ISLEAF. The first step is to find the leaf nodes by using CONNECT_BY_ISLEAF as follows:

	select ename,
	        connect_by_isleaf is_leaf
	  from emp
	 start with mgr is null
	connect by prior empno = mgr
	order by 2 desc

	ENAME          IS_LEAF
	----------  ----------
	ADAMS                1
	SMITH                1
	ALLEN                1
	TURNER               1
	MARTIN               1
	WARD                 1
	JAMES                1
	MILLER               1
	KING                 0
	JONES                0
	BLAKE                0
	CLARK                0
	FORD                 0
	SCOTT                0

The next step is to use a scalar subquery to find the branch nodes. Branch nodes are employees who are managers but who also work for someone else:

	select ename,
	        (select count(*) from emp e
	          where e.mgr = emp.empno
	            and emp.mgr is not null
	            and rownum = 1) is_branch
	  from emp
	 start with mgr is null
	connect by prior empno = mgr
	order by 2 desc

	ENAME       IS_BRANCH
	---------- ----------
	JONES               1
	SCOTT               1
	BLAKE               1
	FORD                1
	CLARK               1
	KING                0
	MARTIN              0
	MILLER              0
	JAMES               0
	TURNER              0
	WARD                0
	ADAMS               0
	ALLEN               0
	SMITH               0

The filter on ROWNUM is necessary to ensure that you return a count of 1 or 0, and nothing else.

The last step is to identify the root nodes by using the function CONNECT_BY_ROOT. The solution finds the ENAME for the root node and compares it with all the rows returned by the query. If there is a match, that row is the root node:

	select ename,
	        decode(ename,connect_by_root(ename),1,0) is_root
	  from emp
	 start with mgr is null
	connect by prior empno = mgr
	order by 2 desc

	ENAME          IS_ROOT
	----------  ----------
	KING                 1
	JONES                0
	SCOTT                0
	ADAMS                0
	FORD                 0
	SMITH                0
	BLAKE                0
	ALLEN                0
	WARD                 0
	MARTIN               0
	TURNER               0
	JAMES                0
	CLARK                0
	MILLER               0

If using Oracle9i Database or later, you can use the SYS_CONNECT_BY_PATH function as an alternative to CONNECT_BY_ROOT. The Oracle9i Database version of the preceding would be:

	select ename,
	       decode(substr(root,1,instr(root,',')-1),NULL,1,0) root
	  from (
	select ename,
	       ltrim(sys_connect_by_path(ename,','),',') root
	  from emp
	start with mgr is null
	connect by prior empno=mgr
	      )

	ENAME       ROOT
	----------  ----
	KING           1
	JONES          0
	SCOTT          0
	ADAMS          0
	FORD           0
	SMITH          0
	BLAKE          0
	ALLEN          0
	WARD           0
	MARTIN         0
	TURNER         0
	JAMES          0
	CLARK          0
	MILLER         0

The SYS_CONNECT_BY_PATH function rolls up a hierarchy starting from the root value as is shown below:

	select ename,
	       ltrim(sys_connect_by_path(ename,','),',') path
	  from emp
	start with mgr is null
	connect by prior empno=mgr

	ENAME      PATH
	---------- ----------------------------
	KING       KING
	JONES      KING,JONES
	SCOTT      KING,JONES,SCOTT
	ADAMS      KING,JONES,SCOTT,ADAMS
	FORD       KING,JONES,FORD
	SMITH      KING,JONES,FORD,SMITH
	BLAKE      KING,BLAKE
	ALLEN      KING,BLAKE,ALLEN
	WARD       KING,BLAKE,WARD
	MARTIN     KING,BLAKE,MARTIN
	TURNER     KING,BLAKE,TURNER
	JAMES      KING,BLAKE,JAMES
	CLARK      KING,CLARK
	MILLER     KING,CLARK,MILLER

To get the root row, simply substring out the first ENAME in PATH:

	select ename,
	       substr(root,1,instr(root,',')-1) root
	  from (
	select ename,
	       ltrim(sys_connect_by_path(ename,','),',') root
	  from emp
	start with mgr is null
	connect by prior empno=mgr
	       )

	ENAME      ROOT
	---------- ----------
	KING
	JONES      KING
	SCOTT      KING
	ADAMS      KING
	FORD       KING
	SMITH      KING
	BLAKE      KING
	ALLEN      KING
	WARD       KING
	MARTIN     KING
	TURNER     KING
	JAMES      KING
	CLARK      KING
	MILLER     KING

The last step is to flag the result from the ROOT column if it is NULL; that is your root row.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.246.123