EXAMPLE 12: GROUP BY AND HAVING

As promised earlier, there’s a little more I want to say about the GROUP BY and HAVING clauses. Consider this query: “For each part supplied by no more than two suppliers, get the part number and city and the total quantity supplied of that part.” Here’s a possible logical (relational calculus) formulation:

     { PX.PNO , PX.CITY ,
                TPQ := SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO , QTY ) }
                WHERE COUNT ( SPY WHERE SPY.PNO = PX.PNO ) ≤ 2

SQL formulation:

     SELECT PX.PNO , PX.CITY ,
                   ( SELECT COALESCE ( SUM ( SPX.QTY ) , 0 )
                     FROM   SP AS SPX
                     WHERE  SPX.PNO = PX.PNO ) AS TPQ
     FROM   P AS PX
     WHERE  ( SELECT COUNT ( * )
              FROM   SP AS SPY
              WHERE  SPY.PNO = PX.PNO ) <= 2

Result:

PNO

CITY

TPQ

P1

London

600

P3

Oslo

400

P4

London

500

P5

Paris

500

P6

London

100

As the opening to this section suggests, however, the interesting thing about this example is that it’s one that might appear to be more easily—certainly more succinctly—expressed using GROUP BY and HAVING, thus:

     SELECT PX.PNO , PX.CITY , COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ
     FROM   P AS PX , SP AS SPX
     WHERE  PX.PNO = SPX.PNO
     GROUP  BY PX.PNO
     HAVING COUNT ( * ) <= 2

But:

  • In that GROUP BY / HAVING formulation, is the appearance of PX.CITY in the SELECT item commalist legal? Answer: Yes, it is—at least according to the standard—though it used not to be. (I did mention this point in Chapter 7, but I’ll repeat it here for convenience.) Let S be a SELECT expression with a GROUP BY clause, and let column C be referenced in the SELECT clause of S. In earlier versions of SQL, then, C had to be one of the grouping columns (or be referenced inside a “set function” invocation, but let’s agree to ignore that possibility for simplicity). In the current version, by contrast, it’s required only that C—or {C}, rather—be functionally dependent on the grouping columns.

  • Do you think the GROUP BY / HAVING formulation is easier to understand? (Debatable.)

  • Does the GROUP BY / HAVING formulation work correctly for parts that aren’t supplied by any suppliers at all? (No, it doesn’t.)

  • Are the formulations equivalent if the database permits nulls? Or duplicates?

As a further exercise, give SQL formulations (a) using GROUP BY and HAVING, (b) not using GROUP BY and HAVING, for the following queries:

  • Get supplier numbers for suppliers who supply N different parts for some N > 3.

  • Get supplier numbers for suppliers who supply N different parts for some N < 4.

What do you conclude from this exercise?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.180.145