EQUALITY COMPARISONS

Despite what I said a few moments ago about ignoring user defined types, I’m going to assume in the present section, purely for the sake of the example, that the supplier number (SNO) attributes in relvars S and SP are of some user defined type—sorry, domain—which I’ll assume for simplicity is called SNO as well. Likewise, I’m going to assume that the part number (PNO) attributes in relvars P and SP are also of a user defined type (or domain) with the same name, PNO. Please note that these assumptions aren’t crucial to my argument; it’s just that I think they make the argument a little more convincing, and perhaps easier to follow.

I’ll start with the fact that, as everyone knows (?), two values can be compared for equality in the relational model only if they come from the same domain. For example, the following comparison (which might be part of the WHERE clause in some SQL query) is obviously valid:

     SP.SNO = S.SNO         /* OK     */

By contrast, this one obviously (?) isn’t:

SP.PNO = S.SNO         /* not OK */

Why not? Because part numbers and supplier numbers are different kinds of things—they’re defined on different domains. So the general idea is that the DBMS[18] should reject any attempt to perform any relational operation (join, union, whatever) that involves, either explicitly or implicitly, an equality comparison between values from different domains. For example, suppose some user wants to find suppliers (like supplier S5 in the sample values of Figure 1-3 in Chapter 1) who currently supply no parts at all. The following is an attempt to formulate this query in SQL:

     SELECT S.SNO , S.SNAME , S.STATUS , S.CITY
     FROM   S
     WHERE  NOT EXISTS
          ( SELECT *
            FROM   SP
            WHERE  SP.PNO = S.SNO )      /* not OK */

(There’s no terminating semicolon because this is an expression, not a statement. See Exercise 2.24 at the end of the chapter.)

As the comment says, this formulation is certainly not OK. The reason is that, in the last line, the user presumably meant to say WHERE SP.SNO = S.SNO, but by mistake—probably just a slip of the typing fingers—he or she said WHERE SP.PNO = S.SNO instead. And, given that we’re indeed talking about a simple typo (probably), it would be a friendly act on the part of the DBMS to interrupt the user at this point, highlight the error, and perhaps ask if the user would like to correct it before proceeding.

Now, I don’t know any SQL product that actually behaves in the way I’ve just suggested; in today’s products, depending on how you’ve set up the database, either the query will simply fail or it’ll give the wrong answer. Well ... not exactly the wrong answer, perhaps, but the right answer to the wrong question. (Does that make you feel any better?)

To repeat, therefore, the DBMS should reject a comparison like SP.PNO = S.SNO if it isn’t valid. However, Codd felt there should be a way in such a situation for the user to make the DBMS go ahead and do the comparison anyway, even though it’s apparently not valid, on the grounds that sometimes the user will know more than the DBMS does. Now, it’s hard for me to do justice to this idea, because I frankly don’t think it makes sense—but let me give it a try. Suppose it’s your job to design a database involving, let’s say, customers and suppliers; and you therefore decide to have a domain of customer numbers and a domain of supplier numbers; and you build your database that way, and load it, and everything works just fine for a year or two. Then, one day, one of your users comes along with a query you never heard before—namely: “Are any of our customers also suppliers to us?” Observe that this is a perfectly reasonable query; observe too that it might involve a comparison between a customer number and a supplier number (a cross domain comparison) to see if they’re equal. And if it does, well, certainly the system mustn’t prevent you from doing that comparison; certainly the system mustn’t prevent you from posing a reasonable query.

On the basis of such arguments, Codd proposed what he called “domain check override” (DCO) versions of certain of his relational operators. A DCO version of join, for example, would perform the join even if the joining attributes were defined on different domains. In SQL terms, we might imagine this proposal being realized by means of a new clause, IGNORE DOMAIN CHECKS, that could be included in an SQL query, as here:

     SELECT ...
     FROM   ...
     WHERE  CUSTNO = SNO
     IGNORE DOMAIN CHECKS

And this new clause would be separately authorizable—most users wouldn’t be allowed to use it (perhaps only the DBA[19] would be allowed to use it).

Before analyzing the DCO idea in detail, I want to look at a simpler example. Consider the following two queries on the suppliers-and-parts database:

image with no caption

Assuming, reasonably enough, that weights and quantities are defined on different domains, the query on the left is clearly invalid. But what about the one on the right? According to Codd, that one’s valid! In his book The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), page 47, he says that in such a situation “the DBMS [merely] checks that the basic data types are the same”; in the case at hand, those “basic data types” are all just numbers (loosely speaking), and so that check succeeds.

To me, this conclusion is unacceptable. Clearly, the expressions P.WEIGHT = SP.QTY and P.WEIGHT SP.QTY = 0 both mean essentially the same thing. Surely, therefore, they must both be valid or both be invalid; the idea that one might be valid and the other not surely makes no sense. So it seems to me there’s something strange about Codd-style domain checks in the first place, before we even get to domain check override. (In essence, in fact, Codd-style domain checks apply only in the very special case where both comparands are specified as simple attribute references. Observe that the comparison P.WEIGHT = SP.QTY falls into this special category but the comparison P.WEIGHT - SP.QTY = 0 doesn’t.)

Let’s look at some even simpler examples. Consider the following comparisons (each of which might appear as part of an SQL WHERE clause, for example):

     S.SNO = 'X4'         P.PNO = 'X4'         S.SNO = P.PNO

I hope you agree it’s at least plausible that the first two of these could be valid (and evaluate successfully, and possibly even give TRUE) and the third not. But if so, then I hope you also agree there’s something strange going on; apparently, we can have three values a, b, and c such that a = c is true and b = c is true, but as for a = b—well, we can’t even do the comparison, let alone have it come out true! So what’s going on?

I return now to the fact that attributes S.SNO and P.PNO are defined on domains SNO and PNO, respectively, and my claim that domains are actually types; as previously noted, in fact, I’m assuming for the sake of the present discussion that domains SNO and PNO in particular are user defined types. Now, it’s possible (even likely) that those user defined types are both physically represented in terms of the system defined type CHAR; in fact, let’s assume such is indeed the case, for definiteness. However, those representations are part of the implementation, not the model—they’re irrelevant to the user, and as we saw in Chapter 1 they’re supposed to be hidden from the user. In particular, therefore, the operators that apply to supplier numbers and part numbers are the operators defined in connection with those types, not the operators that happen to be defined in connection with type CHAR (see the section WHAT’S A TYPE? later in this chapter). For example, we can concatenate two character strings, but we probably can’t concatenate two supplier numbers (we could do this latter only if concatenation were an operator defined in connection with type SNO).

Now, when we define a type, we also have to define the operators that can be used in connection with values and variables of the type in question (again, see the section WHAT’S A TYPE?). And one operator we must define is what’s called a selector operator, which allows us to select, or specify, an arbitrary value of the type in question.[20] In the case of type SNO, for example, the selector (which in practice would probably also be called SNO) allows us to select the particular SNO value that has some specified CHAR representation. Here’s an example:

     SNO('S1')

This expression is an invocation of the SNO selector, and it returns a certain supplier number: namely, the one represented by the character string ‘S1’. Likewise, the expression

     PNO('P1')

is an invocation of the PNO selector, and it returns a certain part number: namely, the one represented by the character string ‘P1’. In other words, the SNO and PNO selectors effectively work by taking a certain CHAR value and converting it to a certain SNO value and a certain PNO value, respectively.

Now let’s get back to the comparison S.SNO = ‘X4’. As you can see, the comparands here are of different types (types SNO and CHAR, to be specific; in fact, ‘X4’ is a character string literal). Since they’re of different types, they certainly can’t be equal (recall from the beginning of the present section that two values can be compared for equality “only if they come from the same domain”). But the system does at least know there’s an operator—namely, the SNO selector—that effectively performs CHAR to SNO conversions. So it can invoke that operator, implicitly, to convert the CHAR comparand to a supplier number, thereby effectively replacing the original comparison by this one:

     S.SNO = SNO('X4')

Now we’re comparing two supplier numbers, which is legitimate.

In the same kind of way, the system can effectively replace the comparison P.PNO = ‘X4’ by this one:

     P.PNO = PNO('X4')

But in the case of the comparison S.SNO = P.PNO, there’s no conversion operator known to the system (at least, let’s assume not) that will convert a supplier number to a part number or the other way around, and so the comparison fails on a type error: The comparands are of different types, and there’s no way to make them be of the same type.

Note: Implicit type conversion as illustrated in the foregoing examples is often called coercion in the literature. In the first example, therefore, we can say the character string ‘X4’ is coerced to type SNO; in the second it’s coerced to type PNO. I’ll have a little more to say about coercion in SQL in particular in the section TYPE CHECKING AND COERCION IN SQL, later.

To continue with the example: Another operator we must define when we define a type like SNO or PNO is what’s called, generically, a THE_ operator, which effectively converts a given SNO or PNO value to the character string (or whatever else it is) that’s used to represent it.[21] Assume for the sake of the example that the THE_ operators for types SNO and PNO are called THE_SC and THE_PC, respectively. Then, if we really did want to compare S.SNO and P.PNO for equality, the only sense I can make of that requirement is that we want to test whether the corresponding character string representations are the same, which we might do like this:

     THE_SC ( S.SNO ) = THE_PC ( P.PNO )

In other words: Convert the supplier number to a string, convert the part number to a string, and compare the two strings.

As I’m sure you can see, the mechanism I’ve been sketching, involving selectors and THE_ operators, effectively provides both (a) the domain checking we want in the first place and (b) a way of overriding that checking, when desired, in the second place. Moreover, it does all this in a clean, fully orthogonal, non ad hoc manner. By contrast, domain check override doesn’t really do the job; in fact, it doesn’t really make sense at all, because it confuses types and representations (as noted previously, types are a model concept, representations are an implementation concept). Note: If you’re not familiar with orthogonality as an important language design principle, you can read about it in “A Note on Orthogonality” in my book Relational Database Writings 1994-1997 (Addison-Wesley, 1998).

Now, you might have realized that what I’m talking about is here is what’s known in language circles as strong typing. Different writers have slightly different definitions for this term, but basically it means that (a) everything—in particular, every value and every variable—has a type, and (b) whenever we try to perform some operation, the system checks that the operands are of the right types for the operation in question (or, possibly, are coercible to those right types). Observe too that this mechanism works for all operations, not just for the equality comparisons I’ve been discussing; the emphasis on equality and other comparison operations in discussions of domain checking in the literature is sanctioned by historical usage but is in fact misplaced. For example, consider the following expressions:

     P.WEIGHT * SP.QTY

     P.WEIGHT + SP.QTY

The first of these is probably valid (it yields another weight: namely, the total weight of the pertinent shipment). The second, by contrast, is probably not valid (what could it possibly mean to add a weight and a quantity?).

I’d like to close this section by stressing the absolutely fundamental role played by the equality operator (“=”). It wasn’t just an accident that the discussions above happened to focus on the question of comparing two values for equality. The fact is, equality truly is central, and the relational model requires it to be supported for every type. Indeed, since a type is basically a set of values (see the section WHAT’S A TYPE?), without the “=” operator we couldn’t even say what values constitute the type in question! That is, given some type T and some value v, we couldn’t say, absent that operator, whether or not v was one of the values in the set of values constituting type T.

What’s more, the relational model also specifies the semantics of the “=” operator, as follows: If v1 and v2 are values of the same type, then v1 = v2 evaluates to TRUE if v1 and v2 are the very same value and FALSE otherwise. (As a matter of fact, I said exactly this in Chapter 1, as you might recall.) By contrast, if v1 and v2 are values of different types, then v1 = v2 has no meaning—it’s not even a legal comparison—unless v1 can be coerced to the type of v2 or the other way around, in which case we aren’t really talking about a comparison between v1 and v2 as such anyway.



[18] DBMS = database management system. Note that there’s a logical difference between a DBMS and a database! Unfortunately, the industry very commonly uses the term database when it means either some DBMS product, such as Oracle, or the particular copy of such a product that happens to be installed on a particular computer. I do not follow this usage in this book. The problem is, if you call the DBMS a database, what do you call the database?

[19] DBA = database administrator.

[20] This observation is valid regardless of whether we’re in an SQL context (as in the present discussion) or otherwise—but I should make it clear that selectors in SQL aren’t as straightforward as they might be, and selector as such isn’t an SQL term. I should also make it clear that selectors have nothing to do with the SQL SELECT operator.

[21] Again this observation is valid regardless of whether we’re in an SQL context or some other context—though (as with selectors) THE_ operators in SQL aren’t as straightforward as they might be, and “THE_ operator” as such isn’t an SQL term. I note too that some types might have more than one associated THE_ operator. See Chapter 8 for further discussion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.180.81