THE RELIANCE ON ATTRIBUTE NAMES

There’s one question that might have been bothering you but hasn’t been addressed in this chapter so far. The operators of the relational algebra, at least as described in this book, all rely heavily on attribute naming. For example, the Tutorial D expression R1 JOIN R2—where I’ll suppose, just to be definite, that R1 and R2 are base relvars—is defined to do the join on the basis of those attributes of R1 and R2 that have the same names. But the question often arises: Isn’t this approach rather fragile? For example, what happens if we later add a new attribute to relvar R2, say, that has the same name as one already existing in relvar R1?

Well, first let me clarify one point. It’s true that the operators do rely, considerably, on proper attribute naming. However, they also require attributes of the same name to be of the same type (and hence in fact to be the very same attribute, formally speaking); equivalently, they require attributes of different types to have different names. Thus, for example, an error would occur—at compile time, too, I would hope—if, in the expression R1 JOIN R2, R1 and R2 both had an attribute called A but the two A’s were of different types.[90] Note that this requirement (that attributes of different types have different names) imposes no serious limitation on functionality, thanks to the availability of the attribute RENAME operator.

Now to the substance of the question. In fact, there’s a popular misconception here, and I’m very glad to have this opportunity to dispel it. In today’s SQL systems, application program access to the database is provided either through a call level interface or through an embedded, but conceptually distinct, data sublanguage (“embedded SQL”). But embedded SQL is really just a call level interface with a superficial dusting of syntactic sugar, so the two approaches come to the same thing from the DBMS’s point of view, and indeed from the host language’s point of view as well. In other words, SQL and the host language are typically only loosely coupled in most systems today. As a result, much of the advantage of using a well designed, well structured programming language is lost in today’s database environment. Here’s a quote:[91] “Most programming errors in database applications would show up as type errors [if the database definition were] part of the type structure of the program.”

Now, the fact that the database definition is not “part of the type structure of the program” in today’s systems can be traced back to a fundamental misunderstanding that was prevalent in the database community in the early 1960s or so. The perception at that time was that, in order to achieve data independence (more specifically, logical data independence—see Chapter 9), it was necessary to move the database definition out of the program so that, in principle, that definition could be changed later without changing the program. But that perception was at least partly incorrect. What was, and is, really needed is two separate definitions, one inside the program and one outside; the one inside would represent the programmer’s perception of the database (and would provide the necessary compile time checking on queries, etc.), the one outside would represent the database “as it really is.” Then, if it subsequently becomes necessary to change the definition of the database “as it really is,” logical data independence is preserved by changing the mapping between the two definitions.

Here’s how the mechanism I’ve just described might look in SQL. First let me introduce the notion of a public table, which represents the application’s perception of some portion of the database. For example:

     CREATE PUBLIC TABLE X            /* hypothetical syntax! */
        ( SNO   VARCHAR(5)  NOT NULL ,
          SNAME VARCHAR(25) NOT NULL ,
          CITY  VARCHAR(20) NOT NULL ,
          UNIQUE ( SNO ) ) ;

     CREATE PUBLIC TABLE Y            /* hypothetical syntax! */
        ( SNO   VARCHAR(5)  NOT NULL ,
          PNO   VARCHAR(6)  NOT NULL ,
          UNIQUE ( SNO , PNO ) ) ,
          FOREIGN KEY ( SNO ) REFERENCES X ( SNO ) ) ;

These definitions effectively assert that “the application believes” there are tables in the suppliers-and-parts database called X and Y, with columns and keys as specified. Such is not the case, of course—but there are database tables called S and SP (with columns and keys as specified for X and Y, respectively, but with one additional column in each case), and we can define mappings as follows

X ≝ SELECT SNO , SNAME , CITY FROM S ;  /* hypothetical syntax! */
Y ≝ SELECT SNO , PNO FROM SP ;          /* hypothetical syntax! */

These mappings are defined outside the application (the symbol “≝” means “is defined as”).

Now consider the SQL expression X NATURAL JOIN Y. Clearly, the join here is being done on the basis of the common column SNO. And if, say, a column SNAME is added to the database table SP, all we have to do is change the mapping—actually no change is required at all, in this particular example!—and everything will continue to work as before; in other words, logical data independence will be preserved.

Unfortunately, today’s SQL products don’t work this way. Thus, for example, the SQL expression S NATURAL JOIN SP is, sadly, subject to exactly the “fragility” problem mentioned in the original question (but then so too is the simpler expression SELECT * FROM S, come to that). However, you can reduce that problem to more manageable proportions by adopting the strategy suggested under the discussion of column naming in Chapter 3. For convenience, I repeat that strategy here:

  • For every base table, define a view identical to that base table except possibly for some column renaming.

  • Make sure the set of views so defined abides by the naming discipline described in that same discussion (i.e., of column naming) in Chapter 3.

  • Operate in terms of those views instead of the underlying base tables.

Now, if the base tables change subsequently, all you’ll have to do is change the view definitions accordingly.



[90] Actually such an error might not occur in SQL, because SQL permits coercions; but Tutorial D doesn’t, and the observation is certainly true of Tutorial D.

[91] From Atsushi Ohori, Peter Buneman, and Val Breazu-Tannen: “Database Programming in Machiavelli—A Polymorphic Language with Static Type Inference,” Proc. ACM SIGMOD International Conference on Management of Data, Portland, Ore. (June 1989).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.40.36