Appendix C. A Relational Approach to Missing Information

The book Database Explorations: Essays on The Third Manifesto and Related Matters, by Hugh Darwen and myself (see Appendix G), describes a variety of approaches to the problem of missing information, all of which avoid the use of, or apparent need for, SQL-style nulls. The present appendix is based on a chapter from that book, and it describes one of those approaches in detail. The approach in question is known as the decomposition approach, because it involves decomposing, in a variety of ways, relvars that might appear to require nulls (or something like them) into ones that don’t. In other words, the emphasis is on designing the database in such a way as to avoid a perceived need for nulls. As a consequence, the approach:

  • Has no notion of null or any other construct that’s allowed to appear anywhere a value is expected and yet isn’t itself a value

  • Relies exclusively on classical two-valued logic (2VL), instead of three-valued logic (3VL) or, more generally, n-valued logic (nVL) for some n > 2

  • Abides by The Information Principle—see Appendix A of this book and elsewhere—in that, at all times, the database contains relations and nothing but relations

  • Is capable of dealing with missing information of any number of different kinds

Note: Before going any further, I should mention that the approach I’m going to be describing is similar but not identical to one proposed by David McGoveran in 1994 in a series of papers with the overall title “Nothing from Nothing” (again, see Appendix G).

Consider Figure C-1 overleaf, which shows a version of our usual suppliers table in which certain information is missing (indicated in the figure, as in Chapter 1 and Chapter 4, by shading the pertinent entries). Note that I can’t say the figure shows a relation, precisely because of those shaded entries; hence my use of the term table, and the related terms column and row, here and throughout this appendix. Now, I said in Chapter 5 that the predicate for suppliers was as follows:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY.

For present purposes, however, I’ll simplify this predicate slightly by dropping the bit about the supplier being under contract. The predicate becomes:

Supplier SNO is named SNAME, has status STATUS, and is located in city CITY.

Table S—sample value

Figure C-1. Table S—sample value

This predicate is at best approximate, however. It would be appropriate if it weren’t for those shaded entries. After all, the following—obtained from the predicate by substituting values from the row in Figure C-1 for supplier S1—is a meaningful instantiation of it (i.e., it’s a meaningful proposition):

Supplier S1 is named Smith, has status 20, and is located in city London.

But if we substitute values from the row for, say, supplier S2, we obtain:

Supplier S2 is named Jones, has status 10, and is located in city ░.

And this certainly isn’t a meaningful instantiation or proposition; in fact, it doesn’t make sense at all.

Another interesting question is: What are the data types for columns STATUS and CITY? (I’m assuming here for the sake of the example, and I’ll continue to assume throughout the rest of this appendix, that shaded entries don’t appear, and won’t ever appear, in the other two columns, SNO and SNAME.) In SQL in particular, the shaded entries in columns STATUS and CITY can be interpreted as meaning the pertinent entries are null; elsewhere in this book, however, I’ve said the (SQL) data types of those columns are INTEGER and VARCHAR(20), respectively, and null certainly isn’t a value of either type INTEGER or type VARCHAR(20). In fact, of course, null isn’t a value at all, and so it can’t be said to be of any type at all.[190]

From this preliminary discussion, it should be clear that what we need to do is get rid of those shaded entries. Two kinds of decomposition, vertical and horizontal, can be used to achieve this goal.



[190] In SQL, by contrast, it’s considered to be of every type. To quote the standard: “Every data type includes a special value, called the null value ... [that] is neither equal to any other value nor not equal to any other value.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.229.246