© George Tillmann 2017

George Tillmann, Usage-Driven Database Design, 10.1007/978-1-4842-2722-0_16

16. A Look Ahead

George Tillmann

(1)Ship Bottom, New Jersey, USA

I really didn’t foresee the Internet. But then, neither did the computer industry. Not that that tells us very much of course—the computer industry didn’t even foresee that the century was going to end.

—Douglas Adams

The most likely way for the world to be destroyed, most expert agree, is by accident. That’s where we come in; we’re computer professionals. We cause accidents.

Nathaniel Borenstein

IT’s track record for prognostication is miserable.

“There is a world market for maybe five computers,” is a quote attributed to Thomas J. Watson Sr., founder and president of IBM. Whether he said it or not (there is some controversy about the origin of the quote), it reflects a position held by many in the 1940s. IBM proved Watson (or whoever) wrong in the 1960s by dominating the lucrative mainframe market. By the 1970s, there were dozens of minicomputers companies and, although IBM was an early manufacturer of minis, it failed to see the market shifting to those machines. It believed all real data processing would take place on mainframes.

The minicomputer market was dominated by Digital Equipment Company (DEC) , which grew to become the second largest computer company in the world. DEC’s founder and CEO, Ken Olsen, when confronted with the personal computer, is reported to have said, “There is no reason for any individual to have a computer in his home.”

DEC’s failure to see the rise of the microcomputer would lead to the company’s eventual sale to, of all things, Compaq Computer Corporation , a PC manufacturer. (Interestingly, Compaq was not able to weather the shift to servers and was sold to HP.)

Poor prognostication seems to extend to the Internet—it was more than 20 years old before anyone knew it was there, much less saw it coming. The same is true of the World Wide Web, and who predicted that phones would play music and spend a lot of their time being a camera?

I wrote an article in the early 1980s predicting that the DBMS would become a data-architecture-neutral back-end engine, fronted by a network (relational, hierarchal, or whatever) user interface. 1 Wrong…although, come to think of it, Oracle has its OO and NoSQL versions....

So no predictions. But a wish list? Maybe. A semi-educated wish list, based on a few observations from years of slugging it out in the DBMS trenches, might make sense.

The following are four DBMS issues I, and many of the database designers and DBAs I interviewed, would like to see addressed.

We Need to Ask the Awkward Questions

Some questions seem too awkward to ask. Maybe they are politically incorrect, or maybe they will hurt someone’s feelings. However, maybe they just need to be asked. The data management questions that have never been raised, much less answered, are about the value of a theoretical foundation.

Database management systems have traditionally been categorized by architecture, such as network, relational, and (although it is not properly an architecture but rather a grab bag of sundry architectures) NoSQL. There is, however, another way to categorize DBMSs—functional versus theoretical. Functional systems, such as IMS, IDMS, and NoSQL, grew out of need. Their utility expanded as problems arose and solutions became apparent. Consequently, their structure tends to be a bit ad hoc, with new features bolted on to existing ones. The result can look like a Picasso portrait where the eyes were put…er…wherever they fit.

Theoretical systems , such as the relational model, have a mathematical foundation. If fact, Codd’s aim was to get the DBMS out of the realm of the functional, which he considered “troublesome…confusing…and…mistaken,” and into the realm of the “cleaner…sound…and…superior.”, 2 3

How successful was the shift from functional to theoretical? Well, the academics took to it like politicians to a junket. As Peter Chen, the creator of the entity-relationship model , stated, academics were comfortable with the mathematical language of the relational model and published, published, and published about it. 4 A cursory look at Google Scholar turned up more academic references to the relational model than to all other architectures combined.

However, since the publication of the first Codd paper on the relational model almost 50 years ago, there have been no other theoretically based DBMS architectures, models, or products—at least none that have seen the light of day. The relational model is the one and only theoretically based system I could find in production today. All DBMS implementations since the relational model have been functionally oriented. Even many of the relational products have abandoned some of the theoretical roots for functional features.

What does this say about the importance of DBMS theoretical foundations ? How important is a theoretical foundation if no other DBMS creator has followed in Codd’s footsteps? Why should data management practitioners care about the theoretical foundation of their DBMS? The commercial community seems to be saying that it is not important to them and that they are more concerned with performance, ease of use, and getting the job done than mathematical purity.

This is not a philosophical question but rather one that asks where the attention of some of our best information technology-oriented minds should focus. Do we really need another normal form or would the world be better served with a more efficient way of updating distributed data? Can academics put aside those delusions of relevance and focus on something more useful, such as helping the guy trying to get his database to work?

There are also publishing implications. Visit any bookstore. While you will find a few subject-related books on object-oriented and NoSQL databases from practitioners, virtually every college textbook assumes that the database will be relational. They are so relationally driven that relational concepts, such as foreign keys and the ban on repeating groups, are baked into their interpretation of logical data modeling. If you look up the definition of Data Definition Language (DDL) in most textbooks, it will refer only to relational objects such as tables and rows.

The relational model is a great achievement and an important part of the information management landscape. However, it is only part of that landscape. There are other fascinating ideas out there that should be part of every database designer’s and DBA’s experience.

We need a Don’t Assume movement that works to ensure that no authors or teachers assume that the RDBMS is the one and only tool for managing data.

There is another consideration to take into account: the Multiple Experience Concept. You have probably never heard of the Multiple Experience Concept because I just made it up. However, it goes like this.

Imagine that you are the programming manager for a company that needs a system written in an obscure assembler language (call it OAL). You have no OAL-trained staff and cannot find any qualified OLA programmers to hire. You will have to train some existing staff in OLA. What criteria do you use to select OLA training candidates?

  1. Someone who does not know any assembler language

  2. Someone who knows one assembler language

  3. Someone who knows two or more assembler languages

The obvious answer is 3. Knowing one assembler language is good and is certainly better than not knowing any at all, but certainly not as good as knowing several. Knowing one assembler language means knowing the syntax of one assembler language. However, knowing two or more languages means that the programmer not only knows the syntax of multiple languages but also understands at least some of the essence of what it means to be an assembler language. They understand concepts that are not just part of a particular language’s structure but also the fundamentals that all assemblers have in common. Understanding the foundation behind a language makes it easier to learn and use a new language.

The same holds true for high-level languages. Need to teach someone C? Rather than picking the COBOL programmer or the FORTRAN programmer , pick the programmer who knows both. And there is an interesting side effect. Teach the COBOL programmer C, and you wind up with not just a new C programmer but also a better COBOL programmer.

The same is true for data management. Knowing one data architecture is good, but knowing two or more is better because then the database designer, DBA, or application programmer understands more of what is going on underneath the hood, not just what appears on the surface.

Tools Need to Take Usage into Account

The late 1980s and early 1990s were the era of the software development or Computer Aided Software Engineering (CASE) tool. You could not swing a dead cat without hitting a CASE tool startup. These products came in various flavors. There were the code generator holdovers from the 1970s, the process modeling tools that turned flow charts and DFDs into programs, the data modeling tools that turned your E-R diagram into a DBMS schema, and the tool that did it all, soup to nuts, idea to code.

As with most everything else in IT, these tools were oversold. A productivity improvement of 5 percent would have cost justified the tool; however, many vendors promised a 50 percent (or greater) improvement that was both unachievable and ultimately disappointing. (It’s amazing how IT jumps from panacea to panacea, seemingly learning nothing from continuous disappointment while blithely looking for the next catholicon.) The hype pushed these tools to near extinction.

The frenzy is now over, and the advantages of these tools are more realistically understood. A few CASE products have survived and are benefiting a number of IT shops. However, many of these tools exhibit one or more very disappointing characteristics.

First, they rely on a logical model that is impregnated with physical design (almost always RDBMS) concepts. They assume that the DBMS is a relational one. Based on this assumption, their logical data models establish relationships using foreign keys and do not allow group or multivalue data items, among other truly annoying disappointments. The value of these tools with nonrelational databases is minimal and, because of their rejection of a DBMS-independent logical data model, misleading even for relational designers.

Second, they tend to disregard how the data are used. The extent of this flaw ranges from not considering some usage aspects, such as volume or path traffic, to the most prevalent failure—not taking usage into account at all.

In an odd twist, the tools that erroneously inject physical database considerations into logical data modeling totally omit from physical database design how the database will be used.

Tool designers need to do a better job of separating the logical from the physical (Principles 1 and 2 of the database design principles from Chapter 1) and a better job of incorporating usage into their physical database design (Principle 3 from Chapter 1).

The One and Only DBMS

Before the database management system , data in files were considered characteristics or properties of a program. The program opened the file, interpreted the data (see the “Data or Information?” sidebar), and closed the file when the program was finished, placing the data in the cyber-equivalent of a coma, until they were needed again. It was not uncommon for multiple files to contain the same data items used by different programs for different purposes. Synchronization problems were so common, but solutions so elusive, that they were essentially ignored. If the customer name in the customer file agreed with the spelling of the customer name in the account, billing, and credit files, then it was often luck rather than craft.

The DBMS changed everything. A data item , such as CUSTOMER NAME, would be stored only once in the DBMS along with its interpretation. At last, the data existed all the time, whether a program had a file open or not. The single occurrence of a data item along with its single interpretation meant that synchronization problems were a thing of the past. The DBMS became the Swiss Army knife of information management, incorporating all the features an enterprise needed (Table 16-1). However, it didn’t last forever.

Table 16-1. General-Purpose DBMS Features

Full lifecycle: Oversees the full lifecycle management of the data (creation, access, update, deletion)

Multiuser: Provides multiuser support (simultaneous database access for multiple end users and programs, whether batch or online)

Full interpretation: Ensures a single, complete, and continuous interpretation of all database data

Data independence: Assures that data exist independent of any users or programs

Consistency and integrity: Enforces both system and user-defined consistency, integrity, and edit rules

Reliability: Guarantees reliability (ACID compliant)

Synchronized redundancy: Guards against uncontrolled data duplication (allows selective fully synchronized redundant, but not duplicate, data)

Security: Protects data from unauthorized access or use

Single source: Is capable of safeguarding the official and only reliable instance of enterprise data

The world changed when big was invented—Big Data, big computers, big companies, big ideas, and, unfortunately, big problems.

The DBMS that solved the “files everywhere problem”—guaranteeing that there existed one and only one instance of a data item, guaranteeing that synchronization problems were a thing of the past, guaranteeing that the single data instance had only a single interpretation, guaranteeing that there was one and only one official repository of enterprise data—exists no more. The dreaded multiple data files have been replaced with multiple DBMSs managing multiple databases. Need to guarantee the integrity of the data? Store them in a general-purpose ACID-compliant database. Need varied data types? There is a specialty DBMS for that. Need super-fast retrieval? There is a specialty DBMS for that. As a result, after some really fine work solving some real-world problems, we are, in many ways, back where we stated.

The poster child for the specialty DBMS is NoSQL , although other systems share this category. Although NoSQL DBMSs are good products, providing necessary services, they represent a disturbing trend. Rather than adding new features to accomplish these new tasks, they shed expensive existing ones. For example, the DBMS that was ACID compliant is now user beware—the fully interpreted data item is now a bucket or folder to store unformatted data. Rather than storing the interpretation in the DBMS, these systems need to be “interpreted by the application.” The basic tenets of what it is to be a DBMS—maintaining data integrity or transforming data into information—can be lost. The average commercial organization now faces the uncomfortable emergence of a two-tiered information structure. One tier is the generic all-purpose DBMS, with all of its associated guarantees, and a second tier of specialty or niche data managers, each catering to a different need while offering fewer assurances.

Table 16-2 lists the fundamental features that almost every general-purpose DBMS supports and grades a representative sample of NoSQL systems on their ability to support these features.

Table 16-2. Comparing General-Purpose and NoSQL DBMSs

DBMS Feature

Description

NoSQL Grade (A to F)

Full lifecycle

Oversees the full lifecycle management of the data (creation, access, update, deletion)

B

Multiuser

Provides multiuser support (simultaneous database access for multiple end users and programs, whether batch or online)

B

Full interpretation

Ensures a single, complete, and continuous interpretation of all database data

B

Data independence

Assures that data exist independent of any users or programs

D

Consistency and integrity

Enforces both system and user-defined consistency, integrity, and edit rules

B

Reliability

Guarantees reliability (ACID compliant)

C

Synchronized redundancy

Guards against uncontrolled data duplication (allows selective, fully synchronized redundant, but not duplicate, data)

F

Security

Protects data from unauthorized access or use

A

Single source

Is capable of safeguarding the official and only reliable instance of enterprise data

F

The majority of general-purpose DBMSs do a good job supporting all nine features. How do NoSQL systems do? Not so well. Arguably the most damaging of the missed features is single source. By admitting that it cannot function as the enterprise’s official information repository, the NoSQL database relegates itself to a secondary role.

Almost 100 years ago, the economist Joseph Schumpeter wrote that the market leaders (the status quo) who provide the market with an “infrastructure” were disrupted by entrepreneurs who often, through innovation and change, put what they replaced out of business. He called this “creative destruction ” (and became the source of nearly a dozen books of the same name, even if not entirely the same subject). However, if the cycle Schumpeter described is to continue, and the market to thrive, then these new disrupters must provide the new infrastructure. 5

Creators of products such as NoSQL have been the entrepreneurs that attacked the entrenched status quo infrastructure, and although they have not yet made a significant monetary impact on the market, they might just be Schumpeter’s disrupting force. If so, then it is time for them to start assuming the role of infrastructure provider, meaning that their products must move out of the niche realm and into the data management mainstream to become the new Swiss Army knives.

IT needs choices, but they need to be choices that compete for the same place in the center of the organization, not ones that reside in its corners. The NoSQL vendors have done some amazing stuff; however, their accomplishments center on either jettisoning some current features (ACID, interpretation, etc.) or digging up some old ideas (hierarchal and network features). Their retrograde change was a needed temporary fix, but now it is time for a more permanent prograde solution. It’s time for the new crop of DBMS designers to shake up the data management world by coming up with some new ideas that build on the past rather than tearing it down.

We need the specialty capabilities NoSQL and similar products provide, but we also need the general-purpose DBMS features we had. The breakthrough product—the killer app of data management—just might be the general-purpose DBMS with web-browser-like plug-in features provided by the specialty DBMS vendors. The ability to plug key-value, document management, or multidimensional capabilities into the enterprise’s official general-purpose DBMS—whether the feature is database-wide or restricted to one or more partitions—could be a best-of-all-possible-worlds solution. It will be interesting to see who the winners will be.

Better Training

Looking back at Chapter 1, Database Design Principle 1 called for an impenetrable wall between logical design and physical design. Principle 2 of the database design principles called for distinguishing logical data modeling from logical process modeling. The word separate is considerably harsher than the more intellectual distinction. This is intentional. The very nature of good design calls for the separation of logical and physical (the what from the how), while the distinction between data and process is more of an observation and realization of the real world of system development. In short, data and process are approached separately because that is how people work.

Object technology started as object-oriented programming, a code development approach that dealt with objects. Objects consist of data and processes. Object-oriented programming led to object-oriented design, object-oriented databases, object-oriented analysis, and object-oriented modeling. The hallmark of object-oriented modeling (OOM) is that the model (graphics and documentation) deals with both data and process. OOM techniques, such as the Unified Modeling Language (UML) , require the modeler to document the system’s data and how those data are used, all in one single model, all in one single step.

It would seem that ideally, OOM would make Principle 2 unnecessary. Unfortunately, the critical word is ideally—in practice, there have been some problems.

UML is criticized for being bureaucratic, cumbersome, and ineffective. It is certainly true that the commercial IT world has not embraced it. However, the problem with UML is not the clumsiness of the language, but the psychology of the people in IT.

As was mentioned in Chapter 1, until analysts and designers see the equal importance of understanding both data and process, not much will happen. As long as IT still has process people and data people, schools and vendors still train the two separately, and IT management is concerned with one more than the other, the problems will continue—and books like this one will still be necessary. Only when the data teams and the process teams are merged into the analysis team, not just organizationally but intellectually, can we even consider replacing Principle 2. And the one prediction I am willing to make is that this change will take a very long time.

Notes

  1. George Tillmann, “The Impact of Relational Systems on Future Database Management Systems.” TeleSystems Journal. Volume IX, No. 2, March–April 1982.

  2. Edgar F. Codd, “A Relational Model of Data for Large Shared Data Banks.” Communications of the ACM. 13.6 (1970) p. 377.

  3. Edgar F. Codd, Deliverability, Redundancy, and Consistency of Relations Stored in Large Data Banks. San Jose, IBM Research Report RJ599 (1969) p. 1.

  4. Peter Chen, “Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons Learned,” pp. 296-310 in Manfred Broy and Ernst Denert (Editors), Software Pioneers: Contributions to Software Engineering. Springer Science & Business Media, 2002.

  5. Schumpeter, Joseph. Capitalism, Socialism, and Democracy. London: Allen & Unwin, 1943.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.52