Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

George Tillmann, Usage-Driven Database Design, 10.1007/978-1-4842-2722-0_16

16. A Look Ahead

George Tillmann¹

(1)Ship Bottom, New Jersey, USA

I really didn’t foresee the Internet. But then, neither did the computer industry. Not that that tells us very much of course—the computer industry didn’t even foresee that the century was going to end.

—Douglas Adams

The most likely way for the world to be destroyed, most expert agree, is by accident. That’s where we come in; we’re computer professionals. We cause accidents.

—Nathaniel Borenstein

IT’s track record for prognostication is miserable.

“There is a world market for maybe five computers,” is a quote attributed to Thomas J. Watson Sr., founder and president of IBM. Whether he said it or not (there is some controversy about the origin of the quote), it reflects a position held by many in the 1940s. IBM proved Watson (or whoever) wrong in the 1960s by dominating the lucrative mainframe market. By the 1970s, there were dozens of minicomputers companies and, although IBM was an early manufacturer of minis, it failed to see the market shifting to those machines. It believed all real data processing would take place on mainframes.

The minicomputer market was dominated by Digital Equipment Company (DEC) , which grew to become the second largest computer company in the world. DEC’s founder and CEO, Ken Olsen, when confronted with the personal computer, is reported to have said, “There is no reason for any individual to have a computer in his home.”

DEC’s failure to see the rise of the microcomputer would lead to the company’s eventual sale to, of all things, Compaq Computer Corporation , a PC manufacturer. (Interestingly, Compaq was not able to weather the shift to servers and was sold to HP.)

Poor prognostication seems to extend to the Internet—it was more than 20 years old before anyone knew it was there, much less saw it coming. The same is true of the World Wide Web, and who predicted that phones would play music and spend a lot of their time being a camera?

I wrote an article in the early 1980s predicting that the DBMS would become a data-architecture-neutral back-end engine, fronted by a network (relational, hierarchal, or whatever) user interface. ¹ Wrong…although, come to think of it, Oracle has its OO and NoSQL versions....

So no predictions. But a wish list? Maybe. A semi-educated wish list, based on a few observations from years of slugging it out in the DBMS trenches, might make sense.

The following are four DBMS issues I, and many of the database designers and DBAs I interviewed, would like to see addressed.

We Need to Ask the Awkward Questions

Some questions seem too awkward to ask. Maybe they are politically incorrect, or maybe they will hurt someone’s feelings. However, maybe they just need to be asked. The data management questions that have never been raised, much less answered, are about the value of a theoretical foundation.

Database management systems have traditionally been categorized by architecture, such as network, relational, and (although it is not properly an architecture but rather a grab bag of sundry architectures) NoSQL. There is, however, another way to categorize DBMSs—functional versus theoretical. Functional systems, such as IMS, IDMS, and NoSQL, grew out of need. Their utility expanded as problems arose and solutions became apparent. Consequently, their structure tends to be a bit ad hoc, with new features bolted on to existing ones. The result can look like a Picasso portrait where the eyes were put…er…wherever they fit.

Theoretical systems , such as the relational model, have a mathematical foundation. If fact, Codd’s aim was to get the DBMS out of the realm of the functional, which he considered “troublesome…confusing…and…mistaken,” and into the realm of the “cleaner…sound…and…superior.”, ² ³

How successful was the shift from functional to theoretical? Well, the academics took to it like politicians to a junket. As Peter Chen, the creator of the entity-relationship model , stated, academics were comfortable with the mathematical language of the relational model and published, published, and published about it. ⁴ A cursory look at Google Scholar turned up more academic references to the relational model than to all other architectures combined.

However, since the publication of the first Codd paper on the relational model almost 50 years ago, there have been no other theoretically based DBMS architectures, models, or products—at least none that have seen the light of day. The relational model is the one and only theoretically based system I could find in production today. All DBMS implementations since the relational model have been functionally oriented. Even many of the relational products have abandoned some of the theoretical roots for functional features.

What does this say about the importance of DBMS theoretical foundations ? How important is a theoretical foundation if no other DBMS creator has followed in Codd’s footsteps? Why should data management practitioners care about the theoretical foundation of their DBMS? The commercial community seems to be saying that it is not important to them and that they are more concerned with performance, ease of use, and getting the job done than mathematical purity.

This is not a philosophical question but rather one that asks where the attention of some of our best information technology-oriented minds should focus. Do we really need another normal form or would the world be better served with a more efficient way of updating distributed data? Can academics put aside those delusions of relevance and focus on something more useful, such as helping the guy trying to get his database to work?

There are also publishing implications. Visit any bookstore. While you will find a few subject-related books on object-oriented and NoSQL databases from practitioners, virtually every college textbook assumes that the database will be relational. They are so relationally driven that relational concepts, such as foreign keys and the ban on repeating groups, are baked into their interpretation of logical data modeling. If you look up the definition of Data Definition Language (DDL) in most textbooks, it will refer only to relational objects such as tables and rows.

The relational model is a great achievement and an important part of the information management landscape. However, it is only part of that landscape. There are other fascinating ideas out there that should be part of every database designer’s and DBA’s experience.

We need a Don’t Assume movement that works to ensure that no authors or teachers assume that the RDBMS is the one and only tool for managing data.

There is another consideration to take into account: the Multiple Experience Concept. You have probably never heard of the Multiple Experience Concept because I just made it up. However, it goes like this.

Imagine that you are the programming manager for a company that needs a system written in an obscure assembler language (call it OAL). You have no OAL-trained staff and cannot find any qualified OLA programmers to hire. You will have to train some existing staff in OLA. What criteria do you use to select OLA training candidates?

Someone who does not know any assembler language
Someone who knows one assembler language
Someone who knows two or more assembler languages

The obvious answer is 3. Knowing one assembler language is good and is certainly better than not knowing any at all, but certainly not as good as knowing several. Knowing one assembler language means knowing the syntax of one assembler language. However, knowing two or more languages means that the programmer not only knows the syntax of multiple languages but also understands at least some of the essence of what it means to be an assembler language. They understand concepts that are not just part of a particular language’s structure but also the fundamentals that all assemblers have in common. Understanding the foundation behind a language makes it easier to learn and use a new language.

The same holds true for high-level languages. Need to teach someone C? Rather than picking the COBOL programmer or the FORTRAN programmer , pick the programmer who knows both. And there is an interesting side effect. Teach the COBOL programmer C, and you wind up with not just a new C programmer but also a better COBOL programmer.

The same is true for data management. Knowing one data architecture is good, but knowing two or more is better because then the database designer, DBA, or application programmer understands more of what is going on underneath the hood, not just what appears on the surface.

Tools Need to Take Usage into Account

The late 1980s and early 1990s were the era of the software development or Computer Aided Software Engineering (CASE) tool. You could not swing a dead cat without hitting a CASE tool startup. These products came in various flavors. There were the code generator holdovers from the 1970s, the process modeling tools that turned flow charts and DFDs into programs, the data modeling tools that turned your E-R diagram into a DBMS schema, and the tool that did it all, soup to nuts, idea to code.

As with most everything else in IT, these tools were oversold. A productivity improvement of 5 percent would have cost justified the tool; however, many vendors promised a 50 percent (or greater) improvement that was both unachievable and ultimately disappointing. (It’s amazing how IT jumps from panacea to panacea, seemingly learning nothing from continuous disappointment while blithely looking for the next catholicon.) The hype pushed these tools to near extinction.

The frenzy is now over, and the advantages of these tools are more realistically understood. A few CASE products have survived and are benefiting a number of IT shops. However, many of these tools exhibit one or more very disappointing characteristics.

First, they rely on a logical model that is impregnated with physical design (almost always RDBMS) concepts. They assume that the DBMS is a relational one. Based on this assumption, their logical data models establish relationships using foreign keys and do not allow group or multivalue data items, among other truly annoying disappointments. The value of these tools with nonrelational databases is minimal and, because of their rejection of a DBMS-independent logical data model, misleading even for relational designers.

Second, they tend to disregard how the data are used. The extent of this flaw ranges from not considering some usage aspects, such as volume or path traffic, to the most prevalent failure—not taking usage into account at all.

In an odd twist, the tools that erroneously inject physical database considerations into logical data modeling totally omit from physical database design how the database will be used.

Tool designers need to do a better job of separating the logical from the physical (Principles 1 and 2 of the database design principles from Chapter 1) and a better job of incorporating usage into their physical database design (Principle 3 from Chapter 1).

The One and Only DBMS

Before the database management system , data in files were considered characteristics or properties of a program. The program opened the file, interpreted the data (see the “Data or Information?” sidebar), and closed the file when the program was finished, placing the data in the cyber-equivalent of a coma, until they were needed again. It was not uncommon for multiple files to contain the same data items used by different programs for different purposes. Synchronization problems were so common, but solutions so elusive, that they were essentially ignored. If the customer name in the customer file agreed with the spelling of the customer name in the account, billing, and credit files, then it was often luck rather than craft.

Data Or Information?

Many information technologists make a distinction between data and information. Data are strings of characters, while information is data with interpretation. For example, take the following string:

5553426978ROBERTABODE

What information is in this data string? Back in the era of flat files, the interpretation of the data was stored in the program. If the program, call it Program A, said that the first seven characters were PHONE NUMBER, the next six characters were FIRST NAME, and the last five were LAST NAME, then you would know the following:

PHONE NUMBER = "555-342-6978"
FIRST NAME = "ROBERT"
LAST NAME = "ABODE"

However, suppose that Program B said that the first four characters were BRANCH NUMBER, the next six were ACCOUNT NUMBER, the next seven were FIRST NAME, and the last four were LAST NAME. Then you would have this:

BRANCH NUMBER = "5553"
ACCOUNT NUMBER = "426978"
FIRST NAME = "ROBERTA"
LAST NAME = "BODE"

That’s a very different interpretation.

Associative array systems placed the interpretation in the record. Their key-value structure preceded every data value with a key containing at least part (the attribute’s name) of its interpretation. Consequently, every program would know the following:

STREET NUMBER = "55534"
POSTAL CODE = "26978"
FIRST NAME = "ROB"
MIDDLE INITIAL = "E"
LAST NAME = "RTABODE"

Most traditional DBMSs go one step further. They move the interpretation of the data from the program (or the record) into the DBMS’s data dictionary.

The DBMS changed everything. A data item , such as CUSTOMER NAME, would be stored only once in the DBMS along with its interpretation. At last, the data existed all the time, whether a program had a file open or not. The single occurrence of a data item along with its single interpretation meant that synchronization problems were a thing of the past. The DBMS became the Swiss Army knife of information management, incorporating all the features an enterprise needed (Table 16-1). However, it didn’t last forever.

Table 16-1. General-Purpose DBMS Features

•Full lifecycle: Oversees the full lifecycle management of the data (creation, access, update, deletion)

•Multiuser: Provides multiuser support (simultaneous database access for multiple end users and programs, whether batch or online)

•Full interpretation: Ensures a single, complete, and continuous interpretation of all database data

•Data independence: Assures that data exist independent of any users or programs

•Consistency and integrity: Enforces both system and user-defined consistency, integrity, and edit rules

•Reliability: Guarantees reliability (ACID compliant)

•Synchronized redundancy: Guards against uncontrolled data duplication (allows selective fully synchronized redundant, but not duplicate, data)

•Security: Protects data from unauthorized access or use

•Single source: Is capable of safeguarding the official and only reliable instance of enterprise data

The world changed when big was invented—Big Data, big computers, big companies, big ideas, and, unfortunately, big problems.

The DBMS that solved the “files everywhere problem”—guaranteeing that there existed one and only one instance of a data item, guaranteeing that synchronization problems were a thing of the past, guaranteeing that the single data instance had only a single interpretation, guaranteeing that there was one and only one official repository of enterprise data—exists no more. The dreaded multiple data files have been replaced with multiple DBMSs managing multiple databases. Need to guarantee the integrity of the data? Store them in a general-purpose ACID-compliant database. Need varied data types? There is a specialty DBMS for that. Need super-fast retrieval? There is a specialty DBMS for that. As a result, after some really fine work solving some real-world problems, we are, in many ways, back where we stated.

The poster child for the specialty DBMS is NoSQL , although other systems share this category. Although NoSQL DBMSs are good products, providing necessary services, they represent a disturbing trend. Rather than adding new features to accomplish these new tasks, they shed expensive existing ones. For example, the DBMS that was ACID compliant is now user beware—the fully interpreted data item is now a bucket or folder to store unformatted data. Rather than storing the interpretation in the DBMS, these systems need to be “interpreted by the application.” The basic tenets of what it is to be a DBMS—maintaining data integrity or transforming data into information—can be lost. The average commercial organization now faces the uncomfortable emergence of a two-tiered information structure. One tier is the generic all-purpose DBMS, with all of its associated guarantees, and a second tier of specialty or niche data managers, each catering to a different need while offering fewer assurances.

Table 16-2 lists the fundamental features that almost every general-purpose DBMS supports and grades a representative sample of NoSQL systems on their ability to support these features.

Table 16-2. Comparing General-Purpose and NoSQL DBMSs

DBMS Feature	Description	NoSQL Grade (A to F)
Full lifecycle	Oversees the full lifecycle management of the data (creation, access, update, deletion)	B
Multiuser	Provides multiuser support (simultaneous database access for multiple end users and programs, whether batch or online)	B
Full interpretation	Ensures a single, complete, and continuous interpretation of all database data	B
Data independence	Assures that data exist independent of any users or programs	D
Consistency and integrity	Enforces both system and user-defined consistency, integrity, and edit rules	B
Reliability	Guarantees reliability (ACID compliant)	C
Synchronized redundancy	Guards against uncontrolled data duplication (allows selective, fully synchronized redundant, but not duplicate, data)	F
Security	Protects data from unauthorized access or use	A
Single source	Is capable of safeguarding the official and only reliable instance of enterprise data	F

The majority of general-purpose DBMSs do a good job supporting all nine features. How do NoSQL systems do? Not so well. Arguably the most damaging of the missed features is single source. By admitting that it cannot function as the enterprise’s official information repository, the NoSQL database relegates itself to a secondary role.

Almost 100 years ago, the economist Joseph Schumpeter wrote that the market leaders (the status quo) who provide the market with an “infrastructure” were disrupted by entrepreneurs who often, through innovation and change, put what they replaced out of business. He called this “creative destruction ” (and became the source of nearly a dozen books of the same name, even if not entirely the same subject). However, if the cycle Schumpeter described is to continue, and the market to thrive, then these new disrupters must provide the new infrastructure. ⁵

Creators of products such as NoSQL have been the entrepreneurs that attacked the entrenched status quo infrastructure, and although they have not yet made a significant monetary impact on the market, they might just be Schumpeter’s disrupting force. If so, then it is time for them to start assuming the role of infrastructure provider, meaning that their products must move out of the niche realm and into the data management mainstream to become the new Swiss Army knives.

IT needs choices, but they need to be choices that compete for the same place in the center of the organization, not ones that reside in its corners. The NoSQL vendors have done some amazing stuff; however, their accomplishments center on either jettisoning some current features (ACID, interpretation, etc.) or digging up some old ideas (hierarchal and network features). Their retrograde change was a needed temporary fix, but now it is time for a more permanent prograde solution. It’s time for the new crop of DBMS designers to shake up the data management world by coming up with some new ideas that build on the past rather than tearing it down.

We need the specialty capabilities NoSQL and similar products provide, but we also need the general-purpose DBMS features we had. The breakthrough product—the killer app of data management—just might be the general-purpose DBMS with web-browser-like plug-in features provided by the specialty DBMS vendors. The ability to plug key-value, document management, or multidimensional capabilities into the enterprise’s official general-purpose DBMS—whether the feature is database-wide or restricted to one or more partitions—could be a best-of-all-possible-worlds solution. It will be interesting to see who the winners will be.

Better Training

Looking back at Chapter 1, Database Design Principle 1 called for an impenetrable wall between logical design and physical design. Principle 2 of the database design principles called for distinguishing logical data modeling from logical process modeling. The word separate is considerably harsher than the more intellectual distinction. This is intentional. The very nature of good design calls for the separation of logical and physical (the what from the how), while the distinction between data and process is more of an observation and realization of the real world of system development. In short, data and process are approached separately because that is how people work.

Object technology started as object-oriented programming, a code development approach that dealt with objects. Objects consist of data and processes. Object-oriented programming led to object-oriented design, object-oriented databases, object-oriented analysis, and object-oriented modeling. The hallmark of object-oriented modeling (OOM) is that the model (graphics and documentation) deals with both data and process. OOM techniques, such as the Unified Modeling Language (UML) , require the modeler to document the system’s data and how those data are used, all in one single model, all in one single step.

It would seem that ideally, OOM would make Principle 2 unnecessary. Unfortunately, the critical word is ideally—in practice, there have been some problems.

OO Or UH, OH

I visited an IT organization of a major financial institution that was “committed to object technology.” They had OO analysis tools, OO design tools, OO programming tools, and, of course, an OODBMS.

An analyst was demonstrating how a process he was working on was entered into a very expensive OOM tool. As he ran through the various operations, I asked him about some of the data on the model. His response was that I would have to talk to the data team about that. Sooo, so close.

Unfortunately, this was not an isolated incident. In the majority of the few IT shops that reported they used OO techniques, either the OO models were just fancy data flow diagrams or, if there was a real effort to use OO techniques, the data and process tasks were performed by two separate teams or one team composed of two separate sets of experts.

Any success in unifying data and process will come from neither the technology nor the tools, but from the people. And so far that is missing.

UML is criticized for being bureaucratic, cumbersome, and ineffective. It is certainly true that the commercial IT world has not embraced it. However, the problem with UML is not the clumsiness of the language, but the psychology of the people in IT.

As was mentioned in Chapter 1, until analysts and designers see the equal importance of understanding both data and process, not much will happen. As long as IT still has process people and data people, schools and vendors still train the two separately, and IT management is concerned with one more than the other, the problems will continue—and books like this one will still be necessary. Only when the data teams and the process teams are merged into the analysis team, not just organizationally but intellectually, can we even consider replacing Principle 2. And the one prediction I am willing to make is that this change will take a very long time.

Notes

George Tillmann, “The Impact of Relational Systems on Future Database Management Systems.” TeleSystems Journal. Volume IX, No. 2, March–April 1982.
Edgar F. Codd, “A Relational Model of Data for Large Shared Data Banks.” Communications of the ACM. 13.6 (1970) p. 377.
Edgar F. Codd, Deliverability, Redundancy, and Consistency of Relations Stored in Large Data Banks. San Jose, IBM Research Report RJ599 (1969) p. 1.
Peter Chen, “Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons Learned,” pp. 296-310 in Manfred Broy and Ernst Denert (Editors), Software Pioneers: Contributions to Software Engineering. Springer Science & Business Media, 2002.
Schumpeter, Joseph. Capitalism, Socialism, and Democracy. London: Allen & Unwin, 1943.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 16. A Look Ahead

Create new playlist

Sign In

Sign Up