On Normalization
I will always cherish the initial misconceptions I had about you.
—UNKNOWN
People often wonder why I didn’t cover normalization in the previous editions of this book, given that it has been part of traditional database design for such a long time. The fact is that it’s unnecessary for me to discuss it for two reasons:
A thorough discussion that would do justice to the subject is beyond the scope of this work, especially given the nontraditional design methodology I present here and the “for mere mortals” nature of the approach to the material.
Normalization is actually incorporated into my design process anyway. (I’ll explain how in a moment.)
I still get questions about this issue and see comments about it on the book’s Amazon.com page, so I decided to include a discussion on how the traditional normalization process is incorporated into my design process. Those of you currently studying the traditional design and normalization process will most likely understand my design methodology more clearly after you’ve read this material.
I want to make a few points perfectly clear before you continue reading:
Please read “The Design Method Presented in This Book” and the “Normalization” sections toward the end of Chapter 2, “Design Objectives,” before you read anything else in this appendix. These sections provide an overall explanation of why and how I came up with my design methodology. They also provide you with the context you need for the points I discuss in the sections that follow.
This is not a formal discussion or tutorial on the traditional normalization process. I’ve recommended books in Appendix H, “Recommended Reading,” that discuss this topic quite well and very thoroughly.
I assume that you already understand the traditional normalization process and its associated concepts and terminology. Throughout the lifespan of this book, I’ve found that the only people who are typically interested in this discussion are either database application programmers, people who already know normalization, or students who are studying normalization. As such, I assume that you are in one or more of these groups if you are reading this appendix.
There is no literal one-to-one mapping between the normalization process as a whole and my design methodology. Normalization is indeed integrated into my methodology, but not in any distinct sequential manner. Whereas normalization is a specific step of the tradition logical design process, it is transparently integrated throughout the entire design process in my methodology. This will become clearer as you read through the material.
My design methodology will yield fully normalized tables. This is true, however, only if you follow my methodology as faithfully as you would the traditional methodology or any other methodology. Taking shortcuts or failing to follow parts of the process will result in poor table structures and poor integrity. But this is also true if you do the same in any other methodology, traditional or otherwise.
Hopefully, you’ve just finished reading the two sections in Chapter 2 that I pointed you to in the first bullet. Now I’ll explain how I integrated normalization into my design methodology.
I’ll start with a short review on how I came up with my methodology, which you learned when you read those two sections in Chapter 2.
Let’s begin, however, by reviewing the overall steps in the traditional design method:
Identify major entities.
Identify relationship types.
Determine primary keys.
Determine foreign keys.
Associate attributes with entity or relationship types.
Determine attribute domains.
Validate the model using normalization.
Define integrity constraints.
The two things that bothered me the most about this methodology were the normalization process (as a whole) and the seemingly endless iterations it took to arrive at a proper design.
I already knew that the purpose of normalization is to transform a set of improperly or poorly designed tables into tables with sound structures. I also understood the process: Take a given table and test it against normal forms to determine whether it is properly designed. If it isn’t designed properly, make the appropriate modifications, retest it, and repeat the entire process until the table structure is sound. Figure G.1 shows how I visualized the process at this point.
There are a number of normal forms, and each one is used to test for a particular set of problems or characteristics, such as modification anomalies, functional dependencies, transitive dependencies, multivalued dependencies, join dependencies, domains, and keys. The problem with normal forms is that they can be quite confusing to anyone who has not taken the time to study formal relational database theory.
At one point, I asked myself, “Why do we take the time to create the database almost three-quarters of the way through, come to a screeching halt, and then determine whether we designed our structures correctly?” I thought this was a ridiculous way to do things.
Keeping the purpose of normalization in mind, I then posed the following questions:
If we assume that a thoroughly normalized table is properly and efficiently designed, shouldn’t we be able to identify the specific characteristics of such a table and state these to be the attributes of an ideal table structure?
Couldn’t we then use that ideal table as a model for all tables we create for the database throughout the design process?
The answer to both questions, of course, is “yes,” so I used this premise as the basis for my “new” design methodology. I first compiled distinct sets of guidelines for creating sound structures by identifying the final characteristics of a well-defined database that successfully passed the tests of each normal form. I then conducted a few tests, using the new guidelines to create table structures for a new database and to correct flaws in the table structures of an existing database. These tests went very well, so I decided to apply this technique to the entire traditional design methodology. I formulated guidelines to address other issues associated with the traditional design method, such as domains, subtypes, relationships, data integrity, and referential integrity. After I completed the new guidelines, I performed more tests and found that my methodology worked quite well.
My design methodology removes many aspects of the traditional design methodology that new database developers find intimidating. For example, normalization, in the traditional sense, is now transparent to the developer because it is incorporated (via the new guidelines) throughout the design process. The methodology is also clear and easy to implement, which I believe is due to the fact that the guidelines are in plain English, making them easy for most anyone to understand.
I believe that the process of designing a database is not and should not be hard to understand. As long as the process is presented in a straightforward manner and each concept or technique is clearly explained, anyone should be able to design a database properly.
As I mentioned earlier, there is no direct one-to-one mapping between normalization and my design methodology. Rather, various elements of my methodology transparently work together to resolve the issues usually addressed by the normalization process. My approach additionally and transparently addresses and resolves other traditional design issues such as scalar values, determinates, functional dependencies, domains, modification anomalies, referential integrity, cardinality, and optionality.
Let’s first look at Table G.1 and consider the fundamental issues each normal form addresses before I show you how my system specifically deals with them.
Table G.1 Fundamental Issues Addressed by Each Normal Form
First Normal Form |
Deals with functional and multivalued dependencies |
Second Normal Form |
Deals with functional dependencies, transitive dependencies, and calculated fields |
Third Normal Form |
Deals with functional dependencies and modification anomalies |
Fourth Normal Form |
Deals with multivalued dependencies |
Fifth Normal Form |
Deals with join dependencies |
Sixth Normal Form |
Primarily used on spatial data |
Boyce/Codd Normal Form |
Deals with determinates and candidate keys |
Domain/Key Normal Form |
Deals with domains and keys |
Keeping this in mind (along with the other issues I referenced earlier), I originally strove to develop a design methodology that incorporated all this and addressed it in a more efficient and far less repetitive manner. I also wanted my methodology to be clear and easily understood by anyone deciding to adopt it. That is why I specifically decided to move away from the traditional jargon and mathematical approach and instead use plain English to describe the processes I developed.
Table G.2 shows how various components of my design methodology address traditional normalization and design issues.
Table G.2 How My Methodology Addresses Traditional Normalization and Design Issues
Component of My Design Methodology |
Traditional Normalization or Design Issues It Addresses |
---|---|
Business Rule Specifications |
Logical domains, validation tables |
Elements of a Candidate/Primary Key |
Functional dependencies, multivalued dependencies, transitive dependencies, determinates |
Elements of a Foreign Key |
Foreign keys, referential integrity, modification anomalies |
Elements of the Ideal Field |
Scalar values, multivalued and multipart fields, calculated values |
Elements of the Ideal Table |
Functional dependencies, multivalued dependencies, transitive dependencies, join dependencies, duplicate fields, duplicate and redundant data, modification anomalies, subtypes |
Field-Level Integrity |
Scalar values, physical domains, logical domains, domain integrity |
Field Specifications |
Scalar values, physical domains, logical domains, domain integrity |
Relationship Characteristics |
Cardinality, optionality, deletion rules |
Relationship-Level Integrity |
Referential integrity, foreign keys |
Resolving Multipart Fields |
Scalar values, logical domains |
Resolving Multivalued Fields |
Scalar values, logical domains |
Table-Level Integrity |
Primary key integrity, duplicate records, functional dependencies |
As I’ve said all along, my approach does indeed deal with all the issues you would typically address with the normalization process, and it will yield fully normalized tables. It will only do so, however, if you follow my methodology as faithfully as you would the traditional methodology or any other methodology. Keep in mind that you’re actually dealing with all of these issues as you develop the database instead of waiting to deal with them until you’re about two-thirds of the way through the process in the traditional method. I’ve found this to be a much better approach to design—there’s less repetitiveness and it certainly takes less time overall to design the database.
I’m occasionally asked why I didn’t include more discussion on SQL and implementation issues such as indexing, partitioning, and distribution. The answer is quite simple: I’ve always believed that the logical design process and the physical design and implementation processes should be kept separate.
I still believe that many people unwittingly fall into the trap of designing a database based solely on the RDBMS software they will use for its implementation. In many cases, they do so because they are already somewhat familiar and skilled with a particular RDBMS or their company or organization is already using a particular RDBMS. This is an unwise approach that you should avoid (as much as possible) for several reasons:
You’re likely to make design decisions based on your perceptions of what your RDBMS can or can’t do. For example, you may decide not to impose a degree of participation for a given relationship because you believe the RDBMS does not provide you with the means to do so.
You’ll inadvertently let the RDBMS dictate the design of the database as opposed to driving the design strictly from the organization’s information requirements. This usually occurs when you discover that your RDBMS provides only limited support for certain aspects of the database, such as field specifications and relationship characteristics.
Your design will be constrained by your knowledge of the RDBMS. For example, you may decide not to implement relationship characteristics simply because you don’t know how to do so.
Your design will be constrained by how skilled you are with your RDBMS. Your skill level affects how efficiently and effectively you can implement various aspects of the database, such as field specifications and business rules.
Using this approach to design a database commonly results in improper structural design, insufficient data integrity, and problems with inconsistent data and inaccurate information. Defining a database within an RDBMS can be deceptively easy. You may create a database that works, but you’re very likely to have a poor design without knowing it.
In the end, the RDBMS that you know and love so well may not be suitable for your organization’s database requirements.
I believe you should always design the logical structure of your database without regard to any RDBMS. By doing so, you’re more likely to design a sound structure because you’ll be focused on the organization’s information requirements. After your design is complete, you can then clearly determine how you should implement the database (single-user application, client/server, web-based, and so on) and which RDBMS you should use to facilitate the implementation.
(The preceding content is in the “Database Design Based on the Database Software” section in Chapter 14, “Bad Design—What Not to Do,” but I thought it bore repeating here.)
I hope this finally clears any confusion and answers any questions you may have regarding my design methodology and normalization. I’ve certainly accomplished my goal if you now at least understand my approach a little more clearly and see how it does address the same issues as normalization.
3.147.104.120