Images

G

On Normalization

I will always cherish the initial misconceptions I had about you.

—UNKNOWN

People often wonder why I didn’t cover normalization in the previous editions of this book, given that it has been part of traditional database design for such a long time. The fact is that it’s unnecessary for me to discuss it for two reasons:

  1. A thorough discussion that would do justice to the subject is beyond the scope of this work, especially given the nontraditional design methodology I present here and the “for mere mortals” nature of the approach to the material.

  2. Normalization is actually incorporated into my design process anyway. (I’ll explain how in a moment.)

I still get questions about this issue and see comments about it on the book’s Amazon.com page, so I decided to include a discussion on how the traditional normalization process is incorporated into my design process. Those of you currently studying the traditional design and normalization process will most likely understand my design methodology more clearly after you’ve read this material.

Please Note. . .

I want to make a few points perfectly clear before you continue reading:

  • Please read “The Design Method Presented in This Book” and the “Normalization” sections toward the end of Chapter 2, “Design Objectives,” before you read anything else in this appendix. These sections provide an overall explanation of why and how I came up with my design methodology. They also provide you with the context you need for the points I discuss in the sections that follow.

  • This is not a formal discussion or tutorial on the traditional normalization process. I’ve recommended books in Appendix H, “Recommended Reading,” that discuss this topic quite well and very thoroughly.

  • I assume that you already understand the traditional normalization process and its associated concepts and terminology. Throughout the lifespan of this book, I’ve found that the only people who are typically interested in this discussion are either database application programmers, people who already know normalization, or students who are studying normalization. As such, I assume that you are in one or more of these groups if you are reading this appendix.

  • There is no literal one-to-one mapping between the normalization process as a whole and my design methodology. Normalization is indeed integrated into my methodology, but not in any distinct sequential manner. Whereas normalization is a specific step of the tradition logical design process, it is transparently integrated throughout the entire design process in my methodology. This will become clearer as you read through the material.

  • My design methodology will yield fully normalized tables. This is true, however, only if you follow my methodology as faithfully as you would the traditional methodology or any other methodology. Taking shortcuts or failing to follow parts of the process will result in poor table structures and poor integrity. But this is also true if you do the same in any other methodology, traditional or otherwise.

Hopefully, you’ve just finished reading the two sections in Chapter 2 that I pointed you to in the first bullet. Now I’ll explain how I integrated normalization into my design methodology.

A Brief Recap

I’ll start with a short review on how I came up with my methodology, which you learned when you read those two sections in Chapter 2.

Let’s begin, however, by reviewing the overall steps in the traditional design method:

  • Identify major entities.

  • Identify relationship types.

  • Determine primary keys.

  • Determine foreign keys.

  • Associate attributes with entity or relationship types.

  • Determine attribute domains.

  • Validate the model using normalization.

  • Define integrity constraints.

The two things that bothered me the most about this methodology were the normalization process (as a whole) and the seemingly endless iterations it took to arrive at a proper design.

I already knew that the purpose of normalization is to transform a set of improperly or poorly designed tables into tables with sound structures. I also understood the process: Take a given table and test it against normal forms to determine whether it is properly designed. If it isn’t designed properly, make the appropriate modifications, retest it, and repeat the entire process until the table structure is sound. Figure G.1 shows how I visualized the process at this point.

A figure presents the general normalization process.

Figure G.1 How I viewed the general normalization process.

There are a number of normal forms, and each one is used to test for a particular set of problems or characteristics, such as modification anomalies, functional dependencies, transitive dependencies, multivalued dependencies, join dependencies, domains, and keys. The problem with normal forms is that they can be quite confusing to anyone who has not taken the time to study formal relational database theory.

At one point, I asked myself, “Why do we take the time to create the database almost three-quarters of the way through, come to a screeching halt, and then determine whether we designed our structures correctly?” I thought this was a ridiculous way to do things.

A cartoon image shows a man with stretched hands. The speech bubble reads, “Wait a minute. Do we even have sound structuresy?”

Keeping the purpose of normalization in mind, I then posed the following questions:

  1. If we assume that a thoroughly normalized table is properly and efficiently designed, shouldn’t we be able to identify the specific characteristics of such a table and state these to be the attributes of an ideal table structure?

  2. Couldn’t we then use that ideal table as a model for all tables we create for the database throughout the design process?

The answer to both questions, of course, is “yes,” so I used this premise as the basis for my “new” design methodology. I first compiled distinct sets of guidelines for creating sound structures by identifying the final characteristics of a well-defined database that successfully passed the tests of each normal form. I then conducted a few tests, using the new guidelines to create table structures for a new database and to correct flaws in the table structures of an existing database. These tests went very well, so I decided to apply this technique to the entire traditional design methodology. I formulated guidelines to address other issues associated with the traditional design method, such as domains, subtypes, relationships, data integrity, and referential integrity. After I completed the new guidelines, I performed more tests and found that my methodology worked quite well.

My design methodology removes many aspects of the traditional design methodology that new database developers find intimidating. For example, normalization, in the traditional sense, is now transparent to the developer because it is incorporated (via the new guidelines) throughout the design process. The methodology is also clear and easy to implement, which I believe is due to the fact that the guidelines are in plain English, making them easy for most anyone to understand.

I believe that the process of designing a database is not and should not be hard to understand. As long as the process is presented in a straightforward manner and each concept or technique is clearly explained, anyone should be able to design a database properly.

How Normalization Is Integrated into My Design Methodology

As I mentioned earlier, there is no direct one-to-one mapping between normalization and my design methodology. Rather, various elements of my methodology transparently work together to resolve the issues usually addressed by the normalization process. My approach additionally and transparently addresses and resolves other traditional design issues such as scalar values, determinates, functional dependencies, domains, modification anomalies, referential integrity, cardinality, and optionality.

Let’s first look at Table G.1 and consider the fundamental issues each normal form addresses before I show you how my system specifically deals with them.

Table G.1 Fundamental Issues Addressed by Each Normal Form

First Normal Form

Deals with functional and multivalued dependencies

Second Normal Form

Deals with functional dependencies, transitive dependencies, and calculated fields

Third Normal Form

Deals with functional dependencies and modification anomalies

Fourth Normal Form

Deals with multivalued dependencies

Fifth Normal Form

Deals with join dependencies

Sixth Normal Form

Primarily used on spatial data

Boyce/Codd Normal Form

Deals with determinates and candidate keys

Domain/Key Normal Form

Deals with domains and keys

Keeping this in mind (along with the other issues I referenced earlier), I originally strove to develop a design methodology that incorporated all this and addressed it in a more efficient and far less repetitive manner. I also wanted my methodology to be clear and easily understood by anyone deciding to adopt it. That is why I specifically decided to move away from the traditional jargon and mathematical approach and instead use plain English to describe the processes I developed.

Table G.2 shows how various components of my design methodology address traditional normalization and design issues.

Table G.2 How My Methodology Addresses Traditional Normalization and Design Issues

Component of My Design Methodology

Traditional Normalization or Design Issues It Addresses

Business Rule Specifications

Logical domains, validation tables

Elements of a Candidate/Primary Key

Functional dependencies, multivalued dependencies, transitive dependencies, determinates

Elements of a Foreign Key

Foreign keys, referential integrity, modification anomalies

Elements of the Ideal Field

Scalar values, multivalued and multipart fields, calculated values

Elements of the Ideal Table

Functional dependencies, multivalued dependencies, transitive dependencies, join dependencies, duplicate fields, duplicate and redundant data, modification anomalies, subtypes

Field-Level Integrity

Scalar values, physical domains, logical domains, domain integrity

Field Specifications

Scalar values, physical domains, logical domains, domain integrity

Relationship Characteristics

Cardinality, optionality, deletion rules

Relationship-Level Integrity

Referential integrity, foreign keys

Resolving Multipart Fields

Scalar values, logical domains

Resolving Multivalued Fields

Scalar values, logical domains

Table-Level Integrity

Primary key integrity, duplicate records, functional dependencies

As I’ve said all along, my approach does indeed deal with all the issues you would typically address with the normalization process, and it will yield fully normalized tables. It will only do so, however, if you follow my methodology as faithfully as you would the traditional methodology or any other methodology. Keep in mind that you’re actually dealing with all of these issues as you develop the database instead of waiting to deal with them until you’re about two-thirds of the way through the process in the traditional method. I’ve found this to be a much better approach to design—there’s less repetitiveness and it certainly takes less time overall to design the database.

Logical Design versus Physical Design and Implementation

I’m occasionally asked why I didn’t include more discussion on SQL and implementation issues such as indexing, partitioning, and distribution. The answer is quite simple: I’ve always believed that the logical design process and the physical design and implementation processes should be kept separate.

I still believe that many people unwittingly fall into the trap of designing a database based solely on the RDBMS software they will use for its implementation. In many cases, they do so because they are already somewhat familiar and skilled with a particular RDBMS or their company or organization is already using a particular RDBMS. This is an unwise approach that you should avoid (as much as possible) for several reasons:

  • You’re likely to make design decisions based on your perceptions of what your RDBMS can or can’t do. For example, you may decide not to impose a degree of participation for a given relationship because you believe the RDBMS does not provide you with the means to do so.

  • You’ll inadvertently let the RDBMS dictate the design of the database as opposed to driving the design strictly from the organization’s information requirements. This usually occurs when you discover that your RDBMS provides only limited support for certain aspects of the database, such as field specifications and relationship characteristics.

  • Your design will be constrained by your knowledge of the RDBMS. For example, you may decide not to implement relationship characteristics simply because you don’t know how to do so.

  • Your design will be constrained by how skilled you are with your RDBMS. Your skill level affects how efficiently and effectively you can implement various aspects of the database, such as field specifications and business rules.

  • Using this approach to design a database commonly results in improper structural design, insufficient data integrity, and problems with inconsistent data and inaccurate information. Defining a database within an RDBMS can be deceptively easy. You may create a database that works, but you’re very likely to have a poor design without knowing it.

  • In the end, the RDBMS that you know and love so well may not be suitable for your organization’s database requirements.

I believe you should always design the logical structure of your database without regard to any RDBMS. By doing so, you’re more likely to design a sound structure because you’ll be focused on the organization’s information requirements. After your design is complete, you can then clearly determine how you should implement the database (single-user application, client/server, web-based, and so on) and which RDBMS you should use to facilitate the implementation.

(The preceding content is in the “Database Design Based on the Database Software” section in Chapter 14, “Bad Design—What Not to Do,” but I thought it bore repeating here.)

I hope this finally clears any confusion and answers any questions you may have regarding my design methodology and normalization. I’ve certainly accomplished my goal if you now at least understand my approach a little more clearly and see how it does address the same issues as normalization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.104.120