How to Approach Database Design

Discussions of database design for relational systems—this one included—often seem schizophrenic. On the one hand, you're told that the relational model makes database design intuitive and easy; on the other hand, you're told that unless your database is “simple” (whatever that is), you'll either have to refer to other sources (which sounds ominous), or let the gurus do it for you. So why bother to wade through endless explanations?

Recognizing this confusion, one relational theorist postulates that database design is often easier to do than to explain exactly what it was you did. In our attempt to explain exactly what to do when you design a database, we will encourage and educate your intuitive impulses, as well as briefly discuss two design aids: normalization and entity-relationship modeling.

Educating the intuition is mostly a matter of demonstrating a few mistakes that beginning designers are likely to make. Once the flaws have been pointed out, the correct structures become obvious by comparison.

As for the formal design methodologies, most experts stress that they are guidelines rather than rigid rules. The best way to describe the existence of real-world objects and the relationships among them is, to some degree, a subjective matter—almost as much in database design as in natural language. Often there's more than one correct solution to a design problem, and there is sometimes good reason to violate even the most basic design rule.

But this doesn't mean that the theories of database design are not useful. Even as a beginner, it's important that you understand the basics. As your experience grows, you will probably rely on the formal methodologies more and more, but don't let them tyrannize you.

Getting Started

Many discussions of relational database design focus almost entirely on how to apply the normalization rules. Basically, normalization means protecting your data integrity by avoiding duplicate data. This often results in splitting a table that initially seems to “make sense” into two or more related tables that can be “put back together” with the join operation. The technical term for this process is non-loss decomposition, which simply means splitting a table into several smaller tables without losing any information.

The normalization guidelines are most valuable as an after-the-fact check on your work. Once you have a pretty good idea which columns go into which tables, you can analyze them according to the normalization rules in order to make sure you haven't committed any database design faux pas. An understanding of normalization can also guide you as you build your design, but it's not a recipe for creating a database structure from scratch.

So how do you figure out what columns go where in the first place? What is the recipe? The answer is that there is no very precise method. However, you can get a good deal of help from entity-relationship modeling, analyzing the data in terms of

  • The entities (objects or things) it describes

  • The relationships (one-to-one, one-to-many, or many-to-many) between those entities

In practice, designing a database requires combining a thorough understanding of the world you're trying to model with the analysis techniques of entity-relationship modeling and normalization. Then you examine your results and do it again. Database design is usually an iterative process in which you keep getting closer and closer to what you want, but you often move back a step or two and redo earlier work as you refine your idea of what you need.

To give you a more concrete idea, here's an example of some steps you might follow when you design a database:

1.
Investigate and think about the information environment you're modeling. Where will the information come from and in what form? How will it be entered into the system and by whom? How frequently will it change? What is most critical in terms of response time and availability? How much of the universe of data do you actually need?

Examine all paper and online files and forms that are currently used to store and track the organization's data; consider also what kind of output is needed from the database—reports, purchase orders, statistical information—and for whom. In a shared database environment, you'll need to collect information by interviewing other people in the organization, either individually or in groups. Don't forget anyone who will be involved in any way with the data—generating it, handling it, changing it, querying it, making reports from it, and so on.

2.
Make a list of the entities (things that are the subjects of the database), along with their properties or attributes. The entities are likely to wind up being tables (each row describing one thing, such as a person or a company or a book); the properties are likely to be columns in those tables (the person's salary, the company's address, the book's price). Of course, you can list all possible attributes first and then group them into entities, rather than starting with entities. Whichever method you choose, keep reviewing your work. Do the attributes really belong where you put them, or would they make more sense attached to a different entity? Do you need additional entities? More or different attributes? Are these entities really needed, or are they in some way outside the scope of this particular database?

3.
As you work, find a systematic way to record the design decisions you're making, either on paper or with a text editor. Designers generally start with lists and then move to sketches of the tables and the relationships among them, called data structure diagrams or entity-relationship (E-R) diagrams.

4.
Once you have made preliminary decisions about the entities and their attributes, make sure that each entity has an attribute (or group of attributes) that you can use to identify uniquely any row in the future table. This unique identifier is often called the logical primary key. If the natural primary key (author name, book title, company name) is not unique, you may have to look for a different attribute that fills this purpose (Social Security number for an author, ISBN number for a book) or add a column to serve as a surrogate key (assign unique company identifiers or generate incremental order numbers, for example).

5.
Next, consider the relationships between the entities. Are they one-to-many (one publisher has many titles, but each title has only one publisher) or many-to-many (an author can write multiple books, and a book can have multiple authors)? Do you have ways to join the data in one proposed table to that in other related tables? Foreign keys (columns that match a primary key in a related table) serve this function. While pub_id in publishers is a primary key, pub_id in titles is a foreign key.

6.
After you have a draft of the database design, look at it as a whole and analyze it according to the normalization guidelines (discussed later in this chapter) to find logical errors. Correct any violations of the normal forms—or make a conscious decision to override the normalization guidelines in the interests of ease of comprehension or performance. Document the reasons for such decisions.

7.
Now you're ready to put the database online and add some prototype data, using SQL for both steps. Experiment with some of the queries and reports you think you'll need. You may want to make some benchmark tests (see Chapter 10) to try out a few variations on the design.

8.
Reevaluate what you've done in light of how satisfied you are with the results. You'll probably find you need to add some entities you overlooked. You may be able to do without some that “logically” belong but actually play no role in the applications using this data.

Most of the rest of this chapter is devoted to an explanation of how we approached the design of the bookbiz database. This detailed, step-by-step discussion should help you understand the database design process.

The Characteristics of a Good Design

What is a good database design—a “clean” design, as the jargon has it? Broadly speaking, a good design

  • Makes your interactions with the database easier to understand

  • Guarantees the consistency of the database

  • Paves the way for the highest performance your system can deliver

Some factors that make a database easy to understand are not technically part of database design. Practically speaking, wide tables (many columns) are difficult to read and understand because they don't fit on your computer screen or on a printed page. On the other hand, splitting data into many small tables makes it hard to see relationships. Settling on the right number of columns is a compromise between ease of comprehension and adherence to the normalization guidelines.

A well-designed database helps prevent the introduction of inconsistent information and the unintentional deletion of information. It accomplishes these ends by minimizing the unnecessary duplication of data within tables and making it possible to support referential integrity among tables. The perils of data inconsistency are explained in more detail later in this chapter.

Finally, a well-designed database is a prerequisite for satisfactory performance. Again, the number of columns in a table is important: The retrieval of data can be slower if results have to come from many tables rather than from one table. On the other hand, huge tables can require the system to handle more data than may be absolutely necessary to answer a particular query. In other words, the number and size of tables affect performance. (Also crucial for performance purposes are appropriate choices about which columns to index and what kind of indexes to put on them. Indexing is a matter of physical design rather than logical database design; it is discussed in Chapters 3 and 10.)

These are some of the benefits of good database design. A bad design, on the other hand, can

  • Return incorrect query results

  • Foster misunderstandings of query results

  • Increase the risk of introducing inconsistencies in the data

  • Force redundant data entry

  • Make life difficult if you need to change the structure of the tables that you've built and filled with data

No single solution can fully satisfy all the objectives of good design. Frequently, you juggle trade-offs, making choices based on the needs and uses of the application for which the database is being designed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.14.235