8
Designing Databases to Support Software

The previous chapters showed you how to gather user requirements, build a database model, and normalize the database to improve its performance and robustness. You learned how to look at the database from your customers' perspective, from the end-users' perspective, and from a database normalization perspective. But there's one other viewpoint that you should consider before you open your database product and start slapping tables together: the programmer's.

You might not be responsible for writing a program to work with a database. The database might not ever directly interact with a program (although that's rare). In any case, the techniques for making a database easier for a program to use often apply to other situations. Learning how to help a database support software applications can make the database easier to use in general.

In this chapter, you learn:

  • Steps that you can take to make the database more efficient in practical use
  • Methods for making validation easier in the user interface
  • Ways to easily manage nonsearchable data

This chapter describes several things that you can do to make the database more program-friendly.

A few of these ideas (such as multi-tier architecture) have been covered in earlier chapters. They are repeated in brief here to tie them together with other programming-related topics, but you should refer to the original chapters for more detailed information.

PLAN AHEAD

Any complicated task benefits from prior planning, and software development is no exception. (For a lot more information about software development, see my book Beginning Software Engineering, Second Edition, Wiley, 2022.) It has been shown that the longer an error remains in a project the longer it takes to fix it. Database design typically occurs early in the development process, so mistakes made here can be extremely costly. A badly designed database provides the perfect proving ground for the expression “Act in haste, repent at leisure.” Do your work up front or be prepared to spend a lot of extra time fixing mistakes.

Practically all later development depends directly or indirectly on the database design. The database design acts like a building's foundation. If you build a weak foundation, the building on top of it will be wobbly and unsound. The Leaning Tower of Pisa is a beautiful result built on a weak foundation, but it's the result of luck more than planning and people have spent hundreds of years trying to keep it from falling down. If you try to build on a wobbly foundation, you're more likely to end up with a pile of broken rubble than an iconic building.

After you gain some experience with database design, it's tempting to just start cranking out table and field definitions without any prior planning, but that's almost always a mistake. Don't immediately start building a database or even creating a relational object model. At least sketch out an ER diagram to better understand the entities that the database must represent before you start building.

DOCUMENT EVERYTHING

Write everything down. This helps prevent disagreements about who promised what to whom. (“But you promised that the database could look up a customer's credit rating and Amazon password.”)

Good documentation also keeps everyone on the same wavelength. If you have done a good job of writing requirements, use cases, database models, design specifications, and all the other paperwork that describes the system, then the developers can scurry off to their own little burrows and start working on their parts of the system without fear of building components that won't work together.

You can make programmers' lives a lot easier if you specify table and field definitions in great detail. Record the fields that belong in each table. Also record each field's properties: name, data type, length, whether it can be null, string format (such as “mm/dd/yyyy” or “###-####”), allowed ranges (1–100), default values, and other more complex constraints.

Programmers will need this information to figure out how to build the user interface and the code that sits behind it (and the middle tiers if you use a multi-tier architecture). Make sure that the information is correct and complete at the start so that the programmers don't need to make a bunch of changes later.

For example, suppose Price must be greater than $1.00. The programmers get started and build a whole slew of screens that assume Price is greater than $1.00. Now, it turns out that you meant Price must be at least $1.00 not greater than $1.00. This is a trivial change to the design and to the database, but the programmers will need to go fix every screen that contains the incorrect assumption. (Actually, good programming practices will minimize the problem, but you can't assume everyone is a top-notch developer.)

After you have all of this information, don't just put it in the database and assume that everyone can get the information from there. Believe it or not, some developers don't know how to use every conceivable type of database product (MySQL, SQL Server, Access, Informix, Oracle, DB2, Paradox, Sybase, PostgreSQL, FoxPro, MongoDB—there are hundreds), so making them dig this information out of the database can be a time-consuming hassle. Besides, documenting it all gives you something to show management to prove that you're making progress.

CONSIDER MULTI-TIER ARCHITECTURE

A multi-tier architecture can help isolate the database and user interface development so that programmers and database developers can work independently. This approach can also make a database more flexible and amenable to change. Unless you're a project architect, you probably can't decide to use this kind of architecture by yourself, but you can make sure it is considered. See Chapter 6, “Extracting Business Rules,” for more details about multi-tier architectures.

CONVERT DOMAINS INTO TABLES

It's easy enough to validate a field against its domain by using check constraints. For example, suppose you know that the Addresses table's State field must always hold one of the values CA, OR, or WA. You can verify that a field contains one of those values with a field-level check constraint. In Microsoft Access, you could set the State field's Validation Rule property to:

 ='CA' Or ='OR' Or ='WA'

Other databases use different syntax.

Although this is simple and it's easy for you to change, it's not easily visible to programmers building the application. That means they need to write those values into their code to allow the user interface to validate the user's choice. Later if you change the list, the programmers need to change their code, too.

Even worse, someone needs to remember that the code needs to be changed! It's fairly common to change one part of an application and forget to make a corresponding change elsewhere. Those kinds of mistakes can lead to some bugs that are very hard to find.

A better approach is to move the domain information into a new table. Create a States table and put the values CA, OR, and WA in it. Then make a foreign key constraint that requires the Addresses table's States field to allow only values that are in the States table. Programmers can query the database at runtime to learn what values are allowed and can then do things such as making a combo box that allows only those choices. Now if you need to change the allowed values, you only need to update the States table's data and the user interface automatically picks up the change.

Wherever possible, convert database structure into data so everyone can read it easily.

KEEP TABLES FOCUSED

If you keep each table well focused, then there will be fewer chances for misunderstandings. Developers will have an easier time keeping each table's purpose straight, so different parts of the application will use the data in the same (intended) way.

Modern object-oriented programming languages also use objects and classes that are similar to the objects and classes used in semantic object modeling and that are similar to the entities and entity sets used by entity-relationship diagrams. If you do a good job of modeling and keep object and table definitions well focused, then those models practically write the code by themselves. There are still plenty of other things for the programmers to do, but at least they'll be able to make programming object models that closely resemble the database structure.

USE THREE KINDS OF TABLES

One tip that can help you keep tables focused is to note that there are three basic kinds of tables. The first kind stores information about objects of a particular type such as Students, Employees, or QuarterlyReports. These hold the bulk of the application's data.

The second kind of table represents a link between two or more objects. For example, the association objects described in Chapter 5, “Translating User Needs into Data Models,” represent a link between two types of objects.

Figure 8.1 shows a relational diagram to model employees and projects. Each employee can be assigned to multiple projects and each project can involve multiple employees. The EmployeeRoles table provides a link between the two object tables. It also stores additional information about the link. In this case, it stores the role that an employee played on each project.

A representation of a relational diagram to model employees and projects.

FIGURE 8.1

The third basic kind of table is a lookup table. These are tables created simply to use in foreign key constraints. For example, the States table described in the earlier section “Convert Domains into Tables” is a lookup table.

When you build a table, ask yourself whether it represents an object, a link, or a lookup. If you cannot decide or if the table represents more than one of those, then the table's purpose may not be clearly defined.

USE NAMING CONVENTIONS

Use consistent naming conventions when you name database objects. It doesn't matter too much what conventions you use as long as you use something consistent.

Some database developers prefix table names with tbl and field names with fld as in, “The tblEmployees table contains the fldFirstName and fldLastName fields.” For simple databases, I prefer to omit the prefixes because it's usually easy enough to tell which names represent objects (tables) and which represent attributes (fields).

Some developers also use a lnk prefix for tables that represent a link between objects as in, “The lnkAnimalsPlanets table links tblAnimals and tblPlanets.” These developers are also likely to use lu or lup as a prefix for lookup tables.

Some developers prefer to use UPPERCASE_LETTERS for table names and lowercase_letters for field names. Others use MixedCase for both.

Some prefer to make table names singular (the Employee table) and others prefer to make them plural (the Employees table).

As I said, it doesn't matter too much which of these conventions you use as long as you pick some conventions and stick to them.

However, there are three “mandatory” naming rules that you should follow or the other developers will snicker behind your back and openly mock you at parties.

First, don't use special characters in table names, field names, or anywhere else in database objects. For example, some databases (such as Access) allow you to include spaces in table and field names. For example, you can make a field named “First Name.” Those databases also provide some mechanism for making these weird names readable to the database. For example, in Access you need to use square brackets to surround a field with a name that contains spaces as in “[First Name].” This produces hard-to-read expressions in check constraints and anywhere else that you use the field. It also makes programmers using the field take similar annoying steps, and that makes their code less readable too. Just don't do it.

Second, if two fields in different tables contain the same data, give them the same name. For example, suppose you use an ID field to link the Employees table to the EmployeePhones table. Don't call this linking field Id in the Employees table and EmpId in the EmployeePhones table. That's just asking for trouble. Call the field EmployeeId in both tables. (A corollary to this rule is that you cannot name an ID field something vague such as Id. It might make sense in the main table such as Employees, but that name won't make sense in a related table such as EmployeePhones.)

The third “mandatory” naming rule is to use meaningful names. Don't abbreviate to the point of obscurity. It shouldn't take a team of National Security Agency cryptographers to decipher a table's field names. StudPrfCrs is much harder to read than StudentPreferredCourses. Don't be afraid to spell things out so that everyone can understand them. (The exception here seems to be the military where everyone would understand the phrase “SecInt visited NavSpecWarGru” but saying “the Secretary of the Interior visited the Naval Special War Group” would brand you as an outsider.)

The section “Poor Naming Standards” in Chapter 10, “Avoiding Common Design Pitfalls,” has more to say about naming conventions and includes a few links that you can follow to learn about some specific standards that you can adopt if you like.

ALLOW SOME REDUNDANT DATA

Chapter 7, “Normalizing Data,” explained that it is not always best to normalize a database as completely as possible. The higher forms of normalization usually spread data out into tables that are linked by their fields. When a program needs to display that data, it must reassemble all of that scattered data, and that can take some extra time.

For example, if you allow customers to have any number of phone numbers, email addresses, postal addresses, and contacts, then what seems to the user like a simple customer record is actually spread across the Customers, CustomerPhones, CustomerEmails, CustomerAddresses, and CustomerContacts tables.

In some cases, it might be better to restrict the database's flexibility somewhat to gain speed and simplicity. For example, if you allow the customers to have only two phone numbers, one email address, and one contact, then you cut the number of tables that make up the customer's information from five to two. The database won't be as infinitely flexible, and it won't be quite as completely normalized, but it will be easier to use.

Usually, it's also best not to store the same data in multiple ways because that can lead to modification anomalies. For example, you don't really need a Balance field in a customer's record if you can recalculate the balance by adding up all of the customer's credits and debits.

However, suppose you're running an Internet service that allows customers to download music so a typical customer makes dozens or even hundreds of purchases a month. After a year or two, adding up all of a customer's credits and debits could be time consuming. In this case, you might be better off adding a Balance field to the customer's record and exercising a little extra caution when updating the account.

NoSQL databases have no trouble holding duplicate and summary data because they don't need to obey relational database constraints. A JavaScript Object Notation (JSON) or Extensible Markup Language (XML) format customer document can define as many phone numbers, email addresses, social media accounts, favorite made-for-television movies, and shoe sizes as you want, and the number can be different for every customer.

DON'T SQUEEZE IN EVERYTHING

Just because you're using a database doesn't mean every piece of data that the system uses must be squeezed in there somewhere. Databases provide tools for storing and retrieving some strange pieces of data such as audio, video, images—just about anything that you can cram into a computer file. That doesn't mean you should go crazy and store every file on your computer within the database.

For example, suppose an application must locate and play thousands of audio files. You could store all of them in the database, or you could place the files in a directory tree somewhere and then store the locations of those files in the database. That makes the database simpler, smaller, and possibly more efficient because it doesn't need to store all those files. It also makes it a lot easier to update the files. Instead of loading a new file into the database, you can simply replace the file on the disk.

This technique can also be useful for managing large amounts of shared data such as web pages. You don't need to copy Wikipedia pages into your database. (In fact, it would probably be a copyright violation.) Instead, you can store the URLs pointing to the pages that you need. In this case you give up control of the data, but you also don't have to store and maintain it yourself. If the data is updated, you'll see the new data the next time you visit its URL.

There are only a couple of drawbacks to this technique. First, you lose the ability to search inside any data that is not stored inside the database. I don't know of any databases that let you search inside video, audio, or JPEG data, however, so you probably shouldn't lose much sleep over giving up an ability that you don't have anyway. I wouldn't move textual data outside the database in this way, however, unless you're sure that you'll never want to search inside it.

Second, you give up some of the security provided by the database. For example, you lose the database's record-locking features, so you might have trouble allowing multiple concurrent users to update the data safely.

SUMMARY

Although the book's focus and your database design effort is on databases, a database rarely lives in total isolation. Usually someone writes a program to interact with it. Often, the database is a backend for a complicated user interface.

To get the most out of your database, you need to consider it in its natural habitat. In particular, you should think about the applications and programmers who will interact with it. Often, a few relatively small changes to the design can make life easier for everyone involved.

In this chapter, you learned to:

  • Plan ahead and document everything.
  • Convert domains into tables to help user interface programmers and to make maintaining domain information easier.
  • Keep tables well focused and make each perform a single task.
  • Use some redundancy and denormalized tables to improve performance.

The last several chapters dealt with database design techniques and considerations. Those chapters explained general techniques for building a data model, and then modifying it to make it more efficient.

The next chapter switches from general discussion to more specific techniques. It summarizes some of the methods described in the previous chapters and explains some common relational database design patterns that you might find useful in providing specific data features.

Before you move on to Chapter 9, however, use the following exercises to test your understanding of the material covered in this chapter. You can find the solutions to these exercises in Appendix A.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.120.161