THE NEED FOR NORMALIZED DATA (STUDY OBJECTIVE 5)

Relational databases consist of several small tables, rather than one large table as in the flat file database. The small tables in a relational database can be joined together in ways that represent relationships among the data.

For example, examine the tables and relationships of a Microsoft Access® database in Exhibit 13-6. Each box is a table, and the field names are listed in each box. The bolded field is the primary key. The primary key, the unique identifier for each record in each table, is used to sort, index, and access records from that table. The lines between boxes indicate the relationships between the tables. The relationships indicated in these tables are one to many. For example, one customer (CustomerID), can have many orders listed in the Orders table. Note that the Customers table and the Orders table are linked by CustomerID. In the Orders table, the CustomerID field is a foreign key field. In other words, CustomerID is in a different (foreign) table, but is needed in the Orders table to establish the link between Customers and Orders.

images

Exhibit 13-6 A Relational Database in Microsoft Access®

These separate tables and the relationships between the tables are what establish the advantage of a relational database. The advantage is that a relational database has flexibility in retrieving data from queries. The developer of the relational database envisioned an English-like query language that could be used to directly access data from the relational database. The query language that has become the industry standard is structured query language, or SQL. By using SQL and joining these tables together in certain ways, nearly any query about customer orders can be answered. If a manager wished to know which customers had placed orders that have been shipped, the Customers and Orders tables can be joined to extract the columns (fields) for CustomerID, CompanyName, OrderID, and ShippedDate. The view of this query and the SQL to extract the data are shown in Exhibit 13-7.

Notice that the SQL query language is relatively plain English. It identifies which fields are selected from which tables and how the tables are to be joined. The tables are joined by OrderID. That relationship is a one-to-many relationship. Any customer may have more than one order.

A more complex example involves a manager who needs to know which employees have sold product 52371. By using SQL, the manager can join the Orders, Employees, Order Details, and Products tables to retrieve these data. Exhibit 13-8 shows the Access design view of this query.

Notice that this query has a “WHERE” condition. The extracted data are filtered in the sense that the ProductID must be equal to 52371. These queries show the flexibility of the relational database that is constructed with many tables. The relational database is flexible because any number of different queries can be answered by joining tables in various ways. Some examples of queries that could be answered from the tables in Exhibit 13-6 are as follows:

images

Exhibit 13-7 Design View and SQL of a Simple Query

images

Exhibit 13-8 Design View and SQL of a Complex Query

  1. Quantity and price of orders by customer
  2. Customers who purchased from a specific employee
  3. Orders shipped by a specific shipping method
  4. Number of products sold between certain dates

There are very few queries that could not be answered by joining these tables in various ways. The tables are flexible enough to answer an unlimited number of queries. To obtain this flexibility, the tables within a relational database must be designed according to specific rules. The process of converting data into tables that meet the definition of a relational database is called data normalization. There are seven rules of data normalization, and these rules are additive. The additive characteristic means that if a table meets the third rule, it has also met rules one and two. Most relational databases are in third normal form, which means they met the first three rules of data normalization. The first three rules of data normalization are as follows:

  1. Eliminate repeating groups. This rule requires that any related attributes (columns) that would be repeated in several rows must be put in a separate table. An example of the application of this rule can be seen in the tables in Exhibit 13-6. There is an order table and an order details table. If these were not separate tables, basic information about the order, such as customer ID and ship date, would have to be repeated for each item ordered.
  2. Eliminate redundant data.
  3. Eliminate columns not dependent on the primary key.

The flexible querying within relational databases is possible only when the tables are constructed to achieve third normal form.

TRADE-OFFS IN DATABASE STORAGE

As discussed in the beginning of this chapter, the method of storage affects the usage of the data. While the relational database is very flexible for queries, it is not the most efficient way to store data that will be used in other ways. The quickest way to access and process records from a database when their intended use is the processing of a large volume of transactions is the hierarchical model. But, the hierarchical model is not flexible for querying. Thus, there is a tradeoff of transaction processing efficiency for flexibility. If the major use of the data is for processing transactions and not for answering queries, the hierarchical model is a more efficient storage choice. If the major usage of the data is to answer queries, the relational model is superior.

Obviously, a large number of the data from transactions require both operations: processing transactions and querying. In today's IT environment, most organizations are willing to accept less transaction processing efficiency for better query opportunities. Therefore, most organizations use relational databases. The transaction processing efficiency loss is not a great loss because computing power has tremendously increased as the cost has also decreased.

The relational database has become very widely used in organizations because of this flexibility. Using relational databases and SQL, managers are able to query and extract data from the database on their own. They do not need to make requests to the IT department to design certain reports. This gives managers much more timely and flexible feedback information about operations. This improved access to information can help managers manage better.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.138.33