Examining the Data

The first task is to examine the data, identify how it is organized, and how it can best be structured within a database. The initial data will consist of a single list of people and their contact details such as postal and email addresses, and telephone numbers. There will also be other information about each contact, including the name of the company they work for.

The simplest solution is to convert the data list into a single database table with each column in the list being mapped to a table field. This is the approach Rory initially considers. However, as he looks through the list of contacts, he notices that there is a lot of duplication. In particular, a number of contacts work for the same companies. If a company has three of its employees in the contacts list, information about the company (its address for example) is held in three places; that is once with each contact's details. It means that if the company moves location, all three contact details will need to be updated at the same time. It can also be difficult to ensure consistency of data if the same entries have to be matched in three places. For example, a user may spot that company A's post code is wrong and correct it in the entry for the contact which the user is working on at that time, but fail to realize that there are other employees of that company whose details also have to be corrected. The result is inconsistency in the data.

The best way to overcome the issues associated with multiple contacts working for the same company is to split the company information out into a separate area. That is, to have separate contact data and company data. Then all the contact data requires is a pointer to tell the system, which of the entries in the company data relates to this contact.

In the contact list data, there are three address types: Home, Business, and Other. When Rory examines the data, he realizes that the "Other" address type is not used, but contacts have both Home and Business addresses. Some contacts have only a Home or a Business address, and some both. It is a fairly straightforward step to assume that the Home address applies to the contact, and the Business address applies to the company. It seems, therefore, that both the company and contact tables require address information.

Is there an issue with having a duplication of address information in contacts and companies data, and therefore should addresses have their own data area? There is a possibility that two people who live together could appear in the contact list (a husband and wife working for the same company, for example), or two companies could share the same premises. However, the instances are likely to be uncommon; and it seems excessive to base a major design decision like separating out addresses, just on dealing with these occasional instances.

On the other hand, there is another reason why separating addresses may be an advantage. The relationship between different parts of an address is special. For example, a post code usually refers to a small number of properties within a small geographical area. These special relationships can be used to carry out actions such as verification. For example, a simple lookup process could be created to check that the town in the address is valid for the given post code. Carrying out this processing will be easier if all the addresses are together, so that their format is easy to control, and all the data can be examined in one pass.

As Rory continues to examine the data, he notices more duplication and specially formatted data. For example, some contacts have multiple email addresses and, of course, email addresses have a special format themselves, with an ampersand in the middle and a root level domain name at the end. The format of telephone numbers conform to simple rules; would these be easier to validate and check if they were separated. There are also dates within the data. For example, Birthdays and Anniversaries. Other dates that would be useful to track are those for events such as meeting dates, contract start and end, and project milestones. However, at the moment, the data does not include this information.

Separating out all these data groups would result in the data being separated into many locations, and a complicated inter-relationship between groups of data would be required to create a meaningful output from the system. There is a tradeoff between the benefits gained from separating data into groups and the added complication that results from having to maintain many relationships between the data groups. Often there is no right answer to this dilemma, and the solution designer's task is to choose the best compromise. How can we tackle this dilemma?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.180