Implementing the Universal Data Models

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER
15
Implementing the Universal Data Models

In this book, numerous models have been provided that can help jump-start data modeling efforts. But what are the purposes for these models, and how can the enterprise use them to build higher-quality operational and data warehouse databases? The models in this book, as well as those in Volume 2, can facilitate the building of more integrated systems, help the enterprise better manage its information, and can lead to higher-quality database designs in shorter periods of time. This chapter will cover the life cycle of the model from the development of an enterprise data model, to the development of the logical data model for a project, to the development of a physical database design which is the basis for implementing a database. Using these models for data warehouse implementations will also be discussed.

Effective methods for incorporating the Universal Data Models can be summarized as follows:

Develop the enterprise data model by customizing and adding to the Universal Data Models using the business terms that are commonly known in the enterprise and adding appropriate information requirements.
Build the appropriate logical data models for each project according to the business requirements for that specific application.
Create the necessary physical database designs based on the logical data model and the technical requirements.
Customize the database design to the appropriate target DBMS (database management system).

Questions to be answered in approaching this area are the following:

What are valid purposes and applications for using Universal Data Models?
What are the appropriate models to create, and how can the Universal Data Models support them?
What customizations are needed for the Universal Data Models to adequately reflect the business of a particular enterprise?
How do the business functions and process impact the data models?
What options are available in creating the logical data models for a specific project?
How flexible are the Universal Data Models? Is there only one right way to implement them?
What considerations are needed in creating the physical database designs?
How does DBMS platform choice affect the design?

The chapter will address each of these subjects in the areas of business analysis, system design, and database design. The Universal Data Models will be used to demonstrate principles and to assist in understanding. The focus for each section will highlight the outlined areas and answer the appropriate questions.

The Enterprise Data Model—An Integrated Business View of the Enterprise's Information

One of the key information issues today is how to develop integrated systems that facilitate consistent information for use by the enterprise. When projects develop their database designs independent of an overall model, the same information items are often implemented in separate tables and sometimes with different meanings, leading to redundant, inconsistent data and non-integrated systems.

The Universal Data Models in this book can be used to jump-start an enterprise data model effort, providing the enterprise with a “road map” of their information and showing how information relates to other information. If projects had such road maps, then they could use this model to facilitate each project using the same format for their data structures and the same definitions behind data items, or possibly even sharing the same physical data structures. This approach can lead to much more data consistency, data quality, and ultimately to better information to be used to improve the operations of the enterprise. The enterprise data model documents the information requirements of the enterprise and helps the enterprise integrate their information.

PURPOSES OF THE UNIVERSAL DATA MODELS

The Universal Data Models in this book can be used to do the following:

Help build an enterprise data model that illustrates the interrelationships between information in various applications. This is a key aspect of helping enterprises integrate their information.
Provide a starting point in developing a logical data model.
Add a new section of a data model to an enterprise's existing data model.
Validate an enterprise's existing logical data models and provide ideas for additions or modifications.
Help systems developers to understand the nature of various pieces of data and offer possible options and solutions to providing better information to the enterprise.
Help understand the information requirements of the enterprise and be used as the basis for helping to select and implement application packages. Requirements for application packages generally consist of functional requirements, data requirements, and technical requirements. The data model can serve as the data requirements. Why let the package dictate the information requirements of the enterprise? The enterprise needs to know its own requirements in order to properly select and implement application packages.
Help to serve as an information road map to identify and synchronize data from multiple systems. Even if an enterprise has decided to use mostly application packages instead of custom building systems, it may use the enterprise data model to identify the information requirements and where application packages store redundant information.

In addition to an enterprise data model, an enterprise process model can help the enterprise identify the business functions across the enterprise and facilitate building systems that do not redundantly or inconsistently develop the same processes. For instance, a “quoting” process would be represented and defined once in the enterprise process model, and this function should be implemented the same way for the quoting system, order entry system, and invoicing system that is developed. Template or Universal Process Models, can help jump-start and validate an enterprise's process model. There is a great need in the industry to develop re-usable process models to assist in systems development projects.

Very large enterprises may want to consider developing an integrated view of a portion of their business. For instance, a large international firm may decide it needs to get a handle on the information and/or processes involved in its customer service and support division, which handles customer service, sales support, technical support, and training worldwide. Although having an enterprise model across the entire enterprise may also be valuable, perhaps developing a model for this portion of the business may be a more manageable effort.

In order to be successful in adapting and using the Universal Data Models (and/or universal process models) as a basis for creating this enterprise data model, it must begin with the business community. Any understanding of information begins with the business. It is important to keep the information at a level that business people can understand. In preparing the enterprise data model, the business concepts need to be complete. In order to represent the broadest viewpoint, the models need to be flexible and able to solve multiple business problems.

Customizing the Universal Data Models

The logical data models in this book were designed to give an enterprise a head start in designing a system or in developing a corporate data model. As noted in many sections of this book, enterprises will invariably have specific business needs that are not addressed by these models or changes in terminology that necessitate model changes The following sections address the various degrees of change that may be required as well as how to handle changes in terminology to suit the enterprise.

Degrees of Customization

Varying degrees of changes may be made to these data models. The changes range from very easy modifications to more difficult data model changes. An example of an easy change is adding additional attributes to one or more entities in the model. This is considered a very easy change because the structure of the data model is kept intact and because it is not necessary to evaluate the impact that this change has on other parts of the model. Care should be taken, though, not to introduce attributes that represent the denormalization of existing structures.

A slightly harder modification is adding a new entity or relationship to the data model. In this case, it is necessary to determine if the new entity or relationship already exists in some other portion of the model. For instance, if there is a proposal to add an entity named PRODUCT PACKAGE, does that information already exist within the MARKETING PACKAGE entity (see Figure 3.9b)? This depends on how the enterprise defines PRODUCT PACKAGING and therefore requires careful analysis.

Data modelers need to be a little more cautious when modifying or deleting entities and relationships in the data models. Because the data models are highly integrated, many entities and relationships in these data models are reused in many different diagrams and for many different purposes. There should be some consideration and analysis of how the change may affect other portions of the model. For example, if the enterprise decides to model ORGANIZATION and PERSON as two separate entities instead of as subtypes of PARTY, how this change affects other portions of the model must be considered. Some changes may be dramatic while other changes to entities and relationships may have lesser impacts on the whole model. If the enterprise is using only a small, specific portion of these data models, the impact of proposed changes may not be as significant.

While care should be taken when modifying these models, one purpose of these models is to provide a starting point for data modelers. If the models are used for this purpose, modifications to the models should be expected and encouraged in order to meet the information requirements of each enterprise.

Because these data models are integrated and many changes may require impact analysis, the systems development team should consider the use of appropriate change control and data administration procedures. These procedures include, but are not limited to, defining the following: Who is responsible for maintaining the models; how change requests are documented, prioritized, evaluated, and approved; how data model versions are maintained; and what mechanisms are provided to support the change control process (i.e., regular review meetings, systems, forms, etc.).

Customizing the Models for Unique Business Terminology

In using the Universal Data Models within this book, it is important to adapt the models to fit the terminology of a specific enterprise. Each enterprise is unique, and what works for some does not work for others. In working with the business, it is important to capture the business language. Important concepts are presented within the Universal Data Models, but they may not directly relate to the organization's business community. This is the first task in modifying the Universal Data Models for an enterprise.

There are various methods of understanding these terms. Documentation reviews, interviews, and facilitated sessions are some common methods used to discover this important aspect of the enterprise. In discussing this, it will most likely be discovered that different segments of the enterprise have different terms for similar things or the opposite—similar terms for different things. This inconsistency in language needs to be dealt with during the customization of the Universal Data Models. Depending on the scope and nature of the work, this may be very involved. Typically, this will require some consensus within the business community. While some items must have a common understanding, such as CUSTOMER, PARTY, and so on, others may not require it. Time must be spent with the business community to prioritize the items and gain the full support of all required business representatives.

The best path is to identify with the generic terms from the models all the business aliases for that term. This will allow for the business view to be included, but it will move the effort toward a unified standard. If consensus (or a majority of participants agreeing) is achieved, the terms can be modified to support the enterprise viewpoint. Documenting all aliases during this process in order to gain a complete understanding of any concept is usually recommended.

The key is to reuse as many of the constructs provided as possible in order to save time in reanalyzing generic constructs so that the modeler can focus on the specific needs of the enterprise. If the data structures apply however the terms used are different, then to gain buy-in, keep the data structure idea and change the names of the entity.

To demonstrate the concept of using the business terminology of an enterprise, this next section will review the party contact mechanism model from Chapter 2 (Figure 2.10).

Again, the model in Figure 15.1 shows the CONTACT MECHANISM used by each PARTY as the PARTY CONTACT MECHANISM. This consists of several subtypes: POSTAL ADDRESS, TELECOMMUNICATIONS NUMBER, and ELECTRONIC ADDRESS. Included is the PARTY CONTACT MECHANISM PURPOSE. This structure as described earlier represents all the potential methods of contact for any PARTY.

Example of Changing the Terms for the Specific Enterprise

In our example, John Doe, a data analyst working to customize the party contact mechanism for XYZ Company, reviews the business requests given to him regarding the PARTY CONTACT MECHANISM. He discovers that there are many aliases for different entities within the model.

These aliases are the candidate entity names in the customized Universal Data Model. As you can see from Figure 15.2, the customized names are applied in the appropriate entities.

Figure 15.1 Party contact mechanism.

The updates made to the Universal Data Model are the application of the XYZ Company business names: PARTY is replaced with BUSINESS ENTITY, CONTACT MECHANISM with CONTACT METHOD, and TELECOMMUNICATIONS NUMBER with PHONE NUMBER. Notice that ELECTRONIC ADDRESS was kept, but because the business used two prominent names, EMAIL ADDRESS and WEB ADDRESS, they were added as subtypes to the enterprise data model. Often adding alias entities as subtypes gains two benefits: One is that the specific attribution for each type may be captured as it is discovered, and the second is that the old business names can be linked to the new, generic business names.

Table 15.1 Aliases for the PARTY CONTACT MECHANISM

UNIVERSAL DATA NAME	XYZ CO. ALIAS NAMES
PARTY	BUSINESS ENTITY
CONTACT MECHANISM	CONTACT METHOD, CONTACT TYPE
TELECOMMUNICATIONS NUMBER	PHONE NUMBER (includes fax, cell, beeper, pager, and so on)
ELECTRONIC ADDRESS	EMAIL ADDRESS, WEB ADDRESS
CONTACT MECHANISM LINK	CONTACT METHOD RELATIONSHIP

It is good to mention at this point that an important understanding is the use of language. If a particular term has traditionally meant one thing, it may be best to choose a new, previously unused word or term for the concept. For example, if ORGANIZATION within XYZ Company always means a subsidiary of XYZ Company, it may be unwise to use it to represent any outside company. The analysts should seek an alternate term for that concept rather than forcing the word to mean something that it never has meant before.

Another important concept is to choose words that reflect a great majority of the business thought. In the previous example, two terms are very common in XYZ Company for ELECTRONIC ADDRESS. Because ELECTRONIC ADDRESS has not been used before, it was decided to add this to the enterprise's vocabulary, but to add the two known terms as types of the higher concept. This helps business users understand the new term and achieves a common feel by having the two known concepts included.

The other question to answer is how to choose which alias, if any, should be used within the business models. Usually the majority of business users will be able to substantiate the most common business terms. The best term to use is the one used the most. The benefit of this approach is that the majority of the business groups will understand the term. The downside is that all known terms have embedded subtleties that may not be apparent. These must be understood and documented in order to properly evaluate the usefulness of a term.

In our example, John Doe has learned that the commonly used term for PARTY is BUSINESS ENTITY. In his research, he discovers that many parts of the enterprise refer to the people and organization as “business entities.” In light of his findings and discussions with many business groups, BUSINESS ENTITY instead of PARTY is used for the model. The term PARTY is avoided, as this is not a sufficiently understood term; however, the same data constructs from the party model may be used.

Figure 15.2 Customized party contact mechanism.

Additional Information Requirements Needed for the Enterprise

The enterprise data model needs to encompass the information requirements of the specific enterprise. While many common constructs may be used as a jump-start to the model, the information requirements of the enterprise still need to be captured and added to the model. The modelers developing the enterprise model need to capture this information, typically through interviews, group sessions, and modeling sessions, to gain a more complete understanding of the information requirements.

The Universal Data Models can serve as the basis for the enterprise data model and then may require modifications to support the specific enterprise's needs. As an example, suppose that John Doe finds that the majority of the entities, relationships, and data objects within the Universal Data Model are applicable for XYZ Company; however, he finds that there are additional needs as well.

For instance, suppose that John Doe learned that XYZ Company needs to capture the preferred times to contact each person or organization and that they need to capture these preferred times for each contact method because the times may vary for different contact methods. A person or organization may specify that it is OK to contact them anytime using an e-mail address; however, they may specify that they should be contacted only at a certain phone during normal business hours.

Additionally, XYZ enterprise needs to add more subtypes to the Universal Data Models in their enterprise data model. They have a need to specifically identify the subtypes of WEB ADDRESS, EMAIL ADRESS, and IP ADDRESS as different types of ELECTRONIC ADDRESSes.

These requirements would represent additional information requirements for the enterprise, and thus data model structures should be added to the enterprise's data model to represent these and any other information needs that are discovered. Figure 15.3 shows the additional entities of PREFERRED CONTACT TIME as well as the IP ADDRESS subtype as additions to the Universal Data Models. (These could also be considered Universal Data Model constructs; however, for illustration purposes, these are shown as additional requirements for a specific enterprise.)

Figure 15.3 shows that each PARTY CONTACT MECHANISM may be preferred to be contacted at one or more PREFERRED CONTACT TIME. This entity indicates the from datetime and/or the thru datetime. Each contact mechanism may have different time preferences. For instance, the preferred contact time for Marc Martinez as a customer of ACME may be from Monday through Friday from 9 AM to 5 PM. Another customer may specify to contact him or her only from Thursday to Friday from 3 to 5 PM. That customer may also specify that it is OK to call him or her at the home number from 10 AM to 4 PM on Saturday or Sunday.

Figure 15.3 Additional information requirements added to Universal Data Models.

Once all the information is gathered for the Universal Data Models, they should be linked together into a single model to form a consistent enterprise view. The full-blown CD contains all the models that may be uploaded to a modeling tool. This allows for the models to be created and modified within the targeted tool set. Once created, the model should be segmented by business concept (or subject data area), such as Party, Product, Order, Shipment, Work Effort, Invoicing, Accounting, and Human Resources, in order to more easily maintain the model going forward.

How the Universal Data Models and Enterprise Data Model Solve Business Problems

Once completed, the enterprise data model will reflect the business understanding of critical information. This is an ongoing process that requires support, time, and resources. One may question the need for such a facility, as it is costly and requires time. The benefits far outweigh the challenges in many ways. One important benefit of this model is in facilitating the ability to solve business problems.

In order to properly solve any problem, the business needs to have a clear view of all information. A past understanding that is assumed to be common knowledge often clouds this view. These assumptions need to be brought out and understood. The information in the model allows for the clear understanding of what should be and what is currently requiring improvement. Once challenged, the information can be redesigned in order to support the true business need.

A clear, concise picture of the information can validate understanding. In a model, clear business rules, concepts, and ideas are captured in a graphical form. The model can support the walk-through of any business group to assist in understanding that information. Once a full understanding is gained, the group is able to discuss potential changes and modifications. Further assumptions can be challenged and modified as required.

This clear picture of the information can support out-of-the-box thinking. When the business is able to see its information, this leads to a clear understanding of potential alternatives, supporting various viewpoints. The enterprise data model is then enhanced to support the multiple viewpoints discovered.

There are many complex concepts in doing business. Often, the downfall of the business community is not enough foresight. The modeled information is able to assist in identifying potential traps that can cause serious efficiency, performance or support problems. One of the biggest benefits in supporting an enterprise data model of information is understanding all the potential solutions. This is accomplished through a detailed, thorough picture of the information in order to see all the possibilities.

In the example, John Doe outlines the key business information involved in contacting a business entity. By modeling this information, the business leaders of XYZ Company begin to understand that electronic mail, postal mail, and telephone calls are essentially alternate means of accomplishing the same thing (contacting a business entity) and are hence similar types of information. Hence, if the resulting systems are able to display this information together in order to show all the methods for getting in touch with various parties, then it may facilitate easier contacting of parties. Contacts with the business entities are understood according to their purpose in order to know when a contact is valid. It also shows that communication events are within the context of relationships (this information is available in Figure 2.12) and that it is important to maintain how well communication events were followed up (this information is available in Figure 2.13).

The enterprise data model can help identify data redundancies and systems that do not serve as well as they could. For instance, it may point out that the customer service personnel and sales personnel are both using the same type of communication events and that the data is not currently integrated. This could lead to scenarios of a sales account manager calling on a customer, without even realizing that the customer has just communicated a complaint in the customer service department.

The model could point out the importance of maintaining information on the complete profile of a person or organization so that, when dealing with people or organizations, a complete picture can be viewed. What would be the consequences of selecting a supplier in a Request for Proposal process, not even realizing that one of the suppliers that was not selected also happened to be your best customer?

In a decision support environment, it may be important to analyze what types of contact methods generally lead to the greatest amount and dollar volume of sales for customers in order to maximize revenues. For instance, perhaps e-mail contacts (as opposed to telephone contacts or mailings) are shown to produce more leads that result in sales.

With the insight into the required information of an enterprise and associated relationships as presented in a data model, the enterprise can validate, update, improve, and remove redundancies in systems.

Using a Data Model for a Particular Application

While the enterprise data model enables an overall view of the enterprise's information, it is important to be able to build individual applications based on a high-quality data design. The enterprise data model can be used to provide a jump-start to the development of a solid database design for an individual application while providing a good context so that the individual application is integrated into the whole system of the enterprise.

The enterprise data model and the individual application data design (sometimes referred to as the logical data model for the application) can support and contribute to each other. The enterprise data model can provide a starting point for the individual application, and then the application data design can feed its insight and learning back into the enterprise data model so that other applications can benefit from this knowledge of the business.

The main objective of a good system design for a specific application is to solve the business problem and produce a system that the targeted community can use. This requires much communication and leverage in order to complete a good design. The Universal Data Models within the chapter facilitate a quicker and higher-quality data design by already addressing the most common needs and providing a better business understanding of the data. Once that is established, the next steps of design and implementation become easier.

Understanding Business Processes

One of the biggest challenges in working with business people to gain a clear understanding of information is that they just do not think in terms of data. Most business people think in terms of a known business process. It is important to approach them from their vantage point to drive a data understanding. In order to accomplish this, a solid understanding of the business processes behind the data is required.

In any interview or facilitated session, an effective approach is to outline the business processes that can help to validate and further refine the data model. This can be as formal as a process modeling methodology or as simple as a procedural flow of the business process to aid in discussion. In the natural course of conversation, “how” something is done will inevitably require “what” is involved. “How” something is done may be modeled with process models. “What” information is needed may be modeled using data models. The data modeler may ask questions about the goals, process, data, systems used, reports needed, issues, and possible solutions, thereby driving out the details of the data.

As pointed out previously, template process models can help jump start the analysis of the processes. Template models can help save time by allowing the analyst to reuse process models for common functions that most organizations conduct, such as sales, marketing or product development, or customer service functions. Template process models can also help to provide a checkpoint to make sure that no required processes are missing.

In our example, John Doe meets with XYZ Company executives to discuss their business issues to be solved with regard to better managing their contact management systems. Apparently, XYZ is planning to expand its Internet capacity and have decided to increase its presence on the Web. Traditional contact methods of telephone and mail are not to be abandoned, but enhanced with the use of e-mail, Web sites, and direct access to many customers. XYZ will contact each customer and determine the best method of contact. The overall process is discussed with the business team, and they design how they will gather the information and make decisions. John then captures both the process and data requirements of XYZ Company.

Table 15.2 illustrates the processes needed for better managing contacts. Each process is documented during the design of the application to gain an understanding of how the system should work. The associated data is also defined and modeled (the next section will show this), maintaining the business requirements and rules.

Table 15.2 Sample Required Business Processes

BUSINESS PROCESS	RELATED INFORMATION
Retrieve current contact methods	CUSTOMER, CUSTOMER CONTACT METHOD, POSTAL ADDRESS, PHONE NUMBER, ELECTRONIC MAIL ADDRESS, WEB ADDRESS
Determine most effective method for making contact for the desired purpose	CUSTOMER CONTACT METHOD AND CUSTOMER CONTACT METHOD PURPOSE
Establish that permission was granted for contacting customer(s)	CUSTOMER, CUSTOMER CONTACT METHOD, non-solicitation ind, use permission ind, CONTACT METHOD,
Establish best time(s) to make contact	CUSTOMER CONTACT METHOD, PREFERRED CONTACT TIME
Make customer contact	CUSTOMER, CUSTOMER CONTACT METHOD, CONTACT METHOD
Update Customer Contacts and Contact Method Relationships	CUSTOMER, CUSTOMER CONTACT MECHANISM, CONTACT METHOD, CONTACT METHOD RELATIONSHIP

The completed processes are helpful in determining the information needs for the needed application. With the supporting business process, the business people can review the developed processes and then identify what information is needed to support that process. In this way, the data requirements may be further driven from the understanding of process. The process model can also serve to validate that the data model is correct or if changes need to be made.

Building the Logical Data Model

Once a clear understanding of the business processes is established, the logical data model for the application can be completed. The logical data model needs to address the specific information requirements and business issues outlined in the process. It is critical to have the business view; otherwise, there is a risk that the resulting system will not solve the particular business issues.

Enterprise data models are important to capture the overall needs of the enterprise and show how the information is integrated across the enterprise. In building a logical data model for an application, it is important to flesh out the model in more detail so that the business processes and data requirements of the specific application are completely met. The enterprise data model will help the designer keep the various views of the enterprise in perspective, so that the resulting data model fits into the enterprise's integrated structure. It is important to know when to use the constructs from the enterprise data model and when to provide specific customizations needed for that specific application.

Suppose the enterprise data model structures are used to jump-start an application for the sales force, which is interested in tracking their customer's phone numbers, fax numbers, e-mail addresses, and any other contact methods. While other parts of the enterprise have a need to track information about their people and organizations, the sales force may have specific needs that they want to ensure are met.

In developing the needs for this application it is important to understand the specific needs of the sales force, while recognizing the benefits of being able to track contact methods and communication events for the entire enterprise. The same customer information that the sales force tracks may also be tracked by the customer service department, the accounting department, and the quality assurance department.

In the detailed system model, John Doe finds that some additional needs are important to the sales force. They may be important to other parts of the enterprise as well and thus be integrated back into the enterprise data model. Because it is not known if the other parts of the enterprise are willing to maintain this information, these information requirements may be included only in the specific application's model.

As an example, suppose that John Doe finds that the majority of the entities, relationships, and attributes listed within the enterprise data model are applicable for the sales force. He finds, though, that there are additional needs as well, shown in Figure 15.4. For instance, John Doe finds that there is an additional attribute, use permission ind, to the CUSTOMER CONTACT METHOD PURPOSE entity for purposes of gaining permission from customers to make “sales solicitation” calls to them. This was a process requirement that was outlined in the detailed process breakdown, and it is to show that the customer has granted permission to contact them for a particular purpose. For example, a customer may have approved being contacted at a particular phone number for a CONTACT METHOD PURPOSE TYPE of “sales solicitation.” XYZ needs this requirement for customer contacts because the enterprise holds it important that customers who have not given permission are not solicited by phone. In many other parts of the enterprise, this may not be required; thus, the attribute may not be included in the enterprise data model.

Figure 15.4 Detailed model for the sales force.

Another difference between the enterprise data model and the specific application model is that the enterprise tracks the CONTACT METHODs for all BUSINESS ENTITYs regardless of the role. The specific application is interested only in these CONTACT METHODs for their customers (in this case, they consider prospects to be a subtype of customers). The application-specific model may therefore model the CONTACT METHODs for CUSTOMERs and relate CUSTOMERs back to a BUSINESS ENTITY, thereby staying within the data structures of the enterprise data model. (This relationship from CUSTOMER to BUSINESS ENTITY can be implemented by using a foreign key to a business entity ID.)

Typically the logical data model will cover the generic structures of the enterprise data in greater detail and in a more explicit way. John found that there was an additional need in this application to create a CONTACT MECHANISM TYPE FORMAT entity to provide additional validation for different types of contact mechanism. For example a CONTACT MECHANISM TYPE FORMAT for “electronic mail address” may have an “@” as the second string format to identify that this is a necessary part of the second part of the string. A CONTACT METHOD TYPE of “phone number within country” may have a first string format of “###” to indicate that this type of contact mechanism must start with three numbers (i.e., the area code) in order to be valid. This additional information requirement was to eliminate errors and to supply XYZ with extremely accurate information.

It is also important to keep the detail explicit enough to support a robust solution. The depth of detail required is set by the requirements of the system. This is an important area, as this will determine the usability of the system. Once the details are captured, the information needs are also detailed. In the example, the details for a CONTACT METHOD were enhanced to support additional requirements. In this discussion, it is important to underline that there is no one way to approach a given problem, but it is based on the needs identified. The Universal Data Models support a guideline approach and will give the group using them a starting point as well as additional ideas to consider in developing the application.

Once all the requirements are understood and modeled, the business and design teams can review them to ensure that everything that is required is covered. It is at this time that any adjustments need to be made to cover all information. The design team should now accept the design as complete to move to the development of a physical database design that can be implemented in a database to support an applications.

Physical Database Design

The Universal Data Models can be used as the basis for an effective physical database design. The flexibility built into them allows for applications to be more stable and more easily maintained. It is important to provide flexible databases because the business rules of an enterprise may change over time, and the database design should be capable of handling many of these changes without restructuring the database design (which is an extremely costly process). Following the basic database principles itemized here, a successful design can be accomplished for any database. These are given as an overview to assist in creating a solid design.

Basic Database Design Principles

Any database design needs to be based on a solid logical datamodel. This is a critical step to ensure that the system developed meets the expectations of the business community using it. The Universal Data Models can be used to facilitate creation of the needed logical data models. Those models can then be implemented in a physical database design. The physical database design implements the information requirements of the logical data model while considering database performance for the selected database management system. A logical data model may be physically implemented many different ways depending on how the data will be maintained and accessed. The processes behind the data are critical to understanding how the data will be used and what the physical requirements of the data are: currency, update frequency, volume of transactions, and retention. It is also important to understand if the data will be used for analytical capabilities, such as a data warehouse or mart. These are some of the main considerations that need to be discussed.

Another key consideration is the type of database engine that will be used for the database. Important information is needed in order to support a design, as each has its special considerations. The requirements of the enterprise's systems should drive the choice of database platforms. Each company should have architectural guidelines to assist in the database choice. Once determined, a skilled database administrator (DBA) should work with the data modeler to create the required physical database design.

Using the logical data model that is based on the Universal Data Models, the physical design can be derived using standard principles. The goal of the following section is to show how the logical data models, built using Universal Data Models as a starting point, can be used to create a physical database design that encompasses performance and implementation considerations. The following section will focus on several examples of how the Universal Data Models can be physically implemented as well as some standard practices used for converting a logical data model into a physical database design.

The database design should follow from the logical data model. The reasons are clear as the important efforts to maintain the business view and requirements are reflected in the logical model. Because the logical data model is linked to the process, the database administrator will be able to understand the usage, currency, security, and other factors required in the design. When using a modeled approach, the assumptions are easily tested prior to physical database creation and loading.

The logical data model should be normalized to third-normal form (3NF) to eliminate any redundancy. This means that each attribute is stored only once and is directly associated with the key, the whole key and nothing but the primary key. The models in this book are in third-normal form since each attribute is associated with an entity with a key that determines that attribute. For instance, the non-solicitation_ind is an attribute of the CUSTOMER CONTACT METHOD since it can be determined from the primary key of that entity, namely, the customer id, contact method id and from date. Stated differently, if one knows the primary key, then one can determine the attribute.

Once physical design is begun, the process of denormalizing begins. The physical database design may not be normalized and may include redundant attributes for speed in maintaining or accessing data. For instance, keeping the contact method fields, such as addressl, in the CUSTOMER CONTACT METHOD entity will result in redundant data, but performance may be improved. It is best to denormalize based on performance issues: number of table joins, indexes, number of queries, updates, inserts, etc. At this point, the needs to support the process are considered and factored into the performance requirements. As in the previous step, the data is modeled and the assumptions tested prior to the actual creation and loading of data.

An important task in creating the database from the physical database design is the mapping of existing data to the new or enhanced structure. Transformation of data will need to be accounted for as well as when the data will be loaded in the process.

Many tools exist today to create the needed code to generate the database from a modeled solution. Once created, the required data objects are loaded into the DBMS, and the database is ready for data loading. Application testing can begin once all data is loaded to the DBMS. Modifications should be made at this point, beginning with the enterprise and logical data models, as new information requirements are discovered. Once the system is tested for the suitability of the application, the database is ready for production.

Creating a Physical Database Design

The logical data models in this book do not represent physical database designs; they illustrate the information needs of an enterprise. The physical database design converts the information requirements represented by the logical data model into a design for the database that can be implemented.

The main difference between the logical data models and the physical database design is that the latter may be optimized for performance. The physical database designer uses the logical database design as a starting point for the database design and denormalizes the structures where appropriate for performance and ease-of-access reasons. For instance, derived data may be included, tables may be merged, and arrays of data may replace one-to-many relationships in certain circumstances.

Several different ways exist to convert the same logical data model into various physical database designs. Many of the design decisions depend on transaction frequencies, use of the data, data volume statistics, and the chosen relational database management system (RDBMS).

Additionally, supertypes and subtypes from the logical data model may be implemented in many different ways in a physical database design:

1. The entire supertype with its subtypes may be implemented as a single table with a relationship to a look-up table to indicate the subtype. For example, the PARTY, PERSON, and ORGANIZATION entities could be implemented as a PARTY table with a look-up to a PARTY TYPE to indicate if it is a “person” or an “organization.”

2. Each subtype may be established as a separate table with the supertype attributes included in each table. For example, the PARTY entity with its subtypes of PERSON and ORGANIZATION could be implemented as a PERSON table and an ORGANIZATION table. Any relationships to PARTY would now point to the PERSON table, ORGANIZATION table, or both, and any attributes of PARTY would be columns in both tables.

3. The supertype and one subtype may be merged into one table, and the other subtypes may be implemented as their own tables. For example, the PARTY, PERSON, and ORGANIZATION entities could be implemented as a PARTY table that includes all the attributes of PARTY and all the attributes of PERSON. In this scenario ORGANIZATION would be a separate table and related to the PARTY table. This design assumes that the PERSON table is accessed much more often than the ORGANIZATION table. And that is why the ORGANIZATION table is split out.

4. The supertype may be implemented as one table, and each subtype may be implemented in separate tables. In this physical database design, the PARTY, PERSON, and ORGANIZATION entities would each be implemented as their own tables leading to a PARTY table, PERSON table, and ORGANIZATION tables with relationships from PERSON to PARTY and ORGANIZATION to PARTY.

The physical database designer needs to make these and other decisions to arrive at a physical database design.

Physical Database Design Examples

Based on the previous examples, the different design options available will be explained. We will use the party model from Chapter 2 (V1:2.5) as the target for discussion of physical design. Although physical design options are discussed for this model, all the Universal Data Models may be physically implemented using a similar physical database design approach.

Review of the Party Role and Relationship Model

The next section will briefly review the party role and relationship model from Chapter 2 (V1:2.5); the following sections will provide examples of possible physical database designs for this model.

As explained in Chapter 2, each PARTY may participate in many different ROLE TYPEs that describe various roles of the PARTY. These ROLE TYPEs are influenced by the type of PARTY, which may be subtyped into PERSON or ORGANIZATION. Some ROLE TYPEs may be for either a PERSON or an ORGANIZATION, specifically SHAREHOLDER, CUSTOMER, and PROSPECT, while others are specific PERSON ROLEs or ORGANIZATION ROLEs. (See Figure 15.5.)

As also explained in Chapter 2, PARTY RELATIONSHIP stores the information about the relationship between two parties such as the relationship status, priority, or communication events within the relationship. The CUSTOMER RELATIONSHIP is an example of a party relationship and at XYZ Company, one of the most critical information requirements is tracking and maintaining this relationship. There are other potential PARTY RELATIONSHIP subtypes, such as EMPLOYMENT, ORGANIZATION ROLLUP, and many other party relationships and corresponding PARTY ROLEs; however, the following discussion will focus on the physical implementation of the PARTY, PERSON, ORGANIZATION, CUSTOMER, EMPLOYEE, INTERNAL ORGANIZATION, CUSTOMER RELATIONSHIP EMPLOYMET and ORGANIZATION ROLLUP entities.

The enterprise data model and logical data model development efforts are normally followed for this model as described by the previous discussions on the contact mechanism model. The upcoming sections will discuss physical database design options for implementing the model, assuming that the enterprise and/or logical data modeling efforts have already occurred.

Figure 15.5 Specific party relationships.

Party Roles and Relationships Physical Design, Option 1

Figure 15.6 illustrates a database design for the party roles and relationships model just described and mostly uses the second strategy of implementing sub-typing discussed earlier. Namely, each subtype may be established as a separate table with the supertype attributes included in each table.

Figure 15.6 Party roles and relationships physical design, Option 1.

Separate tables are set up for the PERSON and ORGANIZATION subtypes instead of implementing a PARTY table. This helps performance because the number of rows in each table is far less than one big PARTY table. The disadvantage is that when either a person or an organization is related to another entity, such as the relationship to CUSTOMER or relationships from both people and organization to ORDERs, then the design is more complex, and the resulting system tends to be complex. For example, the application would have to reference either a PERSON or ORGANIZATION that is related to the ORDER and applications would need additional logic instead of simply showing the PARTY that was related to the order.

Separate tables are also set up for some of the PARTY ROLEs such as EMPLOYEE and CUSTOMER. This is a very practical implementation because most of the PARTY ROLEs are viewed as needing their own tables by most application development efforts. Even though there is a separate CUSTOMER and EMPLOYEE table, the holistic, integrated perspective is preserved because each of these tables is linked to the PERSON and/or ORGANIZATION that plays the role. Therefore, a PERSON could still play multiple roles without storing that person's information redundantly. In this example, the PERSON could act as a CUSTOMER or could act as an EMPLOYEE, which would be valuable to know. Perhaps EMPLOYEEs that are CUSTOMERs are entitled to special considerations such as discounts, or perhaps they are, at least, given special courtesies.

Each CUSTOMER has its own customer id and certain information that is unique to the CUSTOMER role such as last contact date, which is the last date that the customer was contacted by anyone in the enterprise. The BILL TO CUSTOMER and SHIP TO CUSTOMER subtypes were converted to a customer type id foreign key to the CUSTOMER TYPE table, indicating if the CUSTOMER TYPE description is “bill-to customer” or “ship-to customer.” This is an example of using the first subtype implementation option because there is one table for the supertype and the subtypes. (The entire supertype with its subtypes may be implemented as a single table.)

Each EMPLOYEE has its own employee id as well as information that is appropriate to maintain for that specific role. For instance, the enterprise has deemed that the social security number and mother's maiden name will be maintained only for employees.

Attributes about people that are needed in more than one role are maintained in the PERSON table. This preserves the holistic principle that information about people should be stored once and not redundantly duplicated for each role they play. For instance, the current first name, current last name, current middle name, alias name, preferred name, and birthdate of a person may be needed for many different roles and in many different applications. Rather than repeat this information, for instance, by storing the employee name and the customer name in the CUSTOMER and EMPLOYEE tables (and repeating the name in any other role tables that are set up for people), the design recognizes that this is information about the PERSON so each EMPLOYEE and/or CUSTOMER is related to the PERSON, allowing access to this information.

A possible objection to this design for a specific database is that it is not performance oriented because a join needs to take place from CUSTOMER to PERSON to access the customer name, and a similar join from EMPLOYEE to PERSON needs to occur to access the employee name. The CUSTOMER table could include customer name attributes, and the EMPLOYEE table could include employee name attributes. There is, however, a drawback. What if the person is both an employee and a customer? This would mean storing the name twice and possibly storing it inconsistently in the case of a name change.

The database design could provide for a view of each role to its PERSON or ORGANIZATION table so that the required join from the role table to the person or organization table does not complicate the database query. Of course, there is still a performance issue involved, and the designer will need to weigh this trade-off against the redundant data possibility, which could have significant business impacts.

Even if the customer name attributes were put into the CUSTOMER table and EMPLOYEE name attributes were put into the EMPLOYEE table, at least the common person id foreign key would allow the enterprise to identify the presence of the same party that may play more than one role.

Some of the PARTY ROLE subtypes are designed with slight modifications to the logical data model to improve performance and simplify the physical table designs. INTERNAL ORGANIZATION and ORGANIZATION UNIT were PARTY ROLE subtypes in the logical model and provided information on whether the organization was an internal organization of the enterprise as well as whether the organization acted as a department, division, subsidiary, parent company, or other type of role. The physical database design handles the INTERNAL ORGANIZATION information requirements by maintaining an internal organization ind(icator) attribute that specifies if the organization is part of the enterprise (a “yes” value) or not (a “no” value).

The physical database design handles the ORGANIZATION UNIT requirement by relating each ORGANIZATION to an ORGANIZATION TYPE description of “department,” “division,” “subsidiary,” and “parent organization.” The relationships between departments, divisions, subsidiaries, parent organizations, and other organization units is handled through the ORGANIZATION ROLLUP entity, which was a subtype of PARTY RELATIONSHIP.

Each of the PARTY RELATIONSHIP entities is implemented as a separate table, and they inherit the attributes and any relationships of the supertype. (This is the second subtype implementation strategy—each subtype may be established as a separate table with the supertype attributes included in each table.) This is a practical implementation showing the EMPLOYMENT, CUSTOMER RELATIONSHIP and ORGANIZATION ROLLUPs as separate tables. Although they are implemented as separate tables, the enterprise model points out that they share common information, such as from date and thru date, as well as attributes not shown in the figure, such as status, priority, and communication events. The model therefore helps point out common information that should be considered for any PARTY RELATIONSHIP subtype table.

Example Data for Physical Database Design,

Option 1

The following tables provide examples of the data that may be stored in the PERSON, ORGANIZATION, CUSTOMER, EMPLOYEE, CUSTOMER RELATIONSHIP, EMPLOYMENT, and ORGANIZATION ROLLUP.

Table 15.3 shows the contents of the PERSON table. In this, the common information regarding the PERSON is defined, such as the PERSON ID, a generated unique identifier, used as the key to the table. Each instance has the appropriate current first name and current last name, “John Doe,” “Mary Smith,” “Joe Jones,” “John Jones,” “Jane Doe,” and “K Smith.” These columns in the table are required. The current middle name and alias name are optional and are filled in only when necessary. In the majority of cases, the current middle name contains the first initial for rows “1234,” “1345,” and “7890.” Rows “9900” and “7823” have left this blank, as this information was not applicable. For entry “6723,” the value in this case is “Frank,” as the PERSON's name is “K Frank Smith,” and the person uses his middle name instead of his first name. Only two rows contain an alias name: Row “9900” has a value of “Jack,” and row “6723” has a value of “Bud.”

Table 15.3 Person Table

Table 15.4 provides a few examples of organizations with which the enterprise is concerned. The organization ID of each organization is stored as well as the name and federal tax ID; thus, information about each organization may be stored just once and not for each role the organization may play.

Table 15.4 Organization Table

ORGANIZATION ID	ORGANIZATION NAME	FEDERAL TAX ID
8457	Goodcusto, Inc	84-1111-222
8890	ABC Inc.	84-3333-444
8789	Twin Systems	84-6666-777
8821	Consultants Inc.	84-2222-444
8845	DEF Supplies	84-5455-333
9923	XYZ Co	84-7777-444
9924	XYZ Subsidiary	84-7777-456

Table 15.5 provides information about people or organizations that are customers and hence are in the CUSTOMER table. Because either people or organizations may be customers, the customer ID is set up for each customer (perhaps by the sales force) and linked back to either the PERSON person id or the ORGANIZATION organization id, thus providing for the individual needs of the applications using CUSTOMER as well as the enterprise view allowing anyone to see the complete profile for the ORGANIZATION or PERSON, including information about any role they may play.

Table 15.5 Customer Table

The last contact date is a field that may apply only to people or organizations that are customers; therefore, it is a field of the CUSTOMER table. If it is deemed that other applications also need this information, then it should be a field in the PERSON or ORGANIZATION tables, or both. Notice that this is a derived field because it could be figured out from the last COMMUNICATION EVENT (V1:2.12); however, the physical designer has decided to include this as a field for performance reasons. Application code would need to synchronize this date with the last COMMUNICATION EVENT date so that this information is consistent.

Table 15.6 provides information that may be in the EMPLOYEE table. An employee ID may be provided for a particular application, such as a human resources application. Again, the application can store its own data, such as the social security number and mother's maiden name if these are deemed fields that are solely for use within human resources. The fact that each employee id relates to a person id allows the enterprise to link to any other information about that person and thus provide a complete view of each person.

Table 15.6 Employee Table

Notice that Joe Jones (person ID 7890) is an employee of XYZ Company, and he was also identified as a customer in Table 15.5. This type of information can be used to provide better service to Joe, and it can also be beneficial for XYZ company to know how many of their employees are actually customers.

Table 15.7 provides information that may be in the CUSTOMER RELATIONSHIP table. This table links PERSONs or ORGANIZATIONs with the ORGANIZATION for which they are a customer. This allows each customer to have several CUSTOMER RELATIONSHIPs with each of the organizations of the enterprise. For instance, the table shows that Mary Smith has a CUSTOMER RELATIONSHIP with XYZ Company, which has a priority of 1, which is the highest priority. She also has a CUSTOMER RELATIONSHIP with the XYZ Subsidiary organization, where her priority is 5 (perhaps she hasn't done much business with that organization). There may be other information, such as the relationship status, about each relationship that may vary by the relationship even though the customer is the same party.

Table 15.7 Customer Relationship Table

Table 15.8 records information about which employee is employed by which organization. The table shows three employees of XYZ Company. The table could be expanded to include who was an employee of what external organizations; however, the enterprise would need to establish that it had the will and means to maintain this information. A person may be employed many times by a single organization; however, the information in the EMPLOYEE table or in the PERSON table may not change as a result of being employed numerous times.

Table 15.8 Employment Table

Finally, Table 15.9 provides a couple of examples of ORGANIZATION ROLLUPs. The table shows that XYZ Subsidiary (an ORGANIZATION TYPE of “subsidiary”) may be rolled up into XYZ Company (an ORGANIZATION TYPE of “parent organization”). The structures of the enterprises' departments, divisions, subsidiaries, or parent companies may be rolled up in order to show the organization structure of the enterprise at any point in time (the from date and thru date on ORGANIZATION ROLLUP provide for capturing changes). Alternatively, the enterprise can capture the organization structures for external organizations to help find out about these companies. For example, the second row shows that Twin Systems (an external supplier) is a division of Goodcusto (an external customer).

Table 15.9 Organization Roll-up Table

One advantage of this physical database design implementation is that it provides a very practical strategy for different parts of the enterprise to more easily “own” and steward their own information (because it is in separate tables) while providing the infrastructure to enable integrating the various roles to capture a complete profile on either a PERSON or an ORGANIZATION. For instance, the sales force can more easily steward the CUSTOMER specific information while the human resource department can steward the EMPLOYEE information. Both departments may be able to see a more complete profile on a person that is both an employee and a customer.

A disadvantage of this design is that as new roles and relationships become known, it may require new tables. Also each role and relationship may need redundant attributes and relationships instead of being able to relate tables like STATUS TYPE or PRIORITY TYPE to the PARTY ROLEs and PARTY RELATIONSHIPs. Although there are advantages in separating the PERSON and ORGANIZATION subtypes, some redundant structures and relationships may be required in lieu of being able to relate entities to a PARTY entity.

To summarize, each of the PARTY ROLEs and each of the PARTY RELATIONSHIPs can be implemented as separate tables and linked together with common person ids and organization ids to provide an integrated view of people and organizations, while also meeting the needs of specific applications.

Party Roles and Relationships Physical Design, Option 2

Figure 15.7 provides a slightly different version of the previous design for implementing the party roles and relationships model. This design is similar in that most PARTY ROLEs and PARTY RELATIONSHIPs are set up as separate tables. This design shows that instead of separating PERSON and ORGANIZATION into separate tables, a PARTY table stores information on both people and organizations. The data examples would be similar to the first design option except that the party id would link each of the roles to a common profile of people and organizations, instead of linking them with a person id and an organization id.

This provides the benefit of sharing the same data structures between people and organizations. For instance, the same PARTY CONTACT MECHANISM structures can be used for PARTYs instead of having to relate them to both PERSONs and ORGANIZATIONs. Agreements and/or orders can be related to PARTYs instead of both PERSONs and ORGANIZATION. Responsibilities can be assigned to PARTYs, which could be either people or organizations. Many other circumstances exist for which this design is beneficial.

Figure 15.7 Party roles and relationships physical design, Option 2.

The trade-off is that this PARTY table could be quite large if it stores all the people and organizations in the enterprise. Various physical access strategies could be incorporated to handle this issue. For instance, the table could be heavily indexed, or more powerful processors could be used to access this information.

Another option is just to store the party ID as a foreign key in each of the role tables and redundantly store the common party attributes in each of the role tables. At least the enterprise will be able to identify the existence of the same party playing many roles. In this scenario, it is easier to reconcile data inconsistencies because the key to the party is known.

Nevertheless, the designer needs to weigh the clean, normalized, data-consistent design against performance considerations.

Party Roles and Relationships Generic Design, Option 3

The next example will review how to set up a generic overall set of tables to handle all roles and relationships within one model. This will show the flexibility of the Universal Data Models to provide a very flexible database design that will remain stable even when information requirements change.

Figure 15.8 provides a design that incorporates the first strategy for implementing subtypes (the entire supertype with its subtypes may be implemented as a single table with a relationship to a look-up table to indicate the subtype) throughout the design. In this design, it varies from option 1 and is similar to option 2 because it incorporates the PERSON and ORGANIZATION information back into the overall PARTY concept. A major change is that the specific roles played by each party are kept in one table, PARTY ROLE. In this case, the generic table of ROLE TYPE and the various types of roles that parties may play are maintained in the PARTY ROLE TYPE table. Another difference in this design from the previous two designs is that relationships are also placed together into the PARTY RELATIONSHIP table, without separating them into different tables.

This design option could be used for an application such as a staging data warehouse or operational data store. It could also be used in cases where the number of rows is relatively small or where very powerful processors are available to handle this flexible design. The benefit of this design is that it is very flexible and allows any information to be pulled together for overall viewing. Another potential application is where the maintenance and support of the information are centralized and managed by one group. In either case, this design would be appropriate.

As an example, XYZ Company maintains a decision support environment where the information from the various groups is pulled together on a daily basis into an operational data store and then is transferred to the data warehouse at the end of the month. This design is needed to capture all of the data given in the previous examples.

Figure 15.8 Party roles and relationships generic design, Option 3.

As seen before, all the information that XYZ Company has gathered either in the PERSON or ORGANIZATION tables has been moved into the PARTY table. The associated PARTY TYPE has been assigned based on the source of the original data. Included is the current first name, current last name, current middle name, taken from the PERSON table and the organization name from the ORGANIZATION table. Note that the first name and last name for each person have been concatenated to form PARTY NAME.

Table 15.10 Party Table

PARTY ID	PARTY NAME	PARTY TYPE
1234	John Doe	Person
1345	Mary Smith	Person
1567	John Smith	Person
1876	Jerry Wright	Person
7890	Joe Jones	Person
9900	John Jones	Person
7823	Jane Doe	Person
6723	Bud Smith	Person
1567	Ken Harris	Person
1990	Uma Key	Person
5134	Bill Jake	Person
6712	Betty King	Person
7876	Jeff Dane	Person
7890	Linda Kinney	Person
7721	Bob Mason	Person
7723	Larry Ink	Person
8457	Goodcusto, Inc	Organization
8890	ABC Inc.	Organization
8789	Twin Systems	Organization
8821	Consultants Inc.	Organization
8845	DEF Supplies	Organization
9923	XYZ Company	Organization
9924	XYZ Subsidiary	Organization
9925	Accounting department (of XYZ)	Organization

By adding the ROLE TYPE to the PARTY ROLE, as shown in Table 15.12, the information from the PARTY table is now linked to the role each party participates in with XYZ Company. Notice that there is the capability of tracking the many roles for each person; for example, “John Doe” is both a “Prospect” and an “Employee.” “Joe Jones” is a “Customer” and an “Employee,” as is “Jane Doe.” In this case, the ROLE TYPE clarifies the different potential roles that each PARTY has. In addition, the ORGANIZATION type of PARTY plays the role of a SUPPLIER, as it provides the required people or supplies to XYZ Company.

Table 15.11 Party Role with Role Type

Table 15.12 Party Relationship Table

Figure 15.12 provides an example of various relationships that could be stored in the PARTY RELATIONSHIP table.

Table 15.13 illustrates the flexibility offered with this physical implementation of the party roles and relationships model. As new roles or relationships are discovered, perhaps through new business rules, this database design can support these new roles without any database changes, unlike the previous two implementations, which would require additional tables. Again, the designer needs to evaluate performance considerations versus flexibility and maintainability.

Using the Data Warehouse Models

The data warehouse models presented in this book serve to illustrate examples of how to move from the corporate or logical data models to an enterprise-wide data warehouse data model, then to departmental data models. The most important point of these chapters is to show the transformation process so that enterprises can understand the importance of setting up an appropriate data warehouse architecture.

Chapter 10 presented the transformation steps for converting a corporate data model into a data warehouse data model. Chapter 11 then provided a sample data warehouse data model that was developed using these transformation concepts. These transformation steps should be used selectively; it is not necessary to use all of these steps when developing the data warehouse data model. They serve merely as guidelines.

For instance, denormalizations such as including derived data, merging tables, and creating arrays of data may not always be required, or they may be postponed until later. They should be included only when it is obvious that there are enterprise-wide requirements for the information provided by these transformations.

There has been much debate in the data warehouse community regarding the architecture involved in moving data to a data warehouse environment. Specifically, there seems to be many schools of thought. Some say that it is necessary to first extract data into an enterprise-wide data warehouse, then move it to departmental data warehouses. Others contend that they can simply move the information directly from the operational systems to departmental data warehouses. There are debates about the structure of an enterprise-wide data warehouse and whether it should be a series of interconnected star schemas, or a denormalized version of the enterprise data model.

Should there be an enterprise-wide data warehouse or just extractions into departmental data warehouses (i.e., data marts)? Certainly, it is initially easier to move the data directly into the departmental data warehouses. Only one transformation is needed instead of moving the data to an intermediate stage first, then to the departmental data warehouse. Business managers want infor-mation quickly and easily. There is an urgent need to provide the user community with the information it needs. This is exactly why the data warehouse concept has caught on. Why not take the direct route and provide the information directly to the departmental warehouse?

There is a major drawback in moving the information directly into the departmental warehouse. The drawback is not on the first implementation of the departmental data warehouse but on subsequent departmental implementations. If each department moves the data from the operational systems directly into its own departmental data warehouse, there is a great exposure in creating more inconsistent data sources than already exist in the operational systems.

Most enterprises that are creating data warehouses are faced with complex transformation routines that need to cleanse and consolidate data from several operational systems. The largest challenge in creating data warehouses is in deciphering the source of the most reliable data of the enterprise. This is complicated by the fact that the same data is often stored redundantly and inconsistently in most organizations (see Chapter 1 for a discussion on separate and redundant data in enterprises). Therefore, in the transformation process, the data inconsistencies need to be dealt with.

To illustrate the problems of inconsistent data, a project engagement comes to mind where data was being transformed into a new database. When the database design was reviewed, it was noticed that there were two fields for blood type in the person table. The obvious question created some concern: How can there be two blood types for a single person-a person, of course, has only one blood type! When the problem was tracked down, there actually were two blood types for each person; one from system A and the other from system B! It was extremely difficult to find out which system held the more accurate information for each person, so the enterprise decided to keep both values in the system. This is only one example showing the challenge of trying to consolidate information from several sources.

If each department uses its own transformation routines, think of the potential for inconsistent management information. Many of the departments may use the same information regarding people, organizations, products, sales, purchases, projects, and so forth in their decision support environment. The transformation routines are bound to be different for each department, if they are done separately, leading to inconsistent and confusing results. For instance, it is conceivable that a marketing organization may produce executive decision information that may be inconsistent with the data produced by the order processing department if it used different sources or transformations to gather its sales information. This can have devastating effects on an enterprise. The consequences of having inconsistent information between several systems, is that the enterprise may either start to question the credibility of the information, or worse, they act on incorrect information.

The entire enterprise can capitalize on sharing the results behind this difficult process of transforming data. The only way this can happen is by having one central transformation process for decision support. By building an enterprise-wide data warehouse data model, common transformation routines can be built that serve the needs of the entire enterprise. In addition, the enterprise can identify data inconsistencies that are discovered in the transformation process and make a consistent decision on how these discrepancies will be handled. This knowledge can help the enterprise move toward more integrated operational systems by pinpointing existing problems. Finally, the enterprise can save time and money in the long run by doing the transformation process once for the enterprise instead of transforming similar types of data for each departmental data warehouse.

The enterprise needs to be willing to make an investment in its data and information architecture in order to be successful with data warehousing in the long run. An analogy is that it often seems easier to jump in and develop a program without taking the time to formally analyze the requirements or design. Experience has shown, though, that the lack of up-front investment will usually lead to increased programming and maintenance costs over the life of the program.

If an enterprise wants to get a quick head start into the data warehouse arena, another strategy would be to use template data warehouse designs, either from this book or from another source, to create a prototype departmental data warehouse, then load it with a limited amount of data. This allows the enterprise to evaluate the benefits of a decision support environment and make strategic decisions regarding its data warehouse environment. After an initial prototype is built, the enterprise may want to consider building an enterprise-wide data warehouse to support future departmental information needs.

Summary

This chapter discussed how to implement the Universal Data Models to support various types of systems development efforts: models to provide an integrated view of the enterprise, models required to design a specific application, and physical database design models used to generate a database. The chapter also looked at strategies for implementing a data warehouse. The design points to emphasize are the following:

Establish an enterprise data model in order to facilitate integration of information across the enterprise and also to gain understanding of the information requirements of the enterprise.
Maintain contact with the business experts to adequately capture the business needs.
Review the models with the business to ensure that the business understanding is adequately captured.
Using the enterprise data model as the basis for building a logical data model for a specific application. This can facilitate better integration of that application into the overall system of the enterprise, thus enabling more effective sharing and communication of information. It can also provide a jump-start to the modeling effort for that application by allowing reuse of data structures that have already been modeled.
Use process models as another source of input into developing a logical data model. Automate the business processes based on a clear understanding of them.
Create the required physical database design models based upon the logical data models and with performance considerations in mind for the intended database management system (DBMS).
Convert subtypes from the logical data models to a physical database design, choosing from among the four approaches shown in this chapter.
Use the enterprise and logical data models as the basis for building an integrated decision support (data warehouse) environment.

As seen in the examples, there are a number of ways to implement each Universal Data Model. By following clearly defined design principles, the Universal Data Models can be used and adapted to build the physical design. After applying the needed database-specific requirements, the design can be used to generate a physical database schema that is ready for loading and testing. The Universal Data Models can then also serve as the basis for a data warehouse design and implementation.

The intent of this last chapter has been to illustrate how Universal Data Models can be used to build quality, integrated database implementations.

We hope that this is just the beginning of more widespread efforts toward building Universal Data Models and that the information systems industry will continue to develop more reusable models. The results of these efforts will allow developers to shorten systems development cycles and produce higher-quality, better integrated information systems at reduced cost for the user community at large.

For More Information

We encourage and would appreciate any suggestions, questions, or comments to help the further development of Universal Data Models, data integration, or holistic systems.

To find out more information, ask questions, contribute, or if we can be of further assistance to you, please feel free to e-mail us at [email protected], visit the book's companion Web site at silverston.wiley.com, or visit www.universaldatamodels.com.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Implementing the Universal Data Models

Create new playlist

Sign In

Sign Up

Table of Contents for
Implementing the Universal Data Models