CHAPTER 17

Multiple Data Warehouses across a Large Enterprise

When people set out to build a data warehouse and a corporate information factory, they usually have a limited scope in mind. What happens when more than one data warehouse and more than one corporate information factory are built across the enterprise? Integrating data cross an entire enterprise makes problems even more complex. Large, complex, and diverse enterprises that contain multiple data warehouses and corporate information factories are found in many places, including

Images   State and national governments

Images   Multinational organizations

Images   Organizations with multiple vertical lines of business

Multiple data warehouses and data marts can grow up in large complex enterprise environments in which multiple corporate information factories coexist. Trying to apply the principles of data warehousing across such a grand vista is a challenging task. Although it appears to offer the promise of true enterprise integration, integrating data warehousing and the corporate information factories across the enterprise is difficult to implement given the complexity of these environments. This chapter will address these difficulties and show how data warehouses and the corporate information factory can be applied across the entire enterprise environment to achieve enterprise-wide integration.

Define the Need for Integration

The first step toward achieving an enterprise architecture is the determination of whether a need for or the existence of business integration exists across different parts of the enterprise. In some cases business integration already exists. In some cases, undoubtedly, a need for business integration must be met. It makes no sense to build (or attempt to build) integration into the systems of the enterprise framework across multiple data warehouses and corporate information factories when no business justification can be made for such integration.

Suppose that a bank has a data warehouse for retail customers, a data warehouse for trust customers, and another data warehouse for home loans. The bank decides to build an integrated data warehouse and a corporate information factory for all lines of business, based on the business need, to integrate customers and process across different lines of business. Does this type of enterprise data warehouse make sense? Absolutely!

Suppose that the bank has foreclosed on a steel mill. Does it make sense to try to integrate the customers of the steel mill with bank loan customers? Only if the bank is trying to get into the steel-making business does such business integration make sense. When business integration across the enterprise does not make sense, a data warehouse reflecting that integration correspondingly does not make sense.

Define the Enterprise Framework

In order to create the larger enterprise framework consisting of multiple data warehouse and corporate information factories, it is necessary to introduce new definitions and conventions for portraying the corporate enterprise framework. These definitions and conventions provide a common language by which to communicate the framework.

Business Domain

Every data warehouse has a business domain. A business domain for a data warehouse consists of the systems, procedures, transactions, and activities in support of the data warehouse in which data is captured, edited, adjusted, audited, and created. The business domain includes both systems and transactional activities conducted by the corporation. It includes the operational users, the analytical users, and their activities. Figure 17.1 depicts the business domain of a data warehouse.

The business domain of a data warehouse includes the places where data is created and managed. For example, in a manufacturing data warehouse, the business domain might start where an assembly line creates a product, or the manufacturing business domain could include the loading dock of the manufacturing corporation where raw and unfinished goods are received. Or the manufacturing business domain could include the systems that manage the shipment of products out of the manufacturing plant. In short, everywhere detailed data is created and then shipped to the data warehouse is part of the business domain of a data warehouse.

image

Figure 17.1  Each data warehouse serves a business domain.

The business domain of a data warehouse also includes systems that create and use secondary, nondetailed data. These secondary systems are typically analytical systems, which take detailed data and sub type or super type the detailed data to aid analysis. The result is aggregated or summarized data—in other words, nondetailed data whose basis is detailed data belonging to the business domain of the data warehouse. This nondetailed, analysis-oriented data also belongs to the business domain of the data warehouse.

As an example of nondetailed data belonging to the business domain of a manufacturer, consider an accounting system that groups together shipments of products of finished goods that go out the door. The data—monthly total product—is a type of nondetailed data belonging to the business domain of the manufacturing data warehouse.

Occurrences and Types of Data

A “type of” data refers to a generic classification of business domain data and is the specification that the data warehouse designer produces. Collectively, the different “type of” data form a template describing the content and structure of data that is of interest within the business domain. For example, the data warehouse designer may specify that one type of data might be an “employee.”

Occurrences of data are fundamentally different from “type of” data. The occurrences of data refer to the actual rows of data that populate the systems. Figure 17.2 shows the difference between an occurrence of data and a type of data. For example, if the type of data is employee, the occurrences of the type might include:

Images   “Bill Inmon”

Images   “Claudia Imhoff”

Images   “Joyce Montanari”

image

Figure 17.2  The difference between type of data and “type of” data.

Images   “Dan Meers”

Images   “Bob Terdeman”

Images   “Jon Geiger”

Owned Data

Every unit of data in a data warehouse is owned by only one owner. When data is owned in a data warehouse, the ownership can occur in several ways. Either data occurrences can be owned, or data elements within a data type can be owned. Every data occurrence within a data type can be owned. Every data occurrence is owned and every “type of” data is owned by one owner. Figure 17.3 illustrates this concept.

For example, a data warehouse may own the right to create or delete records for:

Images   “Bill Inmon”

Images   “Claudia Imhoff”

Images   “Joyce Montanari”

Images   “Dan Meers”

image

Figure 17.3  Different end-user communities “own” the right to update and manage data.

Images   “Bob Terdeman”

Images   “Jon Geiger”

from the data warehouse. The owner of the data warehouse would then have the ability to throw away the entire record for Joyce Montanari, or any other record.

The other type of ownership is by the element within type. It is possible for data warehouse ABC to own the occurrence for these people and for data warehouse BCD to own an data element within the record for the occurrences. For example, the state-wide data warehouse owns the records for:

Images   “Bill Inmon”

Images   “Claudia Imhoff”

Images   “Joyce Montanari”

Images   “Dan Meers”

Images   “Bob Terdeman”

Images   “Jon Geiger”

The Department of Motor Vehicles, however, owns the data for drivers licensing inside each record:

Images   “Bill Inmon,” license 2278665, no tickets

Images   “Claudia Imhoff,” license 3376229, one moving violation

Images   “Joyce Montanari,” license 3398117, no tickets

Images   “Dan Meers,” license 1187227, three parking tickets

Images   “Bob Terdeman,” license 2289917, expired Jan. 15

Images   “Jon Geiger,” license 2287655, no tickets

Where the owner of the data resides outside of the business domain, coordination problems must be worked out:

Images   What happens when there is a residency record but no driving record?

Images   What happens when there is a driving record and no residency record?

Images   What happens when the residency of an individual drops? Does the driving record also get dropped?

Ownership can also be classified as physical or content. A database administrator typically owns the physical data. When recovery needs to be done, the database administrator is called upon, but content of data is owned as well.

When accounting finds that an error has occurred in the data warehouse, it is accounting’s responsibility to repair the data content.

Shared Data

There is only one owner of data at any moment in time. As such, data that is owned can exist in only one place in the enterprise. However, data can be shared innumerable times across the enterprise; data may not be shared at all; or data can be shared one place, two places, and so forth.

The data owner holds the right to create, delete, and modify the data in the warehouse. No one else has these rights. When data is created and ownership is established, it can be moved to another data warehouse. When owned data is moved to a separate warehouse, the data that is moved is called “shared data.” Shared data carries the actual owner with it. By attaching the actual owner, data can be reshared with no loss of integrity. Additionally, data that is shared across data warehouses carries with it the date of sharing. This date is important because the data owner may decide to alter the data at a later point in time. When the data that has been shared is used, it is assumed that the data is accurate only as of the moment in time of the date of sharing.

In order to keep shared data up to date, the organization doing the sharing must periodically return to the owner to make sure that the shared data is in synch with the owned data. Figure 17.4 illustrates shared data and owned data.

image

Figure 17.4  Owned versus shared data.

Sharing Data across Multiple Data Warehouses

The essence of the multiple related data warehouse environment is ability to share data across the environments while maintaining integrity of data ownership. Many forms of data sharing occur across the multiple data warehouse environments. Some of the more common ways that data can be shared are presented here.

Simple Sharing

The simplest way that data can be shared across multiple data warehouse environments is when one data warehouse sends a simple, single unit of data to another warehouse. Packaged with the single data unit is the sharing date and owner specification. Figure 17.5 shows this simple form of sharing.

In Figure 17.5, one data warehouse—the owning data warehouse—has simply packaged up a unit of data and passed it to another warehouse. When the data arrives at the receiving warehouse, it is incorporated wherever it makes sense.

As an example of the simple data passage from one data warehouse to another, consider a single manufacturer of vehicles. One division manufactures cars, and the other division manufactures motorcycles. The manufacturer decides that having each division have its own separate relationships with external vendors might not be a good idea. A single relationship to external vendors might make sense for several reasons:

Images   There might be a discount given for consolidated orders.

Images   There is a single point of contact if there are problems.

Images   There is the possibility of consolidated shipping and storage.

image

Figure 17.5  Simple sharing.

Obviously, there is a real case for business integration across the different car and motorcycle manufacturing divisions. To try and consolidate relations with outside suppliers, the motorcycle division sends some basic information to the car manufacturing division where the corporate wide consolidation will be done, including:

Images   The part that is received from an external manufacturer

Images   The amount of parts that have been received

Images   Where the parts have been received and where they are stored

Related Occurrences of Data

Sharing multiple related occurrences of data increases the number of records that are passed. Figure 17.6 shows this type of sharing.

As an example of the sharing of multiple related occurrences of data, suppose that a car distributorship sends a monthly record of the automobiles that have been delivered to the automobile manufacturer. This is the same as the simple passage of data except that it occurs on a monthly basis, and over time, multiple shared records are passed.

Or consider the case of both a motorcycle manufacturer and an automobile manufacturer. In this case, the motorcycle manufacturer sends information every time a shipment is received from an external supplier. A historical record is made of the shipments.

image

Figure 17.6  Multiple related occurrences of data that are shared.

Other Relationships

In these two types of shared data, a simple passage of multiple data occurrences occurs from one data warehouse environment to the next. On some occasions, however, a tighter relationship between the data that is passed and one or the other environments is referenced. This type of sharing is very similar to the OLTP notion of referential integrity. There is a substantial difference between grounded data warehouse sharing and referential integrity—in the data warehouse the relationships that are referenced are all referenced in relation to some moment in time. In OLTP referential integrity, the relationship is understood to be active and ongoing.

Grounded Relationships-The Sender Is Grounded

As an example of a grounded relationship in which the sending data warehouse environment is grounded, look at Figure 17.7.

In a grounded relationship, a direct relationship exists between the record of shared data and some single record in the data warehouse being referenced. Grounding can reference the owning data warehouse (the normal case) or the sharing warehouse (the not-so-normal case).

In Figure 17.7, the automobile distributor sends information about shipments to the manufacturer. In this case, the information sent to the manufacturer includes the actual distributorship that received an allotment of cars. In other words, the distributor has added specific instruction information to the data before shipping it to the manufacturer.

image

Figure 17.7  A record that is shared can be grounded to the owner, forming a simple relationship.

image

Figure 17.8  Once the shared data is received, it can have further grounding added to it.

image

Figure 17.9  Verification programs ensure that specified relationships are valid.

Grounded Relationship-The Receiver Is Grounded

After the data is sent by the owning data warehouse, data can have further grounding attached to it. Figure 17.8 shows a case in which after receiving data from an external data warehouse, the information is further processed.

In Figure 17.8, data passed to the manufacturer is further processed to attach the plant where the car was originally manufactured to the distributors data.

Verifying the Integrity of the Grounded Relationship

In order to make sure that the integrity of the grounded relationships remains intact, it is often wise to create and execute programs that verify the integrity of the grounded relationship. Figure 17.9 shows the programs that read the shared data and that make sure the data that is referenced in the data warehouse is indeed valid.

Figure 17.9 shows that a program reads the distributors shared data and makes sure that a distributorship that is referenced in the shared data actually exists.

Define the System of Record

Applying the foregoing data conventions creates what can be termed the system of record for enterprise warehouse data across the enterprise:

Images   Ensuring that data is owned by only one data warehouse or business domain

Images   Ensuring that data that is not owned is shared

Images   Ensuring that shared data carries with it the date of sharing

Images   Ensuring that ownership can be at the occurrence level or the element level within type of data

Images   Ensuring the integrity of a grounded relationship

With the system of record for the data warehouse environment, only one owner exists for each occurrence of data, and only one owner exists for each data element of within a data type.

Establishing and maintaining a system of record for the data warehouse environment across the enterprise then is the key to being able to create and manage data in an enterprise-wide integrated environment.

The concept of the system of record is not new. In the beginning of the data warehouse experience there was the system of record for the data source going into the data warehouse. Figure 17.10 shows the system of record for source data.

In Figure 17.10, the data source flowing into the data warehouse is carefully identified and outlined. Some legacy data goes into the data warehouse, and other legacy data does not. Some legacy data goes into the data warehouse only under certain conditions. Other legacy data goes into the data warehouse and needs to be converted, and so forth. The system of record for source data is a carefully documented and conceived statement of how legacy detailed data will flow into and support the data needed in the data warehouse.

image

Figure 17.10  The system of record as it applies to the source of data feeding the warehouse.

The same sort of concept holds for the system of record data in the data warehouse. The discipline required by the system of record for the data warehouse environment creates an environment in which different data warehouses can operate in a cohesive manner with other warehouses in order to create a truly enterprise architecture of data.

Local Data Warehouses

Another important convention is that of the designation of a local data warehouse. A local data warehouse is one in which the business domain for the data warehouse is entirely self-contained. In other words, there is a single, well-defined business domain that is supported by a single data warehouse. In a pure local data warehouse, no data is in the data warehouse that is owned by another warehouse.

As an example of a local data warehouse, consider a retailer whose warehouse contains detailed information about all of the sales made for the past two years. The warehouse is self contained with no references to external systems or data warehouses.

A Variation of a “Pure” Local Data Warehouse

A variation of the local warehouse is the local warehouse with shared data. Figure 17.11 shows such a warehouse.

In Figure 17.11, most data in the local warehouse has its source as data coming from the business domain, but there is some data whose ownership is outside the business domain. This data that is placed in the local data warehouse is shared data. The ownership of the shared data lies outside the business domain of the local data warehouse. The acquisition date or passage into the data warehouse is stored with the shared data. In such a manner the less than pure local data warehouse is populated with shared data whose source and ownership is outside the local warehouse.

As an example of an impure local data warehouse, suppose that the retailer with the pure data warehouse decides to allow data into the warehouse whose origin is outside the business domain. The retailer decides to allow shipment information into the retailing data warehouse. The shipment information includes:

Images   Who the shipper was

Images   The date of shipment

Images   The date of receipt

Images   Who signed off for the shipment

Images   The status of the shipment upon receipt

The external data is collected by an agency other than one inside the business domain of the retailer. The information is then given to the retailer for further consideration. If the information given to the retailer about shipment is incorrect, it is not in the province of the retailer to check to see if the information is valid.

image

Figure 17.11  A local warehouse with shared data.

Global Data Warehouses

A global data warehouse occurs where the business domains of two or more local business domains and/or local warehouses intersect. Unlike a local warehouse where there is a single business domain, a global warehouse reflects an intersection of multiple business domains. There may or may not be a local data warehouse for the local business domain that is intersected. The global warehouse has the complex task of representing more than one business domain. Figure 17.12 illustrates a global data warehouse.

As an example of a global data warehouse, suppose that a corporation sells to multinational organizations such as IBM, HP, Ford, DuPont, Dow, and others. There is a desire to set up a global data warehouse for sales across the world. Some information in the global warehouse will represent sales across the world, but other information pools contain local sales information that is unique to or peculiar to one locale. There will be a Latin American set of databases for unique Latin American business, a set of databases for China, and another set of databases for the business in France. At the center of these separate worldwide systems will be the global warehouse.

Note that these separate worldwide systems may or may not have a separate local data warehouse. The global data warehouse may function as the intersection of different business domains. with no data warehouse for the business domain, or the global data warehouse may function where the business domains have a local data warehouse. Both forms of the global data warehouse are valid.

image

Figure 17.12  An example of a global data warehouse.

Types of Warehouses in the Enterprise

There are then (at least!) six distinct possibilities for the different kinds of warehouses found in the enterprise environment:

  1. A simple local data warehouse in which there is no sharing of business domain

  2. A simple global data warehouse in which there is a sharing of multiple local business domains but no local data warehouses

  3. Multiple local data warehouse in which there is no sharing of data

  4. Multiple local data warehouse in which there is sharing of data

  5. Multiple local data warehouses in which there is intersection of business domain and a global data warehouse

  6. Multiple local data warehouses in which there is intersection of business domain and a global data warehouse and a separate business domain for the global data warehouse

Figure 17.13 shows the six normal possibilities for the states of data warehouses as they exist inside the enterprise.

A Simple Local Data Warehouse

The most basic structuring of a data warehouse is that of the simple local data warehouse, as seen in Figure 17.14.

Figure 17.14 shows that there are basic systems that feed data to the single data warehouse. The legacy application systems move detailed data to the data warehouse through a layer of integration and transformation. The only data found in the data warehouse is that which belongs to the business domain.

As an example of the simple local data warehouse, consider a small, self-sufficient steel manufacturer. The manufacturer is not dispersed geographically nor does the steel manufacturer participate in a vertically rich line of products. Instead, the steel manufacturer has its own suppliers, its own customers, and its own manufacturing and distribution facilities. The data warehouse reflects the steel manufacturer and stands alone.

A Simple Global Data Warehouse

Suppose that a state government wanted to have a nonredundant warehouse environment. The state government recognized that:

image

Figure 17.13  The six possibilities for data warehouses inside the enterprise.

Images   Some data was common across nearly all state agencies. The data that was common across all state agencies was not very complicated. It consisted of such basic entities as:

Images   Resident

Images   Tax payer

Images   Property

Images   Much of the other data was peculiar to the different state agencies. The Department of Motor Vehicles had traffic data that was unique to and useful only to its department; the Department of Justice had data that was of interest only to its department; the Health and Human Services department had data that was peculiar to its department’s needs; and so forth.

image

Figure 17.14  A simple local data warehouse.

image

Figure 17.15  A simple global data warehouse.

A global warehouse was constructed in order to reflect the common data across different state agencies, as seen in Figure 17.15.

The different components of the global data warehouse were owned by different agencies. Some agencies owned certain classes of data occurrences. Other agencies owned certain elements of data types. In the end, each occurrence of data was owned by only one agency, and each data element was owned by only one agency. But, all agencies could access and analyze data across the global data warehouse.

image

Figure 17.16  The global warehouse is supported by local operational systems.

In addition, each agency had its own supporting systems that were not directly tied into the global data warehouse. Figure 17.16 shows the departmental supporting systems.

The non-data warehouse supporting systems support the data that is peculiar to the business of the department, which is not part of the global data warehouse.

From an analytical perspective, analysis can be done on both global and non-global data warehouse data at the same time. An analyst in the Department of Motor Vehicles can look at and report on data from both the global data warehouse and its own nondata warehouse environment. By the same token, an analyst from the Health and Human Services Department can look at and report from the global data warehouse and its own nonglobal data warehouse at the same time.

Figure 17.17 shows the ability of the analytical component of each department to integrate the different sources of data into the environment, all in the same analysis.

Multiple Unrelated Local Data Warehouses

Consider the Army and the Navy. For all practical purposes, these entities consider themselves to be very separate entities, at least when it comes to their information systems. From a larger perspective—that of the enterprise—these different arms of the services have business domains that do not intersect. Because there is no common business interest, there is no intersection.

image

Figure 17.17  DSS at the departmental level is supported by both local systems and the global data warehouse.

image

Figure 17.18  Sharing data where there are no strong intersections.

Relationship between Business Domains

Now suppose that the enterprise approach is taken in a corporation that recognizes that there is an interest in sharing data between different business domains even where there is no common intersection between those domains.

Imagine a large manufacturer who has a manufacturing group and a distribution group. Although the two organizations do not recognize any formal relationship or intersection between them, they do recognize the need to share data. Figure 17.18 depicts the sharing of data across the two groups.

As data is shared, the data ownership is kept intact, as well as the sharing date. If the analyst using the shared data desires to have very current data, then the organization containing the shared data must go back to the owner and make sure that the data contents are as fresh as the analyst desires.

In such a manner can two very different groups share data without losing the integrity of the data.

Intersecting Interests

Now suppose that a company has two groups whose interests intersect. For example, suppose that a company has a European and an American corporation. Some systems and services serve both the European and the American interests. Of course, there will be some systems and some services that serve only European or only American interests. In this case, the global interests were numerous enough to warrant their own warehouse. Figure 17.19 shows this occurrence.

A global warehouse is constructed based on data from the intersecting business domains. Data arrives in the global warehouse from:

Images   The local data warehouse

Images   Local systems

If data arrives in the global data warehouse from a local data warehouse, it is treated as shared data. The data ownership does not change even though the data exists in both the global and the local data warehouse. And, when the data source is a local system, then the global data warehouse becomes the data owner.

Note that ownership can be by occurrence or type in the global data warehouse.

image

Figure 17.19  A global warehouse that represents the intersection of different local warehouses using shared data.

image

Figure 17.20  A separate worldwide function.

An Extended Global Warehouse

The global data warehouse previously described assumes that all of the data in the global data warehouse will come from one or the other local warehouse. In many cases that will be correct. What about the case in which the global data warehouse has its own directly supporting source systems that are independent of a local data warehouse? Figure 17.20 shows such a case.

In Figure 17.20, there is a global warehouse in which data can come from a local data warehouse, a local system, or from a system supporting global data by itself. In this case, there is a local European data warehouse, a local American data warehouse, and a worldwide global data warehouse. Any given data unit—either occurrence or data type—has its own unique source and its own ownership. Stated differently, any data unit in the global data warehouse has a single data source and a single data owner. In such a manner, a global data warehouse is built and maintained.

Other Important Issues in Enterprise-Wide Architecture

There are other important issues when it comes to considering an enterprise-wide architecture. One issue is data marts. Most normal data marts reside entirely inside a local business domain, but it is possible to have a global data mart. A global data mart is one in which the data is pulled from more than one enterprise data mart. The data warehouse feeding the global data mart could be a local or a global data mart. And, there could be any number of data warehouses feeding the data mart. A global data mart can create and analyze a truly global perspective of data.

Another issue is that of the programs that pass data from an owner to a sharer (or multiple sharers). These programs operate on a push basis, not a pull basis.

Summary

This chapter has addressed the issues of an enterprise-wide approach to data warehousing. We began by defining the following conventions:

Images   Business domain

Images   Ownership of data

Images   Sharing of data

Images   Local data warehouse

Images   Global data warehouse

These conventions and definitions all lead to a system of record for enterprisedata warehousing. After the system of record for enterprise data warehousing is created, a meaningful framework and architecture for building and establishing an enterprise framework is possible. Figure 17.21 depicts what the enterprise multiple data warehouse environment would look like.

image

Figure 17.21  What the enterprise multiple data warehouse environment looks like. The enterprise system of record surrounds all of the data warehouses and other systems in the enterprise.

Well, it is time to set sail on the sea of information and the destination is clear. Today’s business is quickly redefining itself from one in which products are targeted to the masses to one in which products are tailored to the customer. Unfortunately, today’s information systems were not designed for targeting products to the masses. What is needed is a comprehensive and adaptive information solution that can leverage these systems to quickly deliver on the evolving needs of the business. This solution must be able to quickly exploit best-of-breed technologies as they become available. Additionally, this solution must promote an iterative delivery strategy that evolves the information ecosystem while demonstrating incremental value to the business. The CIF presented in these pages is such a solution that has proven itself over time.

We hope that this book has helped you understand the potential promise of the CIF in supporting the evolving information needs of your business. Additionally, we hope this book will help you charter your course in its use and evolution.

Bon voyage!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.209.57