Chapter 2. Data Warehouse Concepts

In this chapter, we look briefly at how computing has changed its focus from operational to decisional concerns. We also define data warehousing concepts, and cite the typical reasons for building data warehouses.

Gradual Changes in Computing Focus

In retrospect, it is easy to see how computing has shifted its focus from operational to decisional concerns. The differences in operational and decisional information requirements presented new challenges that old computing practices could not meet. Below, we elaborate on how this change in computing focus became the impetus for the development of data warehousing technologies.

Early Computing Focused on Operational Requirements

The Business Cycle (depicted in Figure 2-1) shows us that any enterprise must operate at three levels: operational (i.e., the day-to-day running of the business); tactical (i.e., the definition of policy and the monitoring of operations); and strategic (i.e., the definition of organization's vision, goals and objectives).

In Chapter 1, we noted that much of the effort and money in computing has focused on meeting the operational business requirements of enterprises. After all, without the OLTP applications that record thousands, even millions, of discrete transactions each day, it would not be possible for any enterprise to meet customer needs while enforcing business policies consistently. Nor would it be possible for an enterprise to grow without significantly expanding its manpower base.

With operational systems deployed and day-to-day information needs being met by the OLTP systems, the focus of computing has over the recent years shifted naturally to meeting the decisional business requirements of an enterprise. Figure 2-1 illustrates the business cycle as we view it today.

Decisional Requirements Cannot Be Fully Anticipated

Unfortunately, it is not possible for IT professionals to anticipate the information requirements of an enterprise's decision-makers for the simple reason that their information needs and report requirements change as the business situation changes.

The Business Cycle

Figure 2-1. The Business Cycle

Decision-makers themselves cannot be expected to know their information requirements ahead of time; they review enterprise data from different perspectives and at different levels of detail to find and address business problems as the problems arise. Decision-makers also need to look through business data to identify opportunities that can be exploited. They examine performance trends to identify business situations that can provide competitive advantage, improve profits, or reduce costs. They analyze market data and make the tactical as well as strategic decisions that determine the course of the enterprise.

Operational Systems Fail to Provide Decisional Information

Since these information requirements cannot be anticipated, operational systems (which correctly focus on recording and completing different types of business transactions) are unable to provide decision-makers with the information they need. As a result, business managers fall back on the time-consuming, and often frustrating, process of going through operational inquiries or reports already supported by operational systems in an attempt to find or derive the information they really need. Alternatively, IT professionals are pressured to produce an ad hoc report from the operational systems as quickly as possible.

It will not be unusual for the IT professional to find that the data needed to produce the report are scattered throughout different operational systems and must first be carefully integrated. Worse, it's likely that the processing required to extract the data from each operational system will demand so much of the system resources that the IT professional must wait until nonoperational hours before running the queries required to produce the report.

These delays are not only time-consuming and frustrating both for the IT professionals and the decision-makers, they are dangerous for the enterprise. When the report is finally produced, the data may be inconsistent, inaccurate, or obsolete. There is also the very real possibility that this new report will trigger the request for another ad hoc report.

Decisional Systems Have Evolved to Meet Decisional Requirements

Over the years, decisional systems have been developed and implemented in the hope of meeting these information needs. Some enterprises have actually succeeded in developing and deploying data warehouses within their respective organizations, long before the term data warehouse even became fashionable.

Most decisional systems, however, have failed to deliver on their promises. This book introduces data warehousing technologies and shares lessons learned from the successes and failures of those who have been on the "bleeding edge."

The Data Warehouse Defined

What is a data warehouse? William H. Inmon in Building the Data Warehouse (QED Technical Publishing Group, 1992, ISBN: 0-89435-404-3) defines a data warehouse as "a collection of integrated, subject-oriented databases designed to supply the information required for decision-making."

A more thorough look at the above definition yields the following observations.

Integrated

A data warehouse contains data extracted from the many operational systems of the enterprise, possibly supplemented by external data. For example, a typical banking data warehouse will require the integration of data drawn from the deposit systems, loan systems, and the general ledger, just to name three.

Each of these operational systems records different types of business transactions and enforces the policies of the enterprise regarding these transactions. If each of the operational systems has been custom built or an integrated system was not implemented as a solution, then it is unlikely that these systems are integrated. Thus, Customer A in the deposit system and Customer B in the loan system may be one and the same person—but there is no automated way for anyone in the bank to know this. Customer relationships are managed informally through relationships with bank officers.

A data warehouse brings together data from the various operational systems to provide an integrated view of the customer and the full scope of his or her relationship with the bank.

Subject Oriented

Traditional operational systems focus on the data requirements of a department or division, producing the much-criticized "stovepipe" systems of most enterprises. With the advent of business process reengineering, enterprises began espousing process-centered teams and case workers. Modern operational systems, in turn, shifted their focus to the operational requirements of an entire business process and aim to support the execution of the business process from start to finish.

A data warehouse goes beyond traditional information views by focusing on enterprise-wide subjects such as customers, sales, and profits. These subjects span both organizational and process boundaries and require information from multiple sources to provide a complete picture.

Databases

Although the term data warehousing technologies is used to refer to the gamut of technology components that are required to plan, develop, manage, implement, and use a data warehouse, the term data warehouse itself refers to a large, read-only repository of data.

At the very heart of every data warehouse lie the large databases that store the integrated data of the enterprise, obtained from both internal and external data sources. The term internal data refers to all data that are extracted from the operational systems of the enterprise. External data are data provided by third-party organizations, including business partners, customers, government bodies, and organizations that choose to make a profit by selling their data (e.g., credit bureaus).

Also stored in the databases are the metadata that describe the contents of the data warehouse. A more thorough discussion of metadata their role in data warehousing is provided in Chapter 13.

Required for Decision-Making

Unlike the databases of operational systems, which are often normalized to preserve and maintain data integrity, a data warehouse is designed and structured in a denormalized manner to better support the usability of the data warehouse. Users are better able to examine, derive, summarize, and analyze data at various levels of detail, over different periods of time, when using a denormalized data structure.

The database is denormalized to mimic a business user's dimensional view of the business. For example, while a finance manager is interested in the profitability of the various products of a company, a product manager will be more interested in the sales of the product in the various sales regions. In data warehousing parlance, users need to "slice and dice" through different areas of the database at different levels of detail to obtain the information they need. In this manner, a decision-maker can start with a high-level view of the business, then drill down to get more detail on the areas that require his attention, or vice versa.

Each Unit of Data Is Relevant to a Point in Time

Every data warehouse will inevitably have a Time dimension; each data item (also called facts or measures) in the data warehouse is time-stamped to support queries or reports that require the comparison of figures from prior months or years.

The time-stamping of each fact also makes it possible for decision-makers to recognize trends and patterns in customer or market behavior over time.

A Data Warehouse Contains Both Atomic and Summarized Data

Data warehouses hold data at different levels of detail. Data at the most detailed level, i.e., the atomic level, are used to derive the summarized of aggregated values. Aggregates (presummarized data) are stored in the warehouse to speed up responses to queries at higher levels of granularity.

If the data warehouse stores data only at summarized levels, its users will not be able to drill down on data items to get more detailed information. However, the storage of very detailed data results in larger space requirements.

The Dynamic, Ad Hoc Report

The most ideal scenario for enterprise decision-makers (and for IT professionals) is to have a repository of data and a set of tools that will allow decision-makers to create their own set of dynamic reports. The term dynamic report refers to a report that can be quickly modified by its user to present either greater or less detail, without any additional programming required. Dynamic reports are the only kind of reports that provide true, ad hoc reporting capabilities. Figure 2-2 presents an example of a dynamic report.

The Dynamic Report—Summary View

Figure 2-2. The Dynamic Report—Summary View

The Dynamic Report—Detailed View

Figure 2-3. The Dynamic Report—Detailed View

A decision-maker should be able to start with a short report that summarizes the performance of the enterprise. When the summary calls attention to an area that bears closer inspection, the decision-maker should be able to point to that portion of the report, then obtain greater detail on it dynamically, on an as-needed basis, with no further programming. Figure 2-3 presents a detailed view of the summary shown in Figure 2-2.

By providing business users with the ability to dynamically view more or less of the data on an ad hoc, as-needed basis, the data warehouse eliminates delays in getting information and removes the IT professional from the report-creation loop.

The Purposes of a Data Warehouse

At this point, it is helpful to summarize the typical reasons enterprises undertake data warehousing initiatives.

To Provide Business Users with Access to Data

The data warehouse provides access to integrated enterprise data previously locked away in unfriendly, difficult-to-access environments. Business users can now establish, with minimal effort, a secure connection to the warehouse through their desktop PC. Security is enforced either by the warehouse front-end application, by the server database, or both.

Because of its integrated nature, a data warehouse spares business users from the need to learn, understand, or access operational data in their native environments and data structures.

To Provide One Version of the Truth

The data in the data warehouse are consistent and quality assured before being released to business users. Since a common source of information is now used, the data warehouse puts to rest all debates about the veracity of data used or cited in meetings. The data warehouse becomes the common information resource for decisional purposes throughout the organization.

Note that "one version of the truth" is often possible only after much discussion and debate about the terms used within the organization. For example, the term customer can have different meanings to different people—it is not unusual for some people to refer to prospective clients as "customers," while others in the same organization may use the term "customers" to mean only actual, current clients.

While these differences may seem trivial at first glance, the subtle nuances that exist depending on the context may result in misleading numbers and ill-informed decisions. For example, when the Western Region sales manager asks for the number of customers, he probably means the "number of customers from the Western Region," not the "number of customers served by the entire company."

To Record the Past Accurately

Many of the figures and numbers that managers receive have little meaning unless compared to historical figures. For example, reports that compare the company's performance now against its performance last year are quite common. Reports that show the company's performance for the same month over the past three years are likewise of interest to decision-makers.

The operational systems will not be able to meet this kind of information need, and for a good reason. A data warehouse should be used to record the past accurately, leaving the OLTP systems free to focus on correctly recording current transactions and balances. Actual historical values are not stored on the operational system nor derived by adding or subtracting transaction values against the latest balance. Instead, historical data are loaded and integrated with other data in the warehouse for quick access.

To Slice and Dice Through Data

As stated earlier in this chapter, dynamic reports allow users to view warehouse data from different angles, at different levels of detail. Business users with the means and the ability to slice and dice through warehouse data can actively meet their own information needs.

The ready availability of different data views also improves business analysis by reducing the time and effort required to collect, format, and distill information from data.

To Separate Analytical and Operational Processing

Decisional processing and operational information processing have totally divergent architectural requirements. Attempts to meet both decisional and operational information needs through the same system or through the same system architecture merely increase the brittleness of the IT architecture and will create system maintenance nightmares.

Data warehousing disentangles analytical from operational processing by providing a separate system architecture for decisional implementations. This makes the overall IT architecture of the enterprise more resilient to changing requirements.

To Support the Reengineering of Decisional Processes

At the end of each BPR initiative come the projects required to establish the technological and organizational systems to support the newly reengineered business process.

Although reengineering projects have traditionally focused on operational processes, data warehousing technologies make it possible toreengineer decisional business processes as well. Data warehouses,with their focus on meeting decisional business requirements, are the ideal systems for supporting reengineered decisional business processes.

A Word About Data Marts

A discussion of data warehouses is not complete without a note on data marts. Unlike data warehouses, which contain large quantities of data from key operational systems in an enterprise, a data mart typically contains only a subset of the data that would have been stored in an enterprise data warehouse. Data mart data are selected to meet the specific needs of a subset of the organization. It is not unusual to find a data mart developed and implemented for a department, a division, or a geographical location.

Data marts are often preferred by enterprises as a first step to building a data warehouse, since these can be used as a "proof of concept." Initial success with the data mart can be used to convince skeptics in the enterprise and loosen the enterprise's purse strings.

A number of misconceptions exist about data marts and their relationships to data warehouses. We discuss two of those misconceptions below.

Misconception: Data Warehouses and Data Marts Cannot Coexist

There are parties who strongly advocate the deployment of data marts as opposed to the deployment of data warehouses. They correctly point out the difficulties of building an enterprisewide data warehouse in one large project and lead unsuspecting organizations down the "data mart vs. data warehouse" path.

What many do not immediately realize is that data warehouses and data marts can coexist within the same organization; the correct approach is "data mart AND data warehouse." We discuss this subject more thoroughly in the Warehousing Architectures section of Chapter 5.

Misconception: Data Marts Can Be Built Independently of One Another

Some enterprises find it easier to deploy multiple data marts independently of one another. At first glance, such an approach is indeed easier since there are no integration issues. Different groups of users are involved with each data mart, which implies fewer conflicts about the use of terms and about business rules. Each data mart is free to exist within its own isolated world, and all the users are happy.

Unfortunately, what enterprises fail to realize until much later is that by deploying one isolated data mart after another, the enterprise has actually created new islands of automation. And while at the onset these data marts are certainly easier to develop, the task of maintaining many unrelated data marts is exceedingly complex and will create data management, synchronization, and consistency issues. Each data mart presents its own version of "the truth" and will quite naturally provide information that conflicts with the reports from other data marts.

Multiple data marts are definitely appropriate within an organization, but these should be implemented only under the integrating framework of an enterprise-wide data warehouse. Each data mart is developed as an extension of the data warehouse and is fed by the data warehouse. The data warehouse enforces a consistent set of business rules and ensures the consistent use of terms and definitions.

A Word About Operational Data Stores

Data warehouse discussions will also naturally lead to Operational Data Stores, which at first glance may appear no different from data warehouses.

Although both technologies support decisional information needs of enterprise decision-makers, the two are distinctly different and are deployed to meet different types of decisional information needs.

Definition of Operational Data Stores

In Building the Operational Data Store (John Wiley & Sons, 1996, ISBN: 0-471-12822-8), W. H. Inmon, C. Imhoff, and G. Battas define an Operational Data Store as "the architectural construct where collective integrated operational data is stored." ODS can also be defined as a collection of integrated databases designed to support operational monitoring. Unlike the databases of OLTP applications (that are operational or function oriented), the Operational Data Store contains subject-oriented, enterprise-wide data. However, unlike data warehouses, the data in Operational Data Stores are volatile, current, and detailed. The ODS provides an integrated view of the data in the operational systems.

Table 2-1 compares the data warehouse with the Operational Data Store.

Table 2-1. Data Warehouses vs. Operational Data Stores

  DW ODS
Purpose Strategic Decision SupportOperational Monitoring
Similarities Integrated DataIntegrated Data
 Subject-OrientedSubject-Oriented
Differences Static DataVolatile Data
 Historical DataCurrent Data
 Summarized DataMore Detailed

Data are transformed and integrated into a consistent, unified whole as they are obtained from legacy and other operational systems to provide business users with an integrated and current view of operations. Data in the Operational Data Store are constantly refreshed so that the resulting image reflects the latest state of operations.

Flash Monitoring and Reporting Tools

As mentioned in Chapter 1, flash monitoring and reporting tools are like a dashboard that provides meaningful online information on the operational status of the enterprise. This service is achieved by the use of ODS data as inputs to the flash monitoring and reporting tools, to provide business users with a constantly refreshed, enterprise-wide view of operations without creating unwanted interruptions or additional load on transaction-processing systems. Figure 2-4 diagrams how this scheme works.

Relationship of Operational Data Stores to Data Warehouse

Enterprises with Operational Data Stores find themselves in the enviable position of being able to deploy data warehouses with considerable ease. Since operational data stores are integrated, many of the issues related to extracting, transforming, and transporting data from legacy systems have been addressed by the ODS, as illustrated in Figure 2-5.

Opertional Monitoring

Figure 2-4. Opertional Monitoring

The Opertional Data Store Feeds the Data Warehouse

Figure 2-5. The Opertional Data Store Feeds the Data Warehouse

The data warehouse is populated by means of regular snapshots of the data in the Operational Data Store. However, unlike the ODS, the data warehouse maintains the historical snapshots of the data for comparisons across different time frames. The ODS is free to focus only on the current state of operations and is constantly updated in real time.

Data Warehouse Cost-Benefit Analysis / Return on Investment

Senior management typically requires a cost-benefit analysis (CBA) or a study of return on investment (ROI) prior to embarking on a data warehousing initiative. Although the task of calculating ROI for data warehousing initiatives is unique to each enterprise, it is possible to classify the type of benefits and costs that are associated with data warehousing.

Benefits

Data warehousing benefits can be expected from the following areas:

  • Redeployment of staff assigned to old decisional systems. . The cost of producing today's management reports is typically undocumented and unknown within an enterprise. The quantification of such costs in terms of staff hours and erroneous data may yield surprising results. Benefits of this nature, however, are typically minimal, since warehouse maintenance and enhancements require staff as well. At best, staff will be redeployed to more productive tasks.

  • Improved productivity of analytical staff due to availability of data. . Analysts go through several steps in their day-to-day work: locating data, retrieving data, analyzing data to yield information, presenting information, and recommending a course of action. Unfortunately, much of the time (sometimes up to 40 percent) spent by enterprise analysts on a typical day is devoted to locating and retrieving data. The availability of integrated, readily accessible data (in the data warehouse) should significantly reduce the time that analysts spend with data collection tasks and increase the time they have available to actually analyze the data they have collected. This leads either to shorter decision cycle times or improvements in the quality of the analysis.

  • Business improvements resulting from analysis of warehouse data. . The most significant business improvements in warehousing result from the analysis of warehouse data, especially if the easy availability of information yields insights heretofore unknown to the enterprise. The goal of the data warehouse is to meet decisional information needs; it therefore follows naturally that the greatest benefits of warehousing are obtained when decisional information needs are actually met and sound business decisions are made both at the tactical and strategic level. Understandably, such benefits are the most significant and, therefore, the most difficult to project and the most difficult to quantify.

Costs

Data warehousing costs typically fall into one of four categories. These are:

  • Hardware. . This item refers to the costs associated with setting up the hardware and operating environment required by the data warehouse. In many instances, this setup may require the acquisition of new equipment or the upgrade of existing equipment. Larger warehouse implementations naturally imply higher hardware costs.

  • Software. . This item refers to the costs of purchasing the licenses to use software products that automate the extraction, cleansing, loading, retrieval, and presentation of warehouse data.

  • Services. . This item refers to services provided by systems integrators, consultants, and trainers during the course of a data warehouse project. Enterprises typically rely more on the services of third parties in early warehousing implementations, when the technology is still quite new to the enterprise.

  • Internal staff costs. . This item refers to costs incurred by assigning internal staff to the data warehousing effort, as well as costs associated with training internal staff on new technologies and techniques.

ROI Considerations

The costs and benefits associated with data warehousing vary significantly from one enterprise to another. The differences are chiefly influenced by

  • the current state of technology within the enterprise;

  • the culture of the organization in terms of decision-making styles and attitudes towards technology; and

  • the company's position in its chosen market vs. its competitors.

The effect of data warehousing on the tactical and strategic management of an enterprise is often likened to cleaning the muddy windshield of a car. It is difficult to quantify the value of driving a car with a cleaner windshield. Similarly, it is difficult to quantify the value of managing your organization with better information and insight.

Lastly, it is important to note that data warehouse justification is often complicated by the fact that much of the benefit may take some time to realize and therefore is difficult to quantify in advance.

In Summary

Data warehousing technologies have evolved as a result of the unsatisfied decisional information needs of enterprises. With the increased stability of operational systems, information technology professionals have increasingly turned their attention to meeting the decisional requirements of the enterprise.

A data warehouse, according to Bill Inmon, is a collection of integrated, subject-oriented databases designed to supply the information required for decision-making. Each data item in the data warehouse is relevant to some moment in time.

A discussion of data warehouses is incomplete without a word on data marts. A data mart has traditionally been defined as a subset of the enterprise-wide data warehouse. Many enterprises, upon realizing the complexity involved in deploying a data warehouse, will opt to deploy data marts instead. Although data marts are able to meet the immediate needs of a targeted group of users, the enterprise should shy away from deploying multiple, unrelated data marts. The presence of such islands of information will only result in data management and synchronization problems.

A discussion of data warehouses is likewise incomplete without a discussion of Operational Data Stores. Like data warehouses, Operational Data Stores are integrated and subject-oriented. However, an ODS is always current and is constantly updated (ideally in real time). The Operational Data Store is the ideal data source for a data warehouse, since it already contains integrated operational data as of a given point in time.

Although data warehouses have proven to have significant returns on investment, particularly when they are meeting a specific, targeted business need, it is extremely difficult to quantify the expected benefits of a data warehouse. The costs are easier to calculate, as these break down simply into hardware, software, services, and in-house staffing costs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.153.38