Preface

Data warehousing has been around for about 2 decades now and has become an essential part of the information technology infrastructure. Data warehousing originally grew in response to the corporate need for information—not data. A data warehouse is a construct that supplies integrated, granular, and historical data to the corporation.

But there is a problem with data warehousing. The problem is that there are many different renditions of what a data warehouse is today. There is the federated data warehouse. There is the active data warehouse. There is the star schema data warehouse. There is the data mart data warehouse. In fact there are about as many renditions of the data warehouse as there are software and hardware vendors.

The problem is that there are many different renditions of what the proper structure of a data warehouse should look like. And each of these renditions is architecturally very different from the others. If you were to enter a room in which a proponent of the federated data warehouse was talking to a proponent of the active data warehouse, you would be hearing the same words, but these words would be meaning very different things. Even though the words were the same, you would not be hearing meaningful communication. When two people from very different contexts are talking, even though they are using the same words, there is no assurance that they are understanding each other.

And thus it is with first-generation data warehousing today.

Into this morass of confusion as to what a data warehouse is or is not comes DW 2.0. DW 2.0 is a definition of the next generation of data warehousing. Unlike the term “data warehouse,” DW 2.0 has a crisp, well-defined meaning. That meaning is identified and defined in this book.

There are many important architectural features of DW 2.0. These architectural features represent an advance in technology and architecture beyond first-generation data warehouses. The following are some of the important features of DW 2.0 discussed in this book:

The life cycle of data within the data warehouse is recognized. First-generation data warehouses merely placed data on disk storage and called it a warehouse. The truth of the matter is that data—once placed in a data warehouse—has its own life cycle. Once data enters the data warehouse it starts to age. As it ages, the probability of access diminishes. The lessening of the probability of access has profound implications on the technology that is appropriate to the management of the data. Another phenomenon that happens is that as data ages, the volume of data increases. In most cases this increase is dramatic. The task of handling large volumes of data with a decreasing probability of access requires special design considerations lest the cost of the data warehouse become prohibitive and the effective use of the data warehouse becomes impractical.
The data warehouse is most effective when containing both structured and unstructured data. Classical first-generation data warehouses consisted entirely of transaction-oriented structured data. These data warehouses provided a great deal of useful information. But a modern data warehouse should contain both structured and unstructured data. Unstructured data is textual data that appears in medical records, contracts, emails, spreadsheets, and many other documents. There is a wealth of information in unstructured data. But unlocking that information is a real challenge. A detailed description of what is required to create the data warehouse containing both structured and unstructured data is a significant part of DW 2.0.
For a variety of reasons metadata was not considered to be a significant part of first-generation data warehouses. In the definition of second-generation data warehouses, the importance and role of metadata are recognized. In the world of DW 2.0, the issue is not the need for metadata. There is, after all, metadata that exists in DBMS directories, in business objects universes, in ETL tools, and so forth. What is needed is enterprise metadata, where there is a cohesive enterprise view of metadata. All of the many sources of metadata need to be coordinated and placed in an environment where they work together harmoniously. In addition, there is a need for the support of both technical metadata and business metadata in the DW 2.0 environment.
Data warehouses are ultimately built on a foundation of technology. The data warehouse is shaped around a set of business requirements, usually reflecting a data model. Over time the business requirements of the organization change. But the technical foundation underlying the data warehouse does not easily change. And therein lies a problem—the business requirements are constantly changing but the technological foundation is not changing. The stretch between the changing business environment and the static technology environment causes great tension in the organization. In this section of the book, the discussion focuses on two solutions to the dilemma of changing business requirements and static technical foundations of the data warehouse. One solution is software such as Kalido that provides a malleable technology foundation for the data warehouse. The other solution is the design practice of separating static data and temporal data at the point of data base definition. Either of these approaches has the very beneficial effect of allowing the technical foundation of the data warehouse to change while the business requirements are also changing.

There are other important topics addressed in this book. Some of the other topics that are addressed include the following:

Online update in the DW 2.0 data warehouse infrastructure.
The ODS. Where does it fit?
Research processing and statistical analysis against a DW 2.0 data warehouse.
Archival processing in the DW 2.0 data warehouse environment.
Near-line processing in the DW 2.0 data warehouse environment.
Data marts and DW 2.0.
Granular data and the volumes of data found in the data warehouse.
Methodology and development approaches.
Data modeling for DW 2.0.

An important feature of the book is the diagram that describes the DW 2.0 environment in its entirety. The diagram—developed through many consulting, seminar, and speaking engagements—represents the different components of the DW 2.0 environment as they are placed together. The diagram is the basic architectural representation of the DW 2.0 environment.

This book is for the business analyst, the information architect, the systems developer, the project manager, the data warehouse technician, the data base administrator, the data modeler, the data administrator, and so forth. It is an introduction to the structure, contents, and issues of the future path of data warehousing.

March 29, 2007
WHI
DS
EN

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.166.99