Chapter 5 Fluidity of the DW 2.0 technology infrastructure
There are many important aspects to the DW 2.0 architecture for the next-generation data warehouse—the recognition of the life cycle of data, the inclusion of unstructured data, and the inclusion of metadata as essential components. But one of the most challenging aspects of the DW 2.0 architecture is the placement of DW 2.0 into an environment in which the technology can be changed as fast as the business changes.
From the end user’s perspective fluidity is very important because the business climate is always changing. At one moment a corporation needs to focus on profits. This means raising prices and expanding sales. The next year the corporation needs to focus on expenses. This means lowering expenses and stopping expansion. The next year the company needs to focus on new products and new channels of revenue. As each change in the business climate occurs, there is a need for new types of information. Furthermore, as the competitive, technological, and economic climate changes, there are constant new demands for information.
If a data warehouse is built on top of technology that is difficult to change, then the corporation will not be able to adapt its technology to the business climate. And that means that as important as the data warehouse can be to a corporation, it will always be less than optimal in terms of its value.
In first-generation data warehouses, the technology that housed the data resided on traditional information-processing technologies. As such, the data warehouse was cast in stone. Making significant changes to the data warehouse became very difficult to do. DW 2.0 recognizes and responds to the problems associated with casting the data warehouse in stone.
FIGURE 5.1 Business requirements are constantly changing, whereas the technology infrastructure is rooted in concrete.
The story begins with Figure 5.1.
Figure 5.1 characterizes the ever-changing nature of business requirements. Constantly changing business requirements are simply inevitable—as inevitable as death and taxes. The only difference from one organization to the next is the rate and the extent of change.
Sitting beneath the business is an infrastructure of technology. The problem arises as business requirements change. Because of the effort required to make changes to the technological infrastructure, the business is always ahead of the technical foundation that is supposed to support it.
There are lots of good reasons for the intransigency of the technical infrastructure. At the heart of the matter is the popular notion among systems vendors that when a definition of technology is given to their technology, that definition is permanent. This fundamental notion is revealed in many situations:
There are many more examples of the erroneous assumption that once requirements are defined, no more requirements will ever arise.
Figure 5.2 illustrates this assumption.
But requirements are constantly changing and mutating. Consider the simple diagram in Figure 5.3.
The red diamond in Figure 5.3 indicates a change in business requirements. The blue box indicates that IT has adjusted the technology infrastructure in response to the change in business requirements. The black dotted line represents the length of time from the moment when a business requirement has changed until the time when IT has completed the necessary changes. The black dotted line is often very, very long. The situation depicted in Figure 5.3 is ubiquitous.
Next consider what happens when business changes come in faster than IT can respond. Figure 5.4 shows this circumstance.
Figure 5.4 depicts how business requirements change faster than the rate at which IT can respond to change. When the first change is identified, IT starts to design, plan, and build. But before they can finish, another set of business requirements starts to emerge. This set of new requirements has its own life cycle. A different group of people start to work on the new requirements. Things become complicated when both groups of people need to work on and change the same data and processes. To make matters worse, another new set of business requirements comes in before the first or second IT infrastructure changes are done. And things get to be really complicated when the first, the second, and the third groups of people all need to be working on the same data and the same process at the same time.
A great mess ensues.
What often happens is that the organization finds itself trapped in a vicious cycle. An eternal treadmill is created by the fact that new and changed business requirements are generated by the business faster than IT can respond. Figure 5.5 depicts this treadmill.
The long-term effect of the treadmill in Figure 5.5 is that the IT department is perceived as not being responsive to the business of the organization. Business and IT are perceived as heading in different directions.
FIGURE 5.4 What happens when the rate of needed change is faster than the ability to make those changes.
Figure 5.6 shows this perceived divergence.
So what can be done about this dilemma? There are several possible solutions:
In fact, only the third option is viable over the long haul.
Figure 5.7 suggests that reducing the length of time required for IT to respond to change is the only real alternative.
It is one thing to say that IT’s response time for the accommodation of change must be shortened. Determining exactly how that should be done is another thing altogether.
One of the best and most effective ways to reduce the amount of time it takes IT to adapt the technology infrastructure to ongoing business changes lies in a most unlikely place—data that is semantically temporal and data that is semantically static. Figure 5.8 represents these two types of data.
The yellow box indicates data that is semantically temporal. The green box indicates data that is semantically static.
What is meant by semantically static and semantically temporal? Data can change in one of two ways. The content of data may change. For example, my bank account balance may go from $5000 to $7500. This is an example of the meaningful content of data changing. But there is another fundamental kind of change—semantic data change. Semantic change occurs when the definitions of data change, not when the content of data changes. As a simple example of semantic change, suppose the definition of an account holder’s data is created. The definition includes such things as:
This data is defined in the system when the example banking application is initially designed, built, and deployed.
Then modern times arrive and it is recognized that there are other types of data that should have been included with the account holder data. Perhaps the following types of data now need to be added to the definition of account holder:
The addition of the new data elements is a semantic change.
Data can change either in content or in semantics. The remainder of the chapter addresses semantic change, not content change.
Semantically temporal data is data that is likely to undergo semantic change. Some forms of semantic data are particularly notorious for frequently changing. Some of these semantically unstable types of data are shown in Figure 5.9.
Organization charts change with stunning frequency. Every new manager thinks that it is his/her job to reorganize the company. Sales territories are constantly reshuffled. Sales managers are constantly debating where Ohio fits—in the Eastern region or the mid-Western region. One manager wants Ohio in one place, and another manager wants Ohio in another place.
There are many other forms of data whose semantics are constantly in an uproar. Data is semantically temporal wherever there is a likelihood that the semantics of data will change.
The reverse of semantically unstable data is semantically stable data. Semantically stable data is static data—data whose semantics are likely to remain stable for a long time. Basic sales data is a good example of semantically stable data.
Figure 5.10 depicts some semantically stable data.
Basic sales data typically includes information such as:
While this basic sales data is certainly applicable today, it is probably fair to assume that merchants in the markets of ancient Rome were interested in exactly the same data as were merchants in the Chinese markets of Peking 4000 years ago, as is Wal-Mart today.
The truth is that this basic data is fundamental and was of interest long before there ever were computers. And it is predictable that this basic data will be of interest in 2100, just as it is today.
All of this leads to the conclusion that semantically stable data exists. It is called static data here.
So how do systems designers and data base designers treat semantically static data and semantically temporal data? They pay no attention to it at all. The semantics of data are not a major consideration in data base design. As a direct consequence, semantically static data and semantically temporal data are typically freely mixed at the point of database design.
Figure 5.11 shows the result of freely mixing semantically static data and semantically temporal data.
The top line of symbols in Figure 5.11 represents the constant change in business requirements over time. Every time business requirements change, the technical infrastructure that supports the business has to change. Semantically static and semantically temporal data are common components of the supporting technical infrastructure that must be adapted to constantly changing business requirements. Therefore, mixing semantically static and semantically temporal data together is a recipe for trouble.
Figure 5.12 shows that when semantically static and semantically temporal data are mixed, change is difficult to accommodate.
There are lots of good reasons there is such an upheaval whenever change occurs to data that has been mixed together. The most important reason is that a data conversion must be done. Consider what happens to semantically static data when change occurs. The semantically static data must be converted and reorganized even though nothing has happened that alters the actual content of the data. This situation is exacerbated by the fact that organizations typically have a lot of semantically stable data. And there are lots of other reasons change wreaks havoc on semantically static and semantically temporal data when they are mixed together.
So the question naturally arises, what would happen if semantically static data and semantically temporal data were separated? Figure 5.13 depicts this design practice.
When semantically static data and semantically temporal data are separated, the devastation usually caused by changing business requirements is mitigated, as depicted in Figure 5.14.
Even though the phenomenon shown by Figure 5.14 is true, it is not at all intuitive why it is the case. There are several good reasons the separation of semantically static data and semantically temporal data has the very beneficial effect of insulating the IT technological infrastructure from constantly changing business requirements. Consider changing business requirements and semantically static data side by side.
Figure 5.15 shows that semantically static data is not affected much or at all by changes in business requirements. Semantically stable data is by definition and nature semantically stable data under any set of business requirements.
Now consider what happens to semantically temporal data when change occurs.
When semantically temporal data needs to be changed, no change is made at all. Instead a new snapshot of the semantics is created. Creating a new snapshot of semantics is much easier to do than opening up a database to convert and/or change the data it contains. Therefore, when business change occurs, just a new snapshot is made of semantically temporal data (Figure 5.16).
Figure 5.17 shows what happens over time to semantically temporal data as business requirements change.
Over time, a series of snapshots is made. Each snapshot is delimited by time—each snapshot has a to date and a from date. To determine which semantic definition is appropriate, the query has to be qualified by time, which is natural to do with any query.
Figure 5.17 shows that by taking snapshots of the new semantics of temporal data rather than trying to convert older data, managing change over time becomes a very easy thing to do.
There is an interesting side benefit of managing change to semantically temporal data this way. That benefit is that a historical record of the semantically temporal data is created. This record is seen in Figure 5.18.
The value of historical records of the semantics of data is highlighted by the following example. Consider the information needed by an analyst interested in examining the changes that have been made to a company’s organization chart over time. Suppose the analyst particularly wishes to look at the company’s organization chart as it existed in 1990. With a historical record of the semantic data changes that have occurred over time, the analyst can easily locate and retrieve the firm’s 1990 organization chart using the from and to dates included on every snapshot taken of the company’s semantically temporal data.
When semantically static data and semantically temporal data are separated, and those forms of data are used as a basis for technological infrastructure, organizations can gracefully withstand change over time. The upheaval of systems that is caused by business change is mitigated, as depicted in Figure 5.19.
The next logical question is how to create such a division of data. The answer is that semantically static and semantically temporal data should be physically separate in all future database designs. Failing that, there are technologies that manage the DW 2.0 infrastructure as described.
Figure 5.20 represents how infrastructure management software manages the DW 2.0 data infrastructure holistically.
The business user lives not in the world of technology, but in the world of business. And whatever else you can say about business, it is a fact that business changes. In some cases the changes are slower and in other cases the changes are faster. But change is a way of life to the business person.
The economy changes, legislation changes, new products come and go, competition changes, and so forth.
The business person needs to be able to have information adapt to those changes. If the information infrastructure does not adapt to the changes then it becomes a millstone around the neck of the business user. Information becomes a liability rather than an asset. It is only when information is truly agile that it becomes a business asset.
FIGURE 5.19 The effect of separating temporal data from static data in the face of changing business requirements.
The end user does not need to know what is going on “underneath the covers.” The end user looks at the information infrastructure in the same way as most drivers look at their automobile. Most drivers know that there is an engine. Most drivers know that there is a need for gas and oil. But most drivers do not have a clue as to the inner workings of the engine. The best that can be said for most drivers is that when there is a malfunction beneath the hood of the automobile, the driver heads for a garage or repair station.
The same is true for the business analyst and DW 2.0. The business analyst is aware of the existence of DW 2.0. But the he/she does not know the detailed underpinnings of the infrastructure. All the business analyst knows is that when something goes amiss, it is time to find a data architect who does understand the underpinnings of DW 2.0.
The foundation of technology that DW 2.0 is built upon needs to be able to change. When the technological infrastructure is immutable, the organization soon has business requirements that are not reflected in the data warehouse environment. Furthermore, the longer it takes to add new requirements to a data warehouse, the bigger and more intractable the problem of adapting the data warehouse to business change becomes.
There are two ways to create a technological infrastructure for the data warehouse that can change over time. One approach is to use technology that is designed for that purpose. Another approach is to separate semantically static data from semantically temporal data. By separating the semantically different types of data, the impact of change is mitigated.
18.216.172.229