Chapter 5 Fluidity of the DW 2.0 technology infrastructure

There are many important aspects to the DW 2.0 architecture for the next-generation data warehouse—the recognition of the life cycle of data, the inclusion of unstructured data, and the inclusion of metadata as essential components. But one of the most challenging aspects of the DW 2.0 architecture is the placement of DW 2.0 into an environment in which the technology can be changed as fast as the business changes.

From the end user’s perspective fluidity is very important because the business climate is always changing. At one moment a corporation needs to focus on profits. This means raising prices and expanding sales. The next year the corporation needs to focus on expenses. This means lowering expenses and stopping expansion. The next year the company needs to focus on new products and new channels of revenue. As each change in the business climate occurs, there is a need for new types of information. Furthermore, as the competitive, technological, and economic climate changes, there are constant new demands for information.

If a data warehouse is built on top of technology that is difficult to change, then the corporation will not be able to adapt its technology to the business climate. And that means that as important as the data warehouse can be to a corporation, it will always be less than optimal in terms of its value.

In first-generation data warehouses, the technology that housed the data resided on traditional information-processing technologies. As such, the data warehouse was cast in stone. Making significant changes to the data warehouse became very difficult to do. DW 2.0 recognizes and responds to the problems associated with casting the data warehouse in stone.

images

FIGURE 5.1 Business requirements are constantly changing, whereas the technology infrastructure is rooted in concrete.

The story begins with Figure 5.1.

Figure 5.1 characterizes the ever-changing nature of business requirements. Constantly changing business requirements are simply inevitable—as inevitable as death and taxes. The only difference from one organization to the next is the rate and the extent of change.

THE TECHNOLOGY INFRASTRUCTURE

Sitting beneath the business is an infrastructure of technology. The problem arises as business requirements change. Because of the effort required to make changes to the technological infrastructure, the business is always ahead of the technical foundation that is supposed to support it.

There are lots of good reasons for the intransigency of the technical infrastructure. At the heart of the matter is the popular notion among systems vendors that when a definition of technology is given to their technology, that definition is permanent. This fundamental notion is revealed in many situations:

by DBMS vendors, when the structure of the data is defined at the start of a project;
by compilers, who have the notion that once processing and algorithms have been specified, that is the way they are going to remain indefinitely;
by business intelligence vendors who think that once a query is written the same query will work way into the future;
by management that thinks that when they make a lease or a long-term commitment the problem is solved and will not mutate into something else.

There are many more examples of the erroneous assumption that once requirements are defined, no more requirements will ever arise.

Figure 5.2 illustrates this assumption.

images

FIGURE 5.2 Some of the many reasons the technology infrastructure is so difficult to change.

images

FIGURE 5.3 The length of time that is required to make changes to the IT infrastructure.

But requirements are constantly changing and mutating. Consider the simple diagram in Figure 5.3.

The red diamond in Figure 5.3 indicates a change in business requirements. The blue box indicates that IT has adjusted the technology infrastructure in response to the change in business requirements. The black dotted line represents the length of time from the moment when a business requirement has changed until the time when IT has completed the necessary changes. The black dotted line is often very, very long. The situation depicted in Figure 5.3 is ubiquitous.

RAPID BUSINESS CHANGES

Next consider what happens when business changes come in faster than IT can respond. Figure 5.4 shows this circumstance.

Figure 5.4 depicts how business requirements change faster than the rate at which IT can respond to change. When the first change is identified, IT starts to design, plan, and build. But before they can finish, another set of business requirements starts to emerge. This set of new requirements has its own life cycle. A different group of people start to work on the new requirements. Things become complicated when both groups of people need to work on and change the same data and processes. To make matters worse, another new set of business requirements comes in before the first or second IT infrastructure changes are done. And things get to be really complicated when the first, the second, and the third groups of people all need to be working on the same data and the same process at the same time.

A great mess ensues.

THE TREADMILL OF CHANGE

What often happens is that the organization finds itself trapped in a vicious cycle. An eternal treadmill is created by the fact that new and changed business requirements are generated by the business faster than IT can respond. Figure 5.5 depicts this treadmill.

The long-term effect of the treadmill in Figure 5.5 is that the IT department is perceived as not being responsive to the business of the organization. Business and IT are perceived as heading in different directions.

images

FIGURE 5.4 What happens when the rate of needed change is faster than the ability to make those changes.

images

FIGURE 5.5 IT is on a treadmill that it can never get off of.

Figure 5.6 shows this perceived divergence.

GETTING OFF THE TREADMILL

So what can be done about this dilemma? There are several possible solutions:

Freeze business requirements: Unfortunately freezing business requirements is the equivalent to sticking one’s head in the sand at the first hint of problems. It simply is not an acknowledgment of reality.
Add IT resources: Throwing more IT people into the fray is expensive and often simply is not effective. (See The Mythical Man Month by Fred Brooks).
Shorten IT response time: Reducing the length of time it takes IT respond to new and changing business requirements is often the only alternative.
images

FIGURE 5.6 IT and business are seen as going in divergent directions.

In fact, only the third option is viable over the long haul.

REDUCING THE LENGTH OF TIME FOR IT TO RESPOND

Figure 5.7 suggests that reducing the length of time required for IT to respond to change is the only real alternative.

It is one thing to say that IT’s response time for the accommodation of change must be shortened. Determining exactly how that should be done is another thing altogether.

images

FIGURE 5.7 The only realistic plan is to shorten the length of time required for IT to respond to business changes.

SEMANTICALLY TEMPORAL, SEMANTICALLY STATIC DATA

One of the best and most effective ways to reduce the amount of time it takes IT to adapt the technology infrastructure to ongoing business changes lies in a most unlikely place—data that is semantically temporal and data that is semantically static. Figure 5.8 represents these two types of data.

images

FIGURE 5.8 In classical data base design, temporal data is freely mixed with static data.

The yellow box indicates data that is semantically temporal. The green box indicates data that is semantically static.

What is meant by semantically static and semantically temporal? Data can change in one of two ways. The content of data may change. For example, my bank account balance may go from $5000 to $7500. This is an example of the meaningful content of data changing. But there is another fundamental kind of change—semantic data change. Semantic change occurs when the definitions of data change, not when the content of data changes. As a simple example of semantic change, suppose the definition of an account holder’s data is created. The definition includes such things as:

Account ID
Account holder name
Account holder address
Account holder birth date

This data is defined in the system when the example banking application is initially designed, built, and deployed.

Then modern times arrive and it is recognized that there are other types of data that should have been included with the account holder data. Perhaps the following types of data now need to be added to the definition of account holder:

Cell phone number
Fax number
Email address

The addition of the new data elements is a semantic change.

Data can change either in content or in semantics. The remainder of the chapter addresses semantic change, not content change.

images

FIGURE 5.9 Temporal data.

SEMANTICALLY TEMPORAL DATA

Semantically temporal data is data that is likely to undergo semantic change. Some forms of semantic data are particularly notorious for frequently changing. Some of these semantically unstable types of data are shown in Figure 5.9.

Organization charts change with stunning frequency. Every new manager thinks that it is his/her job to reorganize the company. Sales territories are constantly reshuffled. Sales managers are constantly debating where Ohio fits—in the Eastern region or the mid-Western region. One manager wants Ohio in one place, and another manager wants Ohio in another place.

There are many other forms of data whose semantics are constantly in an uproar. Data is semantically temporal wherever there is a likelihood that the semantics of data will change.

SEMANTICALLY STABLE DATA

The reverse of semantically unstable data is semantically stable data. Semantically stable data is static data—data whose semantics are likely to remain stable for a long time. Basic sales data is a good example of semantically stable data.

Figure 5.10 depicts some semantically stable data.

Basic sales data typically includes information such as:

Date of sale
Amount of sale
Item sold
Purchaser name

While this basic sales data is certainly applicable today, it is probably fair to assume that merchants in the markets of ancient Rome were interested in exactly the same data as were merchants in the Chinese markets of Peking 4000 years ago, as is Wal-Mart today.

The truth is that this basic data is fundamental and was of interest long before there ever were computers. And it is predictable that this basic data will be of interest in 2100, just as it is today.

All of this leads to the conclusion that semantically stable data exists. It is called static data here.

So how do systems designers and data base designers treat semantically static data and semantically temporal data? They pay no attention to it at all. The semantics of data are not a major consideration in data base design. As a direct consequence, semantically static data and semantically temporal data are typically freely mixed at the point of database design.

images

FIGURE 5.10 Static data.

MIXING SEMANTICALLY STABLE AND UNSTABLE DATA

Figure 5.11 shows the result of freely mixing semantically static data and semantically temporal data.

The top line of symbols in Figure 5.11 represents the constant change in business requirements over time. Every time business requirements change, the technical infrastructure that supports the business has to change. Semantically static and semantically temporal data are common components of the supporting technical infrastructure that must be adapted to constantly changing business requirements. Therefore, mixing semantically static and semantically temporal data together is a recipe for trouble.

Figure 5.12 shows that when semantically static and semantically temporal data are mixed, change is difficult to accommodate.

There are lots of good reasons there is such an upheaval whenever change occurs to data that has been mixed together. The most important reason is that a data conversion must be done. Consider what happens to semantically static data when change occurs. The semantically static data must be converted and reorganized even though nothing has happened that alters the actual content of the data. This situation is exacerbated by the fact that organizations typically have a lot of semantically stable data. And there are lots of other reasons change wreaks havoc on semantically static and semantically temporal data when they are mixed together.

SEPARATING SEMANTICALLY STABLE AND UNSTABLE DATA

So the question naturally arises, what would happen if semantically static data and semantically temporal data were separated? Figure 5.13 depicts this design practice.

images

FIGURE 5.11 Every time there is a change in business requirements, the technology infrastructure goes haywire.

MITIGATING BUSINESS CHANGE

When semantically static data and semantically temporal data are separated, the devastation usually caused by changing business requirements is mitigated, as depicted in Figure 5.14.

images

FIGURE 5.12 Temporal and static data are hardwired together.

Even though the phenomenon shown by Figure 5.14 is true, it is not at all intuitive why it is the case. There are several good reasons the separation of semantically static data and semantically temporal data has the very beneficial effect of insulating the IT technological infrastructure from constantly changing business requirements. Consider changing business requirements and semantically static data side by side.

Figure 5.15 shows that semantically static data is not affected much or at all by changes in business requirements. Semantically stable data is by definition and nature semantically stable data under any set of business requirements.

images

FIGURE 5.13 What would happen if temporal and static data were separated?

Now consider what happens to semantically temporal data when change occurs.

When semantically temporal data needs to be changed, no change is made at all. Instead a new snapshot of the semantics is created. Creating a new snapshot of semantics is much easier to do than opening up a database to convert and/or change the data it contains. Therefore, when business change occurs, just a new snapshot is made of semantically temporal data (Figure 5.16).

images

FIGURE 5.14 When temporal and static data are separated, the friction and turmoil caused by change are greatly alleviated.

images

FIGURE 5.15 Static data is stable throughout change.

images

FIGURE 5.16 When change occurs, a new snapshot is created.

CREATING SNAPSHOTS OF DATA

Figure 5.17 shows what happens over time to semantically temporal data as business requirements change.

Over time, a series of snapshots is made. Each snapshot is delimited by time—each snapshot has a to date and a from date. To determine which semantic definition is appropriate, the query has to be qualified by time, which is natural to do with any query.

Figure 5.17 shows that by taking snapshots of the new semantics of temporal data rather than trying to convert older data, managing change over time becomes a very easy thing to do.

A HISTORICAL RECORD

There is an interesting side benefit of managing change to semantically temporal data this way. That benefit is that a historical record of the semantically temporal data is created. This record is seen in Figure 5.18.

The value of historical records of the semantics of data is highlighted by the following example. Consider the information needed by an analyst interested in examining the changes that have been made to a company’s organization chart over time. Suppose the analyst particularly wishes to look at the company’s organization chart as it existed in 1990. With a historical record of the semantic data changes that have occurred over time, the analyst can easily locate and retrieve the firm’s 1990 organization chart using the from and to dates included on every snapshot taken of the company’s semantically temporal data.

When semantically static data and semantically temporal data are separated, and those forms of data are used as a basis for technological infrastructure, organizations can gracefully withstand change over time. The upheaval of systems that is caused by business change is mitigated, as depicted in Figure 5.19.

images

FIGURE 5.17 Over time a collection of snapshots is made that reflects all the changes over time.

DIVIDING DATA

The next logical question is how to create such a division of data. The answer is that semantically static and semantically temporal data should be physically separate in all future database designs. Failing that, there are technologies that manage the DW 2.0 infrastructure as described.

Figure 5.20 represents how infrastructure management software manages the DW 2.0 data infrastructure holistically.

images

FIGURE 5.18 One of the benefits of keeping snapshots of data over time is that there is a historical record.

FROM THE END-USER PERSPECTIVE

The business user lives not in the world of technology, but in the world of business. And whatever else you can say about business, it is a fact that business changes. In some cases the changes are slower and in other cases the changes are faster. But change is a way of life to the business person.

The economy changes, legislation changes, new products come and go, competition changes, and so forth.

The business person needs to be able to have information adapt to those changes. If the information infrastructure does not adapt to the changes then it becomes a millstone around the neck of the business user. Information becomes a liability rather than an asset. It is only when information is truly agile that it becomes a business asset.

images

FIGURE 5.19 The effect of separating temporal data from static data in the face of changing business requirements.

images

FIGURE 5.20 One way to manage static and temporal data is by technology such as Kalido.

The end user does not need to know what is going on “underneath the covers.” The end user looks at the information infrastructure in the same way as most drivers look at their automobile. Most drivers know that there is an engine. Most drivers know that there is a need for gas and oil. But most drivers do not have a clue as to the inner workings of the engine. The best that can be said for most drivers is that when there is a malfunction beneath the hood of the automobile, the driver heads for a garage or repair station.

The same is true for the business analyst and DW 2.0. The business analyst is aware of the existence of DW 2.0. But the he/she does not know the detailed underpinnings of the infrastructure. All the business analyst knows is that when something goes amiss, it is time to find a data architect who does understand the underpinnings of DW 2.0.

SUMMARY

The foundation of technology that DW 2.0 is built upon needs to be able to change. When the technological infrastructure is immutable, the organization soon has business requirements that are not reflected in the data warehouse environment. Furthermore, the longer it takes to add new requirements to a data warehouse, the bigger and more intractable the problem of adapting the data warehouse to business change becomes.

There are two ways to create a technological infrastructure for the data warehouse that can change over time. One approach is to use technology that is designed for that purpose. Another approach is to separate semantically static data from semantically temporal data. By separating the semantically different types of data, the impact of change is mitigated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.172.229