Chapter 16 Migration

The DW 2.0 environment is a large complex environment. It requires many resources over a long period of time to build the DW 2.0 environment. Figure 16.1 indicates that the DW 2.0 environment is more like a city than a house.

HOUSES AND CITIES

A house is built over a relatively short amount of time. There is a finite starting point and a finite ending point in the building of a house. A house reaches a single point of usefulness—at one point it is not useful and at another point it is useful.

A city has a very different path of construction. A city is built over a very long period of time. A city is useful as soon as the first building is constructed. The city may or may not be planned. And, cities take on their own character. Even though they have common characteristics, there is a distinctiveness to each city. For example, Athens, Rome, New York, and Tokyo all have airports, a municipal building, residential areas, and high-rent districts, but no one mistakes these cities for one another. Athens has the Parthenon, Paris has the Eiffel Tower, New York has its financial district, and Tokyo has corkscrew traffic bridges over Yokohama Bay.

images

FIGURE 16.1 When you are building DW 2.0, you are building a city, not a house.

The same is true for DW 2.0 data warehouses. The implementation of DW 2.0 will look very different for Coca Cola, Citicorp, CIBC, and Chrysler; yet all of their DW 2.0 data warehouses will still recognizably share the same architecture.

If corporations almost never set out to build a DW 2.0 environment, how do they end up with one? The answer is that they evolve to one. Corporations migrate to the DW 2.0 architecture over time.

MIGRATION IN A PERFECT WORLD

In a perfect world, the order of the building of a DW 2.0 data warehouse mimics the flow of data into and through the DW 2.0 environment.

Figure 16.2 depicts a “perfect world” implementation of the DW 2.0 architecture.

This figure depicts the order in which the DW 2.0 data warehouse would be built if there were no other existing data warehouse infrastructure. As each level of processing is built, the foundation is set for the next level.

THE PERFECT WORLD ALMOST NEVER HAPPENS

But the sequence shown in Figure 16.2 is a theoretical sequence. A DW 2.0 data warehouse is almost never built in top-down sequential steps as depicted. The primary reason a DW 2.0 data warehouse is not built in this “natural order” is that almost everyone who builds a DW 2.0 data warehouse already has an existing data warehouse environment in place.

Figure 16.3 depicts the infrastructure that almost everyone starts with, including a legacy applications environment, ETL processing, and a data base or data warehouse. These are the bare-bone components in the infrastructure that are found in most corporations.

ADDING COMPONENTS INCREMENTALLY

One of the really good pieces of news about the DW 2.0 architecture is that most of its components can be added incrementally and independently, on an as-needed basis. This ability to add components independently and incrementally means that companies can migrate and evolve to the DW 2.0 environment in an orderly manner. Nowhere in the migration to the DW 2.0 architecture is there a call to uproot and discard existing systems. Instead, the path to DW 2.0 is one by which the DW 2.0 infrastructure components can be built on top of an already-existing data warehouse.

images

FIGURE 16.2 The “natural” order in which DW 2.0 is built.

images

FIGURE 16.3 Where most organizations begin.

Adding a near-line storage component to an existing data warehouse infrastructure is a good example of incremental migration to the DW 2.0 architecture. Although near-line storage is optional and is not for every corporation, when it is needed, there is nothing else like it. Adding near-line storage to a first-generation data warehouse environment is an architecturally easy thing to do. No special work or preparation is needed to attach new near-line storage facilities to a first-generation data warehouse.

Figure 16.4 depicts the addition of near-line storage to an existing data warehouse environment.

ADDING THE ARCHIVAL SECTOR

Next consider the Archival Sector. The Archival Sector can also be built with no advance preparation. One day the archival facility is not there, the next day it is, and nothing special had to be done to the first-generation data warehouse in the process.

images

FIGURE 16.4 Adding near-line storage is incremental.

images

FIGURE 16.5 Adding archives is relatively easy to do.

Figure 16.5 depicts the addition of an Archival Sector to an existing first-generation data warehouse environment.

CREATING ENTERPRISE METADATA

The same considerations are true for the DW 2.0 metadata facility. As a rule, local metadata is already in place. Whether it is being used or not, the vendors that supply technology often provide a facility for the local storage and management of metadata, such as ETL metadata, business intelligence metadata, and DBMS metadata. So the local foundation of metadata is usually already in place. What needs to be added is enterprise metadata. Building enterprise metadata usually consists of three steps:

Building the enterprise metadata repository
Moving local metadata into the enterprise metadata repository
Reconciling the local metadata with an enterprise metadata format
images

FIGURE 16.6 Metadata has to be gathered from many sources to form the enterprise metadata repository.

The last of these steps is always the hardest. Revising local metadata to conform to a corporate, enterprise format and structure is a difficult task to do.

BUILDING THE METADATA INFRASTRUCTURE

At no point does building an enterprise-level metadata repository require tearing down or discarding the existing environment. Instead the DW 2.0 metadata infrastructure is built over any already existing data warehouse infrastructure.

Figure 16.6 depicts the building of the enterprise metadata infrastructure over an existing first-generation data warehouse.

“SWALLOWING” SOURCE SYSTEMS

If there is any place where there is likely to be some amount of decommissioning of systems in the existing operational application environment, it is legacy applications that need to be consumed by the Interactive Sector of the DW 2.0 environment. In many cases the Interactive Sector will “swallow up” the old source applications. In other cases, the source applications will be left “as is” and simply continue to contribute data to the Interactive Sector.

FIGURE 16.7 The applications morph into the Interactive Sector.

In the cases in which a source application is swallowed up by the Interactive Sector, it is a good bet that the application was an old out-of-date legacy system. The legacy application that is swallowed up was designed to satisfy business requirements from a long time ago—requirements that have long since changed. If the Interactive Sector had not come along, then the legacy application would have needed to be refurbished in any case.

Figure 16.7 depicts the absorption of some legacy applications into the Interactive Sector.

ETL AS A SHOCK ABSORBER

ETL processing acts like a shock absorber for the entire data warehouse evolution and migration process. A drastic change can occur in the operational source application world, and through ETL transformation the effect on the Interactive Sector is minimized. Likewise, a major change may occur in the Interactive Sector, and through ETL transformation the effect on the Integrated Sector is nonexistent or at worst minimal.

Figure 16.8 shows ETL acting as a shock absorber between the different sectors.

MIGRATION TO THE UNSTRUCTURED ENVIRONMENT

The unstructured data domain is one of the newest and most important features of the DW 2.0 data warehouse environment. In many DW 2.0 environments, unstructured data is the added component that unlocks the door to many new kinds of analytical and decision-support processing.

images

FIGURE 16.8 ETL processing acts like a shock absorber.

The migration to the DW 2.0 environment for unstructured data is quite different from the migration for structured data. Whereas the structured environment almost always exists in the form of a first-generation data warehouse, the same cannot be said for the unstructured component. There almost never is preexisting unstructured data that can be added to a DW 2.0 data warehouse environment.

Figure 16.9 shows that unstructured data is almost always captured in its entirety from its textual sources and is passed through a new unstructured data ETL routine into the unstructured side of the DW 2.0 data warehouse.

images

FIGURE 16.9 Unstructured data is entered from text and other forms of unstructured data.

After unstructured data has been processed into the DW 2.0 data warehouse, linkages are formed between structured data and unstructured data. Figure 16.10 depicts the formation of linkages between the unstructured and the structured data domains within a DW 2.0 sector.

Over time, as unstructured data ceases to be used, the unstructured data migrates to the DW 2.0 Archival Sector’s unstructured data domain. There is more information in Chapter 19 about unstructured data.

FROM THE PERSPECTIVE OF THE BUSINESS USER

Migration is something the business user is indirectly involved in. The business user determines what new subject areas are to be included in DW 2.0. He/she determines when data should be placed in archives and near-line storage. He/she also determines the transformations that occur as data passes from one sector to another.

images

FIGURE 16.10 The unstructured environment is linked to the structured environment.

But at the end of the day, the business user is only tangentially involved in the migration of data that occurs as the DW 2.0 environment is being built.

SUMMARY

There is a natural migration order to a DW 2.0 data warehouse architecture. The natural migration order follows the same path on which data flows—first to the Interactive Sector, next to the Integrated Sector, then to the Near Line Sector, and finally to the Archival Sector. Although the natural order is well defined, it is only theoretical.

In reality, the DW 2.0 environment evolves from the first-generation data warehouse one component at a time. The archival environment can be added independently. The near-line environment can be added independently. So can the enterprise metadata infrastructure and the unstructured data domain.

The different components of the DW 2.0 environment are added in response to business needs.

The only preexisting systems components that are sometimes destroyed and replaced during migration to the DW 2.0 architecture are legacy application systems. On occasion, a legacy systems environment is so out of date and so fragile that it is easier to rewrite the system than it is to integrate data from the old system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.121.251