CHAPTER 6

The Operational Data Store Component

For all of the benefits of a data warehouse and its associated data marts, there is still a need for collective, integrated operational, DSS/informational processing. When this need arises, an operational data store (ODS) is in order. An ODS is a hybrid structure that has equally strong elements of operational processing and DSS processing. This dual nature of the ODS easily makes it the most complex architectural structure in the corporate information factory (Figure 6.1).

What Is an Operational Data Store?

An ODS is a collection of detailed data that satisfies the collective, integrated, operational needs of the corporation. Generally, these needs arise in the following situations: as strategic decisisons are made using the data warehouse and/or data mart and action is required; as integrated operational reporting is needed across multiple but related operational systems (e.g., distribution systems, customer service systems, etc.) The ODS is:

Images   Subject-oriented

Images   Integrated

Images   Volatile

image

Figure 6.1  The operational data store.

Images   Current-valued

Images   Detailed

The ODS looks very much like a data warehouse when it comes to its first two characteristics, subject orientation and integration. However, the remaining characteristics of an ODS are quite different from a data warehouse. Because of the very nature of these fundamental differences in types of data and processing, it is never acceptable to combine an ODS and a data warehouse into the same physical environment.

Volatility

An ODS is volatile. That means that an ODS can be updated as a normal part of processing. A data warehouse is nonvolatile and is not updated under normal circumstances. Instead, a data warehouse contains snapshots; a new snapshot is created whenever a change needs to be reflected in the data warehouse.

Current-Valued

The second major difference is the timeliness of the data found in the ODS. An ODS typically contains daily, weekly, or maybe even monthly data, but the data ages very quickly in the ODS. The data warehouse, on the other hand, contains robust amounts of historical data. In fact, it may contain 5 or even 10 years worth of data.

Detailed Data

The third difference between an ODS and a data warehouse is that the ODS contains detailed data only, but a data warehouse contains both detailed and summary data. This characteristic is perhaps the most defining difference between a data warehouse and an ODS.

Feeds into and out of the Operational Data Store

The ODS is simple; it has two primary feeds into and one primary feed out of it (Figure 6.2).

The Integration and Transformation Layer Feed

The first primary feed into the ODS is from the I & T layer. The feed out of the ODS is to the data warehouse. This ODS-to-data-warehouse feed is activated as data ages.

The second primary feed is from the data warehouse itself. This is one of the more important feeds because it is through this feed that online response time can be achieved for DSS data warehouse data. This feed is typically known as the Class IV feed.

In a Class IV ODS feed, data is analyzed and synthesized at the data warehouse; then it is aggregated or summarized and passed back to the ODS. When in the ODS, the data is immediately available for decision making. There is no need to go back to the data warehouse and gather and analyze a lot of data in order to make a decision.

Class IV ODS are found in many places:

image

Figure 6.2  The feeds of data into and out of the ODS.

Images   In the telecommunications environment where the operator has direct contact with the customer

Images   In the banking environment where the teller has immediate loan authorization for a customer

Images   In the retail environment where the salesperson has up-to-the-second information about the sales status of an item

Images   In the insurance environment where the agent has prerated the customer/prospect

Another rather small feed of data into the ODS is a feed directly from the external environment. Although it is true that specialized applications can be written that will allow data to go directly into the ODS, bypassing the I & T layer, these applications are fairly rare. Under normal circumstances, data enters the ODS by passing through the I & T layer.

Different Classes of the Operational Data Store

The I & T interface is of special interest because it governs the different types (or classes) of ODS. Consider the four different I & T interfaces shown in Figure 6.3:

Images   Class I: Asynchronous—one-to-two-second delay

Images   Class II: Store and forward—two-to-four-hour delay

Images   Class III: Batch processing—overnight

Images   Class IV: Data from the warehouse

A single ODS will contain one, two, three, or perhaps all four classes of data.

Class I Operational Data Store

There is a synchronous interface in which a very, very small amount of time lapses between an application’s transaction and the reflection of the transaction in the ODS. The amount of time is typically one second or less. This type of interface is called a Class I ODS. In this class, very little serious work can be done to the data as it passes through the I & T layer because there simply isn’t time to do very much. The transaction passes through to the ODS in a fairly complete manner. For all practical purposes, the end user never sees a time lag between an operational transaction execution and the reflection of that transaction in the ODS when the ODS is Class I.

image

Figure 6.3  The four different classes of ODS can be classified by the speed of the feed of data into the ODS.

Class II Operational Data Store

If an hour or two passes from the time a transaction is created and interacted in the application environment until that transaction is reflected in the ODS, the ODS type is a Class II ODS. The I & T layer holds the data in abeyance in a store-and-forward mode. Because of the time lag, a serious amount of work can be done to the transaction data as it passes through the I & T layer. The data shows up as truly integrated data in the ODS. The end user may notice a difference between the data in the operational applications and the ODS while the data is held in a store-and-forward mode.

Class III Operational Data Store

In a Class III ODS, there may be a time lag between 12 hours and a day as transaction data is collected in the I & T interface. This is an overnight batch process and as much integration of data as desired can occur during processing in the I & T layer. The data that arrives at the ODS can be very integrated in this mode. The end user can definitely notice a difference between the values of data in the operational application environment and the ODS as the data sits in the I & T layer awaiting processing.

Class IV Operational Data Store

In a Class IV ODS, the data is fed into the ODS directly from the data warehouse. In this case, the data is read and analyzed in the data warehouse and derivations are passed to the ODS. Typical derivations passed to the ODS include customer segments and customer scores.

A Class IV ODS is important because it is here that online response time can be achieved when trying to access data derived in the data warehouse.

Determining the Class

Many technological implications to the intranet technology must be considered here, such as:

Images   Speed of movement of data into the ODS

Images   Volume of data that must be moved

Images   Volume of data that must be stored in intermediate locations during I & T processing

Images   Update of data and integrity of transaction processing

Images   The time of day the movement needs to occur

image

Figure 6.4  The cost of development and operation for a Class I ODS is significantly higher than the corresponding costs for a Class II and a Class III ODS.

The choice of whether a Class I, II, III, or IV ODS will be created is the first and one of the most important decisions the system’s architect must make. Figure 6.4 outlines some of the considerations involved in the choice of a Class I, II, II, or IV ODS.

Furthermore, if integration is a primary consideration, then a Class I ODS is not a very good choice because not very much integration can be achieved with it. For these reasons then, there must be a very serious business justification for a Class I ODS. As a rule, a Class II or a Class III ODS serves most companies’ needs for most processing.

Dynamic Summary Data

One of the distinguishing characteristics of an ODS is the fact that it stores only detailed data. Certainly, summary data can be created from detailed data, but storing that summary data after it is created is an entirely different matter. The summary data that is created in the ODS can be called dynamic summary data. This is data whose accuracy of summarization depends upon the moment of calculation.

Figure 6.5 shows a calculation that is made for a bank’s current collective balance for IBM Corporation. At 10:37 A.M., IBM has a collective balance of $1,998,417.29. Later in the day, at 4:13 P.M., another calculation is made of the collective balance of IBM, showing $2,867,665.19. The accuracy of the summary data found in the ODS is a function of the moment in time when the calculation is made. It would be a mistake to store the 10:37 A.M. value in the ODS because it would be tempting to use that value for a business decision at 4:13 P.M. The business decision at 4:13 P.M. may well be a very different business decision than the one made at the earlier time. For this reason, dynamic summary data is not normally stored in an ODS.

image

Figure 6.5  Dynamic summary data: The accuracy of the calculation depends on the moment in time that the calculation is made.

Static Summary Data

Now consider the summary data found in a data warehouse. Suppose that on Monday, a manager asks an analyst to determine what the expenses were for the last quarter in a department. The analyst calculates that quarterly expenses were $101,768.19 for the department.

Now suppose that another manager asks an analyst to make the same calculation on Friday. The analyst should calculate exactly the same amount—$101,768.19. There should be no variation in the amount calculated, even though the calculation is made at a different moment in time.

This type of summary data is called static summary data and is perfectly safe to place in a data warehouse. In fact, it is wasteful not to place this type of data in a warehouse. In the data warehouse, the DSS analyst is less interested in what expenses look like this instant and more interested in how expenses trend over time, compared to forecast, and compared to last year. In addition, the DSS analyst expects these complex questions to be answered in a very short time (2—3 minutes). This can be accomplished only through liberal use of static summaries in the data warehousing environment. Because of the differences between dynamic summary data and static summary data, summary data should not be stored in an ODS but in a data warehouse and/or data mart.

The Operational Data Store Workload

The ODS environment is the most technologically challenging environment because elements of very different kinds of processing must constructively cohabitate in the same technological infrastructure (Figure 6.6).

Load Processing

One kind of processing that must be done in the ODS is that of loading data. In the case of a Class I ODS, the loading is done in an online manner. The challenges faced in this environment are generally associated with the sophistication of technology or the lack of key technologies in the application and/or legacy environments.

In the case of a Class II or III ODS, loading is done asynchronously. The challenges surface when loading large volumes of data. This generally results in load processes that are complex to develop, certify, and maintain.

image

Figure 6.6  The kinds of processing that occur in the ODS environment.

In the case of a Class IV ODS, loads are done in batches and can be scheduled to run when the workload on the ODS is low. Additionally, data being loaded is less time sensitive given the degree that historical and external data are used. These characteristics make this class of ODS the easiest to implement.

Update Processing

The second kind of processing that can be done in the ODS environment is that of update processing. Even though direct updating is not frequently done, it still must be accommodated. When direct updating must be accomplished, there is a technological implication of update integrity. This is achieved within the confines of the DBMS structure by means of facilities that COMMIT work to be applied to the database, ROLLBACK work that has not been COMMITed, and RECOVER work that has been COMMITed and lost. Even though an update many be done for only a small number of transactions, all active transactions in the system pay the price of overhead for the update.

Access Processing

The third type of processing that occurs in the ODS is that of access processing. Many people may be using the ODS who expect consistent two-to-three-second response time. In fact, this type of processing is the dominant one in the ODS environment and entails the access of a few rows of data, not the update of data.

DSS Analysis Processing

The fourth type of processing found in the ODS environment is the occasional DSS analysis of data, where sweeping analysis is made across many records.

The complicating aspect of the ODS is that all of the different styles of processing must be accommodated within the ODS infrastructure. The problem is that optimizing any one style of processing compromises all other styles of processing. Said differently, when the designer of the ODS optimizes any one style of processing, the designer does so at the expense of all other styles of processing. The best the designer can do is achieve a comfortable level of performance for all styles of processing in the ODS. It is because of this compromise that the ODS is the most complicated of all the architectural constructs within the corporate information factory.

Different Processing Windows

In order to achieve the compromise between the types of processing, the ODS designer must carefully divide the ODS day into different processing windows, as shown in Figure 6.7. One of the most important reasons why the ODS processing window must be divided into a series of mini processing windows is that in order to achieve consistent processing time, that workload must be homogeneous.

What Is a Homogeneous Workload?

A homogeneous workload is one in which the different types of processing that reside in the workload are not mixed. Figure 6.8 shows the difference between a homogeneous workload and a heterogeneous workload in terms of system throughput.

On the left side of Figure 6.8 is a homogeneous workload in which only transactions of a given type—small, fast-running transactions—are found. In the environment on the left, flow through the system is unimpeded, and an efficient and quick flow exists. Such a system exhibits good response time.

image

Figure 6.7  The cycle of daily processing for the ODS environment.

image

Figure 6.8  In order to achieve consistent and good response time, the workload flowing through the system needs to be constrained.

Now consider the workload represented on the right side of Figure 6.8. In this workload is a mixture of different kinds of transactions—very small, fast-running transactions and large slow-running transactions. The practice of keeping workloads homogeneous is called adherence to the standard work unit and has been known and practiced by online system designers for years. The concepts of the standard work unit apply to the ODS every bit as much as they have applied to OLTP.

External Data in the Operational Data Store

External data can be placed in the ODS just as it is placed in other parts of the corporate information factory. Metadata is likewise a part of the ODS environment. Although metadata is useful in the ODS environment, it is not nearly as important as it is in the data warehouse or the data mart environment.

Summary

The ODS is a hybrid architectural construct containing some elements of data warehousing and some application characteristics. It operates on a mixed workload and is easily the most difficult component of the CIF to construct and operate.

An ODS is not an essential part of the CIF. Some companies operate quite nicely without an ODS. Other companies find an ODS to be indispensable when their existing applications environments do not provide the necessary integration to support evolving business management activities resulting from business intelligence gained using the data warehouse. This is commonplace in many marketing, sales, and service organizations. These areas generally use the data warehouse to formulate strategies on how to best treat their customers and use the ODS to support contact activities related to execution of these strategies.

The feed into the ODS is the I & T layer. In addition, occasionally data is entered directly into the ODS.

Four types of ODS exist: Class I, Class II, Class III, and Class IV. The class of an ODS depends on the speed with which data is passed from the I & T layer. As a single ODS matures, it is possible that it may contain all classes of data.

The ODS contains dynamic summary data, data whose accuracy depends on the moment of calculation. Because of the ever-changing nature of dynamic summary data, it is normally not stored in the ODS.

The ODS operates on a severely mixed workload. Because of the dramatic differences in the nature of the workload, the ODS day is divided into slices. There is the OLTP time slice, the batch time slice, and the DSS time slice.

We have talked about the components of the corporate information factory responsible for providing, capturing, transforming, and integrating detail data. In addition, we have talked about how the ODS provides a platform for integrating this data for operational reporting. We also talked about how the ODS is used as a mechanism for inserting information derived in the data warehouse back into the applications—in a sense, making knowledge gained from analyzing data actionable. Now let’s take a look at the best-known component of the CIF, the data warehouse. It is here that detail historical data is stored and made available for strategic analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.233.22