1

Data Warehouse ROI

A Data Warehouse Needs a Purpose

When I was a boy growing up in rural Central Florida, the worst thing I could do on Saturday was nothing. Saturday was the day of the week reserved for chores around the house, such as weeding the garden, mowing the yard, and cleaning the cars and the house. If I was found doing nothing on Saturday, obviously I did not have a task assigned to me or I did not fully understand the task(s) already assigned to me. Either way, my parents would quickly help me by providing and clarifying tasks for me to do...immediately. To make matters worse, my newly assigned tasks would be the most onerous tasks of that Saturday. So, clearly the best way to avoid weeding the garden or cleaning the rain gutters was to volunteer for those tasks that were most tolerable, such as cleaning the house, which included working on the air conditioning.

For a data warehouse, every day is Saturday. A data warehouse without a purpose assigned specifically to that data warehouse will find itself with a new purpose. When Bill Inmon and Ralph Kimball pioneered the information factory and decision support concepts into the modern data warehouse, the genius of their designs and methods rendered the data warehouse an obvious value addition to any enterprise. The retention and juxtaposition of data in a data warehouse from various business units and business processes throughout an enterprise renders a data warehouse a unique collection of data that is found nowhere else in the enterprise. The benefits are obvious and the possibilities are almost infinite. With the addition of a data warehouse, data juxtapositions that were previously not available become standard tools of decision support. When Sales data that exists in an OLTP system can be joined with Product data that exists in an ERP system, and both Sales and Product data can both be joined with Customer data that exists in a CRM system, what could possibly be a problem? The answer is simple, even for a data warehouse with Sales, Product, and Customer data: it is still Saturday.

A data warehouse with Sales, Product, and Customer information is a unique and valuable asset. But, even for unique and valuable assets in an enterprise, such assets must be able to answer the Purpose question: “Why are you here?” Implicit in the Purpose question are several other questions:

  • Why is the enterprise funding the existence of a data warehouse?

  • Why is the enterprise not directing that funding elsewhere?

  • How long should the enterprise expect to continue to fund the existence of a data warehouse?

  • What benefit does the enterprise derive from the existence of a data warehouse?

  • Is the benefit of a data warehouse equal to or greater than the benefit of something else?

Consider all these Purpose questions in the context of a Five-Year Plan, a Ten-Year Plan, and a Fifteen-Year Plan.

These are not philosophical or aesthetic questions with little or no impact on the life of an enterprise or a data warehouse. To the contrary, for an enterprise with a finite budget and more opportunities than resources, the answers to these questions will help the management of an enterprise choose whether to create or maintain a data warehouse, automate a manufacturing operation, or expand into a new market segment. Why should management choose to (continue to) fund a data warehouse rather than automate a manufacturing operation or expand into a new market segment? The answer (“The genius of the data warehouse designs and methods render the data warehouse an obvious value addition to any enterprise”) seems a rather weak answer when compared to the potential for a new market segment. Could the answer be that the retention and juxtaposition of data from various business units and business processes throughout the enterprise renders a data warehouse a unique collection of data that is found nowhere else in the enterprise? No, again although true, that answer seems rather weak. When the automation of a manufacturing operation promises an amazing ROI for the next ten years, and a new market segment is expected to achieve an ROI for the foreseeable future of the enterprise, a data warehouse must be able to identify and quantify the benefits returned by the continued existence of that data warehouse. Otherwise, such a data warehouse may never begin its existence, or may find itself repurposed to something other than decision support, such as weeding the garden.

A Data Warehouse Needs an ROI

A purpose is no guarantee of success or survival in an enterprise. A purpose provides a focal point for the goals and objectives of a data warehouse. If the purpose for a data warehouse is perceived as weak, the data warehouse is perceived as weak. If the purpose for a data warehouse is perceived as strong, the data warehouse is perceived as potentially strong. If the purpose indicates a low ROI, that low ROI reflects on the data warehouse. If the purpose indicates a lofty goal, the data warehouse will be measured by the goal, even if it is measured by nothing else.

The Purpose of a data warehouse is not simply a restatement of the properties and qualities of a data warehouse. Those properties and qualities may include features, functionality, and response time as well as other properties and qualities.

  • Features—For a data warehouse the features can be transaction and event data, business data, environment data, and join strategies, which join those sets of data within the data warehouse.

  • Functionality—For a data warehouse functionality can be metrics and formulas that present information built from the data in the data warehouse, and delivery mechanisms that deliver that data.

  • Response Time—For a data warehouse the response time is the wall clock time duration between the moments when a user requests and receives information from the data warehouse.

The purpose of a data warehouse is not a collection of these features, functionalities, and response times. Instead, these features, functionalities, and response times are the properties of a data warehouse by which it achieves its purpose. Similar to the tools of a carpenter, the hammer and saw are not the purpose; instead, the creation of furniture is the purpose. The hammer and saw are simply the tools by which the carpenter achieves the purpose of building furniture.

Why does this matter? Why pay such attention to the purpose of a data warehouse? Without a valuable yet feasible purpose that simultaneously expresses the value of a data warehouse, that data warehouse is vulnerable. The gravitational pull of data may cause a data warehouse to fail to achieve its purpose and goals. Admit it. Every data warehouse achieves value and ROI, which is neither published nor measured, and may be the “real” reason for the existence of a data warehouse. To continue to deliver its real, yet unpublished, purpose a data warehouse must be able to resist the influence of the gravitational pull of data.

Gravitational Pull of a Data Warehouse

A successful data warehouse that achieves its purpose and adds value and ROI can become its own problem. Well-organized, high-quality data with complete metadata in a data warehouse can attract a problem to itself—the gravitational pull of data. Oddly enough, if left unchecked, the gravitational pull of data will increase the probability that it will destroy the data warehouse as the data warehouse increases its size and success.

Gravity: Two objects, if they are close enough to each other, will be drawn toward each other. When one object is larger than the other, the small object (i.e., the apple) moves while the large object (i.e., the earth) remains stationary. The same is true for a data warehouse. Within an enterprise, a data warehouse can be the large object attracting other objects to it. In an enterprise this gravity occurs by the economies of scale presented by a data warehouse. For example, a data warehouse may have Sales, Product, and Customer data. That would mean that a data warehouse has spent and expensed the investment necessary to analyze the Sales, Product, and Customer data in operational source systems. That would mean that a data warehouse has spent and expensed the investment necessary to design and develop the ETL applications that capture and retain that Sales, Product, and Customer data. Finally, that also means that a data warehouse has spent and will continue to spend, expensed and will continue to expense, the overhead incurred by capturing and retaining Sales, Product, and Customer data. A new application needing Sales, Product, or Customer data can either replicate all those expenditures, expenses, and investments, which will expand the budget and reduce the ROI of the business unit sponsoring the new application, or that new application can get its Sales, Product, or Customer data from the data warehouse, which will allow the business unit sponsoring the new application to avoid the expenditures, expenses, and investments, reduce the budget, and increase the ROI of that new application and the business unit that sponsored it. Only one option is the obvious and preferred option: when data is available in a data warehouse, get that data from the data warehouse. That is the gravitational pull of data.

The data in a data warehouse will “pull” or “draw” applications needing data to itself. As a result, the ROI of the data warehouse will reflect the expenses necessary to provide data to applications without the benefit of that data; in addition, the ROI of the applications will reflect the benefit of that data without the expenses necessary to obtain that data. One application leveraging the data in a data warehouse is company. Fifty new applications per year is a crowd. Any data warehouse that draws that kind of a crowd will inevitably find that some portion of those applications, as they grow and change, will try to modify the data, structure, and architecture of a data warehouse to match their growth and change. Such modifications can convert an enterprise data warehouse into an application data warehouse. This is when a successful and growing data warehouse can become its own problem through the gravitational pull of its own data.

A data warehouse can also be the small object that is drawn to a larger object. When a ten-ton elephant wants to sit in your front yard, where does it sit? Your front yard. There’s no stopping it. When a ten-ton business unit wants to use the data in a data warehouse...well, you can guess what is about to happen. That business unit is going to use the data in the data warehouse. The ten-ton business unit is going to require the data warehouse supplement the data already in the data warehouse with additional data; furthermore, the ten-ton business unit may want to design the additional data necessary to meet the purposes of the business unit, which may or may not be the purposes of the data warehouse. Data already in the data warehouse may be redesigned to meet the needs of the ten-ton business unit, possibly to the detriment of the data warehouse.

Why is the ten-ton business unit consuming the data warehouse in this way? The answer is the same for the ten-ton business unit as for the small single application. Any expense or investment that can be migrated from the ten-ton business unit to the data warehouse will increase the ROI of the ten-ton business unit. Any economy that robs Peter to pay Paul will continue to be supported by Paul. Likewise, an enterprise that allows a business unit to migrate its expenses to a data warehouse, but experience the benefits of the data in the data warehouse, will continue to be supported by all such business units. The data warehouse, however, is left holding the expenses, but none of the benefits, associated with the data.

The gravitational pull of the data in a data warehouse moves smaller applications toward itself so they can use the data in the data warehouse. The gravitational pull of the data in the data warehouse moves the data warehouse toward larger business units when a larger business unit has the clout necessary to exert control over the data warehouse. In both scenarios the success and growth of a data warehouse attracts forces that may, if left unchecked, destroy the data warehouse.

Purpose and ROI

The solution to the gravitational pull of data is purpose and ROI. The purpose of a data warehouse is that set of features and functions it intends to deliver to the enterprise. The ROI of a data warehouse is the realized value caused by the data warehouse when the data warehouse delivers the features and functions of its purpose. The selection of a purpose should be decided as early as possible in the life of a data warehouse. For example, a data warehouse may produce Sales summary reports and Logistics exception reports every morning while an Operational Data Store sends low-inventory and late-shipment alerts in near real time to the PDA of Logistics staff managers. This is when a data warehouse makes itself part of the fabric of the enterprise. The data, reports, alerts, and analytics delivered by a data warehouse—delivered every hour, every day, sometimes in near real time—are collectively the purpose of a data warehouse. The ROI is benefit and value realized within the enterprise when the data warehouse delivers the features and functions included in its purpose.

The ROI of a data warehouse presents a cost to any effort that would change the data warehouse to fit the requirements of a ten-ton business application. In that way, a change to the data warehouse becomes a change to the enterprise. By delivering Sales reports every morning, Sales reports that are immediately consumed by all levels of management, a data warehouse inhibits any efforts that might abridge, interrupt, or delay the delivery of those Sales reports. By delivering near real time alerts from its operational data store to tactical managers throughout the enterprise, near real time alerts that help those tactical managers avoid expensive mistakes, a data warehouse inhibits any efforts that might interrupt or delay those near real time alerts. In that way, Sales reports and near real time alerts can be part of the purpose and ROI of a data warehouse.

Architected correctly, a data warehouse can do both. A data warehouse can maintain its own integrity and flexibility. The internal integrity of a data warehouse, including adherence to data warehousing principles, allows a data warehouse to continue being a data warehouse. The flexibility allows a data warehouse to continue to serve all parts of the enterprise. The purpose and ROI of a data warehouse presents just enough pause for thought to allow a data warehouse architect to achieve both goals—integrity and flexibility.

Bill Inmon and Ralph Kimball individually pioneered the modern data warehouse. Their genius was in their data warehousing principles. The data warehousing principles of Inmon and Kimball allow a data warehouse to assimilate new subject areas, new purposes, and new inertias. Rather than build a database and reporting application that can accommodate only one subject area and its data, the data warehousing principles of Inmon and Kimball allow a data warehouse to assimilate many subject areas with those subject areas already in the data warehouse. This is how a data warehouse is able to assimilate new subject areas throughout its lifetime. In the life of a data warehouse, adherence to data warehousing principles and achievement of data warehouse purposes form a symbiotic relationship. Adherence to data warehousing principles allows a data warehouse to maintain its integrity as a data warehouse while the data warehouse purposes justify adherence to the data warehousing principles. A data warehouse is able to continue its formula for success by achieving its purposes, and able to achieve its purposes because it follows the formula for success, which was pioneered by Inmon and Kimball.

Not Quite a Victim of Success

A data warehouse without a focus and direction, without a purpose, is like a ship without a rudder—rather than choose its own path a ship without a rudder goes where the water takes it. A data warehouse is able to direct its path when it achieves its purpose and delivers ROI. A data warehouse directs its path as it is built in sections, which are subject areas. For example, Sales, Promotions, Logistics, and Customers are frequent subject areas in data warehousing. A data warehouse is not merely a collection of subject areas. The subject areas must be able to work as a cohesive whole within the data warehouse. But what are the symptoms that occur when a data warehouse is not quite a victim of its own success, a data warehouse that lacks focus and direction, that lacks a purpose? Three key indicators of a data warehouse provide a good clue as to the success or failure of a data warehouse. Those three key indicators are performance, relational integrity, and data quality.

Performance

The first clue that a data warehouse is not quite a success is the inability to return a result set. This may seem obvious. It should be obvious. Maybe that is why it is the first clue...because it is so obvious. When users run a report or a query and...and...and...and...they eventually go home for the night, performance is definitely an issue. My first data warehouse was described as “a rock” for its ability to remain motionless and yield absolutely nothing, including data. Performance was an issue.

When a data warehouse is unable to yield data the root cause can be many things. But the first root cause is a lack of purpose. A data warehouse that is built to look like a data warehouse will do exactly that: it will look like a data warehouse. But a data warehouse that is built to render a Sales report will do exactly that: it will render a Sales report. Why is that? If from the first planning meeting everyone understood that, regardless of what may or may not be in the data warehouse, it must be able to generate a Prior Day Sales report, then everyone will begin thinking of the data in terms of the Prior Day Sales report. First, the Sales subject area will be included. Second, Sales will be grained by the day, and maybe by other time grains. Third, the dimensional properties and attributes necessary to describe, quantify, and qualify data in the Prior Day Sales report will be included. In short, everything necessary to make the Prior Day Sales report a success will be included and optimized so that when the managers who decided to fund the data warehouse because they want their Prior Day Sales report ask for this report, that’s exactly what they will get.

What was the purpose of the first subject area of that data warehouse? The purpose was the Prior Day Sales report. Now comes the fun part. When you hit the bull’s-eye right in the center, what is within three inches in every direction? The answer is...more bull’s-eye. By virtue of having a purpose (deliver the Prior Day Sales report) the data warehouse is able to deliver the Prior Day Sales report and a dozen other reports that are only slight variations of the Prior Day Sales report—for example, the Prior Day Department Sales report, the Prior Day Sub-Department Sales report, and the Prior Week Sales report. What about the Prior Day Logistics report? That’s another subject area and another purpose. Remember, one subject area at a time.

Relational Integrity

Another good clue that a data warehouse is not ready for prime time is the multiplication and disappearance of data. When the Prior Day Sales report indicates that sales yesterday were $500,000, but the Prior Day Department Sales report indicates sales yesterday were $750,000, something is amiss in the data warehouse. Or, when the Prior Day Department Sales report indicates the Furniture department disappeared, something is not quite right in the data warehouse.

Again, this would seem to be rather obvious. That is why it is a good first indicator of less than stellar success. The dimension tables look like dimension tables. The dimension and fact tables join like dimension and fact tables. As individual elements of a data warehouse they all seem to fit the descriptions of a data warehouse. But, when joined together in the form of a report or query, they distort the data returned by the data warehouse. Why is that? Again, from the first planning meeting to the final report review the fact tables and dimension tables will be designed to correctly join to achieve the result set for a Prior Day Sales report if the Prior Day Sales report is the goal of that subject area. By using the Prior Day Sales report to guide the relational integrity of the Sales subject area, other reports and queries will also experience good relational integrity. In short, the relational integrity necessary to make the Prior Day Sales report a success will still be there for other reports. By achieving its purpose, which is the Prior Day Sales report, a data warehouse experiences the secondary success of achieving other ancillary purposes as well.

Data Quality

The third key indicator of the success or failure of a data warehouse is not quite so obvious. However, what it lacks in obviousness, it compensates for in simplicity. Why do less successful data warehouses not include a Data Quality program? The answer is simple: they don’t know what bad-quality data looks like. A word looks exactly like a word; a non sequitur word looks exactly like every other word. But, when the only acceptable values for a data element are north, south, east, and west a data quality application can rather easily identify yes as a non sequitur word.

A Data Quality program is difficult. You can take that plain and simple statement to the bank. In a Data Quality program you programmatically lock your definitions of acceptable data values and bad data values in logic written into ETL code. To do so, you must know what acceptable and bad data look like, and you must be able to define them in terms of programmatic logic. That task, all by itself, is difficult. Typically, a Data Quality program is an iterative effort. Each iteration is a refinement of prior efforts.

So, the mere presence of an active and functioning Data Quality program is a key indicator of a successful or failing data warehouse. It’s that simple. The presence of an active and functioning Data Quality program means that someone is validating some of the data in the data warehouse to verify that it is consistent with the logic and design of the data warehouse, which is much better than no one validating any of the data in a data warehouse.

How is a Data Quality program enhanced by the presence of a purpose? The purpose of a data warehouse provides a focus to the Data Quality program. If the purpose of a data warehouse is to produce a Prior Day Sales report, then the Data Quality program will begin by validating the data that contributes to the Prior Day Sales report. Even if the effort is incomplete, yet iterative, any attention to data validation is better than no attention to data validation; furthermore, validating the data delivered to the members of management who choose to fund the data warehouse is always a good choice.

Purpose

If you choose to aim at nothing, you just may hit the target.

A data warehouse, like every other enterprise asset, must have a reason for existing. If a data warehouse is not busy adding value as a data warehouse, it may soon begin adding value as something else. While ROI is thought to mean Return on Investment, for those operating a data warehouse ROI means Reason for Existing. The value added by a data warehouse contributes to the ability of a data warehouse to maintain its integrity in the presence of the gravitational pull of its own data. In that way, a data warehouse purpose can become its own self-fulfilling prophecy.

Build it and they will come. Yes, that often works, but to what level of success? Each subject area of a data warehouse must have its own purpose. Without a purpose to act as cohesive glue, a subject area of a data warehouse will eventually begin to spin out of control. After only a few subject areas experience this form of orbital decay, a data warehouse is already at risk.

The purpose of this book is to suggest and explain two purposes that can increase the value and ROI of a data warehouse. Specifically, those two purposes are Market Basket Analysis and Time Variant Data. Many enterprises aspire to include Market Basket Analysis in their set of analytics. Some enterprises pay marketing and consulting firms to provide Market Basket Analysis. Unfortunately, the results delivered often fail to meet the expectations, and sometimes fail to deliver at all. The following chapters will explain the obstacles to Market Basket Analysis and a design solution that overcomes those obstacles.

Time Variance is one of the concepts included in the pioneering works of Inmon and Kimball. The idea is that a data warehouse will present historical data in its historical context. For example, transactions in 1995 will be presented in their 1995 context while transactions in 2011 will be presented in their 2011 context. While the concept of Time Variant Data is intuitively easy to understand, it is also difficult to deliver each and every transaction in its historical context. An individual query may span a period of weeks, months, or years, which would require it to include the Time Variant context for each of those transactions. The solution design for Time Variant Data overcomes the obstacles of Time Variance, allowing a data warehouse to deliver Time Variant Data with a nominal degradation in performance.

Finally, the last chapter will combine these two purposes into one delivery. As stated earlier, when you hit the bull’s-eye in the center, there is more bull’s-eye in every direction. The combination of Market Basket Analysis using Time Variant Data is an example of such an ancillary benefit. By delivering the purposes of Market Basket Analysis and Time Variant Data, a data warehouse can achieve another deliverable—Market Basket Analysis of Time Variant Data.

A data warehouse that includes those three deliverables (Market Basket Analysis, Time Variant Data, and Market Basket Analysis of Time Variant Data) is a data warehouse with purpose. A data warehouse that delivers those three purposes is a data warehouse with an ROI.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.1.195