Chapter 6. Warehousing Strategy

Define the data warehouse strategy as part of the information technology strategy of the enterprise. The traditional Information Strategy Plan (ISP) addresses operational computing needs thoroughly but may not give sufficient attention to decisional information requirements. A data warehouse strategy remedies this by focusing on the decisional needs of the enterprise.

We start this chapter by presenting the components of a Data Warehousing strategy. We follow with a discussion of the tasks required to define a strategy for your enterprise.

Strategy Components

At a minimum, the data warehouse strategy should include the following elements.

  • Preliminary data warehouse rollout plan. . Not all of the user requirements can be met in one data warehouse project—such a project would necessarily be large, and dangerously unmanageable. It is more realistic to prioritize the different user requirements and assign them to different warehouse rollouts. Doing so allows the enterprise to divide the warehouse development into phased, successive rollouts, where each rollout focuses on meeting an agreed set of requirements.

    The iterative nature of such an approach allows the warehousing team to extend the functionality of the warehouse in a manageable manner. The phased approach lowers the overall risk of the data warehouse project, while delivering increasing functionality to the users.

  • Preliminary data warehouse architecture. . Define the overall data warehouse architecture for the pilot and subsequent warehouse rollouts to ensure the scalability of the warehouse. Whenever possible, define the initial technical architecture of each rollout.

    By consciously thinking through the data warehouse architecture, warehouse planners can determine the various technology components (e.g., MDDB, RDBMS, tools) that are required for each rollout.

  • Short-listed data warehouse environment and tools. . There are a number of tools and warehousing environments from which to choose. Create a short-list for the tools and environments that appear to meet your warehousing needs. A standard set of tools will lessen tool integration problems and will minimize the learning required of both the warehousing team and the warehouse users.

Below are the tasks required to create the enterprise's warehousing strategy. Note that the tasks described below can typically be completed in three to five weeks, depending on the availability of resource persons and the size of the enterprise.

Determine Organizational Context

An understanding of the organization helps to establish the context of the project and may highlight aspects of the corporate culture that may ease or complicate the warehousing project. Answers to organizational background questions are typically obtained from the Project Sponsor, the CIO, or the Project Manager assigned to the warehousing effort.

Typical organizational background questions include:

  • Who is the Project Sponsor for this project? . The Project Sponsor sets the scope of the warehousing project. He or she also plays a crucial role in establishing the working relationship among warehousing team members, especially if third parties are involved. Note that easy access to warehousing data may also be limited to the organizational scope that is within the control or authority of the Project Sponsor.

  • What are the IS or IT groups in the organization? Which are involved in the data warehousing effort? .  Since data warehousing is very much a technology-based endeavor, the IS or IT groups within the organization will always be involved in any warehousing effort. It is often insightful to understand the bulk of the work currently performed within the IS or IT departments. If the IS or IT groups are often fighting fires or are very busy deploying operational systems, data warehousing is unlikely to be high on the list of IT priorities.

  • What are the roles and responsibilities of the individuals who have been assigned to this effort? . It is helpful to define the roles and responsibilities of the various individuals involved in the data warehousing project. This practice sets common, realistic expectations and improves understanding and communication within the team. In cases where the team is composed of external parties (especially where several vendors are involved), a clear definition of roles becomes critical.

Conduct Preliminary Survey of Requirements

Obtain an inventory of the requirements of business users through individual and group interviews with the end-user community. Whenever possible, obtain layouts of the current management reports (and their planned enhancements).

The requirements inventory represents the breadth of information that the warehouse is expected to eventually provide. Note that while it is important to get a clear picture of the extent of requirements, it is not necessary to detail all the requirements in depth at this point. The objective is to understand the user needs enough to prioritize the requirements. This is a critical input for identifying the scope of each data warehouse rollout.

Interview Categories and Sample Questions

The following questions, arranged by category, should be useful as a starting point for the interviews with intended end users of the warehouse.

  • Functions. . What is the mission of your group or unit? How do you go about fulfilling this mission? How do you know if you've been successful with your mission? What are the key performance indicators and critical success factors?

  • Customers. . How do you group or classify your customers? Do these groupings change over time? Does your grouping affect how you treat your customers? What kind of information do you track for each type of client? What demographic information do you use, if any? Do you need to track customer data for each customer?

  • Profit. . At what level do you measure profitability in your group? Per agent? Per customer? Per product? Per region? At what level of detail are costs and revenues tracked in your organization? How do you track costs and revenues now? What kind of profitability reports do you use or produce now?

  • Systems. . What systems do you use as part of your job? What systems are you aware of in other groups that contain information you require? What kind of manual data transformations do you have to perform when data are unavailable?

  • Time. . How many months or years of data do you need to track? Do you analyze performance across years? At what level of detail do you need to see figures? Daily? Weekly? Monthly? Quarterly? Yearly? Do you need month-to-date or year-to-date computations? For which figures? How soon do you need to see data (e.g., do you need yesterday's data today?) How soon after week-end, month-end, quarter-end, and year-end do you need to see the previous period's figures?

  • Queries and reports. . What reports do you use now? What information do you actually use in each of the reports you now receive? Can we obtain samples of these reports? How often are these reports produced? Do you get them soon enough, frequently enough? Who makes these reports for you? What reports do you produce for other people?

  • Product. . What products do you sell, and how do you classify them? Do you have a product hierarchy? Do you analyze data for all products at the same time, or do you analyze one product type at a time? How do you handle changes in product hierarchy and product description?

  • Geography. . Does your company operate in more than one location? Do you divide your market into geographical areas? Do you track sales per geographic region?

Interviewing Tips

Many of the interviewing tips enumerated below may seem like common sense. Nevertheless, interviewers are encouraged to keep the following points in mind:

  • Avoid making commitments about warehouse scope. . It will not be surprising to find that some of the queries and reports requested by interviewees cannot be supported by the data that currently reside in the operational databases. Interviewers should keep this in mind and communicate this potential limitation to their interviewees. The interviewers cannot afford to make commitments regarding the warehouse scope at this time.

  • Keep the interview objective in mind. . The objective of these interviews is to create an inventory of requirements. There is no need to get a detailed understanding of the requirements at this point.

  • Don't overwhelm the interviewees. . The interviewing team should be small; two people are the ideal number—one to ask questions, another to take notes. Interviewees may be intimidated if a large group of interviewers shows up.

  • Record the session if the interviewee lets you. . Most interviewees will not mind if interviewers bring along a tape recorder to record the session. Transcripts of the session may later prove helpful.

  • Change the interviewing style depending on the interviewee. . Middle-managers are more likely to deal with actual reports and detailed information requirements. Senior executives are more likely to dwell on strategic information needs. Change the interviewing style as needed by adapting the type of questions to the type of interviewee.

  • Listen carefully. . Listen to what the interviewee has to say. The sample interview questions are merely a starting point—follow-up questions have the potential of yielding interesting and critical information. Take note of the terms that the interviewee uses. Popular business terms such as "profit" may have different meanings or connotations within the enterprise.

  • Obtain copies of reports, whenever possible. . The reports will give the warehouse team valuable information about source systems (which system produced the report), as well as business rules and terms. If a person manually makes the reports, the team may benefit from talking to this person.

Conduct Preliminary Source System Audit

Obtain an inventory of potential warehouse data sources through individual and group interviews with key personnel in the IT organization. While the CIO no doubt has a broad, high-level view of the systems in the enterprise, the best resource persons for the source system audit are the DBAs and system administrators who maintain the operational systems.

Typical background interview questions, arranged by categories, for the IT department include:

  • Current architecture. . What is the current technology architecture of the organization? What kind of systems, hardware, DBMS, network, end-user tools, development tools, and data access tools are currently in use?

  • Source system relationships. . Are the source systems related in any way? Does one system provide information to another? Are the systems integrated in any manner? In cases where multiple systems each have customer and product records, which one serves as the "master" copy?

  • Network facilities. . Is it possible to use a single terminal or PC to access the different operational systems, from all locations?

  • Data quality. . How much cleaning, scrubbing, deduplication, and integration do you suppose will be required? What areas (tables or fields) in the source systems are currently known to have poor data quality?

  • Documentation. . How much documentation is available for the source systems? How accurate and up-to-date are these manuals and reference materials? Try to obtain the following information whenever possible: copies of manuals and reference documents, database size, batch window, planned enhancements, typical backup size, backup scope and backup medium, data scope of the system (e.g., important tables and fields), system codes and their meanings, and keys generation schemes.

  • Possible extraction mechanisms. . What extraction mechanisms are possible with this system? What extraction mechanisms have you used before with this system? What extraction mechanisms will not work?

Identify External Data Sources (If Applicable)

The enterprise may also make use of external data sources to augment the data from internal source systems. Examples of external data that can be used are:

  • Data from credit agencies

  • Zip code or mail code data

  • Statistical or census data

  • Data from industry organizations

  • Data from publications and news agencies

Although the use of external data presents opportunities for enriching the data warehouse, it may also present difficulties because of differences in granularity. For example, the external data may not be readily available at the level of detail required by the data warehouse and may require some transformation or summarization.

Verify assumptions about the external databases before planning to use these as data sources in warehousing projects.

Define Warehouse Roolouts (Phased Implementation)

Divide the data warehouse development into phased, successive rollouts. Note that the scope of each rollout will have to be finalized as part of the planning for that rollout. The availability and quality of source data will play a critical role in finalizing that scope.

As stated earlier, applying a phased approach for delivering the warehouse should lower the overall risk of the data warehouse project while delivering increasing functionality and data to more users. It also helps manage user expectations through the clear definition of scope for each rollout.

Figure 6-1 is a sample table listing all requirements identified during the initial round of interviews with end users. Each requirement is assigned a priority level. An initial complexity assessment is made, based on the estimated number of source systems, early data quality assessments, and the computing environments of the source systems. The intended user group is also identified.

Sample Rollout Definition

Figure 6-1. Sample Rollout Definition

More factors can be listed to help determine the appropriate rollout number for each requirement. The rollout definition is finalized only when it has been approved by the Project Sponsor.

Define Preliminary Data Warehouse Architecture

Define the preliminary architecture of each rollout based on the approved rollout scope. Explore the possibility of using a mix of relational and multidimensional databases and tools, as illustrated in Figure 6-2.

At a minimum, the preliminary architecture should indicate the following:

Sample Preliminary Architecture per Rollout

Figure 6-2. Sample Preliminary Architecture per Rollout

  • Data warehouses and data mart. . Define the intended deployment of data warehouses and data marts for each rollout. Indicate how the different databases are related (i.e., how the databases feed one another). The warehouse architecture must ensure that the different data marts are not deployed in isolation.

  • Number of users. . Specify the intended number of users for each data access and retrieval tool (or front-end) for each rollout.

  • Location. . Specify the location of the data warehouse, the data marts, and the intended users for each rollout. This has implications on the technical architecture requirements of the warehousing project.

Evaluate Development and Production Environment and Tools

Enterprises can choose from several environments and tools for the data warehouse initiative. Select the combination of tools that best meets the needs of the enterprise. At present, no single vendor provides an integrated suite of warehousing tools. There are, however, clear leaders for each tool category.

Sample Tool Short-List

Figure 6-3. Sample Tool Short-List

Eliminate all unsuitable tools, and produce a short-list from which each rollout or project will choose its tool set (see Figure 6-3). Alternatively, select and standardize on a set of tools for all warehouse rollouts.

In Summary

A data warehouse strategy at a minimum contains:

  • la preliminary data warehouse rollout plan, which indicates how the development of the warehouse is to be phased;

  • la preliminary data warehouse architecture, which indicates the likely physical implementation of the warehouse rollouts; and

  • lshort-listed options for the warehouse environment and tools.

The approach for arriving at these strategy components may vary from one enterprise to another; the approach presented in this chapter is one that has consistently proven to be effective.

Expect the data warehousing strategy to be updated annually each warehouse rollout provides new learning and as new tools and technologies become available.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.254.122