Chapter 7. Warehouse Management and Support Processes

Warehouse Management and Support Processes Warehouse management and support processes are designed to address aspects of planning and managing a data warehouse project that are critical to the successful implementation and subsequent extension of the data warehouse. Unfortunately, these aspects are all too often overlooked in initial warehousing deployments.

These processes are defined to assist the project manager and warehouse driver during warehouse development projects.

Define Issue Tracking and Resolution Process

During the course of a project, it is inevitable that a number of business and technical issues will surface. The project will quickly be delayed by unresolved issues if an issue tracking and resolution process is not in place. Of particular importance are business issues that involve more than one group of users. These issues typically include disputes over the definition of business terms and the financial formulas that govern the transformation of data.

An individual on the project team should be designated to track and follow up the resolution of each issue as it arises. Extremely urgent issues (i.e., issues that may cause project delays if left unresolved) or issues with strong political overtones can be brought to the attention of the Project Sponsor, who must use his or her clout to expedite the resolution process.

Figure 7-1 shows a sample issue log that tracks all the issues that arise during the course of the project.

The following issue tracking guidelines will prove helpful:

  • Issue description. . State the issue briefly in two to three sentences. Provide a more detailed description of the issue as a separate paragraph. If there are possible resolutions to the issue, include these in the issue description. Identify the consequences of leaving this issue open, particularly any impact on the project schedule.

  • Urgency. . Indicate the priority level of the issue: high, medium, or low. Low-priority issues that are left unresolved may later become high priority. The team may have agreed on a resolution rate depending on the urgency of the issue. For example, the team can agree to resolve high-priority issues within three days, medium-priority issues within a week, and low-priority issues within two weeks.

  • Raised by. . Identify the person who raised the issue. If the team is large or does not meet on a regular basis, provide information on how to contact the person (e.g., telephone number, e-mail address). The people who are resolving the issue may require additional information or details that only the issue originator can provide.

    Sample Issue Log

    Figure 7-1. Sample Issue Log

  • Assigned to. . Identify the person on the team who is responsible for resolving the issue. Note that this person does not necessarily have the answer. However, he or she is responsible for tracking down the person who can actually resolve the issue. He or she also follows up on issues that have been left unresolved.

  • Date opened. . This is the date when the issue was first logged.

  • Date closed. . This is the date when the issue was finally resolved.

  • Resolved by. . The person who resolved the issue. Note that this person must have the required authority within the organization to resolve issues. User representatives typically resolve business issues. The CIO or a designated representative typically resolves technical issues. The Project Sponsor typically resolves issues related to project scope.

  • Resolution description. . State briefly the resolution of this issue in two or three sentences. Provide a more detailed description of the resolution in a separate paragraph. If subsequent actions are required to implement the resolution, these should be stated clearly and resources should be assigned to implement them. Identify target dates for implementation.

Issue logs formalize the issue resolution process. They also serve as a formal record of key decisions made throughout the project.

In some cases, the team may opt to augment the log with yet another form—one form for each issue. This typically happens when the issue descriptions and resolution descriptions are quite long. In this case, only the brief issue statement and brief resolution descriptions are recorded in the issue log.

Perform Capacity Planning

Warehouse capacity requirements come in the following forms: space required, machine processing power, network bandwidth, and number of concurrent users. These requirements increase with each rollout of the data warehouse.

During the stage of defining the warehouse strategy, the team will not have the exact information for these requirements. However, as the warehouse rollout scopes are finalized, the capacity requirements will likewise become more defined.

Review the following capacity planning requirements basing your review on the scope of each rollout.

Space Requirements. . Space requirements are determined by the following:

  • schema design, expected volume, and expected growth rate;

  • indexing strategy used;

  • backup and recovery strategy;

  • aggregation strategy;

  • staging and deduplication area required; and

  • metadata space requirements.

Machine Processing Power. . MPP (massively parallel processing) and SMP (symmetric multiprocessing) machines are the ideal hardware platform for data warehousing. Choose a configuration that is scalable and that meets the minimum processing requirements.

Network Bandwidth. . The network bandwidth must not be allowed to slow down the warehouse extraction and warehouse performance. Verify all assumptions about the network bandwidth before proceeding with each rollout.

Define Warehouse Purging Rules

Purging rules specify when data are to be removed from the data warehouse. Keep in mind that most companies are interested only in tracking their performance over the last three to five years. In cases where a longer retention period is required, the end users will quite likely require only high-level summaries for comparison purposes. They will not be as interested in the detailed or atomic data.

Define the mechanisms for archiving or removing older data from the data warehouse. Check for any legal, regulatory, or auditing requirements that may warrant the storage of data in other media prior to actual purging from the warehouse. Acquire the software and devices that are required for archiving.

Define Security Measures

Keep the data warehouse secure to prevent the loss of competitive information either to unforeseen disasters or to unauthorized users. Define the security measures for the data warehouse, taking into consideration both physical security (i.e., where the data warehouse is physically located), as well as user-access security.

Additional precautions are required if either the warehouse data or warehouse reports are available to users through an intranet or over the public Internet infrastructure.

Define Backup and Recovery Strategy

Define the backup and recovery strategy for the warehouse, taking into consideration the following factors:

  • Data to be backed up. . Identify the data that must be backed up on a regular basis. This gives an indication of the regular backup size. Aside from warehouse data and metadata, the team might also want to back up the contents of the staging or deduplication areas of the warehouse.

  • Batch window of the warehouse. . Backup mechanisms are now available to support the backup of data even when the system is online, although these are expensive. If the warehouse does not need to be online 24 hours a day, 7 days a week, determine the maximum allowable down time for the warehouse (i.e., determine its batch window). Part of that batch window is allocated to the regular warehouse load and, possibly, to report generation and other similar batch jobs. Determine the maximum time period available for regular backups and backup verification.

  • Maximum acceptable time for recovery. . In case of disasters that result in the loss of warehouse data, the backups will have to be restored in the quickest way possible. Different backup mechanisms imply different time frames for recovery. Determine the maximum acceptable length of time for the warehouse data and metadata to be restored, quality assured, and brought online.

  • Acceptable costs for backup and recovery. . Different backup mechanisms imply different costs. The enterprise may have budgetary constraints that limit its backup and recovery options.

Also consider the following when selecting the backup mechanism:

  • Archive format. . Use a standard archiving format to eliminate potential recovery problems.

  • Automatic backup devices. . Without these, the backup media (e.g., tapes) will have to be changed by hand each time the warehouse is backed up.

  • Parallel data streams. . Commercially available backup and recovery systems now support the backup and recovery of databases through parallel streams of data into and from multiple removable storage devices. This technology is especially helpful for the large databases typically found in data warehouse implementations.

  • Incremental backups. . Some backup and recovery systems also support incremental backups to reduce the time required to back up daily. Incremental backups archive only new and updated data.

  • Offsite backups. . Remember to maintain offsite backups to prevent the loss of data due to site disasters such as fires.

  • Backup and recovery procedures. . Formally define and document the backup and recovery procedures. Perform recovery practice runs to ensure that the procedures are clearly understood.

Set Up Collection of Warehouse Usage Statistics

Warehouse usage statistics are collected to provide the data warehouse designer with inputs for further refining the data warehouse design and to track general usage and acceptance of the warehouse.

Define the mechanism for collecting these statistics, and assign resources to monitor and review these regularly.

In Summary

The capacity planning process and the issue tracking and resolution process are critical to the successful development and deployment of data warehouses, especially during early implementations.

The other management and support processes become increasingly important as the warehousing initiative progresses further.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.186.6