Chapter 23 Administering the DW 2.0 environment
The DW 2.0 environment is a complex environment that is built over a long period of time. The DW 2.0 environment touches many parts of the corporation—day-to-day operations, management, tactical decisions, strategic decisions, and even the boardroom. There are many facets to the DW 2.0 environment—technical, business, legal, engineering, human resources, and so forth. As such, the DW 2.0 environment is one that is a long-term management issue and requires careful management and administration.
This chapter will touch on some of the many issues of management and administration of the DW 2.0 environment.
At the intellectual heart of the DW 2.0 environment is the data model. The data model is the description of how technology meets business. The data model is used to direct the efforts of many different developers over a long period of time. When the data model is used properly, one piece of development fits with another piece of development like a giant jigsaw puzzle that is built over a period of years. Stated differently, without a data model, coordinating the long-term development of many projects within the DW 2.0 environment with many different people is an almost impossible task.
The data model is built at different levels—high, middle, and low. The first step (and one of the most difficult) is the definition of the scope of integration for the data model. The reason the scope of integration is so difficult to define is that it never stays still. The scope is constantly changing. And each change affects the data model.
When the scope changes too often and too fast, the organization is afflicted with “scope creep.”
Over time, the high-level data model needs the least amount of maintenance. The long-term changes that the organization experiences have the most effect on the midlevel model and the low-level model. At the middle level of the data model, over time the keys change, relationships of data change, domains of data change, definitions of data change, attributes change, and occasionally even the grouping of attributes changes. And as each of these changes occurs, the underlying related physical data base design also changes.
Part of the job of the administration of the data model is to ensure that it is being followed with each new development and with each new modification of the data warehouse. The biggest challenges here are to ensure that
It is important to note that groupings of attributes and keys/foreign keys are very important for compliance, but other aspects of the data model may be less important.
In addition, it is not necessary for the derivatives of primitive data to be in compliance with the data model.
In addition to an administrative organization tending to the data model for compliance with the data model, it is necessary to have a general architectural organization that administers the DW 2.0 architecture. The architectural administration is one that tends to the long-term oversight of the architecture. Some of the concerns of the architectural administration are the following.
In most environments it is not necessary to build the archival environment immediately. Usually some period of time passes before it is necessary to build the archival environment. The architectural administration provides the guidance for when and how the archival environment is to be built. The architectural administration determines many aspects of the archival environment, such as
If the Near Line Sector is needed, the architectural administration determines such important parameters as when data will be moved into the Near Line Sector, when it will be moved back into the Integrated Sector, and when it will be moved to the Archival Sector; what metadata will be stored; what platform the Near Line Sector will reside on; and so forth. Over time the need for a Near Line Sector can change. At the point of initial design it may be apparent that there is no need for a Near Line Sector. But over time the factors that shape the need may change. Therefore, there may come a day when the Near Line Sector is needed. It is the job of the architectural administration to make that decision. Some of the decisions made by the architectural administrator include
Another sector of the DW 2.0 environment that is of concern to the architectural administrator is the Interactive Sector. In some organizations, there is no interactive environment. In other organizations there is an interactive environment. The architectural administrator addresses such issues as the following:
Another task of the architectural administrator is that of making sure there is never a flow of data from one data mart to another. Should the administrator find that such a flow occurs, he/she redirects the flow from a data mart to the DW 2.0 environment and then back to the data mart receiving the data.
Yet another ongoing task of the architectural administrator is to ensure that monitoring is occurring properly and that the results of the monitoring are properly interpreted. Various kinds of monitoring activity need to occur in the DW 2.0 environment. There is the monitoring of transactions and response time in the interactive environment and there is the monitoring of data and its usage in the other parts of the DW 2.0 environment.
Some of the questions that should be considered in monitoring the DW 2.0 environment follow:
One of the most important determinations made as a result of monitoring the usage of the Integrated Sector is whether it is time to build a new data mart. The administrator looks for repeated patterns of usage of the Integrated Sector. When enough requests for data structured in the same way appear, it is a clue that a data mart is needed.
These then are some of the activities of the architectural administration of the DW 2.0 environment. But there are other aspects of the DW 2.0 environment that need administration as well.
It should go without saying that one of the skills the architectural administrator needs to have is that of understanding architecture. It is futile for an individual to try to act as an architectural administrator without understanding what is meant by architecture and what the considerations of architecture are.
Another important part of architectural administration is that of managing the ETL processes found in DW 2.0. The first kind of ETL process found in DW 2.0 is that of classical integration of data from application sources. The issues that need to be monitored here include the traffic of data that flows through the ETL process, the accuracy of the transformations, the availability of those transformations to the analytical community, the speed and ease with which the transformations are made, and so forth. The second type of ETL tool is that of textual transformation by which unstructured data is transformed into the data warehouses contained in DW 2.0. The issues of administration here include the volume of data that arrives in the DW 2.0 environment, the integration algorithms that are used, the type of data that is placed in the DW 2.0 environment, and so forth. It is noted that the two forms of ETL transformation are entirely different from each other.
One of the most important aspects of the DW 2.0 environment is that of metadata. For a variety of reasons the administration of metadata is its own separate subject. Some of those reasons are:
And there probably are more reasons metadata management is a sensitive issue.
The problem is that it is metadata that is needed for the many different parts of the DW 2.0 environment to be meaningfully held together. Stated differently, without a cohesive metadata infrastructure, the many different parts of DW 2.0 have no way of coordinating efforts and work.
There are many aspects to metadata administration. Some of these aspects include
In addition to these considerations, the metadata administrator makes such important decisions as
One of the issues of metadata is its ephemeral nature. Unlike structured data, metadata comes in many forms and structures. It simply is not as stable or as malleable as other forms of data.
Another major issue relating to metadata is that of the different forms of metadata. There are two basic forms of metadata. Those forms are
As a rule technical metadata is much easier to recognize and capture than business metadata. The reason technical metadata is easier to find and capture is that it has long been understood. The truth is that business metadata has long been a part of the information landscape, but business metadata has not been formally addressed—by vendors, by products, by technology. Therefore, it is much easier to find and address technical metadata than it is to find and address business metadata.
Another essential aspect of the DW 2.0 environment is that of data base administration. In data base administration the day-to-day care and tending of data bases is done. Data base administration is a technical job. The data base administrator needs to know such things as how to restore a data base, how to recover lost transactions, how to determine when a transaction has been lost, how to bring a data base back up when the data base goes down, and so forth.
In short, when something malfunctions with a data base, it is the data base administrator who is charged with getting the data base back up and running.
One of the challenges of data base administration is the sheer number of data base administration activities that are required in the DW 2.0 environment. There are so many data bases and tables that the data base administrator cannot possibly devote huge amounts of time to any one data base. There simply are too many, all of which are too important to focus on any one data base. Therefore, the administrator needs tools that can look over the many aspects of the many data bases and tables that comprise the DW 2.0 environment.
Some of the considerations of data base administration in the DW 2.0 environment are
As a rule the data base administration job is a 24/7 job. Someone from data base administration is on call all the time to be able to advise computer operations as to what to do when a problem arises. Especially in the case of the interactive environment, when a data base problem occurs, the data base administrator needs to be as proactive as possible, because malfunctions and down time equal dissatisfaction with the environment. But being proactive is difficult because the vast majority of the tasks facing the data base administrator are reactive.
In recent years, with governance and compliance becoming large issues, the role of stewardship has become an important topic. In years past it was sufficient simply to get data into and through the system. In today’s world, the quality and accuracy of the data have become important.
It is in this framework that stewardship has been elevated into a position of recognized responsibility.
Stewardship entails the following:
To differentiate between the role of data base administrator and that of data steward, consider the following. When a data base has “gone down” and is unavailable to the system, the data base administrator is called. When performance suffers and there is a general system slowdown, the data base administrator is called. When there is an incorrect value in a record that the end user has encountered, the data steward is called. And when it comes time to create a new data base design, and the sources of data and their transformation are considered, the data steward is called.
There are, then, related yet different sets of activities that the data base administrator and the data steward have.
As a rule the data base administrator is a technician and the data steward is a business person. Trying to make the job of the data steward technical is usually a mistake.
Some of the aspects of the data steward’s working life include
As a rule a large corporation will have more than one data steward. There are usually many business people that act as data stewards. Each primitive data element will have exactly one and only one data steward at any moment in time. If a data element has no data steward or if a data element has more than one data steward at any point in time, then there is an issue.
An integral part of the DW 2.0 environment is systems and technology. In its final format the world of DW 2.0 resides on one or more platforms. Because of the diverse nature of the data, processes, and requirements that are served by the different parts of the DW 2.0 environment, it is very unusual to have a single platform serve the entire DW 2.0 environment. Instead, many different technologies and many different platforms need to be blended together to satisfy the needs of DW 2.0 processing.
In one place DW 2.0 requires high performance. In another place DW 2.0 focuses on integration of data. In other places DW 2.0 mandates the storage of data for a long period of time. And in yet other places DW 2.0 caters to the needs of the analytical end user. In brief, there are many different kinds of measurement that determine the success of the DW 2.0 environment in different places.
These needs are very different and it is not surprising that no single platform or no single technology meets all of these needs at once.
Therefore, the technical and systems administrator of the DW 2.0 environment wears a lot of hats. Some of the aspects of the job of the technical administrator are
An important aspect of the technical administrator’s job is capacity planning. In many ways the job of the technical administrator is like that of the data base administrator. The technician operates in many cases in a reactive mode. And no person likes to be constantly bombarded with the need to have everything done yesterday. And yet that is exactly the world in which the technician and the data base administrator often find themselves.
One of the most important ways that the technician can get out of being in a reactive mode is to do proper capacity planning. Not all errors and problems in the DW 2.0 environment are related to capacity, but many are. When there is adequate capacity, the system flows normally. But when a system starts to reach the end of its capacity, it starts to fall apart, in many different manifestations.
The sorts of capacity and related measurements that the technician pays attention to in the DW 2.0 environment include
By looking at these various measurements, the technician can preempt many problems before they occur.
Other important measurements include the growth of dormant data in the Integrated Sector, the growth of near-line storage, the growth of archival storage, the measurement of the probability of access of data throughout the environment, network bottlenecks, and so forth. In short, any place the technician can preempt a critical shortage, the better.
One of the most important jobs of management of the DW 2.0 environment is that of managing end-user relationships and expectations. If management ignores this aspect of the DW 2.0 environment, then management is at risk. Some of the ways in which the end-user expectations are managed include
Another important element of the management of end-user relationships is the establishment of an SLA, or service level agreement. The SLA is measured throughout the day-to-day processing that occurs in the DW 2.0 environment. The SLA provides a quantifiable and open record of system performance. The establishment of the SLA benefits both the end user and the technician. As a rule the SLA addresses both online performance and availability. In addition, the SLA for the analytical environment is very different from the SLA for the transactional environment.
In the case in which there is statistical processing to be done in the DW 2.0 environment, the technician must carefully monitor the full impact of the statistical processing on resource utilization. There is a point at which a separate facility must be built for research statistical analysis.
Sitting over all of these administrative activities is management. It is management’s job to make sure that the goals and objectives of the DW 2.0 environment are being met. And there are many aspects to the management of the DW 2.0 environment. Some of the more important aspects are the following.
The buck stops at the manager’s office when it comes to prioritization. It is almost mandatory that certain parts of the organization want changes and additions to DW 2.0 at the same time that other parts also want changes and additions. It is the job of the manager to resolve (or at least ameliorate) the conflicts. Typical considerations include
The manager must juggle all of these considerations when determining the order of additions and adjustments to the organization. But there are other considerations when managing the DW 2.0 environment.
The primary way in which management influences the organization is through budget. The projects that receive funding continue and flourish; the projects that do not receive funding do not. Some budgetary decisions are long term and some are short term. Nearly everything that is done in the DW 2.0 environment is done iteratively. This means that management has the opportunity to make mid-term and short-term corrections as a normal part of the budgeting process.
One of the most important parts of management is the setting of milestones and schedules. Usually management does not create the original schedules and milestones. Instead management has the projects that are being managed propose the schedules and milestones. Then management approves those milestones that are acceptable. Because nearly all aspects of DW 2.0 are constructed in an iterative manner, management has ample opportunity to influence the corporate schedule.
The manager chooses who gets to lead projects. There is an art to selecting leadership. One school of thought says that when a project is in trouble more resources need to be added. Unfortunately this sends the wrong message to the organization. One sure way to get more resources is to get a project in trouble. Another approach is to remove the project leader of any project that gets in trouble. Unfortunately there often are legitimate circumstances that cause a project to become mired down. The art of management needs to be able to determine which of these circumstances is at hand and to make the proper decision. Another way of saying this is that management needs to be able to tell the difference between running over a speed bump and running off a cliff.
Because the development skills for DW 2.0 are in short supply, it is very normal for an organization to turn to outside consultants for help. Management needs to be able to select a consulting firm objectively and not necessarily select the consulting firm that has always been the preferred supplier. The reason is that the preferred supplier may not have any legitimate experience. In addition management needs to be wary of consulting firms that sell a corporation on their capabilities and then staff the project with newly hired people who are “learning the ropes” at the expense of the client. There are several ways to ensure that a consulting firm does not “sell the goods” to an unsuspecting firm.
In summary, there are many aspects to the management and administration of the DW 2.0 environment. Some of the aspects include the administration of
18.217.7.174