7

XML-Based Portal Technologies and Data-Management Strategies

Chapter Overview

Chapter 7 discusses our use of a specific type of portal—an XML-based portal—as a foundation upon which to build an organizational datamanagement practice. After dodging a fast-moving hype bullet, we describe how the legacy code maintenance burden moves inversely to the degree of information engineering that organizations are able to implement. The pain inflicted by the legacy code maintenance burden can be reduced by reapplying information-engineering principles in an XMLbased portal environment. XML-based portals (XBPs) are a specific type of portal technology that greatly facilitates the implementation of specific data-management functions through the careful application of specific architectural enhancements, integrative solutions, extended data management reach, data preparation techniques, new technical reengineering projects, and maintenance burden reduction strategies.

A portal is essentially an application that allows for rapid integration of data from internal and external sources, and provides the user with a standard interface for accessing many different resources. When we use the word portal, we do not mean the type of web portal seen on the Internet, such as what Yahoo! provides. Still, there are some distinct simi-larities—Internet portals typically aggregate news, stock quotes, weather information, search capabilities, comics, and all forms of other information into one customized page that is specific to a user. In this way, the portals we will discuss in this chapter are indeed similar to web portals, only with different sources of data being provided to the user, and very different internal operations and capabilities.

Portal Hype

Unfortunately, when discussing portals, it is necessary to contend with the hype surrounding the concept. One of the authors of this book helped to start the portal wave by co-authoring with Clive Finkelstein the first book on corporate portals. Thanks to Clive, we had the sense to discuss the huge role that XML should play in this new technology. But most writers on this topic do not specifically address the XML capabilities of their portal examples, and thus readers have been unable to assess the technology’s features properly.

It may appear as if we ourselves are resorting to hype by titling this chapter “XML-Based Portal Technologies and Data-Management Strategies.” However, our goal with this chapter is to illustrate how the use of XBPs can effectively support the expansion of an organization’s existing data-management efforts. First, we need to explore and review the business problem that careful application of XML-based data-management principles can solve—exactly how this approach can reduce the burden of legacy code and help data managers. Since the amount and varying types of data that organizations need to store is increasing, so too should the scope of data management be extended to meet those demands. In this context, we will cover material that offers a reason to be excited about what portals have brought to the market. But before we define portals in further detail, we must explain our use of another buzzword.

We will be using a term in this chapter that has sometimes been considered a “bad” word—reengineering. The term was misused so frequently throughout the late 1980s and 1990s that it has become the source of much misunderstanding and, frankly, disgruntlement. A prime example occurred in the business community as both management and workforce came to understand the term reengineering to be synonymous with cost cutting, job loss, and turmoil. This could be thought of as the “Dilbert” definition of reengineering. Earlier in this book, we defined “metadata” in order to provide a common frame of reference and otherwise reduce confusion. For the same reason, it is necessary to offer a good definition of reengineering here.

In this chapter, we will use the definition of the term as formalized by Chikofsky and Cross (1990) and later adapted by Aiken (1996):

Reengineering is the examination and alteration of the target system to reconstitute it in a new form. The target system is the system to be reengineered. Reengineering can only result from the coordination of forward and reverse engineering efforts. An important point to understand is that reengineering generally involves first some reverse and then some subsequent forward engineering.

One of the interesting things to note about this definition is that a source system (that which is being reverse engineered) and a target system (what we are forward engineering toward) are assumed. In other words, there needs to be a “before” and an “after.” Too often in the past, reengineering has been used as a term not for changing existing systems, but as a metaphor for essentially throwing everything out the window and starting over, usually at tremendous expense. There is a very specific reason for using a formalized definition of the term. Our use of reengineering is intended to help readers understand that the methods described here are based in sound, repeatable, predictable, and standardized engineering principles. Engineering in the physical world cannot be accomplished without understanding the meaning and properties of the components being used. In this respect, data engineering is no different. The principles described here can be used in implementation work in virtually any organizational context.

Now that we have primed the pump by learning about a subject that some did not think they wanted to learn more about (portals) in the context of an un-useful business term (reengineering), we will next explain how organizations can save millions annually by focusing on XML-based portal development. This development is aimed at providing a synergistic combination of data management, portals, and XML for implementing a generalized access to cross-functional data. (Note that many organizations provide excellent cross-functional data access without XML; this guidance is for organizations desiring to catch up.) Next, an exploration of how data-management roles will expand is presented, followed by a description of the growing importance of data preparation. The chapter closes with a few case studies, including a description of how portals work with an ERP.

The Need: Leimageacy Code Maintenance Burden

Whether it is called legacy, heritage, or some very bad explicative, code, organizations maintain huge amounts of it. Until recently, outsourcing and ERP implementation have been considered the primary means of reducing legacy code burden. ERP implementation statistics are not good (see Figure 7.1), and outsourcing requires the organization to lose respon-siveness and control over intellectual capital.

image

Figure 7.1 ERP implementation statistics—most implementations result in cost and schedule overruns. (Adapted from statistics on the Standish Group web site, www.standishgroup.com, accessed 10/2001).

With the advent of web-service standards and recommendations, organizations can plan to use legacy functionality by wrapping web services around existing systems as an alternative to maintaining a large code inventory.

Let us begin by examining the legacy system characteristics that often drive organizations to consider ERP implementation or outsourcing of this functionality. The example we will use is one published by Michael L. Brodie and Michael Stonebraker in their 1995 book Migrating Legacy Systems (see References for publishing information). In the example they use, they describe and represent relevant characteristics of a rather typical legacy system. Given the authors’ combined experience with more than 50 major legacy systems, this does present an accurate if disconcerting picture of the state of systems.

This cash-management system supported check processing and other specialized services for large corporate customers, including such functions as:

image Zero account balancing

image Reconciliation of cleared checks

image Electronic funds transfer (e.g., Swift)

image Lock box operations

image Online query facility

The system cost the organization more than $16 million to put into place. Developed in 1981, the system ran on an on IBM 3090 and was used to manage more than 100 gigabytes of VSAM files. Composed of 40 software modules and 8 million lines of COBOL/CICS code, the system maintained the Federal Reserve Bank connection and processed 300,000 transactions daily, as well as anywhere from 1 to 2 million checks nightly.

The first figure to examine for opportunities is the 8 million lines of code. Organizations might spend approximately $0.10 to maintain one line annually, given the costs of paying programmers to debug and document the code. Multiply that figure by 8 million, and it will be clear why reductions in the amount of code that organizations are required to maintain would be welcome indeed. It should come as no surprise to discover that Pareto principles apply to application code. Analysis of the 8 million lines reveals that three software modules that comprise just 1.7 million lines of code perform the key application functionality. Figure 7.2 shows how the core functionality of the system requiring reengineering accounts for 22% of the application code; the remaining code can be sliced away, as it is not key to the application functionality and consists of interfaces to other organizational parts, interface code, interfaces to data-management routines, and interfaces to hardware, such as for check sorting.

image

Figure 7.2 Core functionality accounts for 22% of the application code.

This is another feature of the stovepipe manner in which systems have been traditionally developed that is not often reported. Each application was developed as a stand-alone system and contains as much as five times the code that would have been required if the application had been developed as part of an integrated system. That represents only a portion of the problem. Stovepipe systems also have other serious drawbacks; typically, they lack integration with other applications, or what integration does exist is insufficient or ad hoc, and their data structures are highly dependent on the code behind them that manipulates the data. This inherent complexity causes problems when it comes time to change and evolve these systems. Organizations can clearly benefit from a new approach to information delivery.

Aiding Implementation of Information-Engineering Principles with XML-Based Portal Architectures

XML-based portals can help organizations to implement information-engineering (IE) principles as originally postulated by Clive Finkelstein (1989, 1993) in the late 1980s. In order to expand on this, we will first explore the mathematics of portals. While most legacy applications can be reengineered to take advantage of portal capabilities, the conservative figure that only one in five are can be transformed into portal-based applications. In this situation, the math works out as follows: Take an organization with 5 million lines of legacy code and apply the $0.10 annual maintenance cost per line of code, which gives us an annual maintenance cost of $500,000. Reducing 80% of the code by just 20% would permit savings of $80,000 annually. The 80% figure represents the application code that covers the necessary infrastructure of the application that does not have anything to do with the business logic, such as creating database connections, formatting queries and response sets, and a plethora of other technical details. If we take a more realistic approach, that many more than just one out of five systems can be evolved into this format, what occurs is an increase in the annual savings paired with a substantial decrease in the legacy code maintenance burden.

As with most technologies, there are places where a portal could be implemented from which the organization will not experience a positive return on investment, as well as places where the organization will experience diminishing returns. The portal capabilities focus on trying to reduce as much of that 80% “infrastructure” code as possible. Reducing this code by 20% is certainly a conservative goal. More is likely achievable, but pushing to extend savings beyond 50% or 60% may yield only marginal additional gains. Code reduction is only one benefit of portals. Other tangible savings can accrue in the first year from:

image The reduction of computing time required to maintain and run the code that is no longer needed

image Savings from avoiding the storage of redundant datasets

image Measurable knowledge-worker productivity gains. These gains stem from the benefits of increased information integration.

Longer-term savings come as the portal gains critical mass with respect to other integration challenges. It will be clear when this happens because application users who have not yet reengineered their systems for the portal will be asking the data managers to help them move their applications into the portal. In order to accomplish the integration, they will offer some of their resources, such as subject matter experts (SMEs) and direct fiscal support to pay for the reengineering. This happens when the users realize that it is an investment in technology that will directly pay off. It is often difficult to get portal technology in the front door for the first time. Once projects have been successfully implemented, it is common for other users to want a piece of that success for their area of operation. Since the portal functionality essentially removes the maintenance burden of a substantial portion of systems, as more systems are migrated over to use a portal that has been implemented, the savings tend to accumulate quickly, due to the use of a shared portal.

Finally and most importantly, these savings can and should be claimed by the data management group, as the savings result from the work of data managers and good data management principles. The trend is toward helping data managers transform their business units into something that shows positive return on investment. That sounds great, but how is an organization supposed to go about picking a target on which to try this approach?

Figure 7.3 shows how careful examination of a legacy environment can identify a number of stovepipe applications, typically the most suitable for evolving toward a portal architecture. These applications must be stripped of features and code that do not add value and repackaged as web services invoked from the XML-based portal. What is useful in the applications is the business logic. The portal should handle as many other technically necessary aspects as possible. The applications identified here represent “low-hanging fruit”—those systems where savings can most easily be realized. Once the process has been completed one time, organizations become increasingly adept at this type of system reengineering and more applications will be seen as candidates for web service-based portal wrapping. Earlier, we talked about the conservative estimate that only one in five systems could benefit from this process. This figure represents the low-hanging fruit described in this section. Once the most obvious problem applications have been helped, data managers often come to realize that the approach can be applied to many more than just one in five systems.

image

Figure 7.3 Evolving applications from stovepipe to web service-based architectures.

As a result of careful reengineering, it will be easier to understand and evolve the portal-based services. Figure 7.4 shows how organizations with experience can apply variations on the initial techniques to create a more useful service. These are the “one-to-many” and “many-to-one” techniques:

image

Figure 7.4 Alternative and more sophisticated means of application reengineering legacy applications into an XMLbased portal context.

image Take a single legacy application that provides two or more services and break it into several applications providing each of those individual services

image Combine one system’s services with those of another existing system’s services to produce one system where there were previously two

These approaches can be more easily implemented because the XMLbased portal services are better understood. They provide a single type of service that can be used in many different places.

Portals are typically implemented by IT organizations to provide a uniform gateway or entryway to enterprise applications. The upper half of Figure 7.5 shows the flow moves from portal to application and from application to data. The lower half illustrates that by adding XML capabilities to those of the portal, organizations can connect their legacy data directly to the portal without relying solely on access provided by the legacy applications.

image

Figure 7.5 XBPs can be used to access legacy data directly, as shown in the lower half of this figure.

Reengineering legacy applications into XML-based, portal-wrapped web services allows organizations to move toward the original promise of information engineering (IE). Clive Finkelstein’s premise has always been that well-structured data should be at the heart of all technology systems. The term “well-structured data” has always meant in part that the supporting data structures should be flexible and adaptable. Organizations gain three advantages by implementing well-understood data structures:

1. Before data assets can be managed., they must be understood. As Michael Brackett (a past president of DAMA International) has stated many times, “Data is the one organizational asset that cannot be used up.” It makes sense to develop a good working knowledge of your organizational data. There are many good reasons that data has been implemented in a stovepipe manner—time and budget pressures, lack of skilled data managers, lack of understanding of data-management principles, industry-wide lack of understanding as to the costs of these approaches, and many others. The result is that data managers often spend time trying to understand their own data, rather than managing it. The advantage here is that by understanding data, it becomes possible to manage it.

2. Organizations that understand their data, better understand their organizational capabilities. If the strengths and weaknesses of data assets are known, the picture of what capabilities an organization has is clearer. This aids in the recognition of opportunities that the organization might not have previously known were within their grasp. It also prevents attempts at implementing technology-based solutions that are dependent on or contain badly designed data. Frequently, implementation projects (such as ERPs) have failed due to incompatibility between data structures, and poor data quality. Once capabilities are understood from a data perspective, data managers can consolidate redundant data. Reducing the amount of data managed by the organization is often a necessary prerequisite to marshaling it in support of organizational strategy. While reduction of the amount of data is significant in its own right, it also greatly helps the process of creating reusable assets—a necessary and too-often overlooked complement to implementing application code reuse.

3. Knowing and understanding data permits data managers to control this asset in support of larger strategies. It is only when data is well understood that organizations can effectively utilize it in support of strategy. All systems exist to support aspects of strategy. One revealing exercise is to formally identify the elements of strategy supported by each application. Data that is organized in stovepipe fashion is often brittle, meaning that changes often require unanticipated modifications to the application code and business processes that use the data. Flexible data stores make it significantly easier to repurpose data in other applications and processes. This makes it cheaper and easier to use data assets in support of strategies. Simultaneously, it simplifies planners’ understanding of how to use data assets in support of strategy.

The astute reader will have noticed here that XML may not seem to be a key component of this scheme. It can and has been successfully implemented by organizations that do not use XML. XML is most helpful in providing lower metadata development costs for organizations that have low data-management maturity.

To summarize, implementation of IE requires understanding data assets before understanding how their properties support and impede strategies. With a collective 30 years of consulting practice and research into data-management practices (Aiken & Mattia, In press), we have seen only 1 in 10 organizations that are ready to implement IE principles. Organizations that want to improve their data-management practices can use empirically derived measures that already exist. Understanding where and in what order improvements can be effectively applied is key.

The good news is that those who are able to sustain IE, quickly reap the benefits; applications require less time to understand, code maintenance costs are lower, applications are engineered from the start to require less code, and more strategic dexterity results from implementing IE principles. The fundamental challenge of information engineering has been how to move legacy infrastructure forward to take advantage of these principles. The burden of legacy systems has increased to the point that maintenance is consuming up to 80% of IT budgets. Before the introduction of XML-based portal technologies, it was cheaper to pay for maintenance than to implement a true IE program. Now things have reached a point where there is almost nothing more expensive than status quo maintenance.

Given the introduction of XML-based portal technologies, the future of data management is changing in three important ways:

image The definition of data management is being expanded to include management of unstructured data. Examples of unstructured data would be Word documents, emails, presentations, and notes.

image Applications development will change due to the use of portal technologies and the benefits they provide.

image Data will likely be prepared with e-business in mind from the start, rather than preparing and cleansing data, only to later add an e-business interface.

These three points are described in the chapter subsections that follow. First, though, a precise definition is needed of what we mean when we say “XML-based portals” (XBP).

Clarifying Excitement Surrounding XML·Based Portals (XBPs)

Just as the hype surrounding the terms “reengineering” and “metadata” requires us to use a precise definition, the same is true of the term “portal.” A firm definition is needed before we can present XBP solutions. First, we will define portal and its basic capabilities, and describe the initial excitement portals have engendered.

Portal technology alone can be considered a significant advance in organizational capabilities. Unfortunately, due to the economic slowdown around the year 2000, the case for portals became buried in hype. Forecasts from the late 1990s before the dot.com bust predicted a rosy future for portals, and everyone tried to jump on the bandwagon, which subsequently collapsed under the weight of its own hype. Here is an example:

We have conservatively estimated the 1998 total market opportunity of the EIP (Enterprise Integration Portal) market at $4.4 billion. We anticipate that revenues could top $14.8 billion by 2002, approximately 36% CAGR [Compound Annual Growth Rate] for this sector. *

This rosy scenario was given during the high-rolling days before the dot.com bust. Keep in mind that Hollywood and the entertainment industry would think of $8 billion annually as a good year, and this will help to calibrate your thinking. Portals were seen as a source of growth, so a lot of attention was paid to the sector. Descriptions of portal visions include the following:

Envision the enterprise information portal as a browser-based system providing ubiquitous access to business-related information in the same way that Internet content portals are the gateway to the wealth of content on the web. (Info World Electric web site, accessed in 1996)

Portals are applications that enable organizations to more rapidly inter-change internally and externally stored information, and provide users a single gateway to personalized information needed to make informed business decisions. Portals are an emerging market opportunity; an amalgamation of software applications that consolidate, manage, analyze and distribute information across and outside of an enterprise (including business intelligence, content management, data ware-house and mart, and data management applications). (Merrill Lynch: SageMaker web site, accessed in 1998)

For all its hype, the initial portal goals were limited to attempting to provide a consistent user interface to the knowledge worker. But even the implementation of a common, well-designed interface alone would help organizations for a number of reasons described below.

image Better user/product understanding—users properly trained in the portal interface can be more intelligent, creative, and powerful in their use of the system.

image Lower cost to develop—significant up-front savings on development costs and interface maintenance costs are possible. In addition, improvements in data quality due to implementation of a consistent user interface are also realistic (see Aiken & White, 2004, for example).

image Lower organizational cost to implement and maintain—imple—mentation of a consistent user interface can reduce costs from current levels of about 30% of the total development costs to less than 30% through the use of portal technology. The portal gets the user interface right in 1 place, rather than having application programmers get it almost right in 10 places.

image Faster time to implement—organizations are able to implement user interfaces and subsequent modifications more rapidly, significantly reducing the barriers to implementing changes to processes and technology.

image Lower training costs—savings in technology training can be achieved though use of consistent user interfaces. Why train workers on many different interfaces?

Figure 7.6 shows how knowledge workers may have to navigate and understand numerous interfaces to obtain information supporting their work. We recall fondly the desk of an action officer at the Defense Information Systems Agency (DISA) whose desk (in 1992) was crowded with the following items:

image

Figure 7.6 Adapted from Terry Lanham’s articulation (Lanham, 2001).

image A vintage DOS machine (386)

image A Windows 3.0 machine

image A Macintosh

image A 3270 Terminal

image 2 UNIX machines—one to run native UNIX, the other a UNIX environment permitting a DOS shell to be opened

These six machines were required in order to accomplish the required tasks, which were specifically to act as the systems integrator by reading information from one machine and re-keying it into one or more other machines. In the same manner that Windows was designed to provide a consistent user experience for applications within its environment, a portal is an attempt to provide a consistent user experience across all applications. How many web sites does the average worker use in a day, each of which has a different way of logging in, logging out, and getting information?

For the DISA action officer, a portal solution—had it been available in 1992—would have combined all of the applications together under a single portal, as shown in Figure 7.7. When an application is provided over the web, a single computer can be used as the front end, and the officer would have accessed the various applications from a single application, namely the portal. Perhaps Terry Lanham stated it best when he said, “Portals do for systems what Window did for applications.” Terry, incidentally, presented one of the first portal ROI case studies where the savings amounted to $100 million annually (Lanham, 2001).

image

Figure 7.7 Adapted from Terry Lanham articulation (Lanham, 2001).

There is one very important quality-control caveat. With Windows, there are varying degrees of how well the applications conform to the typical user experience. Similarly, a wide variety of things can happen when a knowledge worker clicks on any of the links shown in Figure 7.7, depending on whether the system was properly reengineered. The range of possibilities is presented in Figures 7.8 through 7.10.

image

Figure 7.8 When a link is accessed—a 3270-based screen and interaction results.

image

Figure 7.9 When a link is accessed—a browser-based screen and interaction results.

image

Figure 7.10 When a link is accessed—a Windows-based screen and interaction experience results.

At the positive end of the quality spectrum, the link selected by the user might call up a screen designed according to standards containing the usual error-checking routines. The knowledge worker would use this screen to specify the information desired and submit the request. The system would return the information according to reasonable expectations and in a form that is most useful. A poor user experience with the portal might occur when the screen that is accessed from the portal link leads immediately to an old-fashioned “green screen” on which the knowledge worker is expected to enter cryptic codes and communicate with the system in a non-intuitive manner. In this case, very little has been done to reengineer the “old” application—as soon as the knowledge worker requests the service provided by the mainframe application, he or she is “in” the terminal emulator communicating with the application. Of course, practical integration features such as the ability to cut and paste across various applications typically correlate with the soundness of the reengineering task. Better reengineering results in better integration and user experiences.

With the caveat of the variability of the reengineering done to the application, we can see how a number of portal types were developed in order to address obvious information-dissemination needs. These portal types include:

image Enterprise Portal. This would be a single gateway via corporate Intranet or Internet to relevant workflows, application systems, and databases—integrated using XML and tailored to the specific job responsibilities of each individual.

image Employee Portal. All employees can access processes, systems, and databases via Intranet or Internet to carry out job responsibilities with full security and firewall protection.

image Customer Portal. A single gateway across the Internet, or via secure extranet, to details about products and services, catalogues, and order and invoice status for customers—integrated using XML and tailored to the unique requirements of each customer. Opportunities exist for one-to-one customer person-alization and management.

image Supplier Portal. A single gateway to purchase orders and related status information for the suppliers of an enterprise.

image Partner/Shareholder Portal. A single gateway for business partners or shareholders. (Finkelstein & Aiken, 1998)

XML·Based Portal Technology

So the question arises, is it not good enough to provide a consistent user interface? What else is needed? More importantly, why do data managers need to learn about portals? First, if you let the applications people develop their portals without data-management participation, you will miss a splendid opportunity to implement some data-quality and data-integration principles. These are the principles that will establish data managers as contributors to the solution of specific problems in a cost-effective manner. Second, application developers do not necessarily know about IE principles, and may require guidance with the reengineering of the applications and the structuring of services. This reengineering and structuring is important so that the applications fit into an architectural pattern that best supports the organization’s strategy. After all, if the application is not supporting some aspect of the organization or its strategy, why does it exist? Figure 7.11 illustrates specific portal advantages discussed previously and compares these to the advantages accruing from use of XBP technologies, discussed next.

image

Figure 7.11 Advantages that accrue from use of XML-based portal technologies.

So what is an XBP? There is a continuum of how much an application is like a portal just as there is a continuum of how well or poorly an application sticks to human-computer interaction standards. The degree to which various portals incorporate XML will range from simply paying XML lip service (almost no support) to support at every level of the portal. A typical XBP product supports a specific style of developing information-delivery systems. XBP products contain three key elements:

1. Engineered, XML-based and metadata-based data integration. The portal is designed to support IE principles of well-defined, flexible, and adaptable data structures that can inherently be wrapped in XML. These structures are used in a variety of ways as metadata within the portal.

2. Internet, Intranet, TCP/IP-based interfaces and delivery. Adopting all of the advances on the web, portals should not attempt to reinvent the wheel but instead build on the access, security, and scalability of existing web technologies. There is nothing mysterious about this technology, and its maturity enables vendors to build portals on top of existing web technologies. This is a true advantage of portals over non-portal applications—portals can benefit from all previously developed web technologies.

3. Extensive use of new technologies, including

– 4GLs (4th-Generation Languages). These languages are used to format and facilitate information into and out of the portal.

– Data-analysis tools. These tools are used to help derive correct data structures.

– Business rule engines are used to extract, formalize, and apply business logic.

– Data logistic networks are used to format and deliver data to the right place at the right time.

– Finally, XML-based model and repository manipulation involves having access to multiple levels of metadata, and making them available for manipulation.

As with all well-designed systems, knowledge workers who use the product will not know or really care about any of the above! What the knowledge worker will care about is his or her gain in productivity as a result of navigating the portal in particular ways. There are seven specific types of portal navigation as described by Joe Zarb (personal communication, March 2001):

1. Any-to-any relationships. Any-to-any indicates that any object in the portal may be combinable with any other item on the same screen. As a simple example, a customer identification number would be combinable with an “order” data structure to display the orders placed by that customer. This encourages users to try other combinations they hadn’t previously thought of, like combining peanut butter with chocolate. Equipped with a browsing interface, this capability sets up a “teach aperson to fish and you will feed that person for life” situation. If a person is taught how to relate different types of information via areporting mechanism, he or she can later create their own custom reports by themselves. Any-to-any relationships encourage the knowledge worker to attempt to combine items of interest in the same spirit as the reporting centers and fourth-generation reporting tools such as Crystal Reports. In the purest sense, this is what the original promise of end-user computing was all about—giving knowledge workers tools they can use to obtain information in any source and easily combine it into ways that support their individual work styles.

2. Drag-and-relate interaction metaphor. This encourages users to make and use one of the biggest productivity improvements that the interface-development community has ever come up with: drag-and-relate. This interface concept permits users to create an “any-to-any” relationship by dragging one object to another to determine how the two objects are related. Like its predecessor, drag-and-drop editing, drag-and-relate encourages users to become even more immersed in individual tasks. When a user is dragging something, there is a feeling of heightened tension in carrying or dragging some object across a screen. The user is more immersed in the task since it requires attention, and less experienced workers learn their respective tasks more rapidly.

Any-to-any is supported by drag-and-relate and vice versa—one is needed for the other. Figure 7.12 shows how they combine to increase the access options for the “sample” dataset. On the left, primary/foreign key relationships are used to access data; on the right, additional links are created allowing access from almost any point to any other point. Using a variety of means described in the next section, even more access can be achieved.

image

Figure 7.12 One vendor’s version of drag-and-relate.

3. Point-of-view navigation. A technique enabled by storing settings that permits a portal to be viewed from various perspectives, including Enterprise, Employee, Customer, Supplier, and Partner/Shareholder Portals. In some cases, different parties viewing the same data should have different views of it. For example, the “partner” view of customer information would likely be far less detailed than the employee view.

4. Metalinks. A link-management system using new linking facilities. These linking facilities are described in detail in the section on the XLink protocol in the XML Component Architecture chapter.

5. Three-way scalability of objects, users, and records. These three options for navigation of information are built in a very scalable way. This presentation encourages vendors and developers to incorporate the scalability concepts from the web-based information delivery world that have been so successful in the past.

6. Integration from different data sources and different data stores. This offers opportunities for standards-based data integration that is more rapid and accurate than what was previously available. This particular point will be covered more in depth in an upcoming section.

7. Confederated Components Model. This navigation permits workers to integrate data across differing models from different units without losing their context. The ability to integrate information from different models on the fly supports a higher level of cognitive momentum than was previously available. One example of integration is shown in Figure 7.13.

image

Figure 7.13 IIIustration of possible portal component confederation from Zarb (Personal communication, March 2001).

XML·Based Architectural Enhancements

The architectural benefits of using XBP components fall into the categories of flexibility, ability to evolve, and maintainability. Each is described below.

Better Architectural Flexibility

By adopting XBPs, the organization instantly acquires an excellent set of integrated data-management technologies. By using these same technologies to manage its metadata, the organization can immediately realize more tangible ROI from reuse than most object-oriented implementations realize in a lifetime. The portal capabilities can be used directly as metadata management technologies—this saves having to invest in, implement, train on, migrate to, and maintain a separate metadata-management technology. As you will soon see, XBPs can implement broader data-structure categories, and since they can easily handle organization-sized data-delivery solutions, XBPs make excellent metadata-management technologies.

Using the XBP as a metadata-management tool means that you can use metadata and portal capabilities to manage your organizational metadata. Many of the various enhancements described from this point on are due to the fact that by implementing an XBP, all of the revolutionary operations such as drag-and-drop can be implemented in support of your metadata as well as your data. Knowledge worker training in the use of an XBP applies in two ways; to metadata and to data. All of this combines to increase the flexibility of the architecture now able to be implemented on a component-by-component basis within the framework provided by the XBP. As a result, the architecture can be maintained in formats most directly supporting strategic and tactical requirements. As long as they are designed according to good architecture-development principles, the various components can be implemented in different ways to support different organizational strategies, permitting the most architectural flexibility.

Finally, by using the XBP to maintain your enterprise data architecture and affiliated components, you are increasing the integration between your organizational data and its metadata—the subject of the next chapter sub-section. Consider how the knowledge workers of the organization will benefit as they simultaneously learn to access and manipulate data and its metadata—all actions applying to data also applying to metadata and vice versa.

Better Architectural Evolvability/Maintenance

All architectures evolve over time as the organization, its mission, its environment, its competitors, and many other things change. If they are maintained using XBP capabilities, making changes can often be accomplished globally. For example, changes from one set of competing XML standard names could be accomplished using “batch parsing” capabilities and relational database management systems (RDBMS). Instead of manually locating all instances of specific tags, metadata maintained in the XBP environment would allow users to make direct changes to data and the associated metadata—changing, for example, all instances of “southwest” and “southeast” to “south” as a result of expanded region definitions. The metadata would be maintained using the XBP and would be simpler to evolve.

Metadata managers and knowledge workers also have access to XSLT capabilities that are embodied in the XBP. As shown in Figure 7.14, one example of a better way to evolve can be seen in a situation where a stovepiped system finds itself managing multiple sets of tags to describe the same business areas. This occurs frequently after mergers and acquisitions. In order to disrupt processing as little as possible, both sets of XML tag structures are “learned” by the XBP. Once made accessible to the XBP, any data will carry with it a tag indicating which of the two mappings for this data is the correct one so that the XBP will associate the proper metadata. In instances where the data is being mapped from one tag set to another, XSLT transformations can be developed, providing transformation from one set of tag structures to the other automatically. This is an example of XML’s malleability of data in action.

image

Figure 7.14 XBP and automated XSLT capabilities applied to metadata management.

Since the XSLT transformations are implemented as XML documents, transformations themselves can be operated upon. When the organization is ready to eliminate one of the two sets of competing XML tag structures, it can do so by implementing changes to the mapping transformations as well as to the existing tag structures. Now recall that the portal is managing the data and the metadata using an RDBMS, and the metadata can be queried based on criteria such as pattern matching, similarity, and text-matching patterns. Potential changes can be more easily identified. Fixes can be applied more comprehensively and reliably. Links can be more easily managed using XLink, permitting links to destinations previously not thought of. By maintaining architectural components using the XBP, organizations can easily incorporate other XML architectural components such as XSLT and XLink into their metadata-management capabilities.

While the above description makes it seem as if XSLT is a wonderful solution that can be easily implemented, the reality is that XSLT is not a silver bullet, but simply another tool for data managers to add to their existing toolkits. Just as whenever any new technology is introduced, XSLT must be carefully evaluated and applied where cost effective and appropriate. XSLT coding can be difficult and involved—what we lightly refer to as “non-trivial.” And nothing about the technology will permit data engineers to relax the rigor of their practice. Good practice can be extended by XML but not the other way around.

Enhanced Integration Opportunities

The next advantage that XBPs have over non-XBPs is increased standards-based integration, more integration depth, wider integration, and more rapid implementation. Each is discussed below.

Standards-Based Integration

One of the nicest things about XML is that it is often driven from outside of the data-management area. Because of the hype, business users are quite interested to find out if XML can help them in any number of ways. Many savvy business people are creating XML-based solutions. More often than not, though, they are starting by adding XML capabilities that help but do not fully realize its potential. Since the requests and resources for the XML implementations often come from the business users, data managers are in a powerful position to put together something that they have not previously been able to pull off: functional data standards. By the way, data managers might be wise to avoid telling their business partners about “data-standardization efforts,” since the words might scare them off. Instead, just tell them that before XML tags can be applied to data, each tag must have a name. In this manner, data managers can get partners to name their data, and XML structures with tags arranged according to their best representation can be created.

The tag and structure combinations may be imperfect and might be changed in the future, but that will not be a barrier to using the structures in the XBP. Changes to the tag names and tag structures can be accomplished with provided features discussed earlier, such as XSLT. Calling something by one name one week and another name in subsequent accesses is done using a combination of direct manipulation, link changes, and transformations. The implication is that the standard vocabulary can be changed with little disturbance to the system operation. The actual naming of the tags is of little importance. What is critical is that the tags receive names so that the data can be wrapped in XML and made into meaningful data structures.

More Integration Depth

XBPs permit greater integration depth to be achieved among data stores because of their inherently flexible and adaptable structuring options. By depth, we refer to the penetration of the standardization into different areas. XML-based metadata can be created to support more flexible structure because of its support of a great variety of structure types. Consider that XBPs have virtually all major relational database-management systems available for connectivity, and thus XBP users can access any type of tabular data occurring in DB2, Oracle, Informix, etc. Given this wide support, there are few if any areas the XBP cannot penetrate due to lack of support for the data platform.

Many tend to forget, but data is also still frequently stored in hierarchical structures such as IMS. Essentially, hierarchical data structures in XBPs provide direct support for hierarchically organized data stores. Other variations in structures can be handled using features such as direct and indirect linking. The ability to use a wider array of data structures and access a range of databases permits integration to occur at a greater depth and strengthens the argument for using the XBP as a metadata management technology.

Wider Integration Scope

Hub-and-spoke integration permits wider integration because of the ability to enhance transformation engines by classes of systems as well as by individual system. For example, Figure 7.15 shows how a network of hubs can be used to maintain interconnections among various collections of XML tags and structures. Once the hubs can talk to each other, partners can not only communicate within a hub, but hubs can also communicate amongst themselves—this represents a new class of systems rather than another individual system.

image

Figure 7.15 Integration, hub by hub.

More Rapid Implementation

The final asset is the speed at which new partners can be brought into the fold. Bringing on entire communities at a time as shown in Figure 7.15 makes for very rapid implementation of the actual data-integration possi- bilities. One central concept occurs repeatedly with portals and XML; just as portals are an attempt to get user interface and data integration challenges solved in one place so that others can reuse those solutions, XML does the same with data representation and manipulation. Much of the added speed of implementation is due directly to this approach—solve the problem in one place, and then exploit that solution in other places, avoiding reinventing the wheel wherever possible. The added speed is something of an illusion; it is not so much that XML and portals are revolutionary in this sense; rather, they are simply making sound architectural decisions, such as solving a problem correctly in one place, and letting everyone else benefit from those decisions. So it would be accurate to say that XML and XBPs are not faster, just that everything else is simply slower and more cumbersome.

Extending Data-Management Technologies/Data-Management Product Examples

XBPs allow the creation of exciting extensions to data management. Much of the work that has gone into data management has typically applied to tabular data stored in a relational database-management system. Although this seems to be where the most effort is spent, experts estimate that only 20% of data is in systems such as these. In Figure 7.16, we can see the dichotomy of data—structured and unstructured and also referred to as tabular and non-tabular data. Typically, data-management efforts are almost exclusively above the thick line in the “structured” category of data. The remaining 80% of data is stored in such forms as purchase orders, PowerPoint presentations, legal contracts documents, emails, and of course the ubiquitous spreadsheet.

image

Figure 7.16 The dichotomy of structured versus unstructured data. (Adapted from Finkelstein and Aiken, 1998).

Unstructured data is literally data that does not have an explicit structure. Take as an example an email written between colleagues. The documents are in fact full of structure—there are breaks between sen-tences, logical paragraphs, a title, and many other markers that to the human eye impart structure. For computer systems, though, it is difficult to extract data from these documents because the structure is not formalized in the way that a computer expects to see it. As a result, unstructured documents tend to be difficult to search and query. Somewhat obtuse plain-text searches are of course possible, but contextual searches are more difficult, such as, “Display all emails where the second paragraph makes reference to Bob Jones.” The best that can be done is a wider search, perhaps for “Bob Jones,” the results of which would then be nar-rowed manually.

XML provides a number of possibilities for including more structured contextual information along with these documents. For example, the Office 2000 and Office 2001 suites set a standard, pointing toward the necessity for future document management technologies to include XML as part of the data-management baseline. Saving an Office 2000+ document to a BizTalk (by Microsoft) server now makes the wealth of automatically created document content and metadata accessible to searchers. Figure 7.17 shows the Properties Tab of a PowerPoint MS-Office Document, viewable by accessing the “File → Properties” menu.

image

Figure 7.17 Contents tab of a PowerPoint MS-Office document.

This development alone greatly extends the reach of modern datamanagement practices. At least on a limited basis, it is now possible to include previously unstructured data in searchable intranets by using XBPs.

The reach of current data-management knowledge, skills, and abilities can be extended to include this vast new array of valuable data once it is in a form that allows computer systems to effectively process it. Consider just the wealth of information that could be uncovered by integrating office and email documents with the existing tabular data. Queries can be written to ask for the PowerPoint slide with the title of “Who is Joan Smith?” modified by “Peter Aiken” in “March 2003,” and the possibility exists to track its use throughout a defined pool of PowerPoint presentations. People who were referenced in internal documents could be linked to tabular data about their employment or activity on various projects. By tagging the metadata inside of unstructured documents, data managers could use those documents in the type of “any-to-any” relationships discussed earlier.

Extending data management to unstructured data also expands the role of information managers, not just the definition of data management itself. Data managers would be capable of delivering information by searching through the larger, formerly unstructured data. The domain potentially increases by a factor of four according to most experts. In addition, as the return on investment associated with XBPs becomes more widely known, users will increasingly demand that their existing environment be migrated into XBP environments. Beyond the metadata engineering is the application engineering-possible in the XBP environment because of metadata tagging. For application developers, this will be an unfamiliar paradigm at first, and they will require leadership and training in order to succeed.

We have discussed some of the ways that XBPs will change the role of data management. One of the opportunities this leads to is the possibility of creating “data-management products”—that is, data products that are of use to the organization, and that wer e created through effective data management. This is an important topic to discuss in conjunction with portal material in part because it speaks to the return on investment possible with portals, and also because it addresses the payoff of mature datamanagement practices making use of portals.

Selected Product Examples

Data-management products are usually seen as more valuable than the original information or source data. This is of course the classic value add-on, when the output of a process is more valuable than its input. XML’s ability to rapidly repurpose existing data for new uses and the knowledge to manage it well puts data managers in the enviable position to develop new products specific to their organization (with support and input from the business community). Most of these examples are easiest to realize within the context of a portal, due to its unique functionality and reach. Let us look at some examples of what is meant by these new products.

image Profiting from data quality. Several forward-thinking organizations have invested in individuals to champion data-quality initiatives. XML permits the development of new and innovative ways to achieve higher-quality data using data-quality portals. The key is to provide data rapidly whose quality can be defined and quantified, enabling a distinction to be drawn between data that the organization knows is correct, and data it simply hopes is correct. In addition, making data more available and accessible via a portal encourages the use of the data. With more eyes on the data, problems become rapidly apparent. Public and private institutions have saved millions by illustrating how investments in data quality can easily demonstrate ROI. Data is used to make decisions every day. It is critical to know that the data being used as the basis for those decisions is sound.

image Profiting from data analysis. More data profiling technologies are incorporating XML, enabling them to achieve integration and complete mapping using GUI-based mapping instead of more labor-intensive means. Take the example of a utility company that was ready to spend the money for an upgrade to the hardware base of their data warehousing operation. An investment in XML-based data profiling technology was able to produce results quickly, revealing that a small adjustment to the data structures maintained by the warehousing group permitted the hardware upgrade to be postponed for more than a year. The group divided among themselves the upgrade savings for the year.

image Small-scale information engineering. In another capacity-planning example, a data-management group developed a novel means of demonstrating positive ROI in what is called small-scale information engineering. Consider an application where users request information for something that is not well defined, such as travel planning: “I am leaving on this day or that—depending on the flight times … “Traditional interaction consists of the user refining the desired information until it matches the description of the product he or she wishes to purchase, carefully weeding out the irrelevant information. In this case, data management convinced the application-development group to co-develop a more efficient solution. The data group first used mining, statistical, and predictive modeling to develop a clear picture of what the initial request was trying to accomplish. When the exact request was specified, the server sent the client the requested data wrapped in XML. This was done with the awareness that the knowledge worker would refine the query an average of 12 times. The applications group developed code that could perform the requested manipulations on the client side, which was only possible because the data was sent to the client in XML, providing the extra information needed to manipulate it rather than just presentation formatting. This architecture is illustrated in Figure 7.18. Often, this powerful combination could solve a query with the initial set of data delivered. In most instances, it required fewer server accesses, and the reduction in the resulting overhead was measurably positive. In this case, the use of XML really only saved the server a few extra requests. But when the millions of hits the site usually receives multiply the savings, the benefit in hardware, network capacity, and support becomes clear. This is, in a nutshell, what small-scale information engineering is about—multiplying small savings into enormous savings.

image

Figure 7.18 Illustration of small-scale information engineering.

Other data-management groups have developed products based around services that they provide internally, to competitors, and to partners. Certainly there is a wealth of other opportunities once the general concepts are applied to specific industrial problems.

Newly Important and Novel Data·Preparation Opportunities

“At the speed of the Internet” is a new phrase indicating very rapid movement of data and the triggering of associated processes. Data movement can only be rapid when it is fully automated—human interaction and validation is almost always the bottleneck. Over several years, organizations surveyed continue to report an increase in both the amount and kind of data that is truly exchanged automatically. Increasing automation of data is not really a choice, but a necessity. As the volume of data expands at a rapid rate, it is usually not an option to hire additional staff to keep pace. Data automation is enabled by XML, and yet it is a mixed blessing to some organizations that are not technologically up to the challenge. Other organizations have problems when they are forced by industry competition to embrace these processes rather than choosing it of their own volition.

Only 5% of organizations surveyed indicated that they did not see an increase in the number of automated decisions processed by their organization in 2002. Strangely, only 1 in 3 is approaching the process in a structured manner. They do not trust each other’s data—only 1 in 3 companies are very confident in the quality of their own data, and only 15% of companies are very confident of the data received from other organizations. The dangerous trend seems to be that organizations are increasingly automating decisions with data with which they are not even comfortable!

Data quality issues are becoming more important as the demand for accessing data is increasing. Yet just 20% of organizations have ongoing data quality initiatives as of 2004. For e-transactions where errors move at the speed of the Internet, data preparation will become an automated activity and the price of errors will multiply. The price multiplies as the amount of human interaction decreases because it generally takes longer for the error to be recognized as it flows through multiple systems without the interference of rational thought.

The XBP gives organizations an opportunity to begin branding data as meeting certain quality standards. It seems strange that several human signatures are usually required for small purchasing decisions, but multi-million-dollar electronic purchases with trading partners are executed with no required standards on data quality. Using the proper semiautomated data engineering analysis, evolution can be properly supported to avoid data quality errors from eating away at the savings produced by automating processes in the first place.

The evolution we are describing can be understood in terms of the four specific activities described below:

1. Understanding Legacy Structures

2. Data-Quality Portals and Data Cleansing

3. Data Accuracy Assessment

4. Creating a Transitional Data Model

Understanding Legacy Structures

XML is about transferring and transforming data structures. This means that in order to wrap part of any legacy system, understanding is required. Understanding is a shorthand reference for using a data-centric perspective to represent the core metadata of the system. This formal understanding is required for effective implementation to occur. Metadata models are represented using standardized notation and are detailed enough to allow business analysts and technical personnel to read the same model, and come away with a common understanding. The model then forms the basis for developing new components, as well as the ongoing description of the overall enterprise architecture and how this particular metadata model fits into it.

In order to achieve this level of understanding, business analysts and technical personnel must share an understanding of the environment in which they work. New components should also be developed using specific approaches that allow the creation of objective measures. The alternative of course is ad hoc development, which is difficult to measure and even more difficult to document. Taking a look at how many components have been developed and how many are left, along with their various levels of testing, provides an organization with concrete facts about development process and quality. Cultivated with an understanding of the system metadata, the model becomes the basis, the language, and the currency for achieving understanding among team members, technologists, and knowledge workers.

As discussed in Chapter 5, there are a number of tools and technologies available, ranging in cost from $5K to more than $1 million, to aid the process of understanding legacy structures. All of them help to transform the older, manual method of reverse engineering the logical data structures.

The process of moving to logical models of the enterprise architecture components is critical to achieving what has been called “practical” or “good enough” data engineering. The older methods of driving straight at logical data models using Joint Application Development (JAD) sessions have been popular methods of deriving them: A group of SMEs would gather around the projection equipment and a data model would be reviewed in real time. A scribe would get notes of improvements/corrections, etc., and these changes would be applied at break or end of day. The model updates would be accomplished offline and after hours. The upshot was that the process was taxing on the SMEs, who were needed as much as 30 hours a week during peak analysis times.

The modern method uses tools that profile the data, making it easier to understand and thus infer the proper architectural components-first physical, then logical. These profiling engines are based on inference engines and replace the human knowledge extraction processes required by the manual methods. The SME’s role can be reduced to confirmations and explanations rather than acting as the primary source of knowledge. Profiling engines allow analysis of the data and identification of specific hypotheses. These hypotheses are presented to the SMEs during model refinement/validation [MR/V] sessions, resulting in more efficient progress. Some of the high-end tools can even be used to generate the XML required to manage the metadata. This is accomplished in much the same way as CASE tools create data definition language [DDL] files.

When the use of software is combined with limited MR/V sessions, the process is always a more efficient use of the SMEs’ time. This is a perfect example of what we mean when we refer to a semi-automated solution; it contains elements of automation and manual work, allowing the humans to do the important thought work while the machines are left to the time-consuming drudgery.

XBPs and Data-Quality Engineering

XBPs offer an opportunity to demonstrate positive return on investments in data-quality engineering. To illustrate, we will describe the tale of a courageous data manager’s effort to make the organization aware of the costs of data quality problems.

Consider this example: A pharmaceutical maker maintained a master list of all of the physicians considered as customers. This list consisted of 2.4 million physicians and was used as the primary list for developing phone, mail, email.and package delivery lists. Twice-monthly mailings were made to each physician on the delivery list. The hitch was that SMEs within the organization privately estimated that the actual pool of physicians considered as customers consisted of 800,000—about one-third of the number on the list! In other words, three times as many materials as were actually needed were being sent out—twice a month!

The more the data manager learned about the problem, the stronger the case for taking action became. A data quality portal was created. The data manager placed a link to the “Master Customer List” (MCL); clicking the link retrieved an XML-wrapped flat file of 2.4 million organizational customers. The XML wrapper was standardized using an XML schema, and the data manager made arrangements with the known users of the data to be able to utilize the MCL. Each time the link was accessed, the user was asked to acknowledge a statement to the effect that this list was not the right list but it was the best list available. It was a simple click, but it forced acknowledgement of the problem, and many workers began asking what could be done to improve the quality of the data in the dataset.

Using the cookie data generated by the web interface, the data manager collected access statistics. The combination of publicly stating that the data was of potentially poor quality and the evidence of the wide-spread usage created a consensus to address the problem. Unfortunately, the only available resource to work on rectifying it was a clerk-level position—no software could be purchased due to the pricing of software for that specific purpose and budget restrictions on new software.

The data manager hired a college intern with good data skills to fill the clerical position, and after one year was able to report that the intern had reduced the list by 200,000 bad records. The savings were calculated by assessing the savings on mailings at the lowest postal rate of $0.20 per letter, presorted. Avoiding postage on 200,000 letters 24 times each year at $0.20 per letter amounted to more than $200,000 in savings even after taking into account the clerical salaries and benefits.

The well-articulated savings resulted in creating a second clerk position the following year. During that year with two people working on the problem, another 600,000 physicians were removed from the MCL. The second reduction resulted in more than $800,000 in savings. It was quite obvious to the company that the investment in data quality was paying off. For the third year, the data manager understood that as the target number approached, it was necessary to manage expectations. The savings could not be made infinite by decreasing the number of physicians in the MCL.

And then a funny thing happened.

A business process owner approached the data manager with a simple question: “Could the pharmacy data be put into the XBP just like you did with the MCL?” The pharmacy data had similar problems to those experienced in the MCL. “If we pay for the salaries, can you put some more interns on the problem and achieve similar results?” The answer was of course yes, and the demonstrated savings were substantial. Earlier we talked about how successful adoption of an XBP would cause internal pushes for further adoption, and this is a great example of just that phenomenon. Then an even stranger thing happened.

Someone noticed that the MCL and the MPL (Master Pharmacy List) datasets could be combined in new ways now that duplicates and incorrect entries had been removed. The combination was possible before, but the data had been so questionable that the result of the combination was worthless. The new combinational capabilities were adopted by the marketing and sales forecasting groups who loved their new capabilities, and made several important breakthroughs in understanding their customers without any data mining or other sophisticated data analysis techniques! The request for the MPL was followed almost immediately by another business-generated request, this time for the Master Distribution List. The resulting new combinations of data appealed to a broader group within the company, including the logistics/distribution group. Imagine at this point if the company had proposed removal of the XBP—the users would have howled in protest.

As the process began to snowball, another part of the business became eager to get their data into the XBP, because they saw that it saved them money, time, and effort. Data was quickly becoming known as either good or bad. Data moved into the XBP was seen as good because its char-acteristic metadata was published, including estimates of quality, in a manner similar to the travel web sites providing on-time records of various flights. Data outside of the XBP was seen as potentially questionable.

It is hard to imagine a better way to introduce organizations to data quality than through the use of XBPs. They permit organizations to quantify the problem and provide a means of helping users to understand the difference between good- and bad-quality data. XBPs represent an important tool in our approach to developing long-term products from data management.

Creating a Transitional Data Model

The last category of data management products is something that the business users must be trained to ask for and data managers to produce. Projects of most sorts involve transitioning from one data source, structure, or format to another.* The major classes of transformation include the following:

image Data exchange is the process of sharing data with other groups within and external to the organization.

image Data evolution is the process of changing the association of data from the target systems to the reengineering system.

image Data integration is required to achieve both data exchange and data evolution.

All of these tasks benefit from a very tangible data-management output called the Transitional Data Model (TDM). The TDM is a narrowly focused model that accounts for the four stages of data understanding that must be considered for all data evolution projects. The four stages are illustrated in Figure 7.19 and are described below, in implementation order:

image

Figure 7.19 Four stages of transformation in data models.

image

They correspond to (from lower left) “As-Is Data Implementation,” “As-Is Data Design” (upper left), “To-Be Design” (upper right), and “To-Be Data Implementation” (lower right) quadrants of Figure 7.19—illus-trating the evolution of TDM. The transformations are performed by the data management team and presented for validation to the SMEs. The data management team begins by focusing on selectively extracting existing processes and technology components and representing them on a one- for-one basis in a physical model.

The scope of this effort is confined to data being evolved and required contextual data elements. It is an enterprise architecture component, but not the architecture itself. The next activity concentrates on taking the physical representations and stripping them of their technological aspects and characteristics: conceptually evolving the components to a logical existing representation that is technology independent. For example, in Oracle, a field might be implemented as a 35-character area to store a last name, but the important part is that last name is being captured, and that it is a string—this typifies the difference between physical and logical rep-resentations. The goal is to focus the model components on what functions they are supporting, as opposed to how technology provides the support below the technology line.

As an example of the need to incorporate these stages into the TDM, consider how the first telephone voice response units (VRUs) were used. Typically, they determined which operator to route a call to, based on your last name, state of residence, product type, and so on. This was done because the capacity of user workstations was measured in the tens of megabytes instead of gigabytes as it is today. Data strictly segmented by the codes above are generally not flexible enough to re-implement in new systems without reorganization. Data must first be reorganized when, as in this example, the technical details of its implementation cause it to stray quite a bit from how it is used in the business context. Companies also exist that move VSAM files directly into database structures with poor results. In one case, a daily job required 45 hours of runtime! In the past, data-management teams have tried for perfection, but in today’s business environment, it is more important to be good enough and rapid enough to contribute rather than theoretically perfect. TDMs, while conceptually simple, are underused tools that business users need to learn.

Greater Business and System-Reengineering Opportunities: Reduction of Maintenance Burden Strategies

In Chapter 4, we presented a detailed description of an ERP implementation example and illustrated the potential synergies that exist between the two individually perceived opportunities. In this section, we will present a second example of greater business and system-reengineering opportunities. It illustrates how an XBP can be used to reduce legacy maintenance costs by up to four-fifths. These represent solid examples from case studies of the return on investment that we have been discussing in the previous sections.

Earlier in the chapter, Figure 7.7 showed how portals are used to reduce the cognitive burden placed on knowledge workers accessing different systems. While portal technology focuses on common user interfacing, XBPs supplement the interface with a common data vocabulary. This capability is easily implemented in XBPs, but it is also possible for them to exist in non-XML-based portals—however, the overhead is much higher. The message is simple: get rid of code that is expensive to maintain and create increased demand for portal services. These are described below.

Get Rid of Expensive-to-Maintain Code

Projects often sound easier than they are to actually achieve. Before code can be eliminated, one must first find out how much is spent on maintenance annually and then discover where staff spend most of their time. Only then can one claim to have identified code that is expensive to maintain—that usually translates into specific program or system groupings. A goal statement is then formulated to focus the analysis. Once identified, a trained application-reengineering team examines the code using reengineering methods (see ongoing research at http://reengineering.org). A decision is made on the approach, and the “business rules” are extracted from the code. Business rules represent the portion of an application that directly support the organizational goals. This code is repackaged asa web service and installed under the portal. That takes care of the common user interface, given the caveats we described previously in this chapter.

The next task is to increase the scope for subsequent cycles of the data enterprise architecture. Using the bottom-up approach described in the chapter on Revised Data Management Goals, the components are developed with narrow foci applied to specific projects. Each application added to the XBP also has its data integrated via XML schemas. Automated management of the schema metadata means that when datamanagement challenges such as homonym and synonym questions arise, they can be easily identified and corrected through the XBP’s technical capabilities.

Increased Inteimageration Creates Demand for Portal Services Instead of Coded Applications

Rapid access to information is key to acquiring the support of the knowledge workers who use the XBP. The workers quickly gain appreciation for the advantages of the common interface. Perhaps a bit more slowly, the worker gains appreciation for the portal’s ability to connect or prohibit connections between data items.

Any data item within the portal can be integrated with any other data item, or the XBP can report them as “not able to be integrated.” Figure 7.20 shows how data from System A, an SAP system, can be integrated with data from the sales system, even though the developers of the systems did not plan to integrate the two. Any knowledge worker with proper access can attempt to connect any two data objects without requesting that some connection be specially developed.

image

Figure 7.20 Top-tier portal demo.

Each item added is integrated with the portal and thus with the existing collection of data. The XPB can answer any integration question by dragging and dropping items from one application onto another to see if they are connectable. If they are, the user is presented with a range of choices. The connection happens via XML transformations. The outputs can be reports, datasets, HTML pages, and XML. Increased quality attributes mean that knowledge workers will begin to desire data from the XBP.

Conclusion

At this point, we hope it is clear that XBP efforts are focused on incorporating appropriate XML. After all, who wants to do it the first time, knowing it will have to be repeated? XBP represents a very powerful technology, capable of benefiting many groups simultaneously.

Recent developments in portal technology that we have discussed in this chapter are combining to make this one of the most opportune times to work in information management. The three vital pieces are falling into place—the tools (XML-based portals), the techniques (data engineering), and the technologies (web-based information delivery). This chapter is meant to act as a guide for how these three fit together into the picture of better data management. We have gone to some effort to point out that it need not be an issue of reworking everything in the organization, but that it can be taken step by step, according to what can be cost- and feature-justified along the way. Regardless of the merits of any technology, it would be foolish to suggest anything other than a step-wise approach at refining data management, given the complexity of existing systems.

Hopefully, those data managers that started with distaste for the term “portal” will now be able to see the XBP as a fantastic opportunity in the context of real-world organizational data-management problems. Like any other technology, it cannot be implemented willy-nilly without planning, or it will be rendered as ineffectual as the last over-hyped technology that never seemed to work out. This chapter stresses the need to understand data structures as a way of pointing out that, in many cases, work must be done before organizations can start saving untold millions of dollars by implementing portals—they do not represent a quick fix or easy way out. What they do represent is opportunity; to the extent that any organization would like to reduce their data management costs, improve their decision-making ability, or tailor their data resources to aid strategic moves, it should be interested in closely examining XBP technology.

References

Aiken, P., & Mattia, A., et al. (in press). Measuring data management’s maturity: An industry’s self-assessment. IEEE IT Professional.

Aiken, P., White, E. Organizational data quality approached from the user interface. IEEE Computer. 2004.

Aiken, P.H. Data reverse engineering: Slaying the legacy dragon. New York: McGraw-Hill; 1996.

Aiken, P.H., Ngwenyama, O., et al. Reverse engineering new systems for smooth implementation. IEEE Software. 1999;16(2):36–43.

Brodie, M.L., Stonebraker, M. Migrating legacy systems: Gateways, interfaces & the incremental approach. San Francisco: Morgan Kaufmann; 1995.

Chikofsky, E., Cross, J.C., II. Reverse engineering and design recovery: A taxonomy. IEEE Software. 1990;7(1):13–17.

Finkelstein, C. An introduction to information engineering. Boston: Addison-Wesley; 1989.

Finkelstein, C. Information engineering: Strategic systems development. Boston: Addison-Wesley; 1993.

Finkelstein, C., Aiken, P.H. Building corporate portals using XML. New York: McGraw-Hill; 1998.

Lanham, T., Designing innovative enterprise portals and implementing them into your content strategies—Lockheed Martin’s compelling case study, Paper presented at the Web Content II: Leveraging Best-of-Breed Content Strategies meeting. San Francisco. 2001.


*Info World Electric web site <ETH> http://www.infoworld.com/cgi-bini/ displayS tory. pl?/features/990125eip.htm

*For additional reading, see Data Reverse Engineering Chapter 13, Indirect Outputs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.32.222