5

Making and Using XML: The Data Managers’ Perspective

Introduction

Two trends are occurring in the XML and data management technologies that make it difficult to accurately sum up. First of all, because of the XML hype, practically all vendors are relabeling their offerings to make it seem as if they take advantage of XML. Some do; others have not gotten it quite right. The second complicating factor is that the use of XML technologies is still relatively immature. As our collective use of XML and data management technologies matures, the means by which we incorporate the technology will evolve as we move rapidly up a steep learning curve. In short, we need to know more before we can determine how best to develop XML-based support for data management processes.

In this chapter, we will attempt to describe major categories of XML-based technologies in terms of whether they are used to capture (input), process, or produce XML (output). This categorization reflects not only the main things that computers do (input, process, output) but also what is done with any type of data. Data can be created, consumed, or used. This is certainly not the only way these tools could be categorized, and there are some fuzzy boundaries as many tools perform more than one operation. The reason for this categorization is that data managers typically think of XML in terms of which of the three facilities are most needed. Some data managers are interested in XML in order to understand data coming from outside vendors. Others are interested in how it aids the process. Still others are interested in exporting XML for use elsewhere.

This chapter is important for data managers who want to get a glimpse of which classes of tools are available to work with today. There are plenty of software companies who will speak in glowing terms about what is coming around the corner, but the discussion of data management technologies in this chapter is intended to give data managers an idea of the kinds of functionality they can rely on now. This discussion refers not just to software that is used strictly for XML, but related software such as modeling tools that are already in many organizations. For some, working with XML-capable data management technologies understand that it is more about learning the new tricks of the software that they already have, rather than learning a new toolset altogether.

Input

There are three basic ways to get data into XML other than creating a new application capable of producing it. Each of the three we will discuss below.

image XML editors

image CASE (Computer Aided Software Engineering) tools

image Extracting metadata from negacy systems

XML Editors

An XML document is simply text. Therefore, it can be created in any text editor such as Notepad, Emacs, TextEdit, etc. Many people can and do craft their XML by hand using this type of editor. It is likely, though, that many data managers will graduate to editors that have been designed specifically for creating and editing XML documents. These editors, such as XML Spy and XML Pro and many others, provide graphical representations of an XML document to make the design process easier on the developer. While these editors have some of the features of computer-aided software engineering (CASE) tools, they are not meant to facilitate engineering directly, just to create documents, and are therefore classified as editors rather than CASE tools.

Figure 5.1 illustrates one basic feature supported by XML editors—the ability to visualize and manage the document’s structure from a metadata perspective. The figure shows an XML document describing an XML structure called Bookstore. Bookstore is composed of a sequence of books and books are composed of title, author, year published, etc. This type of rudimentary metadata management is crucial to managing and evolving XML structures. By briefly looking through a document, an author can assess what the purpose of the document is, and what type of metadata is captured in it. If the document conforms to a particular vocabulary standard, much more information might be available, including the domain of various data items, and other extended information.

image

Figure 5.1 Screen shot from XMLSpy web site accessed 10/2001.

Figure 5.2 shows a demonstration screen shot from the XMLSpy web site. The illustration scenario is of an organizational chart, showing the tool’s ability to integrate the management of the

image

Figure 5.2 Demonstration screen shot from the XMLSpy web site. (From http://www.xmlspy.com/, accessed 9/2003.)

image XML document’s structure as an XML schema (OrgChart.xsd)

image Associated stylesheet (OrgChart.xsl)

image Integrated WSDL specification

image Document after it is rendered, using the stylesheet, into XHTML (OrgChart.xml)

image Error handling and debugging

– SOAP debugger tool

– XSLT debugging tool

image Governing XML structure for a related XML document (ipo.xml)

It makes sense to try out some of these tools because of their low entry cost. The cost is typically measured in hundreds of dollars rather than thousands, and many data managers find them useful if for no other purpose than to inspect and debug XML documents. A tool with integrated XML concepts will aid the learning process and typically covers its cost (and then some) in valuable learning experiences.

CASE Technologies

CASE, which stands for computer aided software engineering, is a class of tools that is already heavily used in every type of organization. This particular classification is somewhat broad, but for the purposes of this discussion we will try to restrict it to software packages that help in systems development.

Figure 5.3 shows the first instance of a CASE tool incorporating what we consider desirable XML support. Since Visible Advantage has its origins with Clive Finkelstein, who is known as the father of information engineering, it was not surprising that it was the first to provide quality XML support. The figure shows just how integrated the process of managing the CASE tool metadata could be. While the entity “Person” is described in standard information engineering format, it can be simulta-neously displayed and output as XML. This degree of integration is one of the most important aspects of CASE tool usage, specifically maintaining the model (in this case, the metadata of “Person”) separately from the way it is represented. Figure 5.4 illustrates the same degree of integration, in this case managing schemas concurrently with design objects.

image

Figure 5.3 CASE tool support for XML provided by Visible Advantage. (From http://www.visible.com. accessed 10/2001.)

image

Figure 5.4 Managing schemas concurrently with design objects.

In many ways, the capabilities of Visible Advantage’s products are not representative of the CASE tool industry. CASE tool usage peaked around 1992–1993 when more than two-thirds of organizations used the technologies to help manage their metadata. Since 1993, CASE tool usage has slipped to less than one in three organizations! A number of data modelers have professed to us that PowerPoint is now their current favorite “CASE tool.” The reason for the decline may be due in part to the lack of products desired by the data management community.

The drop in CASE tool usage has also been due to the usage myth illustrated on the left half of Figure 5.5. The old belief has been that all organizational metadata must “fit” into a single CASE technology. The problem with this is that access to the metadata from outside of the CASE tool has traditionally been very limited, which discourages widespread use. So it comes as no surprise that the valuable metadata in these tools is of limited value when few actually use it. Perhaps one of the reasons for the CASE tool myth is that software vendors have been eager to represent their products as something that could handle any situation, and the price tags on the products have provided added impetus for organizations to try to fit everything into one tool.

image

Figure 5.5 Competing models of CASE tool usage.

The “new” model of CASE tool usage is shown on the right of Figure 5.5. Notice how XML-based integration of the metadata is both the input and the architectural basis. If integrating data using metadata works, then consider how managing metadata with additional metadata will also help out. Integrated metadata sits in an open repository, ready for a variety of CASE-based tools and methods to operate on it as utilities. The subsequent metadata is widely accessible via web, portal, XML, database management system, and so on. The accessibility in turn drives continued use of the data.

Extracting Metadata From Legacy Systems

Figure 5.6 illustrates the reverse engineering challenge often encountered as organizations begin the process of integrating XML into their environments. The right-hand side of the figure indicates a realistic goal of identifying the size, shape, and other relevant capabilities of the various architectural components that comprise the current architecture. On the left side of the figure is information about the legacy environment—in this case, just counts of files, tables, and attributes. (This often comprises what is known about the current architecture.) What must be accomplished in order to make this metadata useful for XML and data engineering tasks is to obtain meaningful descriptions of the existing operational data architecture components so that they can be “wrapped” in XML.

image

Figure 5.6 What we actually know and what we’d like to know about organizational data structures.

Wrapping groups of data items together requires two specific considerations. First, wrapping must not invent new XML tags unless the noun (or the data object being referenced) is new. Data managers must find and assign the correct tag to each data item. Creating new tags must be the exception rather than the rule. A normal part of evolving knowledge of these structures will be to improve understanding by renaming attributes and structures as collective understanding matures. Second, notice must be taken of the organizing format and the process usage of the data structure in question. Understanding data structures variations and how processes use them opens another door; transformations to automate the mapping among the structures can be developed. These two points together mean that, overall, data managers should avoid reinventing the wheel with new tags. If, however, they are in situations where there is more than one tag for the same concept, as long as those concepts are well understood, it should be possible to build a bridge between them with a transformation expressed in XSLT.

In the future, when the metadata needs improvements or corrections, they can also be done because XML can be used both to maintain the structures, as well as to perform transformations between structures. Programmatic changes to metadata can be accomplished by applying the same transformations to documents containing the metadata! The goal is to create new metadata and capture it in XML as a part of ongoing system enhancement efforts. Limited, focused analyses with specific goals will be much more successful than efforts guided by a general belief that data managers are doing the “right” thing by recovering metadata.

Let us look at an example continued from the previous chapter where this metadata evolution would be necessary. Many organizations are facing the implementation of ERP. As part of the implementation effort, there is a need to move data from legacy systems into the ERP, which requires solid understanding of the source metadata from the legacy system, and the target metadata from the ERP.

Figure 5.7 shows how an ERP can be reverse engineered in order to develop a means of regularly extracting the metadata XML form to a repository. This XML-based metadata is then used in the effort to move data into the ERP. We developed a repeatable procedure so that whenever the metadata changed (in this case, we were dealing with a PeopleSoft system), we could simply regenerate our version with a little work, resulting in all of the data and process metadata being wrapped in XML. That XML is then available for reuse in a number of different ways. The lesson here is that there are effective semi-automated and automated means of extracting the metadata from systems, both new and legacy.

image

Figure 5.7 Reverse engineering new systems. (Adapted from Aiken and Ngwenyama, 1999..)

Since the previous means of extracting this information was manual and resource intensive, this is good news. In some instances, reverse engineering the metadata from an existing system in order to express it in XML will be trivial. Many CASE tools have newly added functionality permitting them to extract the data structures automatically in entity-relationship diagram (ERD) notation and sometimes in XML as well. In other situations, you will be faced with the need to develop a reverse engineering analysis without such aid.

But how is that done? Figure 5.8 shows the most common means of achieving the understanding required of architectural components— group analysis sessions (sometimes referred to as JAD or “joint application development” sessions) using a large-screen projector. This information is included along with our discussion of CASE tools precisely because it is often needed even when there is software automating some of the process. Using the developing model components, business users, subject matter experts, technical personnel, and management gain a collective understanding of the component in question.

image

Figure 5.8 Invaluable data analysis technology used during XML development activities.

The mode of operation in these sessions has been completely transformed by the increasing maturity of data analysis software. Comparing old and new approaches to reverse engineering data structures is like night and day. Prices for this type of software can start around $5,000 but will more typically rise to the equivalent of a full-time employee (FTE) if one is to obtain “industrial” analysis strength.

Figure 5.9 and Figure 5.10 illustrate that the key difference (less subject matter expert involvement required) in the process is the amount of time and type of involvement required on the part of the subject matter experts (SMEs) and business users. Their role is to describe the use of data in their business environment and to verify that the data models accurately capture this understanding. Well, now we want to develop our understanding of the XML structures concurrently with the development of the data models.

image

Figure 5.9 Division of time and labor required using manual data analysis.

image

Figure 5.10 Division of time and labor required using semi-automated and automated data analysis is significantly less than using manual methods.

Figure 5.9 shows that the majority of the weekly sessions were devoted to sessions held with the business users/SMEs. These sessions were unfortunately subject to the “terror of the blank screen” when modelers/facilitators would welcome everyone to the meeting and say, “Now tell me about your business!” Normally the business users were given only three mornings during the week to tend to existing duties—time when the modelers would refine the component models and formulate questions for the next joint session.

Figure 5.10 shows a different operation with just two afternoons reserved for joint business user/SME development activities. The data analysis software technologies (also sometimes called data profiling packages) are based on a powerful inference technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures. During the two afternoon joint sessions, the modelers present the various hypotheses to the SMEs both business and technical who confirm, refine, or deny them. This allows existing data structures to be inferred at a rate that is an order of magnitude more rapid than previous manual approaches. From this point, the model component is just a few transformations away from a logical normal model. It is at the logical level that data redundancies are identified and exploited. Data analysis technologies provide a means of following a formal path to logical understanding by profiling the columns, across rows, and among tables (Olson, 2003).

After examining CASE tool approaches and analysis sessions as methods of creating XML input, we will now turn our attention to various data management technologies that process and interpret XML data. Processing XML represents the second of the three classes of data management technologies and their relationship to XML.

Processing XML

Once a data manager has handled the task of obtaining or creating XML data, there is a need for tools that aid the process of actually accomplishing meaningful work with that data. This section deals with various technologies available to process XML and what they mean to data managers—starting with XML databases and servers. These two types of technologies aid in the storage and use of XML data.

The following discussion of XML databases/servers has been adapted from Steve Hamby’s Understanding XML Servers (2003) because it is difficult to identify a better way of organizing and presenting the material. XML databases and servers can be categorized according to three functions: integration servers, mediation servers, and repositories. Individual product offerings may include one or more of these classes of functionality.

XML Integration Servers

Think of a switchboard that can translate requests and data, both in the form of XML documents, and you will understand XML integration server technologies. The best way to grasp the difference between integration servers and mediation servers is the level of granularity that they are optimized to serve. Integration servers are optimized to support document exchange via physical connectivity and semantic mapping. XML mediation servers are generally focused on addressing format and structure transformations at the data structure and individual data element levels.

XML integration servers have been typically used to interface legacy systems with each other and with other environments. Conceptually, integration servers act as message-oriented middleware. Using XML schemas or DTDs as structure identifiers/validation techniques, XML integration servers provide translation and connection deliveries whose functionality and scope can be extended using various Enterprise Application Integration, or “EAI” adapters. Each legacy system could call the integration server to provide application program interfaces (APIs), document level message exchange, and mapping of document names.

Integration servers can issue queries to access messages from various interfaces providing interfaces to various legacy applications. They function as the hubs in some instances where they facilitate the exchange of documents such as “purchase orders” among diverse business partners—this is a topic that will be discussed at some length in the chapter on XML frameworks. For example, imagine an XML document called “Purchase Order” from company X that could trigger a rule transforming the metadata of the document into something that company Y knows as an “M242-purchase order.” A document can actually carry around information that makes it possible for it to translate itself to be understandable by others.

Perhaps best of all, the integration multiplier goes from geometric growth in complexity to linear growth. As each application is added, it is accessible to other connected applications. The exchange cannot be valuable to the organization until the semantic understanding has been mapped from application to application. This can be accomplished using a series of XSLT transformations that are used to manage the transformation of data from one format and structure to another.

XML Mediation Servers

XML mediation servers are generally focused on document transformation—changing individual data items or document data structures. As organizations adopt XML technologies, they tend to implement them from a technical perspective instead of from a semantic understanding perspective. This harkens back to the earlier discussion of how organizations often start with XML by wrapping what they have in simple XML tags, rather than understanding the structure of what they have and expressing that in XML. We illustrate an important point in our lectures by leading a discussion that we routinely have with audiences, asking them to consider the following real-life situation. According to our survey of the industry, only about one in five organizations is formally specifying technology used to build new system functionality. The scenario that we ask others to consider is this:

One year ago, an organization had literally hundreds of data items duplicated among dozens of applications. Because of book like the one you are reading, technical specialists within the operational support group become interested in XML capabilities, and spontaneously begin to develop standard vocabularies. Now, one year later, the organization has literally dozens of XML “standard” vocabularies, where the year before it had none. Question: Is the organization better or worse off than it was one year ago?

Often the first thought is to react to the silliness of having multiple “standards” and to think of the organization as worse off than it was a year ago before it spent the year doing duplicative, inefficient metadata engineering. However, please also consider the following positives that help to indicate the power XML has to assist with duplicative data problems—these in turn can help organizations to justify the cost of harmonizing duplicative data problems.

image The organization has gotten multiple person years’ worth of working XML experience. Even if it was not coordinated, the short learning curve will make it very easy to coordinate and focus the newly acquired XML expertise.

image The organization now has good working knowledge of which data and data structures are more “valuable” to the organization, by identifying which were valuable enough for people to bend their efforts toward. Results of some of the mini projects implemented over the last year yield solid, reusable metadata. The Pareto subset of data should be easily completed.

image Most, if not all, of the XML name and structure changes can be made and maintained using XSLT and often can make use of namespaces. For example, for one type of change, the key is to change all instances of the use of the XML tag “person” to the tag “INDIVIDUAL.” As discussed earlier, this is possible when there is an understanding that they are one and the same for the purposes of the data structure.

image The most value has come from the past year’s focus on developing understandings of the use of names by the business users. If we were to hold a data naming meeting, we might get folks to show up if we offer free cookies. Face it, data standardization discussions do not sound exciting. It works much better if you tell the business users, when they ask you to use XML on their application, that you need to have them name everything correctly. This they do readily, and once the initial vocabularies are implemented, they are tested right away. Feedback is sure and swift, so it is easy to make changes to the existing rules.

We contend that using XML and learning from the exercises far outweighs the negatives of having developed multiple standards. Indeed, the synergies are so apparent that this capability to reuse and manage metadata creates quite the business case for spending the additional 5% required to capture the metadata. Architecturally similar to integration servers, the mediation servers are easily extensible transformation engines. Frequently powered by GUI-based drag-and-drop functionality, the real goal of these systems is often to make the maintenance of the rules for transforming data items and data structures among various “services” a part of the business user’s tasks and remove the support requirement from IT.

XML Repository Servers

XML repository servers are optimized to store XML metadata and documents. They are a relatively new technology designed to handle XML more efficiently than existing technologies that have not been built to support XML. The current crop of big iron databases (DB/2, Oracle, Informix, Adabase, etc.) has had XML features added or has been modified to incorporate XML. These are XML-enabled databases—defined as conventional databases that have been fitted with some kind of front-end XML adaptor to manage the storage of data from XML documents, system transactions encoded in XML, etc. The XML repository servers are typically native XML databases, architected from the ground up to manage XML and allow XML documents to be stored as XML internally.

This leaves data engineers with an important and difficult-to-reverse design decision that must be made: Will the fundamental data structure be fundamentally data-centric or document-centric? XML documents have hierarchical structures, resulting in the need for complex mappings and software tricks to store or retrieve the documents from an XML-enabled relational database. Systems are typically data-centric (think tabular data) versus document-centric (non-tabular data). The general rules seem to be that data-centric XML documents are more appropriate for XML-enabled databases while document-centric XML documents are more appropriate for native XML databases. This makes sense since XML-enabled databases will be able to store tabular data that happens to be represented in XML with ease, while the non-tabular data lends itself to more efficient storage in its native XML format.

XML repository server use is growing, reaching into one in three organizations. Repositories do have the distinct advantage of being able to store metadata using the same technologies that are used to store the XML documents. This means that standard vocabulary can be implemented at the same level as data access security, and that bad habits of using expired data item tag names can be avoided by enforcing the use of the standard vocabulary. The repository servers also offer services such as compression, automated XML wrapping, and XSLT transformation capabilities.

As you might imagine, the features of the XML frameworks (to which we have devoted a later chapter) are closely supported by XML repository (and other) server types. Figure 5.11 shows two popular solutions and how the various services can provide single-source data in output formats suitable for mobile phone, PDA, printer, browser and CD. These services are offered under a panoply of acronyms including:

image

Figure 5.11 Ipedo (left) and Tamino (right) feature articulations from their web sites. (Courtesy of Ipedo, www.ipedo.com, and Software AG, www.softwareag.corn/tamino.)

image Web Services

image SOAP (Simple Object Access Protocol)

image Java and EJB (Enterprise Java Beans)

image DOM (Document Object Model)

image NET (Microsoft’s Initiative)

image HTTP (Hypertext Transfer Protocol)

These operate in concert with even more technologies and approaches, including query manager, e-commerce, content management, portals—okay, you get the picture.

Outputting XML

Now that we have covered ways of inputting and creating XML, and also ways of processing it and consuming what was generated, it is time to take a look at data management technologies that address XML output. The first that we will examine are the XML converters.

XML Converters

XML converters focus on the development of XML documents that are specified by non-technical users via drag-and-drop GUIs. Figure 5.12 illustrates how one of these products facilitates transformation of documents to XML, and then further to a BizTalk framework or a Tamino server. Many organizations are faced with situations where they have to convert existing documents into XML, and the XML converters class of tools is an aid to the process. One typical approach involves a user highlighting a key portion of a document, and training the program about the context information around that piece of the document that makes it interesting. The XML converter then takes that context information and looks for it both within the sample document and in other documents that are fed to it. Structured information can be extracted in this way, which is then encoded in an XML format specified by the user and out-putted to any number of different sources (in the case of Figure 5.12, Tamino and BizTalk).

image

Figure 5.12 XML converter. (Courtesy of Itemfield, Inc., www.itemfield.com.)

XML conversion tools at the time of this writing tend to be less sophisticated than some users expect. Rather than presenting a tool that “figures out” what is in the document, the software packages provide more of an interface for users to teach the software which parts of the documents are meaningful, where those parts are located, and how they should be expressed in XML. In other words, the trained user is really doing most of the work. Still, the tool provides a number of facilities to speed the process, and to generalize the lessons taught by the user and apply them to many subsequent documents. Using this approach, users can develop a set of rules for one semi-structured document, and then have the XML converter run the same set of heuristics against documents of similar structure.

The advantages to these packages for data managers are that, as of this point, they represent one of the best options for extracting data from unstructured and semi-structured documents. Typically, if an organization were to develop a custom approach for a specific set of documents, the results might be slightly better than those from an XML converter. Given the time and expense of that approach, though, many find that the use of an XML converter is a reasonable compromise. The disadvantages to these packages are that they tend to be rather pedantic in terms of what data they are looking for, and there is quite a bit of variation among the different packages in terms of how accurately they can generalize the rules they were taught to new documents. Overall, this class of tools is definitely worth taking a look at for data managers who need to convert large numbers of documents into XML. However, they are not without their drawbacks.

Generating XML Automatically or Semiautomatically

Data analysis software technologies, CASE tools, and applications themselves are proving to be useful sources of XML and the associated semantic understanding required to make data accessible using XML. It is helpful to consider the metadata models used by the technologies as the basis for your mapping efforts.

When we refer to these XML generating tools, we are speaking of applications that already contain some data, which can then be represented or “exported” as XML. In some other cases, however, the application may not actually have data, but only a means to fetch it, such as in a report-writing application that connects to a database to pull data it reports on. The semi-automatic XML generators will typically ask the user a set of questions about how the XML should be structured, which tags should be used, and so on. The automatic XML generators create output usually according to a standard DTD or XML schema, with minimal user intervention. Automatic generation is not necessarily more desirable than semi-automatic generation. Data managers usually find that it depends on the situation, what they are generating, and where it needs to go afterward.

To make sense of the many types of software that generate XML automatically or semi-automatically, we can put them into two broad categories. The first and more desirable category is the tools that generate XML as a “view” of a competent data model that the application has internally. A good example of this would be Visible Advantage that was discussed at the beginning of the chapter. These tools have an excellent generic data model that is held in the application, and when it comes time to generate the XML, it is performed by outputting an XML description of that competent data model. In this way, the application knows how the182data is structured, and the XML is just one way of looking at that structure.

The other category of tools that generate XML are those that “tack on” XML support to an existing product. In contrast to packages like Visible Advantage, these applications have internal data models that may not be compatible with a useful XML description of the data. Still, these applications can export XML by having the application do something of an internal translation between the way it understands the data, and the way the user wants to see it in XML. In practice, this approach is much less desirable. Symptoms of this approach include:

image Data items that are visible in the application, but not in the output XML files. Alternately, data items that are specified in the XML documents do not show up in the application after the XML is imported. This happens when there is no way for the application to translate the items into the appropriate places between the internal data model and the XML description.

image XML “inflexibility.” This happens when there are few or no options for generating different forms of XML, or when XML that is input into the tool must always have exactly the same structure. Tools with appropriate XML support have at least some input flexibility, for example, the ability to add extra (perhaps non-standard) metadata fields that can later be manipulated in the tool.

image XML “exclusivity.” Some tools generate XML output that is in a format that cannot be understood by anything but the tool that generated it, because of its odd structure (for example, a particular proprietary schema that nothing else interoperates with). This is clearly undesirable, since XML documents that cannot be understood by anything else defeat one of the central purposes of XML.

These symptoms typically indicate a mismatch between the way the application understands the data, and the way it is represented in XML. It is still possible to work with these applications; they are simply a bit less desirable. There is hope, however—some tools that offered initial XML support in this way have managed to mature over time to more useful methods of generating XML. Furthermore, the number of tools with good XML generation capabilities (such as Visible Advantage) is growing.

Data Layers/Data Services

The last class of XML technologies is a particular type of framework that supports data delivery. Called data layers or data service layers, these technologies permit programmatic delivery of XML data. Taking the capabilities of the servers described in the previous section, and then adding to them the data service layer, produces several distinct advantages:

image The layers can be directly model-driven in that changing the model produces changes to the XML, causing subsequent changes in the mapping.

image To the extent that your non-tabular data can be packaged with its metadata, there exists integrated support for its classification and reuse.

image Internal query integration capacities permit data of virtually all types, images, email messages, spreadsheets, streaming data, HTML documents, XML documents, and web services to be accessed using common queries.

image The data abstraction layer approach enables developers to focus on information rather than data connectivity issues.

The combination of all three server capabilities provides additional synergies as data managers discover the productivity of having the features integrated with the modeling and transformation features. To really understand the power of this integration, consider a scenario that would illustrate how data management could change for organizations able to adapt to effectively utilize the technology. Organizations without the ability to document past successes and repeat the guidance gained from those successes will be unable to take advantage of these or other datamanagement technologies—more on this at the end of this section. One organization has placed online demos of the integrated accessible on the web describing them.*

Data Management Maturity Measurement (DM3)

To close this chapter, we would like to introduce an idea that is important to data management—that of differing levels of maturity. Data management is a complicated discipline, and there are clearly some people and practices that are ahead of others in terms of their data management ability. Throughout this book so far, we have discussed the need for well-structured and understood data. After going over the tools in this chapter, however, it is vital to point out that advanced metadata management and data transformation are difficult to take advantage of if some of the more basic aspects of data management have not been addressed.

How can we assess the data management maturity of a particular organization? The tool that was developed for this process is the data management maturity measurement, or DM3 survey. By assessing how advanced the efforts are within a particular organization, data managers can be in a position to know which types of efforts are likely to succeed, and which are likely to fail. In addition, such an assessment provides information about the types of things organizations need to do in order to get ahead with their data management. The discussion of the DM3 survey here is intended to get data managers thinking about what their organizations can and cannot do, based on how mature their practices are.

Five levels of developmental maturity are applied to software in the capability maturity model and to data in the DM3. The DM3 is based on the widely appreciated capability maturity model (CMM) developed by the Software Engineering Institute.* The DM3 survey database contains data management maturity measurements of more than 250 organizations, enabling researchers to assess the current state of practice both between and within industries. Each organization is assessed relative to basic data management areas. Application of the XML technologies described in this chapter must be performed by organizations able to demonstrate data management maturity equal to the “Repeatable” level (Level 3) of the data management maturity measurement (DM3). Organizations not yet at Level 3 may not have as easy a time demonstrating savings derived from its application.

In Figure 5.13, the intersecting aspects of the DM3 survey are shown. The rows correspond to different tasks that are performed as part of an overall coordinated data management effort. The columns correspond to levels of maturity for each task. The “Initial” level is for organizations that have addressed a particular task, but in an informal or unstructured way. The “Repeatable” level means that at some point, the organization has taken the time to define a process for accomplishing a particular task, and that it is possible to repeat that process. The “Defined” level means that not only is there a process in place, but it is actually published throughout the organization and its use is encouraged. This is often quite a step up from simply having the process. The fourth level of maturity is “Managed,” which deals with whether efforts are made to determine if the process is working or not. What this means is that having a process is not enough—the question is, Does the organization know that the process is and continues to be the right process? Finally, maturity level five, referred to as “Optimizing,” addresses whether or not an organization uses a feedback loop to constantly improve the process that is used. Organizations that rank at maturity level five typically make a concerted effort to find out if the process is breaking down, and if so, what the alternatives are and how they can be fed back into the process to improve it. This is referred to as an “optimizing feedback loop.”

image

Figure 5.13 Five levels of developmental maturity and five data management areas of the DM3.

The DM3 survey has shown itself to be an extremely effective tool in assessing data management maturity. There is a strong correlation between overall DM3 score and success in data management projects, as well as strategic advantages gained from use of data. We typically recommend that data managers take a look at issues that the DM3 brings up, simply because it is difficult for an organization’s efforts to improve unless it knows where it stands.

Chapter Summary

The key to understanding how to effectively implement XML technologies is to see them as extensions of data management technologies. GUIs such as ItemField are really data rationalization tools. Now you can hand them to the knowledge workers and ask them to help you to understand their uses of data.

In this chapter, we have covered input, process, and output of XML data with discussions of various approaches to XML problems, and we have looked at the data management maturity that is necessary to apply these technologies effectively. When these technologies are tied together with coherent data structure and sound engineering principles, they can be tremendous assets. Going back to the architectural and engineering analogy, if the foundation of a building is well built and the architectural plans for the building are specified and understood, the technologies outlined in this chapter can act as the heavy lifting machinery that allows the construction of something truly astounding.

References

Aiken, P.H., Ngwenyama, O., et al. Reverse engineering new systems for smooth implementation. IEEE Software. 1999;16(2):36–43.

Hamby, S., Understanding XML servers, Paper presented at the DAMA/Metadata Conference. Orlando, FL. 2003, April.

Olson, J.E. Data quality: The accuracy dimension. San Francisco: Morgan Kaufmann; 2003.

The data management program definitions are defined by and used with permission from Burton G. Parker Parker, B.G. Enterprise wide data management process maturity. Auerbach Data Base Management. 1999.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.158.32