Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8

Storing: XML and Databases

8.1 Introduction

The act of querying XML obviously requires that there is XML to be queried. What most standards related to querying XML do not address is the question of where that XML is found.

In this chapter, we discuss several ways in which XML documents can be made available for querying. Among these are ordinary computer file systems, websites, relational database systems, XML database systems, and other persistent storage systems. Such persistence facilities may present a single XML document at a time, or they might provide the ability to query a collection of documents at once. Another source of XML, however, does not require persistent storage but involves XML that is presented to a client (such as a querying facility) as it is generated. The capability of generating XML (usually dynamically) and transmitting it to one or more clients in “real time” is often called streaming. Querying XML that is persistently stored offers several advantages and challenges, while querying streaming XML presents other advantages and challenges.

As you read this chapter, you’ll learn about the differences in ways that XML can be stored (persistent XML) along with the advantages and challenges involved in querying that persistent XML. The mechanisms for storing persistent XML data range up to enterprise-level database systems, with all of the robustness, scalability, transaction control, and security that such systems offer.

You’ll also learn about the advantages and challenges associated with queries evaluated against XML streams. Such data might be broadcast for consumption by many clients (stock ticker data, for example) or might be streamed to a single client (real-time communication systems, such as instant messaging). The common thread is that data, once transmitted, cannot be retrieved a second time.

There is also a middle ground in which XML is often used: message queuing systems. Such systems often require that data be stored in some temporary location until it can be transmitted to its consumer, but they rarely involve long-term persistence of the data. Such data is sometimes queried while it resides in its temporary storage locations and sometimes when it has been released from storage and is being transmitted to a receiving agent – and thus behaves more like streamed data.

8.2 The Need for Persistence

A great deal of the XML data most people encounter today is stored somewhere – that is, it is persistent. Storing XML data persistently makes a great deal of sense for data that may be used many times, especially when that data has a high value and may have been expensive, even difficult, to create.

Examples of such XML abound: Our movie collection is documented in an XML document; corporations are increasingly likely to store business data like purchase orders in an XML form; many technical books are being produced from XML sources; the W3C‘s specifications themselves are all coded in XML; even computer applications’ initialization and scripting information is increasingly represented in XML. Of course, different types of information present different requirements for persistent storage. Some sorts – such as the books owned by a publisher – probably need to be retained for lengthy periods of time, while others – messaging data, for example – might have a lifetime measured in seconds or minutes. The various mechanisms discussed in the remainder of this section easily support the wide variety of requirements for storing XML.

8.2.1 Databases

A database, according to the Wikipedia,¹ is “an information set with a regular structure.” A database system, or database management system (DBMS), is thus (for our purposes, at least) a computer system that manages a computerized database. While it’s not unknown for some people to apply the term database management system to extremely primitive data management products, the term is most often used to describe systems that provide a number of important characteristics for data integrity. Among these characteristics are:

• Query tools, such as a query language like SQL or XQuery

• Transaction capabilities that include the so-called ACID properties: atomicity of operations, consistency of the database as a whole, isolation from other concurrent users’ operations, and durability of operations even across system crashes

• Scalability and robustness

• Management of security and performance, including registration and management of users and their privileges, creation of indices on the data, and provision hints for the optimization of operations

Several types of database management systems are in wide use by enterprises of all sorts, but we believe that only three are commonly employed to store and manage XML data: relational, object-oriented, and “pure XML.” All of these types of database inherently provide the ability not only to store and retrieve XML documents but also to search that data through the use of query languages of some sort. Querying XML data in a DBMS is probably more effective than querying XML data stored in other media, if for no other reason than the existence of various performance-enhancing features of a DBMS, such as indices.

It is worth noting one important consideration when storing XML in a database system: XML, by definition, is based on the Unicode character set.² Not all database systems support Unicode, and some support Unicode only when that character set was chosen when the database system was installed or when the specific database was created. Increasingly, however, we see that all of the major relational database systems are being updated to employ Unicode internally – implying that this may no longer be a serious issue in a few years. We have not investigated the status of Unicode in object-oriented DBMSs, but the fact that many of them have Java interfaces suggests that they may use Unicode internally. Naturally, pure XML databases will always use Unicode internally.

Relational Databases

You won’t be surprised to hear that a very large fraction of persistent XML is found in relational databases, right along with other data vital to an enterprise’s business. Most large businesses today – and an increasing percentage of smaller businesses – depend on relational databases to store and protect their data.

Relational database management systems (RDBMSs) have been on the scene since the early 1980s and have arguably become the most widely used form of DBMS. The billions of dollars that have been invested into commercial relational database systems (such as Oracle’s Oracle database, IBM’s DB2, and Microsoft’s SQL Server) have given them formidable strengths in the data management environment. Such systems are tremendously scalable, often able to handle thousands of concurrent users accessing many terabytes – even petabytes – of data.

Some say that the relational database systems – because of the two decades and billions of dollars invested in their infrastructure and code, their proven ability to adapt to new types of data, and their entrenchment in so many organizations – might never be superseded in the marketplace by other, more specialized database products. Whether this is mere hubris or a realistic view of the world, we see that the vendors of RDBMS products are adapting very quickly to a world in which XML support is a major requirement.

Starting in roughly 2001, most commercial relational database vendors began adding support for XML data into their products. Initially, the focus was on merely storing XML documents and retrieving them in whole, without the ability to perform any significant operations on the content of those documents. Some systems merely stored serialized XML data in character string columns or CLOB (character large object) columns, while others explored ways of breaking the XML data down into component elements, attributes, and other nodes for storage into columns in various tables. (This latter mechanism, commonly called shredding the XML, is discussed further in Section 8.2.3.)

As the vendors’ experience with – and customers’ requirements for – XML grew, the products gained more direct support for XML as a true data type of its own. A native XML type (see Section 8.3) was defined for the use of database designers and application authors. New built-in functions (see Chapter 15, “SQL/XML”) were developed to transform ordinary relational data into XML structures of the users’ choice. And a variety of ways were invented to query within XML stored in that native XML type, including the ability to invoke XPath and XQuery (see Chapter 9, “XPath 1.0 and XPath 2.0,” Chapter 10, “Introduction to XQuery 1.0,” Chapter 11, “XQuery 1.0 Definition,” and Chapter 15, “SQL/XML”) on that XML. In addition, these products have been given the ability to support XML metadata, largely in the form of XML Schema (see Chapter 5, “Structural Metadata”).

Of course, we may be biased by our years of participation in the relational database world, but we believe that RDBMS products are rapidly becoming as fully capable of managing XML data as they are of managing ordinary business data.

Object-Oriented Databases

In the late 1980s and early 1990s, a new form of DBMS was introduced to the data management marketplace, the object-oriented database management system (OODBMS). Unlike the RDBMS products, OODBMS products suffered from not having a formal data model on which their design was based. As a result, the meaning of the term OODBMS varied widely between implementations. What they all had in common, of course, was that they managed objects instead of tuples of attributes or rows of columns.

Arguably, the real world is better represented as a collection of objects, each having a state (data about the individual object) and behaviors (functions that implement common semantics of classes of objects). Object-oriented programming languages (OOPLs) were coming into prominence (and have since tended to dominate some application domains), and it was natural to want to persistently store the objects being manipulated in OOPL programs. Some OODBMSs took the approach of allowing individual objects (or classes of objects) handled by a particular OOPL program to be “marked” with a flag that indicated whether or not the object (or members of the class) were to be automatically placed into persistent storage – without any specific action (e.g., a “store” command) taken by the program. Others made the OODBMS an integral part of the OOPL so that storing and retrieving objects was done completely seamlessly without any application code involved. Still others required that the OOPL programs explicitly store and retrieve objects when the program made the decision to do so.

What was generally missing from all of these OODBMS products was a common query language that allowed applications to locate objects based on their states and to retrieve information about specific objects. The RDBMS world had standardized on the database language SQL, so the OODBMS community³ decided to adapt SQL for use as a query language in their world; the result of that adaptation is a language called OQL, which is a search-and-retrieval-only language without built-in update capabilities.

A significant portion of the XML community views XML as naturally object-oriented (for example, every node in an XML document has unique identity, as do objects in all object-oriented systems). Consequently, when XML became a significant market force, we expected that Object Data Management Group (ODMG) would quickly move to incorporate this new type of data, if only by adapting an XML data model like the DOM (Document Object Model)⁴ for use in the context of ODMG. While the owners of the ODMG standard have not yet published a new version with explicit XML support, a group of academics did just that in a system they called Ozone.⁵ Subsequently, an open-source effort providing an Ozone database system⁶ was established. The documentation of this effort states that “ozone [sic] includes a fully W3C-compliant DOM implementation that allows you to store XML data.”

We are unaware of any significant presence in the marketplace of OODBMS products that incorporate explicit support of XML as a data type (in the sense that the Ozone system does, at least). This may be due to the fact that OODBMSs in general have found secure niches in the data management community and that those niches have little need for XML except as a data interchange format. It may also be due to the fact that many (but not all) relational database systems have embraced object technology and are popularly known as object-relational database management systems (ORDBMSs). In any case, we do not perceive a near-term movement toward the use of OODBMS products for large-scale management of XML data.

Native XML Databases

We were not surprised that a number of start-up companies as well as some established data management companies determined that XML data would best be managed by a DBMS that was designed specifically to deal with semistructured data – that is, a native XML database.

But what, exactly, is a native XML database? One resource we found⁷ defines it in terms of three principle characteristics:

• Defines a (logical) model for an XML document

• Has an XML document as its fundamental unit of (logical) storage

• Is not required to have any particular underlying physical storage model

Undoubtedly, the most important of those three criteria is the first one, the definition of a model for XML documents. As you’ve seen elsewhere in this book (e.g., Chapter 5, “Structural Metadata,” and Chapter 6, “The XML Information Set (Infoset) and Beyond”), a number of data models for XML are in current use. The specific model chosen for a native XML database system is less important than the requirement that it support arbitrarily deep levels of nesting and complexity, document order, unique identity of nodes, mixed content, semistructured data, etc.

Unfortunately for companies that invested heavily in the development of what we call “pure XML” database systems, the widely accepted definition of “native XML” database systems doesn’t exclude other existing technologies. The definition cited earlier makes it clear that relational database systems can provide all of the required characteristics of a native XML database. This can be done either by building an XML-centric layer atop a relational system or by incorporating new XML-specific facilities directly into relational engines. Of course, that doesn’t mean that there is no marketplace for pure XML DBMSs. However, we suspect that, like OODBMSs before them, pure XML DBMSs will find small but secure niches for themselves where they satisfy very specific needs that are not targeted by RDBMS (or ORDBMS) products.

8.2.2 Other Persistent Media

While a great proportion of enterprise XML data is managed by explicit database management systems, we believe that a large majority of XML in the world today does not get stored in DBMSs at all. Instead, XML documents are found in ordinary operating system files and on web pages. A quick search of just one of our computers found several thousand XML documents – most of which we didn’t even realize were there, since they were created as part of the installation of several software products.

The advantage of storing XML documents in ordinary files on your own computer is, of course, that everybody with a computer has a file system – while most of us don’t (yet) have formal DBMSs installed on our computers or even unrestricted access to our organizations’ DBMSs. Better yet, those files are completely under your control and not governed by some database administrator somewhere in your organization. Of course, there are disadvantages as well: You’re usually responsible for backing up your own files, lack of transactional control makes data loss more likely, and the problems of keeping track of perhaps thousands of XML files are quite tedious. Perhaps more importantly, there is usually no way to enforce any consistent relationships among those thousands of XML files – those documents that specify configuration information for software products might define the same operating system environment variable in multiple, incompatible ways.

Some people argue that a single XML document can be a sort of “database-in-a-file.” If you take this sort of approach, you would just mark up your data on the fly, making up tag names as you go. Unfortunately, unless you write a good XML Schema to validate that document, it’s awfully difficult to keep that data internally consistent, because you might use different “spellings” of tags to represent the same conceptual entity (<SerialNumber> one time, <SerNum> another time, <Serial-num> a third, all to represent the serial numbers of products you own). We recommend strongly against such an approach to storing your data, although the concept might be very useful for transporting your data from one environment to another – that is, as a data exchange representation.

XML documents that are found across the World Wide Web probably don’t outnumber those found in ordinary file systems, but you are personally likely to find more web-available XML documents than there are XML documents on your personal file system. The problem with those web documents is that a given website may or may not be “reachable” at any given time, making access to those documents somewhat less dependable at any moment than access to your own documents.

That, of course, has implications on querying those XML documents. A query facility that accesses files stored in your local file system always has access to those files (subject only to the availability of your file system), whereas a query facility that searches data on the web may sometimes find a given document and other times not find it because of websites going offline temporarily (or permanently).

Nonetheless, we believe there is a market for XML querying tools that don’t depend on the existence of a DBMS but that search XML documents in local file systems and across the web. Many of these tools will implement XQuery, while others may provide some other query language.

8.2.3 Shredding Your Data

In Section 8.2.1, under the subheading “Relational Databases,” we mentioned that some relational database vendors provided a way for XML documents to be broken down into their component elements, attributes, and other nodes for storage into columns in one or more tables. It can be argued that such shredding of XML documents does not preserve the integrity – the “XML-ness” – of those documents. While that argument is probably valid for some shredding implementations, other implementations manage to preserve the XML-ness of the documents. In fact, such implementations usually provide options that allow the user to control what level of XML-ness must be preserved. Vendors of those products typically provide a variety of ways of reconstructing the XML documents from the shredded fragments. What many of the shredding implementations do not do particularly well is to allow queries to be written that depend heavily on complex structures in some XML documents or that search for data located at arbitrarily deep levels of nesting.

The purpose of shredding is to improve (relative to character string or CLOB – character large object – representations, that is) the efficiency of access to the data found in XML documents. When XML serves the same purposes as its ancestor SGML – that is, representation of documents, such as books and technical reports – the data represented in the XML is semistructured by nature. However, XML is also used to represent much more regular, or structured, data, such as purchase orders and personnel records. Most people would not consider shredding an appropriate way of handling books or magazine articles marked up in XML. Instead, it is much more likely to be used for dealing with data-oriented XML.

Shredding can be done in a very naïve manner, such as defining a SQL table for each element type (at least those allowed to have mixed content) in a document, with columns for each attribute, the nonelement content of those elements, and the content of child elements that are not allowed to have element content themselves. For simple documents like most of the movie examples you’ve already seen in this book, that naïve approach might not be completely inappropriate, as illustrated in Example 8-1 and Table 8-1. (You may recall that a similar example appeared in Chapter 1, “XML,” in our introduction to the various ways in which XML data can be stored.)

Table 8-1

Result of Shredding Movies Document

Example 8-1 Shredding an XML Document into a Relational Database

First, the XML to be shredded:

Now, the definitions of (reasonable) SQL tables into which the shredded XML data will be placed:

The data shown in Table 8-1 contains something that the input document did not contain: an id code for each movie and each director. Since the input didn’t contain those id codes, from where did they come? Well, the application that performed the shredding simply had to make them up.

Now that the data has been shredded, applications are dealing with purely relational data and can write ordinary SQL statements to query and otherwise manipulate that data. At this point, it’s trivially easy to write SQL queries to find out the longest movie in our collection:

Similarly, to know the name of the director of the longest movie, we could join data from two tables:

What’s a bit harder to do is to reconstruct the original structure of the input. In order to restore the original XML document from that shredded data, a somewhat complicated SQL query would have to be written to discover the names of the tables and columns (using the standardized SQL schema views such as the TABLES view and the COLUMNS view, unless the table names are known a priori by the application), then join the various tables together on their respective PRIMARY KEY and FOREIGN KEY relationships, and finally construct the resulting XML document. We leave the writing of such a sequence of SQL statements as an exercise for the reader; after all, most vendors of shredding-capable relational systems provide tools that reproduce the original XML document automatically.⁸ We note, however, that such relational systems normally aim to preserve a data model representation of the XML documents and not the actual sequence of characters that may have been provided in the serialized XML input. The ordering of XML elements (remember that elements in an XML document have a defined and stable order) is preserved in those systems by a variety of techniques – “magic” – that may involve the assignment of some sort of sequence numbering scheme to sibling elements of a given parent.

More complex XML documents, like those you’ll undoubtedly find throughout your organization’s business documents, don’t lend themselves to naïve shredding techniques. The tools doing the shredding often permit users knowledgeable about the data to give clues about how the shredding should be performed (sometimes using a graphical interface) or to “tweak” the table and column definitions before the XML-to-relational mapping is finished.

There will always be a use for shredding, particularly in applications that merely receive structured data in an XML format and always need to store it as ordinary relational data.⁹ However, with the increased emphasis in all major relational database implementations on true native XML support, we believe that shredding is going to diminish in popularity for most applications. It’s only fair to note, however, that implementers continue to come up with more and more sophisticated shredding techniques targeted at a variety of usage scenarios.

8.3 SQL/XML’s XML Type

In Chapter 15, “SQL/XML,” you’ll read about a relatively new part of the SQL standard¹⁰ designed to allow applications to integrate their XML data and their ordinary business data in their SQL statements.

The centerpiece of SQL/XML is the creation of a new built-in SQL type: the XML type. Logically enough, the name of the type is “XML,” just as the type intended for storing integers is named “INTEGER.”

The design of SQL/XML’s XML type makes it a true native-XML database type. Therefore, if you were to create a SQL table with a column of type XML, the values stored in that type must be XML values, and those values retain all of their “XML-ness.” In SQL/XML:2003, the XML type was based on the XML Information Set, about which you read in Chapter 6, “The XML Information Set (Infoset) and Beyond.” The next edition of SQL/XML¹¹ replaces its use of the Infoset with the adoption of the XQuery 1.0 and XPath 2.0 Data Model (discussed in Chapter 10, “Introduction to XQuery 1.0”). Along with the adoption of the XQuery Data Model, the basic definition of the XML type will be updated accordingly.

Of course, that does not mean that SQL/XML implementations are required to store values of the XML type in a collection of data structures that are isomorphic to the XQuery Data Model descriptions. Implementations might choose to store serialized XML documents and dynamically parse them into data model instances whenever they are referenced, or they might store some other already-parsed representation that can be mapped onto the data model definitions when required. In fact, implementations could even choose to shred (fully or partially) those XML values, as long as the process is transparent to applications. The internal storage details of XML type values are left up to the implementation, in the same way as the corresponding details of DATE and FLOAT values are the concern of only the implementation.

With the advent of the XML type in SQL, concerns such as “CLOB vs. shredding” will, for the most part, become even less visible to the application developer. XML will be stored in XML columns, and native SQL facilities (augmented, when desired, by XQuery) will be used to manipulate those XML values.

8.4 Accessing Persistent XML Data

Neither XQuery nor SQL (nor, for that matter, any query language) exists in a vacuum – in spite of the fact that they are generally specified as though nothing else existed. Instead, applications are typically written in one or more other programming languages, such as C/C++, Java, and even COBOL. When those applications require access to a query language, they must use some sort of API to cause their queries to be executed and the results to be materialized in the host language environment.

Most of the more conventional programming languages (such as C and COBOL) access SQL database systems by invoking a call-level interface such as SQL/CLI¹² or one of the various proprietary APIs that correspond to SQL/CLI. SQL/XML:2003 did not provide SQL/CLI extensions to deal with the XML type, but that was a deliberate choice. Because languages like C and COBOL do not have built-in data types for XML, all results of SQL statements that return a value of the XML type are implicitly cast to character string (that is, serialized) before the result is given to the invoking program.

Java programs typically access SQL database systems through the JDBC API.¹³ The current version of JDBC, 3.0, contains no provisions for exchanging XML values between a Java program and a SQL DBMS. The spec does say that it “does not preclude interacting with other technologies, including XML, CORBA, or nonrelational data,” but it offers no additional information about how such interaction should be done (other Java-related specifications provide those capabilities). It’s not inconceivable that the next version of JDBC, 4.0, will offer more direct support for access to XML data handled by SQL database systems, but no details of any such capability are available at the date of publication.

There are, however, proprietary JDBC API extensions offered by a number of vendors of SQL database engines and by vendors of middle-tier (“middleware”) facilities. Nonetheless, the “most standard” way for Java programs to access the XML data stored in SQL databases is for them to retrieve XML data using JDBC’s getObject() method and then to cast the retrieved object to an XML class defined in another Java-related specification, such as JAXP.¹⁴ At that point, the interfaces defined in that other specification can be employed to handle the XML data.

On the horizon is another API that will assist Java programs in accessing persistent XML data, whether it’s stored in a relational database system, an object-oriented database system, a pure native-XML database system, or flat files. This API, called XQJ,¹⁵ “will define a set of interfaces and classes that enable an application to submit XQuery queries to an XML data source and process the results of these queries.” In other words, it will provide a direct interface from Java programs to XML data sources without those programs having to intermix multiple APIs, such as JDBC and JAXP.

At the time of writing, an Early Draft Review version of the XQJ specification is available at the URI referenced in footnote 15. While that document is decidedly incomplete, it allows interested parties to gain an idea of what the final API will provide. We encourage our readers to become familiar with XQJ, because we believe that it will be one of the dominant APIs for querying and updating XML data from Java applications.

8.5 XML on the Fly: Nonpersistent XML Data

Throughout this chapter so far, we have focused on XML data that is persistently stored on various media. In fact, the rest of this book tends to discuss querying XML from the viewpoint of persistent storage. There are significant advantages to be had when the XML data to be queried is persistently stored. For example, query processors might be able to access specialized data structures (such as indices) to improve a query’s performance.

But not all applications find it suitable to store XML data persistently before querying it. For example, XML data containing stock market quotations might be broadcast to WAP-enabled cell phones that are programmed to alert their owners whenever particular stocks achieve a particular price. Not only are the phones generally incapable of storing very large quantities of data, but the nature of the data stream is unsuitable for storage before querying.

In particular, such data streams are literally never-ending – they may continue uninterrupted for months on end, perhaps with each stock quotation represented as a separate XML document. In addition, the queries are supposed to detect the specified conditions immediately and not after periodic store-and-query episodes.

Consequently, XML querying systems must be able to process XML documents that never exist on any persistent medium but that are only temporarily stored (perhaps in RAM) while the query is evaluated against them. There are several reasons why querying streaming XML is problematic. Consider the XML document shown in Example 8-2, in which we’ve incorporated a large number of stock ticker elements into a single document for illustrative purposes.

Example 8-2 Streamed XML Document

Now imagine a query that must retrieve the current price of XMPL if and only if the preceding 10 trades all increased in price. Further, imagine that there are hundreds, even thousands, of stockTicker elements represented by the ellipses (…). A query that examines this XML document – as it streams past – is forced to evaluate information without having access to all of the information in the document. In this case, the query would retrieve information from “this stockTicker element’s tradePrice child element,” if and only if “this stockTicker element’s preceding sibling stockTicker element’s tradePrice child element” had a lesser value, and that stockTicker element’s preceding sibling stockTicker element’s tradePrice child element had a lesser value than that, and so on until the 10th preceding sibling stockTicker element’s tradePrice child element matched the required criterion.

In general, access to an element’s ancestors and preceding siblings (and other “reverse axis” nodes) requires the ability to traverse “backwards” in the document. But how can that be done when the document is too large for available storage? In general, it cannot. Because the stream relentlessly flows past, there’s no way to go back “upstream” to capture data that has already gone by. And there lies the principal difficulty in querying streaming XML. There are (again, in general) only two ways to resolve this problem:

1. Queries can be prohibited (syntactically or by means of execution-time checks) from accessing nodes reachable only through the use of one of those reverse axes.

2. Queries are permitted to access such nodes only in documents (or document fragments) sufficiently small to be handled using limited resources.

Most streaming XML query processors choose one of these two alternatives.

Queries against streaming XML are best suited for small XML documents and relatively simple queries, perhaps involving a transformation of source XML into a more desirable form of XML or directly into HTML or even plain text. Another form of query eminently suitable for streaming applications is the sort that depends solely on “very local” data. For example, if we wanted to know the trade price of XMPL every time a trade was recorded, it’s quite easy to detect those elements as they stream past and to supply the value of the tradePrice child element whenever a stockTicker element whose symbol attribute having the value “XMPL” is seen.

8.6 Chapter Summary

In this chapter we have explored the various facilities through which XML data can be stored persistently and the implications on querying such persistent XML. We’ve explored the pros and cons of using database technology vs. ordinary file systems for storing and querying XML documents, and we’ve looked at shredding as a mechanism for storing XML documents into ordinary relational (or, indeed, other sorts of) databases. We’ve also examined the SQL standard’s new built-in XML type, its relationship to shredding, and the implications on the APIs that application programs use to access SQL database management systems. Finally, we reviewed the nature of streaming XML, its uses, and the difficulties raised when querying such nonpersistent XML data.

Our conclusion, which we hope is clear from the text, is that we believe that most applications are better served by storing XML in some persistent medium and then querying that persistent XML data. Only when the XML data is inherently unsuitable for storing, we believe, are queries against streaming XML desirable.

¹ Wikipedia, The Free Enclyopedia, http://en.wikipedia.org.

² The Unicode Standard, Version 4.1.0 (Mountain View, CA: The Unicode Consortium, 2005). Available at: http://www.unicode.org/versions/Unicode4.1.0/.

³ R. G. G. Cattell (ed.), et al., The Object Data Standard (ODBM 3.0) (San Francisco: Morgan Kaufmann Publishers, 2000).

⁴ Document Object Model (DOM) Level 3 Core Specification Version 1.0 (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/DOM-Level-3-Core.

⁵ Serge Abiteboul, Jennifer Widom, and Tirthankar Lahiri, A Unified Approach for Querying Structured Data and XML (1998). Available at: http://www.w3.org/TandS/QL/QL98/pp/serge.html.

⁶ The Ozone Database Project, http://www.ozone-db.org.

⁷ Kimbro Staken, Introduction to Native XML Databases (2001). Available at: http://www.xml.com/pub/a/2001/10/31/nativexmldb.html.

⁸ In fact, such tools often do not produce a new XML document that is identical in every respect to the initial document. Differences often include changes in nonsignificant white space and the exact representation of literals (canonical form for such literals may be used instead).

⁹ For those who need to do shredding (or, in a more generalized sense, mapping of XML to relational data), a number of XML mapping products make that task easier. Some with which we are familiar are Altova’s MapForce (http://www.altova.com), Oracle’s XDB schema processor and the Schema annotations it supports (http://www.oracle.com), and IBM’s DAD (Document Access Definition) component of DB2’s XML Extender (http://www.ibm.com).

¹⁰ ISO/IEC 9075-14:2003(E), Information Technology – Database Languages – SQL – Part 14: XML-Related Specifications (SQL/XML) (Geneva, Switzerland: International Organization for Standardization, 2003).

¹¹ FDIS (Final Draft International Standard) 9075-14:2005, Information technology – Database Languages – SQL – Part 14: XML-Related Specifications (SQL/XML) (Geneva, Switzerland: International Organization for Standardization, 2005).

¹² ISO/IEC 9075-3:2003(E), Information Technology – Database Languages – SQL – Part 3: Call-Level Interface (SQL/CLI) (Geneva, Switzerland: International Organization for Standardization, 2003).

¹³ JDBC 3.0 API (Santa Clara, CA: Sun Microsystems, Inc., 2002). Available at: http://java.sun.com/products/jdbc/download.html#corespec30.

¹⁴ Java API for XML Processing (JAXP) 1.3 (Santa Clara, CA: Sun Microsystems, Inc., 2002). Available at: http://jcp.org/aboutjava/communityprocess/pfd/jsr206/index2.html.

¹⁵ XQuery API for Java^™. Available at: http://jcp.org/en/jsr/detail?id=225 (currently in development).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Storing: XML and Databases

Create new playlist

Sign In

Sign Up