Chapter 12. XQuery Serialization

Introduction

When working with XQuery, two different serialization issues arise, and in this chapter we contend with both of them.

The first is how to serialize the results of an XQuery as XML. This is difficult because the XQuery Data Model is larger than XML itself; for example, an XQuery can result in two documents, or a list of attributes and numbers, none of which are directly serializable as XML. This is known as XQuery Serialization.

The second issue is how to represent the query itself using XML. Representing query expressions as XML allows them to be manipulated using XQuery itself. This XML syntax is known as XQueryX.

At the time of this writing, neither of these serialization formats has solidified. XQuery serialization is a relatively new addition to the draft specification and is still undergoing significant changes at the time of this writing. In contrast, XQueryX has been a part of XQuery almost from the start, but hasn't been updated in more than two years.

It's not surprising, then, that most implementations lack support for one or both of these serialization formats. Check the documentation accompanying your XQuery implementation to be sure.

XQuery Serialization

Data Model serialization faces three main obstacles. First, some values simply may not be directly representable in the target format. For example, XML cannot contain arbitrary characters. Serialization formats usually deal with this complication by encoding the data into some other form that is representable.

Second, some serialized information may not round-trip. That is, deserializing the serialized form may result in a different data model instance than was serialized. In some cases, this information loss may be deemed acceptable.

And finally, the serialization format must reserve certain names for its own use, which could then collide with user names. A well-designed serialization format anticipates this problem and works around it.

In the realm of XML and text, additional complications arise, such as the character encoding to use, whether to perform whitespace or Unicode normalization, and various other text-formatting rules.

In the following sections, you will see how XQuery Serialization handles each of these obstacles.

Sequences of Values

XQuery serializes sequences of values using the same format as XML Schema: space-separated strings. For example, the sequence (1, 2.5, "x") is serialized by converting each atomic value to a string, adding spaces between consecutive atomic values, and then concatenating together the result, to produce "1 2.5 x".

There are two potential problems with this serialization choice. One is that now some values are ambiguous. For example, a sequence containing a single string value "x y" will be deserialized as a sequence containing two string values ("x"ΒΈ"y") when the type is xs:string*. XQuery Serialization (like XML Schema) doesn't provide a way to work around this difficulty. The other is that the conversion to string uses the canonical representation of the value, which may differ slightly from how the user originally specified it. For example, you may have written 1e0 but the serialized result is 1.0E0. However, this difference is already considered insignificant by XQuery, so its effects should be minimal on your applications.

The Root

XQuery Serialization doesn't allow a data model instance that contains a single attribute, xs:QName value, or namespace node to be serialized (nor sequences of these). These items require some context, such as an enclosing element, to be serialized. Consequently, the results of the XQuery attribute a {"b"} cannot be serialized, but the results of <x>{attribute a {"b"}</x> can be serialized.

If the root of the data model consists of a single value (which the previous section would have converted to xs:string), then the value is replaced by a text node containing that value. In this way, the serialized form is always a sequence of zero or more nodes. When serializing, each document node is omitted from the results; its children are used instead.

At this point, the result is a valid XML fragment, that is, a sequence of zero or more nodes, each of which is an element, comment, processing instruction, or text node.

Serialization Parameters

In addition to the rules described above, implementations may allow you to control certain other aspects of the serialization process, such as whether to use CDATA sections or whether to indent the output. These serialization parameters vary from one implementation to the next.

XQueryX

One of the requirements the W3C set for XQuery was that it must provide an XML syntax for queries. The commonly cited reason for this requirement is to allow queries to operate on queries (for example, this might enable you to write an XQuery interpreter using XQuery).

XQueryX obviously resembles XSLT in both form and function. In fact, people today use XSLT to generate or transform other XSLT transforms. However, XSLT is limited by the fact that embedded XPath queries are string values, and consequently difficult to manipulate. The idea behind XQueryX is to address this problem by defining a standard XML serialization of every XQuery expression, even paths.

At the time of this writing, the standard has not yet defined this serialization. However, it will likely end up being an XML serialization of the XQuery parse tree. Although it is too soon to provide concrete details about XQueryX, Listing 12.1 provides an idea of what it may eventually look like. This example shows a possible representation of the query a/b[2].

Example 12.1. XQueryX may look like something like this

<query xmlns="urn:hypothetical-xqueryx">
  <path>
    <step axis="child">
      <name>a</name>
      <step axis="child"> 
        <name>b</name>
        <predicate>
          <integer>2</integer>
        </predicate>
      </step>
    </step>
  </path>
</query>

Conclusion

In this chapter we explored how XQuery serializes its data model in XML. This serialization is complicated by the fact that the XQuery Data Model is so much more expressive than XML itself. Not all implementations support this serialization, in part because it's still being defined.

We also examined how XQuery may define serializing a query in XML (XQueryX). XQueryX defines essentially a parse tree for XQuery, in a standard way to allow interoperability across implementations. Not all implementations support XQueryX either, for the same reason.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.73.125