Chapter 12

XQueryX

12.1 Introduction

XQueryX is an alternative syntax for the XQuery language, where a query is represented as a well-formed XML document (as opposed to just a string of characters). There is a mindset in the XML world that says, “XML is a good way of representing stuff, and therefore all stuff should be represented as XML.” For example, one of the advantages of XML Schema over DTDs is that an XML Schema is an XML document, while a DTD is not. This turns out to be a very practical way to go about things – it really is useful to be able to treat an XML Schema, or an XQuery, as an XML document. It means you can:

• Validate it against an XML Schema (an XML Schema can be validated against the Schema for Schemas,1 and an XQueryX can be validated against the XQueryX Schema).2

• Create it with an XML editing tool.

• Store it the same way you store other XML documents.

• Pass it around as an XML document, e.g., as a SOAP message.

• Query it, using XPath or XQuery – or XQueryX.

• Embed it in another XML document.

The XQueryX spec3 notes a couple of other benefits of an XML representation of an XQuery – parser reuse and automatic query generation. In fact, many people believe that XQueryX is the only XML query syntax we need – after all (the argument goes), nobody actually writes queries by hand; they write applications that write queries. So what if XQueryX is verbose and difficult to read and write, only applications will read and write XQueries, so it’s more important to make the language machine-readable/writable than human-readable/writable. That argument does have supporters, but the bulk of the XQuery Working Group’s efforts have gone into creating the human-readable/writable syntax for XQuery. XQueryX, though recognized as a requirement early on, has been defined as an adjunct to the non-XML syntax.

Given the XQuery language, there are a number of ways you could define an XML syntax (that is, a way to represent any possible XQuery in XML). In Section 12.2 we describe two possible extremes – a trivial embedding and a fully parsed XQuery – and we describe some of the design features of XQueryX. In Section 12.3 we describe how the XQueryX spec defines XQueryX. In Section 12.4 we look closely at some example XQueries and their XQueryX representations. And in Section 12.5 we discuss how and why you might query XQueryX documents.

12.2 How Far to Go?

There is a non-XML, human-readable/writable syntax for the XQuery language, and we want to define an XML syntax based on that language. The XML syntax must be able to express exactly what the non-XML syntax expresses, no more and no less. And it probably should be recognizable as an XML representation of the non-XML syntax, reusing the same keywords and clauses.4 But how far should XQueryX go in the direction of XML? Let’s look at the two possible extremes – a trivial embedding of XQuery into XML, and an XML representation of a parsed XQuery – before discussing what XQueryX actually does.

12.2.1 Trivial Embedding

The simplest way to represent the XQuery syntax as XML is just to wrap each query in a start tag and an end tag, as in Example 12-1.

Example 12-1   Trivial Embedding (1)

image

This trivial embedding works for some queries, but what if the query includes, e.g., a less-than sign? The resulting XML would not be well-formed, unless you escaped the less-than sign somehow. You could wrap the whole query in a CDATA section, effectively escaping any special characters that might occur in the query (Example 12-2).

Example 12-2   Trivial Embedding (2)

image

But you can’t apply that strategy blindly either – if there is already a CDATA section as part of the query, wrapping it in another CDATA section again creates something that is not well-formed XML (CDATA sections cannot be nested in well-formed XML). So the most trivial embedding that will work for all queries is one that involves either wrapping each special character in the query in a CDATA section, or replacing each special character with a character entity reference (Example 12-3).

Example 12-3   Trivial Embedding (3)

image

So the “trivial embedding” approach is not entirely trivial. And it only achieves some of the goals for an XML syntax. A query like the one in Example 12-3 is certainly well-formed XML, but it has no real structure to it – it’s just a single element that contains the full text of the query. You could pass this in a SOAP message or embed it in an XML document, but you could not perform meaningful queries against it, nor could you store it as XML. And this syntax does not help with parser reuse or automatic query generation – the text of the query needs to be parsed (or generated) in exactly the same way as the non-XML query, but with two more tags and some CDATA sections and/or character entity references to consider. However, the “trivial embedding” approach is considered useful, and the XQueryX Schema and stylesheet both support it – i.e., Example 12-3 is a valid XQueryX instance.

12.2.2 Fully-Parsed XQuery

The opposite extreme to trivial embedding would be to represent the fully-parsed form of an XQuery as XML, where each language construct, down to individual characters, is a separate element or attribute. By adopting this approach, you achieve all the benefits of XQueryX – a query needs to be parsed only once (when you first create the XQueryX), and this form is easy to generate automatically (as a natural by-product of parsing the query). The downside to this approach is its verbosity – you’ll see in Example 12-5 just how long the simplest XQueryX would be if it mapped every XQuery grammar production. And, as you’ll see in Section 12.2.3, it is possible to define XQueryX so that XQueryX queries are even more amenable to being queried than this fully-parsed representation.

12.2.3 The XQueryX Approach

The approach taken in the XQueryX spec is fairly close to the “Fully-Parsed XQuery.” That is, an XQueryX looks quite like an XML representation of the parsed form of an XQuery. There are two broad areas where XQueryX deviates from a straightforward parsed query mapping.

First, XQueryX does not reflect every production, and it does not represent “empty” parts of a production (parts of a production that are optional, and that don’t exist in the XQuery being represented). If XQueryX did faithfully represent every part of every grammar production, then XQueryX queries would be even more verbose than they are under the current spec – see Example 12-5 for an example.

Second, XQueryX represents constructs such as expressions, operators, and literals so that their representation (in an XQueryX instance document) is concise, yet you can create broad or narrow queries (to search for nodes higher or lower in the parse tree). We look at each of these in turn, illustrating them with fragments of the XQueryX Schema and fragments of an XQueryX instance document. In the XQueryX instance fragments, we use the namespace prefix “xqx.”

Expressions

There are many different kinds of expressions in XQuery – the FLWOR expression, the path expression, etc. In XQueryX, each kind of expression is represented by an element with a name describing that kind of expression. For example, a path expression is represented by an element called “pathExpr.” An element representing a kind of expression has a Schema type with the same name as the element name, based on the “expr” type. For example, the type “pathExpr” is defined in the XQueryX Schema as an extension of the “expr” type, like this:

image

image

The element “pathExpr” has the type “pathExpr” (an extension of the type “expr”), and is a member of the substitution group “expr“:

image

The type “expr” is defined in the XQueryX Schema, along with an “expr” element, like this:

image

Notice that the “expr” element is marked as abstract. That means you can’t have an element of that name in a valid XQueryX instance document, but you can define a substitution group with the element “expr” as its head element.5 In general, we can say that “expr” represents a base class for all expressions in XQueryX, and each kind of expression is a subclass of “expr.”

A path expression in an XQuery is represented in an XQueryX instance as:

image

It is easy for the human reader to see that this is a path expression (an element with both name and type of “xqx:pathExpr”). Perhaps more importantly, it is easy to run an XQuery over one or more XQueryX instance documents to find all path expressions. You can also do a broader search, for all expressions. There are (at least) two ways to achieve such a search. First, schema-element (expr) matches any element in the substitution group headed by “expr” whose type matches, or is derived from, the type of the “expr” element (i.e., matches any expression). Second, element(*, expr) matches any element with any name (“*”) with a type that matches, or is derived from, the type “expr” (again, matches any expression). See Example 12-10 and Example 12-11.

Operators

Let’s look closely at the “less than” comparison operator (”<”) as an example of an operator. The relevant XQuery grammar production is:

image

XQueryX does not map this production exactly. Instead, XQueryX represents “less than” as an element called “lessThanOp”, of type “binaryOperatorExpr”, belonging to the substitution group headed by “generalComparisonOp”. The type “binaryOperatorExpr” is based on the type “operatorExpr”, which in turn is based on the type “expr”. The Schema definition of the type “binaryOperatorExpr” dictates that an element of this type must have two child elements, “firstOperand” and “secondOperand”, each of type “exprWrapper”. Here’s how all that looks in the XQueryX Schema document:

image

image

The “less than” comparison operator is represented in an XQueryX instance document like this:

image

See Example 12-11 for a complete example.

Given this structure – a type hierarchy plus a substitution group – you can write XQueries to find all “less than” comparisons in one or more XQueryX instances, and you can broaden the search in two ways. First, schema-element (generalComparisonOp) matches any element in the substitution group headed by “generalComparisonOp” whose type matches, or is derived from, the type of the “generalComparisonOp” element (i.e., matches any general comparison operator – “=”, “!=”, “<”, “<=”, “>”, “>=”). Second, element(*, binaryOperatorExpr) matches any element with any name (“*”) with a type that matches, or is derived from, the type “binaryOperatorExpr” (i.e., matches any binary operator expression – general comparisons, value comparisons, or node comparisons).

Literals

The XQuery grammar defines two kinds of literals (constants) – string and numeric – and then breaks down numeric literals into integer, decimal, and double, like this:

image

The XQueryX Schema, on the other hand, defines a type “constantExpr” based on “expr”, and four subtypes of “constantExpr”, one each for integers, decimals, doubles, and strings. The XQueryX Schema for “constantExpr” and “integerConstantExpr” looks like this:

image

An XQueryX representation of the integer 42 looks like this:

image

Once again, you can write an XQuery to do broad or narrow searches across one or more XQueryX instances – find all constants (literals), or all integer constants, or all expressions.

Summary

In summary, XQueryX nearly represents the parsed form of an XQuery, representing tokens and atomic values, but not individual characters, as elements. XQueryX represents all the structure of the XQuery grammar, including, for example, each step in an XPath expression. This means you can query a collection of queries to find out, e.g., how many queries include some string literal, or which queries include a particular XPath axis (see Section 12.5). XQueryX does not include a one-to-one representation of every XQuery grammar production – instead, it uses subtyping and substitution groups to enable broad or narrow queries over (fairly) concise XQueryX instances.

12.3 The XQueryX Specification

Now that you have the general flavor of the XQueryX approach to representing XQueries in XML, let’s look at the XQueryX specification before stepping through some complete examples.

The XQueryX specification defines XQueryX by providing an XML Schema, which defines the syntax of XQueryX, and a stylesheet, which defines the semantics. The spec also includes some worked examples and a definition of a trivial embedding.

The XQueryX Schema defines what an XQueryX query can look like. The Schema follows the XQuery grammar quite closely. The size of an XQueryX query is kept manageable by skipping some productions, and by not forcing empty productions to be represented. In Section 12.4, we take some example XQueries and look at the XQuery grammar rules, the XQueryX Schema, and the XQueryX representation of the query together.

The semantics of XQueryX are defined by the XQueryX stylesheet – i.e., the meaning of any XQueryX instance is the meaning of the XQuery produced by applying the XQueryX stylesheet to it. The XQueryX spec does not explain how to get from XQuery to XQueryX, but the stylesheet ensures that we always know when we get there.

12.4 XQueryX By Example

The XQueryX specification does not give any guidelines on how to produce an XQueryX instance (query), given an XQuery. But if you study the XQuery grammar productions, the XQueryX Schema, and the examples in the XQueryX specification, it’s not too difficult to produce XQueryX queries. If you are not too sure how your XQuery should parse, you can get (some of) the parse tree for an XQuery from the XQuery grammar test applet.6 And of course you can check the resulting XQueryX query by running it past the XQueryX stylesheet and checking the result against your original XQuery.

12.4.1 The Simplest XQueryX Example – 42

Let’s start with a simple example – the number 42. 42 is a valid XQuery, so we can produce an XQueryX query that represents it. In the XQueryX query examples in this section, we show first the XQuery, then the XQueryX query, and then the result of applying the XQueryX stylesheet to the XQuery. The latter is semantically equivalent to the original XQuery.

Example 12-4   XQueryX (1)

image

image

To see how we got from the XQuery 42 to Example 12-4, take a look at the XQuery grammar EBNE. The first production is:

image

This says that an XQuery is a Module, which is an optional VersionDecl followed by either a MainModule or a LibraryModule. We don’t need a version declaration, and we don’t have any library modules, so our XQueryX query is just a module element with one child, mainModule. So, what constitutes a MainModule?

image

A MainModule is a Prolog followed by a QueryBody, and all the parts of the Prolog are optional. One could argue that the XQueryX should contain an empty prolog element – after all, the prolog is not optional, it’s mandatory, though it may be empty. The XQueryX spec misses this subtlety, so we can leave out the prolog altogether and look at what makes up a QueryBody.

image

A QueryBody is an Expr, and an Expr is one or more ExprSingles separated by commas. At this point we have to take a long walk through several grammar productions to find that an ExprSingle can be just a PathExpr. This may look a little convoluted, but it works for XQuery – the grammar is (mostly) LL(1) (meaning you can parse any statement by looking at each token from left to right, never having to look ahead more than one token), and the precedence of the operators such as “and” and “or” is implicitly defined by the grammar productions – operator precedence doesn’t have to be defined separately. Scott Boag, XQuery grammar guru, calls this cascading precedence. You’ll read more about the XQuery grammar in Appendix C: XQuery 1.0 Grammar. For now, it’s enough to read through the next few grammar productions, ignoring anything that is optional.

image

So an ExprSingle can be a PathExpr. In the same way, a PathExpr can be simply an IntegerLiteral (PathExpr = RelativePathExpr = StepExpr = FilterExpr = PrimaryExpr = Literal = NumericLiteral = IntegerLiteral).

image

Finally, an IntegerLiteral is a Digits, which is a sequence of one or more characters in the range 0 through 9.

image

The simplest way to represent the XQuery 42 as XML, mapping each grammar rule in turn into a new element, would yield the XQueryX-like syntax in Example 12-5.

Example 12-5   Not an XQueryX

image

image

image

Example 12-5 is quite a mouthful for a simple query, and it’s not terribly useful for searching. To improve this situation, XQueryX represents only the meaningful steps in parsing this query. The definition of “meaningful” here is somewhat subjective – in general, XQueryX includes elements that have some content and/or are useful for searching. In Example 12-4, the XQueryX query contains an element for each of the module, mainModule, and queryBody productions, and then it skips to an integerConstantExpr element. module, mainModule, and queryBody are defined in the XQueryX Schema in an obvious way, like this:

image

This gives us the pattern for our example XQueryX query, minus the contents of the queryBody:

image

We have already met the integerConstantExpr element, in Section 12.2.3 – it has a single child element, value, of type xs:integer, which yields the XQueryX query in Example 12-4 – a relatively compact, easy-to-search XML representation of the XQuery 42.

Before we look at a slightly less simple example, we should point out that embedded expressions – expressions that occur “inside” other expressions – are defined as type xqx : exprWrapper in the XQueryX Schema, not as xqx:expr. xqx:exprWrapper is, as its name implies, a wrapper around the expr type:

image

The purpose of exprWrapper is to provide an additional level of abstraction on expr, which may be used in a later version of the spec. At the time of writing, it serves no useful purpose.

12.4.2 Simple XQueryX Example

Now let’s look at an XQuery that is a bit less simple than Example 12-4. Example 12-6 is still not a terribly useful query, but it has a few more constructs for us to look at.

Example 12-6   Simple XQuery Example

image

We start, as before, with a Module. This time there is a VersionDecl as well as a MainModule. The VersionDecl is

image

But the only nonkeyword information in VersionDecl is the string containing the xquery version, so the XQueryX Schema defines the version declaration like this:

image

So we represent the module (this time with a version declaration), mainModule, and queryBody in our XQueryX query like this:

image

image

Inside the mainModule we have a queryBody, as before. This time the expression inside the queryBody is a FLWOR expression. The XQuery grammar defines a FLWOR expression like this:

image

The XQueryX Schema represents a FLWOR expression like this:

image

So the XQueryX looks like this:

image

image

Inside the letClause, the XQueryX maps less closely to the XQuery grammar productions, because XQueryX represents the structure of the query as an XML tree instead of with keywords. The XQuery grammar defines the LetClause as:

image

This becomes the XQueryX Schema definitions:

image

image

Adding the content of the LetClause to our XQueryX, we get:

image

image

Finally, we need to add the contents of the returnClause. We follow the same steps as for the LetClause – that is, look at the XQuery grammar definition:

image

Here, the XQuery grammar designers have decided not to split out return as a separate clause. This rule could have been (but was not) written as two rules:

image

The XQueryX Schema is written as though the returnClause were a separate grammar rule:

image

Here’s how the returnClause looks in XQueryX:

image

Putting it all together, the XQuery in Example 12-6 can be written in XQueryX as in Example 12-7.

Example 12-7   XQueryX (2)

image

image

12.4.3 Useful XQuery Example

These first two example queries are too simple to be useful. We’ll close this section with Example 12-8, a query taken from Chapter 10, “Introduction to XQuery 1.0.”

Example 12-8   A Simple but Useful XQuery Written in XQueryX (3)

image

image

image

image

12.5 Querying XQueryX

As we said in Section 12.1, one of the reasons for having an XML syntax for XQuery is so that you can do queries over queries. In this section, we look at two kinds of queries you might want to do over a collection of XQueries – queries that will help you tune your XQuery engine, and queries that will help you improve your application or service. In the examples in the rest of this section we use a new document, “xqueryxs.xml,” made up of the XQueryX queries in Example 12-6, Example 12-7, and Example 12-8, with a new root element <queries>. Of course, you could use stylesheets and XSLT transformations to produce reports on XQueryXs instead of using XQueries.

12.5.1 Querying XQueryX for XQuery Tuning

Let’s suppose you have built an XQuery engine, and that engine is running all the queries against your movies database. Unfortunately, the queries are not running as fast as you’d like them to. You could look into the XQuery engine code and try to speed up every subroutine, but it would be much more efficient if you knew what kinds of queries people were doing in your application, so that you could focus on that area of the code. (Most readers of this book will not build their own XQuery engine; they will buy one or download one for free. But the creator of that engine needs to know which parts of the engine are being exercised, so that he can improve the engine on your behalf.)

Example 12-9 is a simple XQuery to count how many queries we are dealing with.

Example 12-9   How Many Queries?

image

Example 12-10 is an XQuery that returns an XML document containing a count of all expressions and a count of path expressions. This query makes use of the fact that each expression element has a type based on the “expr” type, with the kind of expression denoted by its element name (in this case, “xqx:pathExpr”). You can count occurrences of expressions (without enumerating them) as well as counting a particular kind of expression.

Example 12-10   Count Expressions and Path Expressions

image

image

Finally, Example 12-11 produces a report showing all general comparison operators and their parameters. Example 12-11 makes use of the fact that all the general comparison operators are part of the substitution group headed by “generalComparisonExpr.”

Example 12-11   Show All General Comparison Operators and Their Parameters

image

image

12.5.2 Querying XQueryX for Application Improvement

Even if you are not building your own XQuery engine, you probably want to know what kinds of queries your users are doing. You may want to know what kinds of things they are searching for, so that you can make them more readily available.

Suppose you created a public web page so that anyone can search your movies archive. You know lots of people come to the site and do searches, but you want to improve the user experience by offering pull-down lists for some fields and by showing some movies on the home page without the need for a search. Example 12-12 shows a query that would tell you which fields were being used as filters. If you found that “yearReleased” was a popular filter, you might add a pull-down list to your search page to filter on the year that movies were released. Further queries would tell you which ranges were appropriate (5 years? 20 years?). If most of the queries restricted the search to a particular 5-year period, you might display those movies on the first page of your browsable movie archive.

Example 12-12   Show Which Filters Are Being Used

image

12.6 Chapter Summary

In this chapter we looked at XQueryX, an XML syntax for XQueries. There are many ways that an XQuery could be represented as XML – we described two extremes, trivial embedding and completely mapping the parsed query, and then we described the XQueryX approach. The XQueryX approach is to represent the parsed query, leaving out BNF steps that are not useful and treating expressions, operators, and literals in a special way. This leads to a relatively compact XML representation of an XQuery that is particularly useful for searching.


1XML Schema Part 1: Structures, Second Edition, Appendix A: Schema for Schemas (normative) (Cambridge, MA: World Wide Web Consortium, 2004). Available at http://w3.org/TR/xmlschema-1/#normative-schemaSchema.

2XML Syntax for XQuery 1.0 (XQueryX), Section 4: An XML Schema for the XQuery XML Syntax (Cambridge, MA: World Wide Web Consortium, 2005). Available at http://www.w3.org/TR/xqueryx/#Schema.

3XML Syntax for XQuery 1.0 (XQueryX) (Cambridge, MA: World Wide Web Consortium, 2005). Available at: http://www.w3.org/TR/xqueryx/.

4This is not, of course, necessary. The XQuery Working Group could have defined an abstract notion of the XQuery language and then defined two (or more!) ways to serialize instances of that language independently.

5For information about substitution groups and abstract types, see Sections 4.6 and 4.7 of: XML Schema Part 0: Primer, Second Edition (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/xmlschema-0/#SubsGroups.

6At the time of writing, the latest XQuery grammar test applet is at http://www.w3.org/2005/04/qt-applets/xqueryApplet.html – you can find a link to it on the main W3C XQuery page, http://www.w3.org/XML/Query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.78.237