Chapter 7. Querying XML

               <xsl:template name="child-query">
<xsl:with-param name="parent" select=" 'Daddy' "/>
<xsl:value-of select="concat('But, why',$parent,'?')">
<xsl:apply-templates select="reasonable_response"/>
<xsl:call-template name="child-query">
<xsl:with-param name="parent" select="$parent"/>
</xsl:call-template>
</xsl:template>

Parents not recognizing tail recursion may risk blowing their stack

This chapter covers recipes for using XSLT as an XML query language. Querying XML means extracting information from one or more XML documents to answer questions about facts and relationships occurring in and among these documents. By analogy, querying an XML document involves asking the same types of questions of XML using XSLT that one might ask of a relational database using SQL.

The “official” query language for XML promulgated by the W3C is not XSLT, but XQuery (http://www.w3.org/TR/xquery/). XSLT and XQuery have many similarities, but also some striking differences. For example, XSLT and XQuery both rely on XPath. However, an XSLT script is always in XML syntax, while an XQuery script has both a human-friendly and XML syntax (http://www.w3.org/TR/xqueryx).

When the idea for an XML query language distinct from XSLT was proposed, it was controversial. Many members of the XML community thought there would be too much overlap between the two. Indeed, any query formulated in XQuery could also be implemented in XSLT. In many cases, the XSLT solution is as concise as the XQuery solution. The advantage of XQuery is that it is generally easier to understand than the equivalent XSLT. Indeed, XQuery should present a much smaller learning curve to those already versed in SQL. Obviously, comprehension is also a function of what you are used to, so these comparisons are not absolute.

Explaining XQuery in detail or providing a detailed comparison between it and XSLT is beyond the scope of this chapter. Instead, this chapter provides query examples for those who have already invested time into XSLT and do not wish to learn yet another XML-related language.

It would be impossible to create examples that exhausted all types of queries you might want to run on XML data. Instead, this chapter takes a two-pronged approach. First, it presents primitive and generally applicable query examples. These examples are building blocks that can be adapted to solve more complex query problems. Second, it presents a recipe that shows solutions to most XML query-use cases presented in the W3C document XML Query Use Cases (http://www.w3.org/TR/xmlquery-use-cases). In many cases, you can find a solution to a use-case instance that is similar enough to the particular query problem you face. It then becomes a simple matter of adapting the solution to the particulars of your XML data.

Performing Set Operations on Node Sets

Problem

You need to find the union, intersection, set difference, or symmetrical set difference between two node sets. You may also need to test equality and subset relationships between two node sets.

Solution

The union is trivial because XPath supports it directly:

<xsl:copy-of select="$node-set1 | $node-set2"/>

The intersection of two node sets requires a more convoluted expression:

<xsl:copy-of select="$node-set1[count(. | $node-set2) = count($node-set2)]"/>

This means all elements in node-set1 that are also in node-set2 by virtue of the fact that forming the union with node-set2 and some specified element in node-set1 leaves the same set of elements.

Set difference (those elements that are in the first set but not the second) follows:

<xsl:copy-of select="$node-set1[count(. | $node-set2) != count($node-set2)]"/>

This means all elements in node-set1 that are not also in node-set2 by virtue of the fact that forming the union with node-set2 and some specified element in node-set1 produces a set with more elements.

An example of symmetrical set difference (the elements are in one set but not the other) follows:

<xsl:copy-of select="$node-set1[count(. | $node-set2) != count($node-set2)] |
$node-set2[count(. | $node-set1) != count($node-set1)] "/>

The symmetrical set difference is simply the union of the differences taken both ways.

To test if node-set1 is equal to node-set2:

<xsl:if test="count($ns1|$ns2) = count($ns1) and 
               count($ns1) = count($ns2)">

Two sets are equal if their union produces a set with the same number of elements as are contained in both sets individually.

To test if node-set2 is a subset of node-set1:

<xsl:if test="count($node-set1|$node-set2) = count($node-set1)">

To test if node-set2 is a proper subset of node-set1:

<xsl:if test="count($ns1|$ns2) = count($ns1) and count($ns1) > count(ns2)">

Discussion

You may wonder what set operations have to do with XML queries. Set operations are ways of finding commonalities and differences between sets of elements extracted from a document. Many basic questions one can ask of data have to do with common and distinguishing traits.

For example, imagine extracting person elements from people.xml as follows:

<xsl:variable name="males" select="//person[@sex='m']"/>
<xsl:variable name="females" select="//person[@sex='f']"/>
<xsl:variable name="smokers" select="//person[@smoker='yes']"/>
<xsl:variable name="non-smokers" select="//person[@smoker='no']"/>

Now if you were issuing life insurance, you might consider charging each of the following sets of people different rates:

<!-- Male smokers -->
<xsl:variable name="super-risk" 
     select="$males[count(. | $smokers) = count($smokers)]"/>
<!-- Female smokers -->
<xsl:variable name="high-risk" 
     select="$females[count(. | $smokers) = count($smokers)]"/>
<!-- Male non-smokers -->
<xsl:variable name="moderate-risk" 
     select="$males[count(. | $non-smokers) = count($non-smokers)]"/>
<!-- Female non-smokers -->
<xsl:variable name="low-risk" 
     select="$females[count(. | $non-smokers) = count($non-smokers)]"/>

You probably noticed that the same answers could have been acquired more directly by using logic rather than set theory:

<!-- Male smokers -->
<xsl:variable name="super-risk" 
     select="//person[@sex='m' and @smoker='y']"/>
<!-- Female smokers -->
<xsl:variable name="high-risk" 
     select="//person[@sex='f' and @smoker='y']"/>
<!-- Male non-smokers -->
<xsl:variable name="moderate-risk" 
     select="//person[@sex='m' and @smoker='n']"/>
<!-- Female non-smokers -->
<xsl:variable name="low-risk" 
     select="//person[@sex='f' and @smoker='n']"/>

Better still, if you already had the set of males and females extracted, it would be more efficient to say:

<!-- Male smokers -->
<xsl:variable name="super-risk" 
     select="$males[@smoker='y']"/>
<!-- Female smokers -->
<xsl:variable name="high-risk" 
     select="$females[@smoker='y']"/>
<!-- Male non-smokers -->
<xsl:variable name="moderate-risk" 
     select="$males[@smoker='n']"/>
<!-- Female non-smokers -->
<xsl:variable name="low-risk" 
     select="$females[@smoker='n']"/>

These observations do not invalidate the utility of the set approach. Notice that the set operations work without knowledge of what the sets themselves contain. Set operations work at a higher level of abstraction. Imagine that you have a complex XML document and are interested in the following four sets:

<!-- All elements that have elements c1 or c2  as children-->
<xsl:variable name="set1" select="//*[c1 or c2]"/>
<!-- All elements that have elements c3 and c4  as children-->
<xsl:variable name="set2" select="//*[c3 and c4]"/>
<!-- All elements whose parent has attribute a1-->
<xsl:variable name="set3" select="//*[../@a1]"/>
<!-- All elements whose parent has attribute a2-->
<xsl:variable name="set4" select="//*[../@a2]"/>

In the original example, it was obvious that the sets of males and females (and smokers and nonsmokers) are disjoint. Here you have no such knowledge. The sets may be completely disjointed, completely overlap, or share only some elements. There are only two ways to find out what is in common between, say, set1 and set3. The first is to take their intersection; the second is to traverse the entire document again using the logical and of their predicates. In this case, the intersection is clearly the way to go.

EXSLT defines a set module that includes functions performing the set operations discussed here. The EXSLT uses an interesting technique to return the result of its set operations. Instead of returning the result directly, it applies templates to the result in a mode particular to the type of set operation. For example, after EXSLT set:intersection computes the intersection, it invokes <xsl:apply-templates mode="set:intersection"/> on the result. A default template exists in EXSLT with this mode, and it will return a copy of the result as a node-tree fragment. This indirect means of returning the result allows users importing the EXSLT set module to override the default to process it further. This technique is useful but limited. It is useful because it potentially eliminates the need to use the node-set extension function to convert the result back into a node set. It is limited because there can be at most one such overriding template per matching pattern in the user stylesheet for each operation. However, you may want to do very different post-processing tasks with the result of intersections invoked from different places in the same stylesheet.

Tip

Do not be alarmed if you do not grasp the subtleties of EXSLT’s technique discussed here. Chapter 14 will discuss in more detail these and other techniques for making XSLT code reusable.

See Also

You can find an explanation of the EXSLT set operations at http://www.exslt.org/set/index.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.100.237