Chapter 9. Advanced Queries

Now that you are an expert on the syntax of XQuery expressions, let’s look at some more advanced queries. This chapter describes syntax and techniques for some commonly requested query capabilities. You may have these same requirements for your queries, but even if you don’t, this chapter will show you some creative ways to apply XQuery syntax.

Working with Positions and Sequence Numbers

Determining positions and generating sequence numbers are sometimes challenging to query authors who are accustomed to procedural programming languages. Because XQuery is a declarative rather than a procedural language, it is not possible to use familiar techniques like counters. In addition, the sorting and filtering of results can interfere with sequence numbers. This section describes some techniques for working with positions and sequence numbers.

Adding Sequence Numbers to Results

Suppose you want to return a list of product names preceded by a sequence number. Your first approach might be to use a variable as a counter, as shown in Example 9-1. However, the results are not what you might expect. This is because the return clause for each iteration is evaluated in parallel rather than sequentially. This means that you cannot make changes to the value of a variable in one iteration, and expect it to affect the next iteration of the for clause. At the beginning of every iteration, the $count variable is equal to 0.

Example 9-1. Attempting to use a counter variable

Query

let $count := 0
for $prod in doc("catalog.xml")//product[@dept = ("ACC", "WMN")]
let $count := $count + 1
return <p>{$count}. {data($prod/name)}</p>

Results

<p>1. Fleece Pullover</p>
<p>1. Floppy Sun Hat</p>
<p>1. Deluxe Travel Bag</p>

Another temptation might be to use the position function, as shown in Example 9-2. However, this will return the same results as the previous example. In XQuery, unlike in XSLT, the position function only has meaning inside a predicate in a path expression. In this case, the value bound to $prod is no longer in the context of three product items; therefore, it has no relative sequence number within that sequence.

Example 9-2. Attempting to use the position function
for $prod in doc("catalog.xml")//product[@dept = ("ACC", "WMN")]
return <p>{$prod/position()}. {data($prod/name)}</p>

Luckily, FLWORs have a special syntax that enables you to define a positional variable in the for clause. This variable, which is preceded by the keyword at, is bound to an integer representing the iteration number, as shown in Example 9-3.

Example 9-3. Using a positional variable in a for clause

Query

for $prod at $count in doc("catalog.xml")//product[@dept = ("ACC", "WMN")]
return <p>{$count}. {data($prod/name)}</p>

Results

<p>1. Fleece Pullover</p>
<p>2. Floppy Sun Hat</p>
<p>3. Deluxe Travel Bag</p>

However, the positional variable in the at clause does not always work. For example, suppose you wanted to use both where and order by clauses in your FLWOR. This interferes with the sequencing, as shown in Example 9-4. First, the numbers are not in ascending order. This is because the results are ordered after the positional number is evaluated. Another problem is that the sequence numbers are 2, 3, and 4 instead of 1, 2, and 3. This is because there are four product elements returned in the for clause, and the first one was eliminated by the where clause.

Example 9-4. Attempting to use a positional variable with a where clause

Query

for $prod at $count in doc("catalog.xml")//product
where $prod/@dept = ("ACC", "MEN")
order by $prod/name
return <p>{$count}. {data($prod/name)}</p>

Results

<p>4. Cotton Dress Shirt</p>
<p>3. Deluxe Travel Bag</p>
<p>2. Floppy Sun Hat</p>

One way to resolve this is by embedding a second FLWOR in the let clause, as shown in Example 9-5. This embedded FLWOR returns all the products sorted and filtered appropriately. Then, the for clause contained in the main FLWOR uses the positional variable on the sorted sequence.

Example 9-5. Embedding the where clause

Query

let $sortedProds := for $prod in doc("catalog.xml")//product
                    where $prod/@dept = "ACC" or $prod/@dept = "MEN"
                    order by $prod/name
                    return $prod
for $sortedProd at $count in $sortedProds
return <p>{$count}. {data($sortedProd/name)}</p>

Results

<p>1. Cotton Dress Shirt</p>
<p>2. Deluxe Travel Bag</p>
<p>3. Floppy Sun Hat</p>

Using the count Clause

Starting in version 3.0, the count clause offers a new way to address the challenges of keeping track of position. Its simple syntax is shown in Figure 9-1.

Figure 9-1. Syntax of a count clause

Like the at keyword in a for clause, the count clause creates a variable whose value is the iteration number. It is more useful, though, because it can appear anywhere between the first clause and the return clause of a FLWOR. Example 9-6 shows how you can use it after the where and order by clauses, eliminating the problem shown in Example 9-4.

Example 9-6. Using a count clause
xquery version "3.0";
for $prod in doc("catalog.xml")//product
where $prod/@dept = ("ACC", "MEN")
order by $prod/name
count $count
return <p>{$count}. {data($prod/name)}</p>

Being able to use the count clause after an order by clause but before a where clause also makes it easier to rank items and only return a certain number. Example 9-7 shows how you might return only the highest prices in the price document. It refers to the variable $count in the where to limit it to the two highest values.

Example 9-7. Using a count clause for ranking

Query

xquery version "3.0";
for $prod in doc("prices.xml")//prod
order by $prod/price/number(.) descending
count $count
where $count <= 2
return <p>{$count}. {data($prod/price)}</p>

Results

<p>1. 69.99</p>
<p>2. 39.99</p>

The count clause can also be used after a group by clause to count the position of the group within all the groups. This is shown in Example 9-8.

Example 9-8. Using the count clause with groups

Query

xquery version "3.0";
for $item in doc("order.xml")//item
group by $d := $item/@dept
count $count
return <p>{concat("Group ", $count, ": ", $d)}</p>

Results

<p>Group 1: ACC</p>
<p>Group 2: MEN</p>
<p>Group 3: WMN</p>

Testing for the Last Item

Sometimes it is also useful to test whether an item is last in a sequence. Earlier in this chapter, we saw that the position function is not useful unless it is in a predicate. The same is true of the last function, which limits its usefulness when testing for the last item, for example, in a FLWOR.

Suppose you want to concatenate the names of the products together, separated by commas. At the end of the last product name, you want to specify a period instead of a comma. (Actually, this particular example would be best accomplished using the string-join function. However, the example is useful for illustrative purposes.) One approach is to bind the number of items to a variable in a let clause, as shown in Example 9-9.

Example 9-9. Testing for the last item

Query

<p>{ let $prods := doc("catalog.xml")//product
     let $numProds := count($prods)
     for $prod at $count in $prods
     return if ($count = $numProds)
            then concat($prod/name, ".")
            else concat($prod/name, ",")
}</p>

Results

<p>Fleece Pullover, Floppy Sun Hat, Deluxe Travel Bag, Cotton Dress Shirt.</p>

The $numProds variable is bound to the number of products. A positional variable, $count, is used to keep track of the iteration number. When the $count variable equals the $numProds variable, you have arrived at the last item in the sequence.

Another approach is to use the is operator to determine whether the current $prod element is the last one in the sequence. This query is shown in Example 9-10. In this case, it is not necessary to count the number of items or to use a positional variable. The results are the same as in Example 9-9.

Example 9-10. Testing for the last item using the is operator
<p>{ let $prods := doc("catalog.xml")//product
     for $prod in $prods
     return if ($prod is $prods[last()])
            then concat($prod/name, ".")
            else concat($prod/name, ", ")
}</p>

Windowing

The window clause, new in version 3.0, provides a way to group items that are adjacent to each other in more sophisticated ways than are possible with just a group by clause. Like a for clause, a window clause creates an iteration over a sequence of items. However, it creates windows based on starting and ending conditions, and evaluates the return clause once per window.

Example 9-11 uses the window clause to iterate through a sequence of integers and create a new window when it encounters an even number (where that number mod 2 is 0). The variable $s is named so that it can be referenced in the when expression, which specifies the condition that creates a new window. In the return clause, the variable $w refers to all the items in the original input sequence that are in the window. Note that in this example, the value 1 does not appear in the results, because there was no window for it.

Example 9-11. Using the window clause

Query

xquery version "3.0";
for tumbling window $w in (1, 4, 3, 12, 5, 13, 8)
  start $s when $s mod 2 = 0
return <window>{$w}</window>

Results

<window>4 3</window>
<window>12 5 13</window>
<window>8</window>

Creating windows with the window clause is different from creating groups with the group by clause in several ways. One is that the order of the items is retained, whereas in grouping the items are rearranged into their groups. In Example 9-11, the integers were kept in their original order, not reorganized into two separate groups containing even and odd numbers. Because you retain the order of the items, you can put constraints on items that come before and after the start or end of the window. For example, you can specify a start condition to create a new window if a particular value changes compared to the previous item.

There are two kinds of window clauses: tumbling and sliding. The difference is that tumbling windows never overlap each other, while sliding windows might. This is another difference from grouping, where the groups cannot overlap. The syntax of a window clause is shown in Figure 9-2. The diagram shows the end condition as optional, but it is only optional when the tumbling option is chosen. With sliding windows, the end condition is required.

Figure 9-2. Syntax of a window clause

The start and end conditions use a series of window variables. Figure 9-3 shows their syntax.

Figure 9-3. Syntax of window variables

In all, up to nine variables can be declared in a window clause. They are listed in Table 9-1. All but the first are optional. It is possible to name these variables anything you want, as long as their names are distinct from each other. For clarity, in all of the examples in this chapter, the variables are named the same thing, as listed in the second column of the table. If all nine variables were used (which is very unusual), a window clause might look like:

for tumbling window $w in (2, 4, 6, 8, 10)
  start $s at $s-pos previous $s-prev next $s-next when true()
  end $e at $e-pos previous $e-prev next $e-next when true()
return $s

The window variables are declared so that they can be accessed in the when expression (either in the start or end conditions) or in the return clause. They are optional; if they are not referenced, they do not need to be specified. In Example 9-11, only the $s variable was used.

Table 9-1. Variables of the window clause
TermName used in examplesPurpose
Window-variable $w The sequence of items in the window (required), appears after the keyword window
Start-item $s The first item in the window, appears after the word start
Start-item-position $s-pos The position of $s in the binding sequence, appears after the keyword at in the start condition
Start-previous-item $s-prev The item in the binding sequence (if any) that precedes $s, appears after the keyword previous in the start condition
Start-next-item $s-next The item in the binding sequence (if any) that follows $s, appears after the keyword next in the start condition
End-item $e The last item in the window, appears after the word end
End-item-position $e-pos The position of $e in the binding sequence, appears after the keyword at in the end condition
End-previous-item $e-prev The item in the binding sequence (if any) that precedes $e, appears after the keyword previous in the end condition
End-next-item $e-next The item in the binding sequence (if any) that follows $e, appears after the keyword next in the end condition

Using start and end Conditions

Example 9-11 only had a start condition, saying when to start a new window. Every item starting with the start item, up until the next start item, was included in the window. It is also possible to use an end condition to end the window before the next window starts. With tumbling windows, this usually means that you do not want to include all the items in your results, just certain subsets.

Example 9-12 shows some input XML that has sequences of three elements (propname, value, and alt-value). If you want to create a new window when you reach a propname element, but want to ignore the alt-value elements, you could end the window when you encounter a value element, thus excluding alt-value elements from the results.

Example 9-12. Using the end condition

Query

xquery version "3.0";
let $props := <properties>
                <propname>x</propname>
                <value>xval</value>
                <alt-value>xval2</alt-value>
                <propname>y</propname>
                <value>yval</value>
                <alt-value>yval2</alt-value>
              </properties>
for tumbling window $w in $props/*
  start $s when name($s) = 'propname'
  end $e when name($e) = 'value'
return <property>{$w}</property>

Results

<property>
   <propname>x</propname>
   <value>xval</value>
</property>
<property>
   <propname>y</propname>
   <value>yval</value>
</property>

Windows Based on Position

The start-item-position and end-item-position variables are useful for creating windows based on the position of the item in the sequence. For example, to create windows of three items, one way would be to test the difference between the start and end positions, as shown in Example 9-13.

Example 9-13. Using the start and end positions

Query

xquery version "3.0";
for tumbling window $w in (1, 4, 3, 12, 5, 13, 8)
  start at $s-pos when true()
  end at $e-pos when $e-pos - $s-pos = 2
return <window>{$w}</window>

Results

<window>1 4 3</window>
<window>12 5 13</window>
<window>8</window>

In that example, the when expression in the start condition is set to true(), meaning that a new window should always start immediately after another one ends (or at the beginning). In the end condition, it tests the difference between the positions of the start item and the end item, and if it is equal to two, that means there are three items in the window and it should end the window. In the results, the value 8 appears in a window by itself because that window was started, even though it never satisfied the end condition.

Example 9-14 shows a slightly revised query that uses only end instead of just end. This has the effect of removing the last window that contained only 8, because the end condition was never true.

Example 9-14. Using the only keyword

Query

xquery version "3.0";
for tumbling window $w in (1, 4, 3, 12, 5, 13, 8)
  start at $s-pos when true()
  only end at $e-pos when $e-pos - $s-pos = 2
return <window>{$w}</window>

Results

<window>1 4 3</window>
<window>12 5 13</window>

Windows Based on Previous or Next Items

The previous and next keywords in a window clause allow you to create windows by testing the previous or next items. This is useful for testing if something has changed. For example, suppose you want to iterate through the order document and display some information from it, but you want to put a header in every time the department changes. You don’t want to group by department, because for whatever reason, it is important that the items retain their original order. An example of this is shown in Example 9-15.

Example 9-15. Using the previous keyword

Query

xquery version "3.0";
for tumbling window $w in doc("order.xml")//item
  start $s previous $s-prev when $s/@dept != $s-prev/@dept
return (<p>Department {data($s/@dept)}</p>, $w)

Results

<p>Department ACC</p>
<item dept="ACC" num="563" quantity="1"/>
<item dept="ACC" num="443" quantity="2"/>
<p>Department MEN</p>
<item dept="MEN" num="784" quantity="1" color="white"/>
<item dept="MEN" num="784" quantity="1" color="gray"/>
<p>Department WMN</p>
<item dept="WMN" num="557" quantity="1" color="black"/>

The example uses the keyword previous and declares a variable $s-prev that will contain the previous item. The when expression then checks if the current and previous dept attributes are equal to each other. If not, a new window is created, which allows the insertion of a p element at the beginning of the window.

However, you may have noticed that the first item in the order is missing from the results. This is because $s-prev is the empty sequence when it encounters the first item, and comparing anything to the empty sequence is always false. This could be remedied by adding an additional test for the existence of $s-prev, as in:

start $s previous $s-prev when $s/@dept != $s-prev/@dept or not($s-prev)

or alternatively, you could compare their string values, as in:

start $s previous $s-prev when string($s/@dept) != string($s-prev/@dept)

It is possible to specify a similar condition in the end condition by checking the next item, as shown in Example 9-16, where we use the technique of comparing string values to avoid dropping the last item.

Example 9-16. Using the next keyword

Query

xquery version "3.0";
for tumbling window $w in doc("order.xml")//item
  start $s when true()
  end $e next $e-next when string($e/@dept) != string($e-next/@dept)
return (<p>Department {data($s/@dept)}</p>, $w)

Results

<p>Department WMN</p>
<item dept="WMN" num="557" quantity="1" color="navy"/>
<p>Department ACC</p>
<item dept="ACC" num="563" quantity="1"/>
<item dept="ACC" num="443" quantity="2"/>
<p>Department MEN</p>
<item dept="MEN" num="784" quantity="1" color="white"/>
<item dept="MEN" num="784" quantity="1" color="gray"/>
<p>Department WMN</p>
<item dept="WMN" num="557" quantity="1" color="black"/>

Sliding Windows

Sliding windows differ from tumbling windows in that the windows can overlap. This can be useful for things like moving averages, or for data that naturally overlaps, like overlapping events over time. Example 9-14 showed how to create windows with three items each. Using tumbling windows, each item was only in one window. If we change that example slightly to make it a sliding window, as shown in Example 9-17, more windows are created, with overlapping values.

Example 9-17. Sliding windows

Query

xquery version "3.0";
for sliding window $w in (1, 4, 3, 12, 5, 13, 8)
  start at $s-pos when true()
  only end at $e-pos when $e-pos - $s-pos = 2
return <window>{$w}</window>

Results

<window>1 4 3</window>
<window>4 3 12</window>
<window>3 12 5</window>
<window>12 5 13</window>
<window>5 13 8</window>

You could calculate the integer average of each window by changing the return clause to:

return round(avg($w))

which would result in the sequence:

3 6 7 10 9

Copying Input Elements with Modifications

Often you will want to include elements from an input document, but with minor modifications. For example, you may wish to eliminate or add attributes, or change their names. However, the XQuery language does not have any special functions or operators that perform these minor modifications. For example, there is no direct syntax that means “select all the product elements, but leave out their dept attributes.”

The good news is that you can accomplish these modifications by “reconstructing” the elements. For example, you can write a query to construct a new product element and include all the children and attributes (except dept) of the original product element in the input document. Even better, you can write a user-defined function that uses computed constructors to handle these modifications in the general case. This section describes some common modifications and provides useful functions to handle these cases.

Note that these functions are intended to change the way elements and attributes appear in the results of a query, not to update them in an XML database. To update your XML database, you should check the documentation for your XQuery implementation to find out about its update capabilities, which may consist of support for the XQuery Update Facility, or implementation-specific update functions.

Adding Attributes to an Element

To add an attribute to an element, you could use a function like the one shown in Example 9-18. It takes as arguments a sequence of elements, along with a sequence of attribute names and values, and returns newly constructed elements with those attributes added.

Example 9-18. Useful function: functx:add-attributes
declare namespace functx = "http://www.functx.com";
declare function functx:add-attributes
  ( $elements as element()*,
    $attrNames as xs:QName*,
    $attrValues as xs:anyAtomicType* )  as element()* {
       
   for $element in $elements
   return element { node-name($element)}
                  { for $attrName at $seq in $attrNames
                    return if ($element/@*[node-name(.) = $attrName])
                           then ()
                           else attribute {$attrName}
                                          {$attrValues[$seq]},
                    $element/@*,
                    $element/node() }
};

The function makes use of computed constructors to create dynamically new elements with the same names as the original elements passed to the function. It also uses a computed constructor to create the attributes, specifying their names and values as expressions.

It then copies the attribute and child nodes from the original elements. The expression $element/node() is used rather than * because node() will return text, processing instruction, and comment nodes in addition to child elements. However, node() does not return attributes, so a separate expression $element/@* is used to copy those.

The expression:

functx:add-attributes(doc("catalog.xml")//product, xs:QName("xml:lang"), "en")

uses this function to return all the product elements from the catalog, with an additional xml:lang="en" attribute.

Removing Attributes from an Element

Removing attributes also requires the original element to be reconstructed. Example 9-19 shows a function that “removes” attributes from a sequence of elements. It does this by reconstructing the elements, copying the content and all the attributes, except those whose names are specified in the second argument.

Example 9-19. Useful function: functx:remove-attributes
declare namespace functx = "http://www.functx.com";
declare function functx:remove-attributes
  ($elements as element()*, $names as xs:string*) as element()* {

   for $element in $elements
   return element
     {node-name($element)}
     {$element/@*[not(name() = $names)],
      $element/node() }
};

The function will accept a sequence of strings that represent attribute names to remove. For example, the expression:

functx:remove-attributes(doc("order.xml")//item, ("quantity", "color"))

returns all the item elements from the order document, minus the quantity and color attributes. The extra parentheses are necessary around the "quantity" and "color" strings to combine them into a single argument that is a sequence of two items. Otherwise, they would be considered two separate arguments.

Notice that the predicate [not(name() = $names)] does not need to explicitly iterate through each of the strings in the $names sequence. This is because the = operator, unlike the eq operator, can be used on lists of values. The comparison will return true for every attribute name that is equal to any one of the strings in $names. Using the not function means that the predicate will allow through only the attributes whose names do not match any of the strings in $names.

It may seem simpler to use the != operator rather than the not function, but that does not have the same meaning. If the predicate were [name() != $names], it would return all the attributes whose names don’t match either quantity or color. This means that it will return all the attributes, because no single attribute name will match both strings.

Removing Attributes from All Descendants

You could go further and remove attributes from an element as well as remove all its descendants. The recursive function functx:remove-attributes-deep, shown in Example 9-20, accomplishes this.

Example 9-20. Useful function: functx:remove-attributes-deep
declare namespace functx = "http://www.functx.com";
declare function functx:remove-attributes-deep
  ($nodes as node()*, $names as xs:string* )  as node()* {

   for $node in $nodes
   return if ($node instance of element())
          then  element { node-name($node)}
                { $node/@*[not(name() = $names)],
                  functx:remove-attributes-deep($node/node(), $names)}
          else if ($node instance of document-node())
          then functx:remove-attributes-deep($node/node(), $names)
          else $node
};

This function uses an algorithm similar to the previous example, but it differs in the way it processes the children. Instead of simply copying all the child nodes, the function uses a FLWOR to iterate through them. If a child is an element, it recursively calls itself to process that child. If the node is a document node, it calls functx:remove-attributes-deep on its child (root) element. If the child is not an element or document node (for example, if it is a text or processing-instruction node), the function returns it as is. Iterating through all of the child nodes in a FLWOR preserves their original order in the results.

Removing Child Elements

The three previous functions relate to adding and removing attributes, and they could apply equally to child elements. For example, the function in Example 9-21 could be used to eliminate certain elements from a document by name. It is very similar to functx:remove-attributes-deep, in Example 9-20, except that it removes child elements instead of the attributes.

Example 9-21. Useful function: functx:remove-elements-deep
declare namespace functx = "http://www.functx.com";
declare function functx:remove-elements-deep
 ($nodes as node()*, $names as xs:string*)  as node()* {

   for $node in $nodes
   return
     if ($node instance of element())
     then if (name($node)=$names)
          then ()
          else element { node-name($node)}
                { $node/@*,
                  functx:remove-elements-deep($node/node(), $names)}
     else if ($node instance of document-node())
     then functx:remove-elements-deep($node/node(), $names)
     else $node
};

Another common use case is to remove certain elements but keep their contents. For example, if you want to remove any inline formatting from the desc element in the product catalog, you will want to remove any i or b tags, but keep the content of those elements. You can use the function shown in Example 9-22 for that.

Example 9-22. Useful function: functx:remove-elements-not-contents
declare namespace functx = "http://www.functx.com";
declare function functx:remove-elements-not-contents
  ($nodes as node()*, $names as xs:string*) as node()* {

   for $node in $nodes
   return
    if ($node instance of element())
    then if (name($node) = $names)
         then functx:remove-elements-not-contents($node/node(), $names)
         else element {node-name($node)}
              {$node/@*,
              functx:remove-elements-not-contents($node/node(), $names)}
    else if ($node instance of document-node())
    then functx:remove-elements-not-contents($node/node(), $names)
    else $node
};

Changing Names

Another minor modification is to change the names of certain elements. This could be useful for implementing changes to an XML vocabulary or for language translation.

For example, suppose you want to preserve the structure of the order document but want to change the name order to purchaseOrder, and the name item to purchasedItem. Since the document is very simple, this could be done in a hardcoded way by using direct constructors. However, there may be situations where the document is more complex and you do not know exactly where these elements appear, or they appear in a variety of places in different document types.

You can modify names more flexibly using the function shown in Example 9-23. The function provided here is slightly altered from the version in the FunctX function library in order to make it more clear for educational purposes, but the result is the same.

Example 9-23. Useful function: functx:change-element-names-deep
declare namespace functx = "http://www.functx.com";
declare function functx:change-element-names-deep
  ($nodes as node()*, $oldNames as xs:QName*,
   $newNames as xs:QName*) as node()* {

  if (count($oldNames) != count($newNames))
  then error(xs:QName("Different_Number_Of_Names"))
  else for $node in $nodes
       return if ($node instance of element())
              then let $newName :=
                     if (node-name($node) = $oldNames)
                     then $newNames[index-of($oldNames, node-name($node))]
                     else node-name($node)
                   return element {$newName}
                     {$node/@*,
                      functx:change-element-names-deep($node/node(),
                                              $oldNames, $newNames)}
              else $node
};

Like the functx:remove-attributes-deep function, functx:change-element-names-deep calls itself recursively to traverse the entire XML document. Every time it finds an element, it figures out what its new name should be and binds it to the variable $newName. If it finds its original name in the $oldNames sequence, it selects the name in $newNames that is in the same position. Otherwise, it uses the original name. It then reconstructs the element using the new name, copies all of its attributes, and recursively calls itself to process all of its children.

Example 9-24 calls the functx:change-element-names-deep function with sequences of old and new names. It returns the same data and basic structure as the order.xml document, except with the two names changed.

Example 9-24. Using the functx:change-element-names-deep function

Query

declare namespace functx = "http://www.functx.com";
let $order := doc("order.xml")/order
let $oldNames := (xs:QName("order"), xs:QName("item"))
let $newNames := (xs:QName("purchaseOrder"), xs:QName("purchasedItem"))
return functx:change-element-names-deep($order, $oldNames, $newNames)

Results

<purchaseOrder num="00299432" date="2015-09-15" cust="0221A">
  <purchasedItem dept="WMN" num="557" quantity="1" color="navy"/>
  <purchasedItem dept="ACC" num="563" quantity="1"/>
  <purchasedItem dept="ACC" num="443" quantity="2"/>
  <purchasedItem dept="MEN" num="784" quantity="1" color="white"/>
  <purchasedItem dept="MEN" num="784" quantity="1" color="gray"/>
  <purchasedItem dept="WMN" num="557" quantity="1" color="black"/>
</purchaseOrder>

Combining Results

Your query results may consist of several FLWORs or other expressions that each return a result sequence. In addition, the sequences you use in your for and let clauses may be composed from more than one sequence.

There are four ways to combine two or more sequences to form a third sequence. They differ in which items are selected, whether their order is affected, whether duplicates are eliminated, and whether non-nodes are allowed in the sequences.

Sequence Constructors

The first way to merge two sequences is simply to create a third sequence that is the concatenation of the first two. This is known as a sequence constructor, and it uses parentheses and commas to concatenate two sequences together. For example:

let $prods := doc("catalog.xml")//product
let $items := doc("order.xml")//item
return ($prods, $items)

returns a sequence that is the concatenation of two other sequences, $prods and $items. The items in $prods are first in the sequence, then the items in $items, in the order they appear in that sequence. No attempt is made to eliminate duplicates or sort the items in any way.

Note that concatenation is the only way to combine sequences that contain items that are not nodes; union, intersect, and except work on sequences that contain nodes only.

The union Expression

Another approach to combining sequences of nodes is via a union expression, which is indicated by the keyword union or the vertical bar character (|). The two operators have the exact same meaning. The resulting nodes are rearranged into document order. Path expressions such as:

doc("catalog.xml")//product/(number | name)

use the vertical bar operator to select the union of the number children and the name children of product. An equivalent alternative is to use the vertical bar operator to separate two entire multistep path expressions, as in:

doc("catalog.xml")//product/number | doc("catalog.xml")//product/name

Unlike simple concatenation, using a union expression eliminates duplicate nodes. Duplicates are determined based on node identity, not typed value or string value. That means an expression like:

doc("catalog.xml")//product/@dept | doc("order.xml")//item/@dept

results in all 10 dept attributes (four from catalog.xml and six from order.xml), because it does not eliminate the duplicate department values.

A union eliminates duplicate nodes not just between the sequences, but also within either of the original sequences.

The intersect Expression

An intersect expression results in a sequence that contains only those nodes that are in both of the original sequences. As with union expressions, duplicate nodes (based on node identity) are eliminated, and the resulting nodes are rearranged into document order. For example, the expression:

let $prods := doc("catalog.xml")//product
return $prods[@dept = "ACC"] intersect $prods[number = 443]

returns the third product element in the catalog.

The except Expression

An except expression results in a sequence that contains only nodes that are in the first sequence, but not in the second. As with union expressions, duplicate values (based on node identity) are eliminated, and the resulting nodes are rearranged into document order. For example, the expression:

doc("catalog.xml")//product/(* except number)

returns all the element children of product except for number elements. The parentheses are required because the slash (/) operator has precedence over the except operator. Without the parentheses, it would be interpreted as two separate path expressions: doc("catalog.xml")//product/* and number. This is equally true for the other operators in this section.

Using Intermediate XML Documents

When we think of XML structures, we tend to think of the input documents and the results. However, XQuery also allows you to create intermediate XML structures in your queries that are not included in the results. This can be useful for many reasons, among them creating lookup tables and narrowing down input documents to reduce complexity or improve performance.

Creating Lookup Tables

Suppose you want to create a summary of the product catalog that lists the departments. However, you would like to convert the department codes to more descriptive names. You could use the query shown in Example 9-25 to accomplish this.

Example 9-25. Converting values without a lookup table

Query

let $cat := doc("catalog.xml")/catalog
for $dept in distinct-values($cat/product/@dept)
return  <li>Department: {if ($dept = "ACC")
                        then "Accessories"
                        else if ($dept = "MEN")
                             then "Menswear"
                             else if ($dept = "WMN")
                                  then "Womens"
                                  else ()
               }  ({$dept})</li>

Results

<li>Department: Womens  (WMN)</li>
<li>Department: Accessories  (ACC)</li>
<li>Department: Menswear  (MEN)</li>

This gives the desired results, namely a descriptive name for the department, with the department code in parentheses. However, the query is somewhat cluttered, and anyone maintaining the query would have to be careful to insert any new department codes in the right place, using the right XQuery syntax. A more elegant solution is shown in Example 9-26, which uses an intermediate XML structure as a lookup table. It has the same results as the previous example.

Example 9-26. Converting values with a lookup table
let $deptNames := <deptNames>
                    <dept code="ACC" name="Accessories"/>
                    <dept code="MEN" name="Menswear"/>
                    <dept code="WMN" name="Womens"/>
                  </deptNames>
let $cat := doc("catalog.xml")/catalog
for $dept in distinct-values($cat/product/@dept)
return <li>Department: {data($deptNames/dept[@code = $dept]/@name)
                  }  ({$dept})</li>

Just as you can use path expressions on the input documents, you can use them to traverse the intermediate XML structure. So, the expression:

$deptNames/dept[@code = $dept]/@name

traverses the deptNames structure looking for the department name where the code matches the department code in question.

This solution is easier to maintain and it makes the mappings more obvious. Of course, if this is a general-purpose and unchanging lookup table that might be used in many queries, it can alternatively be stored as a separate XML document that is referenced using the doc function.

Reducing Complexity

In the previous example, the intermediate XML was hardcoded into the query. You can also build a temporary XML structure in the query, based on values from the input data. This can be handy to reduce the complexity of an input document before performing further querying or transformation on it.

Suppose you want to perform a join on the product catalog and the order. With the results of the join, you want to create an HTML table that formats the information. While this can probably be done in one FLWOR, it may be easier to write (and read!) a query that does it in two steps.

Such a query is shown in Example 9-27. It constructs a series of item elements that have attributes representing all the data items from the join. It binds the variable $tempResults to the six resulting item elements by using a let clause. Within the main return clause, it uses an embedded FLWOR to iterate through the item elements and turn them into table rows (tr elements).

Example 9-27. Reducing complexity

Query

let $tempResults:= for $item in doc("order.xml")//item,
                       $prod in doc("catalog.xml")//product
                   where $item/@num = $prod/number
                   return <item num="{$item/@num}" name="{$prod/name}"
                                color="{$item/@color}"
                                quant="{$item/@quantity}"/>
return <table>
        <tr>
          <th>#</th><th>Name</th><th>Color</th><th>Quan</th>
        </tr>
        {for $lineItem in $tempResults
         return <tr>
                  <td>{data($lineItem/@num)}</td>
                  <td>{data($lineItem/@name)}</td>
                  <td>{data($lineItem/@color)}</td>
                  <td>{data($lineItem/@quant)}</td>
                </tr>
        }
     </table>

Value of $tempResults

<item num="557" color="navy" name="Fleece Pullover" quant="1"/>
<item num="563" color="" name="Floppy Sun Hat" quant="1"/>
<item num="443" color="" name="Deluxe Travel Bag" quant="2"/>
<item num="784" color="white" name="Cotton Dress Shirt" quant="1"/>
<item num="784" color="gray" name="Cotton Dress Shirt" quant="1"/>
<item num="557" color="black" name="Fleece Pullover" quant="1"/>

Partial Results

<table>
   <tr>
      <th>#</th><th>Name</th><th>Color</th><th>Quan</th>
   </tr>

   <tr>
      <td>557</td>
      <td>Fleece Pullover</td>
      <td>navy</td>
      <td>1</td>
   </tr>
   <!-- ... -->
</table>

In this case, the example input documents are fairly simple, so this approach may be overkill. However, as documents and the joins between them become more complex, intermediate XML results can be very useful in simplifying queries.

This technique is sometimes called pipelining. Rather than putting the whole pipeline in one query, you could also consider chaining together a sequence of separate queries. This makes it easier to reuse each of the queries in different pipelines. A good use case for pipelining is to handle variants of the input vocabulary—for example, different flavors of RSS. Rather than have one query that handles all the variations, you can build a pipeline in which you first convert a particular variant to your chosen “canonical form,” and then operate on that. It’s also possible to combine XQuery with other technologies (such as XSLT) by using pipelines, where different technologies are used to implement different steps in the pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.217.167