XQuery is a new language under development by the W3C that’s designed to query collections of XML data. XQuery provides a mechanism to efficiently and easily extract data from XML documents or from any data source that can be viewed as XML, such as relational databases.
XQuery (http://www.w3.org/XML/Query) provides a powerful mechanism to pull XML content from multiple sources and dynamically generate new content using a programmer-friendly declarative language. The XQuery code in Example 2-12 (shakes.xqy) formats in XHTML a list of unique speakers in each act of Shakespeare’s play Hamlet. The hamlet.xml file can be found at http://www.oasis-open.org/cover/bosakShakespeare200.html.
Example 2-12. A simple XQuery to search Shakespeare (shakes.xqy)
<html><head/><body> { for $act in doc("hamlet.xml")//ACT let $speakers := distinct-values($act//SPEAKER) return <span> <h1>{ $act/TITLE/text( ) }</h1> <ul> { for $speaker in $speakers return <li>{ $speaker }</li> } </ul> </span> } </body></html>
This example demonstrates a
XQuery
FLWOR (pronounced flower) expression. The name
comes from the five possible clauses of the expression:
for
, let
,
where
, order by
, and
return
. Example 2-12 says that for
every ACT
element appearing at any level in the
hamlet.xml file, let the
$speakers
variable equal the distinct values of
all the SPEAKER
elements found under that instance
of ACT
. Then for every $act
and
$speakers
value, return the
$act
’s TITLE
text using an h1
element followed by a
ul
listing of every speaker in an
li
element.
XML is a native data type of XQuery and can be used in queries directly without quoted strings, objects, or other tricks. You separate XML elements from enclosed expressions using curly braces. Example 2-13 shows the query result (using ellipses to shorten the output).
Example 2-13. Shakespeare speakers
<html> <span> <h1>ACT I</h1> <ul> <li>BERNARDO</li><li>FRANCISCO</li><li>HORATIO</li> ... </ul> </span><span> <h1>ACT II</h1> <ul> <li>LORD POLONIUS</li><li>REYNALDO</li><li>OPHELIA</li> ... </ul> </span><span> <h1>ACT III</h1> <ul> <li>KING CLAUDIUS</li><li>ROSENCRANTZ</li><li>GUILDENSTERN</li> ... </ul> </span><span> <h1>ACT IV</h1> <ul> <li>KING CLAUDIUS</li><li>QUEEN GERTRUDE</li><li>HAMLET</li> ... </ul> </span><span> <h1>ACT V</h1> <ul> <li>First Clown</li><li>Second Clown</li><li>HAMLET</li> ... </ul> </span> </html>
Example 2-14 (speakers.xqy) demonstrates a more advanced form of the query. This longer query pulls content from multiple source documents and beautifies the speaker names so that they’re always printed in standard case (capitalized first letters only).
Example 2-14. An XQuery with multiple inputs and beautified output
declare function local:singleWordCase($name as xs:string) as xs:string { if ($name = "") then "" else let $first := substring($name, 1, 1) let $rest := substring($name, 2) let $firstUpper := upper-case($first) let $restLower := lower-case($rest) return concat($firstUpper, $restLower) }; declare function local:multiWordCase($name as xs:string) as xs:string { string-join( let $words := tokenize($name, "s+") for $word in $words return local:singleWordCase($word) , " ") }; <html><head/><body> { for $file in ("all_well.xml", "dream.xml", "hamlet.xml", "lear.xml", "macbeth.xml", "merchant.xml", "much_ado.xml", "r_and_j.xml") let $play := doc($file) let $speakers := distinct-values($play//SPEAKER) order by $play/PLAY/TITLE/text( ) return <span> <h1>{ $play/PLAY/TITLE/text( ) }</h1> <ul> { for $speaker in $speakers let $speakerPretty := local:multiWordCase($speaker) order by $speakerPretty return <li>{ $speakerPretty }</li> } </ul> </span> } </body></html>
The top portion of the query defines two functions to handle the
conversion of names to standard case. The first function,
singleWordCase()
placed in the special
local
namespace, takes an
xs:string
source name and returns an
xs:string
that is the input parameter converted to
standard case. Typing is optional in XQuery. When used, typing is
based on XML Schema types (http://www.w3.org/TR/xquery/#id-types).
The first line of the function short-circuits so that if the
$name
is empty, then the expression evaluates to
empty; otherwise, the second half of the expression gets evaluated.
Assuming $name
is non-empty, we assign
$first
to its first character and
$rest
to the remainder, uppercase the
$first
, lowercase the $rest
,
and return the concatenation. The return
keyword
is not used to return a value but rather as a clause of a FLWOR
expression. A better name for it might have been
do
.
The second function, multiWordCase()
, tokenizes the input string based on
whitespace characters (s
is the
regular-expression pattern for a whitespace character and the
+
modifier means “one or
more”). Then for every word returned by that
tokenization, it executes singleWordCase()
with
the result joined together with the string-join()
function, which adds a space between each reformatted word.
The query body executes against eight plays that have been named
explicitly. For every $file
in the list we assign
the $play
variable to be the document node
associated with that document name. Then we use
distinct-values()
to calculate the unique
speakers in the $play
. The order by
clause of the FLWOR expression orders the
tuples
(ordered sequence of values) coming out of the for
and let
clauses so that the tuples are sorted
alphabetically by the play’s title text. The
return
clause is evaluated once for each tuple and
prints the play title followed by the list of unique speakers in the
play, beautified and sorted alphabetically. The result appears in
Example 2-15.
Example 2-15. More Shakespeare speakers
<html> <span> <h1>A Midsummer Night's Dream</h1> <ul> <li>All</li><li>Bottom</li><li>Cobweb</li><li>Demetrius</li> ... </ul> </span><span> <h1>All's Well That Ends Well</h1> <ul> <li>All</li><li>Bertram</li><li>Both</li><li>Both</li> <li>Clown</li> ... </ul> </span><span> <h1>Much Ado about Nothing</h1> <ul> <li>Antonio</li><li>Balthasar</li><li>Beatrice</li> <li>Benedick</li> ... </ul> </span><span> <h1>The Merchant of Venice</h1> <ul> <li>All</li><li>Antonio</li><li>Arragon</li> <li>Balthasar</li> ... </ul> </span><span> <h1>The Tragedy of Hamlet, Prince of Denmark</h1> <ul> <li>All</li><li>Bernardo</li><li>Captain</li> <li>Cornelius</li> ... </ul> </span><span> <h1>The Tragedy of King Lear</h1> <ul> <li>Albany</li><li>Burgundy</li><li>Captain</li> <li>Cordelia</li> ... </ul> </span><span> <h1>The Tragedy of Macbeth</h1> <ul> <li>All</li><li>Angus</li><li>Attendant</li> <li>Banquo</li> ... </ul> </span><span> <h1>The Tragedy of Romeo and Juliet</h1> <ul> <li/><li>Abraham</li><li>Apothecary</li> <li>Balthasar</li> ... </ul> </span> </html>
Development of XQuery 1.0 is not yet complete. As of this writing, the W3C specification documents are in Last Call (http://www.w3.org/XML/Query#specs). It looks like there will be a second Last Call before the specifications proceed to candidate recommendation (with two more formal stages after that). The example code shown here was written against the Last Call draft from November 2003.
You can find pointers to the XQuery specifications, online articles, mailing lists, and a community Wiki at http://www.xquery.com
—Jason Hunter
3.144.104.29