Chapter 3
IN THIS CHAPTER
Discovering XQuery
Finding out about FLWOR expressions
Comparing XQuery with SQL
Early in the development of XML, one of the primary concerns was the conversion of data stored in SQL databases to XML so that it could be transmitted to other, incompatible data stores or so that it could be displayed on the web. In Book 6, Chapter 1, I describe several SQL functions whose purposes are to perform such conversions. Converting XML to SQL is also an important endeavor because SQL has traditionally been the premier tool for extracting the information you want from a collection of data. This conversion usually takes the form of shredding, in which an XML document is torn apart and pieces of it flow into the columns of tables in an SQL database. Then queries can be made by using normal SQL
statements.SELECT
Querying XML documents directly, without shredding them into an SQL database, is much more complicated; thus, that capability took a while longer to implement in standard form. In addition, many common examples of XML documents aren’t readily shredded. The tree structure of an XML document can be difficult to translate into the row-and-column structure of a relational database. Consequently, several years of development of XML query facilities have been required to produce the XQuery standard, currently at version 3.1, which defines how to query an XML document directly and retrieve the information you want.
XQuery, like XML itself, is a vast topic, which I don’t cover in detail here. I describe it briefly, however, and give some examples of its use. These examples are in no way comprehensive but should give you an idea of what you can do with XQuery. For an in-depth treatment, I recommend Querying XML, by Jim Melton and Stephen Buxton (published by Morgan Kaufmann Publishers).
XQuery is a nonprocedural language specifically designed to retrieve desired information from XML documents, just as SQL is a nonprocedural language specifically designed to retrieve desired information from relational databases. Whereas relational databases are highly structured, XML documents can be characterized as semistructured. What an XML document looks like varies a lot more than what a relational database looks like.
XQuery is the result of combining the best parts of several predecessor languages. For a long time, it’s been clear that being able to query XML documents directly has great value. Several groups worked on the problem and came up with a variety of query languages. One of those languages is XQL, written in 1998 by Jonathan Robie, who worked for Software AG at the time. An unrelated language, also named XQL, was created at Fujitsu Labs at the same time, but never developed beyond the prototype stage.
At about the same time, a language named XML-QL emerged from a collaboration of several researchers. Stanford University joined the game with a project named Lore and a language named Lorel, which had object-oriented characteristics. At Institut National de Recherche en Informatique et en Automatique (INRIA), the French National Institute for Research in Computer Science and Control, a research language named YATL was developed. Rounding out the predecessors, a language named Quilt was developed by Don Chamberlin (one of the authors of SQL), Jonathan Robie, and Daniela Florescu, all of IBM.
Although XQuery probably owes more to Quilt than to any of the other predecessor languages, it takes ideas from all the others and benefits from all that has gone before.
The XQuery 3.1 language specification defines what must be true of a language for it to qualify as an XQuery implementation. Actually, it defines three levels. Some things must be true, others should be true, and some other things may be true. XQuery is evolving, and it’s not yet clear what will be mandatory in the future.
Among the things that the XQuery 3.1 language specification defines are the following requirements:
XQuery 1.0 doesn’t have any update capability, but an extension named XQuery Update 1.0 does. However, at the time of this writing, not all vendors support XQuery Update 1.0. Oracle and Microsoft, among others, offer proprietary update solutions (discussed in Book 6, Chapter 2).
The XQuery requirements document specifies several things that an XQuery implementation must do, as well as things that it should do and things that it may do. Here are some of those functionality requirements:
<element></element>
. Text may go beyond such a boundary. If it does, queries must be able to express simple conditions on that text.NULL
values.These requirements are a partial list. XQuery 3.1 meets all the requirements I have listed as must or should. Other requirements in the XQuery Requirements may or may not have been met by XQuery 3.1.
The World Wide Web Consortium (www.w3.org
) has developed a set of 77 use cases that cover 9 categories of queries. In each case, a query is applied to the supplied input data, and the expected results are given. You can use these use cases as a starting point in testing an XQuery implementation to see whether it’s more or less working. An exhaustive test suite, which tests every possibility, would take thousands of such cases, but these 77 are a good start. In this section, I show you just one such case.
Because this book just happens to be about SQL, one category of data that you may want to query using XQuery is data stored in a relational database. You can find such a case at paragraph 1.4.1 of the XML Query Use Cases document, which is available at www.w3.org/TR/xquery-use-cases
. I reproduce that case in this section.
The case takes data from a simplified version of an online auction. Three tables are involved: USERS, ITEMS, and BIDS. The USERS table contains information on buyers and sellers. The ITEMS table lists items currently for sale or that have recently been for sale. The BIDS table contains all the bids on record.
Here are the tables and the columns that they contain:
USERS (<b>USERID</b>, NAME, RATING)
ITEMS (<b>ITEMNO</b>, DESCRIPTION, OFFERED_BY, START_DATE, END_DATE,
RESERVE_PRICE)
BIDS (<b>USERID</b>, <b>ITEMNO</b>, BID, BID_DATE)
USERID is the primary key of the USERS table; ITEMNO is the primary key of the ITEMS table, and the combination of USERID and ITEMID is the composite primary key of the BIDS table.
The relational database tables correspond to input documents named users.xml
, items.xml
, and bids.xml
. The correspondence between the tables and the XML documents is specified by the following Document Type Definition (DTD):
<!DOCTYPE users [
<!ELEMENT users (user_tuple*)>
<!ELEMENT user_tuple (userid, name, rating?)>
<!ELEMENT userid (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT rating (#PCDATA)> ]>
<!DOCTYPE items [
<!ELEMENT items (item_tuple*)>
<!ELEMENT item_tuple (itemno, description, offered_by,
start_date?, end_date?, reserve_price?)>
<!ELEMENT itemno (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT offered_by (#PCDATA)>
<!ELEMENT start_date (#PCDATA)>
<!ELEMENT end_date (#PCDATA)>
<!ELEMENT reserve_price (#PCDATA)> ]>
<!DOCTYPE bids [
<!ELEMENT bids (bid_tuple*)>
<!ELEMENT bid_tuple (userid, itemno, bid, bid_date)>
<!ELEMENT userid (#PCDATA)>
<!ELEMENT itemno (#PCDATA)>
<!ELEMENT bid (#PCDATA)>
<!ELEMENT bid_date (#PCDATA)> ]>
The input data is contained in Table 3-1, Table 3-2, and Table 3-3.
TABLE 3-1 USERS
USERID |
Name |
Rating |
U01 |
Tom Jones |
B |
U02 |
Mary Doe |
A |
U03 |
Dee Linquent |
D |
U04 |
Roger Smith |
C |
U05 |
Jack Sprat |
B |
U06 |
Rip Van Winkle |
B |
TABLE 3-2 ITEMS
ITEMID |
Description |
Offered_By |
Start_Date |
End_Date |
Reserve_Price |
1001 |
Red Bicycle |
U01 |
1999-01-05 |
1999-01-20 |
40 |
1002 |
Motorcycle |
U02 |
1999-02-11 |
1999-03-15 |
500 |
1003 |
Old Bicycle |
U02 |
1999-01-10 |
1999-02-20 |
25 |
1004 |
Tricycle |
U01 |
1999-02-25 |
1999-03-08 |
15 |
1005 |
Tennis Racquet |
U03 |
1999-03-19 |
1999-04-30 |
20 |
1006 |
Helicopter |
U03 |
1999-05-05 |
1999-05-25 |
50000 |
1007 |
Racing Bicycle |
U04 |
1999-01-20 |
1999-02-20 |
200 |
1008 |
Broken Bicycle |
U01 |
1999-02-05 |
1999-03-06 |
25 |
TABLE 3-3 BIDS
USERID |
ITEMNO |
Bid |
Bid_Date |
U02 |
1001 |
35 |
1999-01-07 |
U04 |
1001 |
40 |
1999-01-08 |
U02 |
1001 |
45 |
1999-01-11 |
U04 |
1001 |
50 |
1999-01-13 |
U02 |
1001 |
55 |
1999-01-15 |
U01 |
1002 |
400 |
1999-02-14 |
U02 |
1002 |
600 |
1999-02-16 |
U03 |
1002 |
800 |
1999-02-17 |
U04 |
1002 |
1000 |
1999-02-25 |
U02 |
1002 |
1200 |
1999-03-02 |
U04 |
1003 |
15 |
1999-01-22 |
U05 |
1003 |
20 |
1999-02-03 |
U01 |
1004 |
40 |
1999-03-05 |
U03 |
1007 |
175 |
1999-01-25 |
U05 |
1007 |
200 |
1999-02-08 |
U04 |
1007 |
225 |
1999-02-12 |
The XML representation of this tabular data serves as the input to the query. Following is a truncated version of the XML (truncated because the full XML files are lengthy, with nothing new beyond the first element in each):
<items>
<item_tuple>
<itemno>1001</itemno>
<description>Red Bicycle</description>
<offered_by>U01</offered_by>
<start_date>1999-01-05</start_date>
<end_date>1999-01-20</end_date>
<reserve_price>40</reserve_price>
</item_tuple>
<!-- !!! Snip !!! -->
<users>
<user_tuple>
<userid>U01</userid>
BookTom Jones</name>
<rating>B</rating>
</user_tuple>
<!--!!! Snip !!! -->
<bids>
<bid_tuple>
<userid>U02</userid>
<itemno>1001</itemno>
<bid>35</bid>
<bid_date>1999-01-07</bid_date>
</bid_tuple>
<bid_tuple>
<!-- !!! Snip !!! -->
Here’s one of the queries run against this data: List the item numbers and descriptions of all bicycles that currently have an auction in progress, ordered by item number. This query is expressed in XQuery as follows:
<result>
{
for $i in doc("items.xml")//item_tuple
where $i/start_date <= current-date()
and $i/end_date >= current-date()
and contains($i/description, "Bicycle")
order by $i/itemno
return
<item_tuple>
{ $i/itemno }
{ $i/description }
</item_tuple>
}
</result>
Assuming that the current date is 1999-01-31, the expected result of the query is
<result>
<item_tuple>
<itemno>1003</itemno>
<description>Old Bicycle</description>
</item_tuple>
<item_tuple>
<itemno>1007</itemno>
<description>Racing Bicycle</description>
</item_tuple>
</result>
The auctions for the old bicycle and the racing bicycle were active on the last day of January 1999.
FLWOR means for, let, while, order by, and return. FLWOR
expressions are to XQuery what SELECT
expressions are to SQL. They’re the constructs you use to ask questions of an XML document. A FLWOR
expression must contain either a for
clause or a let
clause as well as a return
clause; optionally, it may also include while
and order by
clauses. Here’s the syntax of a FLWOR
expression:
FLWORExpr ::= (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle
ForClause ::= "for" "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*
PositionalVar ::= "at" "$" VarName
LetClause ::= "let" "$" VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*
WhereClause ::= "where" ExprSingle
OrderByClause ::= ("order" "by" | "stable" "order" "by") OrderSpecList
OrderSpecList ::= OrderSpec ("," OrderSpec)*
OrderModifier ::= ("ascending" | "descending")? ("empty" "greatest" | "empty" "least")? ("collation" URILiteral)?
In the preceding code, the following conventions are used:
"A?"
means that A
is optional."A|B"
means either A
or B
but not both."A+"
means one or more occurrences of A
."A*"
means zero or more occurrences of A
.To see how to use a FLWOR
expression on an example XML document that corresponds to an SQL table, consider this sample document:
<?xml version="1.0" encoding="UTF-8"?>
<customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<FirstName>Abe</FirstName>
<LastName>Abelson</LastName>
<City>Springfield</City>
<AreaCode>714</AreaCode>
<Telephone>555-1111</Telephone>
</row>
<row>
<FirstName>Bill</FirstName>
<LastName>Bailey</LastName>
<City>Decatur</City>
<AreaCode>714</AreaCode>
<Telephone>555-2222</Telephone>
</row>
<row>
<FirstName>Chuck</FirstName>
<LastName>Wood</LastName>
<City>Philo</City>
<AreaCode>714</AreaCode>
<Telephone>555-3333</Telephone>
</row>
</customer>
Keep this example in mind as you make your way through the for
s, let
s, while
s, order by
s, and return
s in the following sections.
A FLWOR
expression must have at least a for
clause (or a let
clause) and a return
clause. For purposes of illustrating the effect of the for
clause, here are a for
fragment and the result of that much of a FLWOR
expression:
for $c in customer/row
This code selects the row
element in the customer XML document. The result is
$c: <row>
<FirstName>Abe</FirstName>
<LastName>Abelson</LastName>
<City>Springfield</City>
<AreaCode>714</AreaCode>
<Telephone>555-1111</Telephone>
</row>
$c: <row>
<FirstName>Bill</FirstName>
<LastName>Bailey</LastName>
<City>Decatur</City>
<AreaCode>714</AreaCode>
<Telephone>555-2222</Telephone>
</row>
$c: <row>
<FirstName>Chuck</FirstName>
<LastName>Wood</LastName>
<City>Philo</City>
<AreaCode>714</AreaCode>
<Telephone>555-3333</Telephone>
</row>
The result includes three instances of the variable $c
, each one being bound to a separate row element in the binding sequence from the original document. This result is called a tuple stream. The for
clause iterates over the tuples in the binding sequence, binding the variable to each of the tuples in the sequence in turn.
To show the effect of the let
clause, I need a second document example to go along with the customer
example. Here’s that document:
<?xml version="1.0" encoding="UTF-8"?>
<product xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row rating="0">
<ProdNo>101</ProdNo>
BookFirecracker 1</Name>
<Size>Big</Size>
</row>
<row rating="1">
<ProdNo>102</ProdNo>
BookFirecracker 2</Name>
<Size>Huge</Size>
</row>
<row rating="3">
<ProdNo>103</ProdNo>
BookFirecracker 3</Name>
<Size>Tremendous</Size>
</row>
</product>
Whereas the for
clause iterates over the items in the binding sequence, the let
clause binds its variables with the entire sequence. A let
clause that binds multiple variables generates a single tuple containing all the variable bindings. Here’s an example:
let $c := /customer/row, $p := /product/row/ProdNo
This code produces the following result:
$c: <row> $p: 101
<FirstName>Abe</FirstName> 102
<LastName>Abelson</LastName> 103
<City>Springfield</City>
<AreaCode>714</AreaCode>
<Telephone>555-1111</Telephone>
</row>
<row>
<FirstName>Bill</FirstName>
<LastName>Bailey</LastName>
<City>Decatur</City>
<AreaCode>714</AreaCode>
<Telephone>555-2222</Telephone>
</row>
<row>
<FirstName>Chuck</FirstName>
<LastName>Wood</LastName>
<City>Philo</City>
<AreaCode>714</AreaCode>
<Telephone>555-3333</Telephone>
</row>
The result of a for
clause or a let
clause contains some data that you want, but it likely also contains some irrelevant data that you don’t want. You can filter out the data that you don’t want with a where
clause. Consider the following XQuery fragment:
for $p at $i in product/row
where $i = $p/@rating
This code produces the following result:
$p: <row rating="3"> $i 3
<ProdNo>103</ProdNo>
BookFirecracker 3</Name>
<Size>Tremendous</Size>
</row>
Only in the case of the last product does the element counter equal the value of the rating
attribute.
The order by
clause, as you would expect, specifies an order for the items in the result, according to the ordering specification (OrderSpec
). The ordering specification contains a single expression (ExprSingle
) and an optional ordering modifier (OrderModifier
). You can add an order by
clause to one of the query fragments in the preceding sections to see how it affects the result, as follows:
for $c in customer/row
order by $c/LastName descending
This code selects the row element in the customer
XML document, listing the tuples in descending order by last name. The result is
$c: <row>
<FirstName>Chuck</FirstName>
<LastName>Wood</LastName>
<City>Philo</City>
<AreaCode>714</AreaCode>
<Telephone>555-3333</Telephone>
</row>
$c: <row>
<FirstName>Bill</FirstName>
<LastName>Bailey</LastName>
<City>Decatur</City>
<AreaCode>714</AreaCode>
<Telephone>555-2222</Telephone>
</row>
$c: <row>
<FirstName>Abe</FirstName>
<LastName>Abelson</LastName>
<City>Springfield</City>
<AreaCode>714</AreaCode>
<Telephone>555-1111</Telephone>
</row>
Because you’re ordering by last name in descending order, the record for Chuck Wood precedes that of Bill Bailey, which comes before the record for Abe Abelson.
The return
clause specifies what’s returned by the FLWOR
expression after all the other clauses have had a chance to have an effect. The ExprSingle
in the return clause is evaluated once for each tuple produced by the combined activity of the other clauses. The result is a sequence of values. Adding a return
clause to the query fragment in the preceding section to make a complete query results in the following:
<result>
{
for $c in doc("customer.xml")
order by $c/row/LastName descending
return
<row>
{ $c/FirstName }
{ $c/LastName }
</row>
}
</result>
This code selects the row
element in the customer
XML document, listing the tuples in descending order by last name. Then it outputs the first and last names in the tuples. The result is
<result>
<row>
<FirstName>Chuck</FirstName>
<LastName>Wood</LastName>
</row>
<row>
<FirstName>Bill</FirstName>
<LastName>Bailey</LastName>
</row>
<row>
<FirstName>Abe</FirstName>
<LastName>Abelson</LastName>
</row>
</result>
The data in an XML document is in quite a different form from the data in an SQL table, but some similarities exist between the two worlds. XQuery’s FLWOR
expression corresponds to some extent with the SQL SELECT
expression. Both expressions are used to retrieve the data you want from a collection of data, most of which you don’t want at the moment.
Although XQuery’s let
clause has no analog in SQL, the XQuery for
clause is related to the SQL FROM
clause in that both specify the source of the data. XQuery’s order by
clause serves the same purpose that SQL’s ORDER
BY
clause serves. In both XQuery and SQL, the where
clause filters out data that you don’t want to include in the result. SQL’s GROUP BY
and HAVING
clauses have no analogs in XQuery.
Some of XQuery’s data types correspond to SQL data types; others don’t. XQuery has some data types that don’t correspond to any SQL data types, and vice versa.
Table 3-4 lists the XQuery data types and, where applicable, the corresponding SQL types. Where no corresponding type exists, a dash serves as a placeholder.
TABLE 3-4 XQuery 1.0 Data Types and Corresponding SQL Data Types
XQuery 1.0 Data Types |
SQL Data Types |
|
|
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
|
|
|
|
|
|
— |
|
— |
|
|
|
|
|
|
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
— |
|
|
|
|
|
— |
|
— |
|
|
|
|
|
|
|
— |
|
— |
|
— |
|
— |
|
— |
|
|
|
|
|
— |
|
— |
|
— |
|
|
|
|
|
|
|
— |
|
— |
Node types |
Structured user-defined types (UDTs) |
User-defined complex types |
Structured UDTs |
— |
|
— |
|
List types and sequences |
|
List types and sequences |
|
— |
|
Clearly, a lot more XQuery types are available than SQL types. In most cases, casting an SQL type to an XQuery type isn’t a problem, but going the other way may be a challenge.
18.118.137.67