Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3

Retrieving Data from XML Documents

IN THIS CHAPTER

Discovering XQuery

Finding out about FLWOR expressions

Comparing XQuery with SQL

Early in the development of XML, one of the primary concerns was the conversion of data stored in SQL databases to XML so that it could be transmitted to other, incompatible data stores or so that it could be displayed on the web. In Book 6, Chapter 1, I describe several SQL functions whose purposes are to perform such conversions. Converting XML to SQL is also an important endeavor because SQL has traditionally been the premier tool for extracting the information you want from a collection of data. This conversion usually takes the form of shredding, in which an XML document is torn apart and pieces of it flow into the columns of tables in an SQL database. Then queries can be made by using normal SQL SELECT statements.

Querying XML documents directly, without shredding them into an SQL database, is much more complicated; thus, that capability took a while longer to implement in standard form. In addition, many common examples of XML documents aren’t readily shredded. The tree structure of an XML document can be difficult to translate into the row-and-column structure of a relational database. Consequently, several years of development of XML query facilities have been required to produce the XQuery standard, currently at version 3.1, which defines how to query an XML document directly and retrieve the information you want.

XQuery, like XML itself, is a vast topic, which I don’t cover in detail here. I describe it briefly, however, and give some examples of its use. These examples are in no way comprehensive but should give you an idea of what you can do with XQuery. For an in-depth treatment, I recommend Querying XML, by Jim Melton and Stephen Buxton (published by Morgan Kaufmann Publishers).

XQuery

XQuery is a nonprocedural language specifically designed to retrieve desired information from XML documents, just as SQL is a nonprocedural language specifically designed to retrieve desired information from relational databases. Whereas relational databases are highly structured, XML documents can be characterized as semistructured. What an XML document looks like varies a lot more than what a relational database looks like.

When I say that SQL or XQuery is a nonprocedural language, I mean that a query — whether expressed in SQL or XQuery — describes what to do but not how to do it. The how is left up to the engines that process the SQL or XQuery code.

Where XQuery came from

XQuery is the result of combining the best parts of several predecessor languages. For a long time, it’s been clear that being able to query XML documents directly has great value. Several groups worked on the problem and came up with a variety of query languages. One of those languages is XQL, written in 1998 by Jonathan Robie, who worked for Software AG at the time. An unrelated language, also named XQL, was created at Fujitsu Labs at the same time, but never developed beyond the prototype stage.

At about the same time, a language named XML-QL emerged from a collaboration of several researchers. Stanford University joined the game with a project named Lore and a language named Lorel, which had object-oriented characteristics. At Institut National de Recherche en Informatique et en Automatique (INRIA), the French National Institute for Research in Computer Science and Control, a research language named YATL was developed. Rounding out the predecessors, a language named Quilt was developed by Don Chamberlin (one of the authors of SQL), Jonathan Robie, and Daniela Florescu, all of IBM.

Although XQuery probably owes more to Quilt than to any of the other predecessor languages, it takes ideas from all the others and benefits from all that has gone before.

What XQuery requires

The XQuery 3.1 language specification defines what must be true of a language for it to qualify as an XQuery implementation. Actually, it defines three levels. Some things must be true, others should be true, and some other things may be true. XQuery is evolving, and it’s not yet clear what will be mandatory in the future.

Among the things that the XQuery 3.1 language specification defines are the following requirements:

XQuery is a declarative language, and as such, it must not mandate an evaluation strategy. It describes what the processor should do, not how it should do it.
XQuery may have more than one syntax binding, but it must have one syntax that’s convenient for humans to read and one syntax expressed in XML that reflects the underlying structure of the query.
XQuery must define standard error conditions that can occur during execution of a query.

XQuery 1.0 doesn’t have any update capability, but an extension named XQuery Update 1.0 does. However, at the time of this writing, not all vendors support XQuery Update 1.0. Oracle and Microsoft, among others, offer proprietary update solutions (discussed in Book 6, Chapter 2).

XQuery functionality

The XQuery requirements document specifies several things that an XQuery implementation must do, as well as things that it should do and things that it may do. Here are some of those functionality requirements:

XQuery must support operations on all data types in the XQuery data model.
Queries must be able to express simple conditions on text, including on text that spans element boundaries. Element boundaries are specified by tags such as <element></element>. Text may go beyond such a boundary. If it does, queries must be able to express simple conditions on that text.
Operations on collections must include support for universal and existential quantifiers. “All” is an example of a universal quantifier and “some” is an example of an existential quantifier. When Aristotle said, “All men are mortal,” he was making a statement about all men, a universal statement. When he said, “Some men are Greeks,” he was making an existential statement. A least one man is a Greek, but one cannot conclude from the statement that all men are Greeks.
XQuery must be able to combine related information from different parts of a given document or from multiple documents.
XQuery must be able to compute summary information from a group of related document elements (aggregation).
XQuery must be able to sort query results.
XQuery must support NULL values.
Queries should be able to operate on literal data. Literal data is the data itself, rather than some representation of the data.
Queries must be able to perform simple operations on names, such as testing for equality in element names and attribute names, and processing instruction targets. Queries may perform more powerful operations on names.
XQuery should support the use of externally defined functions on all data types of the XQuery data model.
XQuery must be able to provide access to environmental information, such as current date, time, and time zone.

These requirements are a partial list. XQuery 3.1 meets all the requirements I have listed as must or should. Other requirements in the XQuery Requirements may or may not have been met by XQuery 3.1.

Usage scenarios

The World Wide Web Consortium (www.w3.org) has developed a set of 77 use cases that cover 9 categories of queries. In each case, a query is applied to the supplied input data, and the expected results are given. You can use these use cases as a starting point in testing an XQuery implementation to see whether it’s more or less working. An exhaustive test suite, which tests every possibility, would take thousands of such cases, but these 77 are a good start. In this section, I show you just one such case.

Because this book just happens to be about SQL, one category of data that you may want to query using XQuery is data stored in a relational database. You can find such a case at paragraph 1.4.1 of the XML Query Use Cases document, which is available at www.w3.org/TR/xquery-use-cases. I reproduce that case in this section.

The case takes data from a simplified version of an online auction. Three tables are involved: USERS, ITEMS, and BIDS. The USERS table contains information on buyers and sellers. The ITEMS table lists items currently for sale or that have recently been for sale. The BIDS table contains all the bids on record.

Here are the tables and the columns that they contain:

USERS (USERID, NAME, RATING)

ITEMS (ITEMNO, DESCRIPTION, OFFERED_BY, START_DATE, END_DATE,

RESERVE_PRICE)

BIDS (USERID, ITEMNO, BID, BID_DATE)

USERID is the primary key of the USERS table; ITEMNO is the primary key of the ITEMS table, and the combination of USERID and ITEMID is the composite primary key of the BIDS table.

The relational database tables correspond to input documents named users.xml, items.xml, and bids.xml. The correspondence between the tables and the XML documents is specified by the following Document Type Definition (DTD):

<!DOCTYPE users [

<!ELEMENT users (user_tuple*)>

<!ELEMENT user_tuple (userid, name, rating?)>

<!ELEMENT userid (#PCDATA)>

<!ELEMENT name (#PCDATA)>

<!ELEMENT rating (#PCDATA)> ]>

<!DOCTYPE items [

<!ELEMENT items (item_tuple*)>

<!ELEMENT item_tuple (itemno, description, offered_by,

start_date?, end_date?, reserve_price?)>

<!ELEMENT itemno (#PCDATA)>

<!ELEMENT description (#PCDATA)>

<!ELEMENT offered_by (#PCDATA)>

<!ELEMENT start_date (#PCDATA)>

<!ELEMENT end_date (#PCDATA)>

<!ELEMENT reserve_price (#PCDATA)> ]>

<!DOCTYPE bids [

<!ELEMENT bids (bid_tuple*)>

<!ELEMENT bid_tuple (userid, itemno, bid, bid_date)>

<!ELEMENT userid (#PCDATA)>

<!ELEMENT itemno (#PCDATA)>

<!ELEMENT bid (#PCDATA)>

<!ELEMENT bid_date (#PCDATA)> ]>

The input data is contained in Table 3-1, Table 3-2, and Table 3-3.

TABLE 3-1 USERS

USERID	Name	Rating
U01	Tom Jones	B
U02	Mary Doe	A
U03	Dee Linquent	D
U04	Roger Smith	C
U05	Jack Sprat	B
U06	Rip Van Winkle	B

TABLE 3-2 ITEMS

ITEMID	Description	Offered_By	Start_Date	End_Date	Reserve_Price
1001	Red Bicycle	U01	1999-01-05	1999-01-20	40
1002	Motorcycle	U02	1999-02-11	1999-03-15	500
1003	Old Bicycle	U02	1999-01-10	1999-02-20	25
1004	Tricycle	U01	1999-02-25	1999-03-08	15
1005	Tennis Racquet	U03	1999-03-19	1999-04-30	20
1006	Helicopter	U03	1999-05-05	1999-05-25	50000
1007	Racing Bicycle	U04	1999-01-20	1999-02-20	200
1008	Broken Bicycle	U01	1999-02-05	1999-03-06	25

TABLE 3-3 BIDS

USERID	ITEMNO	Bid	Bid_Date
U02	1001	35	1999-01-07
U04	1001	40	1999-01-08
U02	1001	45	1999-01-11
U04	1001	50	1999-01-13
U02	1001	55	1999-01-15
U01	1002	400	1999-02-14
U02	1002	600	1999-02-16
U03	1002	800	1999-02-17
U04	1002	1000	1999-02-25
U02	1002	1200	1999-03-02
U04	1003	15	1999-01-22
U05	1003	20	1999-02-03
U01	1004	40	1999-03-05
U03	1007	175	1999-01-25
U05	1007	200	1999-02-08
U04	1007	225	1999-02-12

The XML representation of this tabular data serves as the input to the query. Following is a truncated version of the XML (truncated because the full XML files are lengthy, with nothing new beyond the first element in each):

<items>

<item_tuple>

<itemno>1001</itemno>

<description>Red Bicycle</description>

<offered_by>U01</offered_by>

<start_date>1999-01-05</start_date>

<end_date>1999-01-20</end_date>

<reserve_price>40</reserve_price>

</item_tuple>

<users>

<user_tuple>

<userid>U01</userid>

BookTom Jones</name>

<rating>B</rating>

</user_tuple>

<bids>

<bid_tuple>

<userid>U02</userid>

<itemno>1001</itemno>

<bid>35</bid>

<bid_date>1999-01-07</bid_date>

</bid_tuple>

<bid_tuple>

Here’s one of the queries run against this data: List the item numbers and descriptions of all bicycles that currently have an auction in progress, ordered by item number. This query is expressed in XQuery as follows:

<result>

{

for $i in doc("items.xml")//item_tuple

where $i/start_date <= current-date()

and $i/end_date >= current-date()

and contains($i/description, "Bicycle")

order by $i/itemno

return

<item_tuple>

{ $i/itemno }

{ $i/description }

</item_tuple>

}

</result>

Assuming that the current date is 1999-01-31, the expected result of the query is

<result>

<item_tuple>

<itemno>1003</itemno>

<description>Old Bicycle</description>

</item_tuple>

<item_tuple>

<itemno>1007</itemno>

<description>Racing Bicycle</description>

</item_tuple>

</result>

The auctions for the old bicycle and the racing bicycle were active on the last day of January 1999.

FLWOR Expressions

FLWOR means for, let, while, order by, and return. FLWOR expressions are to XQuery what SELECT expressions are to SQL. They’re the constructs you use to ask questions of an XML document. A FLWOR expression must contain either a for clause or a let clause as well as a return clause; optionally, it may also include while and order by clauses. Here’s the syntax of a FLWOR expression:

FLWORExpr ::= (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle

ForClause ::= "for" "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*

PositionalVar ::= "at" "$" VarName

LetClause ::= "let" "$" VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*

WhereClause ::= "where" ExprSingle

OrderByClause ::= ("order" "by" | "stable" "order" "by") OrderSpecList

OrderSpecList ::= OrderSpec ("," OrderSpec)*

OrderModifier ::= ("ascending" | "descending")? ("empty" "greatest" | "empty" "least")? ("collation" URILiteral)?

In the preceding code, the following conventions are used:

"A?" means that A is optional.
"A|B" means either A or B but not both.
"A+" means one or more occurrences of A.
"A*" means zero or more occurrences of A.

To see how to use a FLWOR expression on an example XML document that corresponds to an SQL table, consider this sample document:

<?xml version="1.0" encoding="UTF-8"?>

<customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<row>

<FirstName>Abe</FirstName>

<LastName>Abelson</LastName>

<City>Springfield</City>

<AreaCode>714</AreaCode>

<Telephone>555-1111</Telephone>

</row>

<row>

<FirstName>Bill</FirstName>

<LastName>Bailey</LastName>

<City>Decatur</City>

<AreaCode>714</AreaCode>

<Telephone>555-2222</Telephone>

</row>

<row>

<FirstName>Chuck</FirstName>

<LastName>Wood</LastName>

<City>Philo</City>

<AreaCode>714</AreaCode>

<Telephone>555-3333</Telephone>

</row>

</customer>

Keep this example in mind as you make your way through the fors, lets, whiles, order bys, and returns in the following sections.

The for clause

A FLWOR expression must have at least a for clause (or a let clause) and a return clause. For purposes of illustrating the effect of the for clause, here are a for fragment and the result of that much of a FLWOR expression:

for $c in customer/row

This code selects the row element in the customer XML document. The result is

$c: <row>

<FirstName>Abe</FirstName>

<LastName>Abelson</LastName>

<City>Springfield</City>

<AreaCode>714</AreaCode>

<Telephone>555-1111</Telephone>

</row>

$c: <row>

<FirstName>Bill</FirstName>

<LastName>Bailey</LastName>

<City>Decatur</City>

<AreaCode>714</AreaCode>

<Telephone>555-2222</Telephone>

</row>

$c: <row>

<FirstName>Chuck</FirstName>

<LastName>Wood</LastName>

<City>Philo</City>

<AreaCode>714</AreaCode>

<Telephone>555-3333</Telephone>

</row>

The result includes three instances of the variable $c, each one being bound to a separate row element in the binding sequence from the original document. This result is called a tuple stream. The for clause iterates over the tuples in the binding sequence, binding the variable to each of the tuples in the sequence in turn.

The let clause

To show the effect of the let clause, I need a second document example to go along with the customer example. Here’s that document:

<?xml version="1.0" encoding="UTF-8"?>

<product xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<row rating="0">

<ProdNo>101</ProdNo>

BookFirecracker 1</Name>

<Size>Big</Size>

</row>

<row rating="1">

<ProdNo>102</ProdNo>

BookFirecracker 2</Name>

<Size>Huge</Size>

</row>

<row rating="3">

<ProdNo>103</ProdNo>

BookFirecracker 3</Name>

<Size>Tremendous</Size>

</row>

</product>

Whereas the for clause iterates over the items in the binding sequence, the let clause binds its variables with the entire sequence. A let clause that binds multiple variables generates a single tuple containing all the variable bindings. Here’s an example:

let $c := /customer/row, $p := /product/row/ProdNo

This code produces the following result:

$c: <row> $p: 101

<FirstName>Abe</FirstName> 102

<LastName>Abelson</LastName> 103

<City>Springfield</City>

<AreaCode>714</AreaCode>

<Telephone>555-1111</Telephone>

</row>

<row>

<FirstName>Bill</FirstName>

<LastName>Bailey</LastName>

<City>Decatur</City>

<AreaCode>714</AreaCode>

<Telephone>555-2222</Telephone>

</row>

<row>

<FirstName>Chuck</FirstName>

<LastName>Wood</LastName>

<City>Philo</City>

<AreaCode>714</AreaCode>

<Telephone>555-3333</Telephone>

</row>

The where clause

The result of a for clause or a let clause contains some data that you want, but it likely also contains some irrelevant data that you don’t want. You can filter out the data that you don’t want with a where clause. Consider the following XQuery fragment:

for $p at $i in product/row

where $i = $p/@rating

This code produces the following result:

$p: <row rating="3"> $i 3

<ProdNo>103</ProdNo>

BookFirecracker 3</Name>

<Size>Tremendous</Size>

</row>

Only in the case of the last product does the element counter equal the value of the rating attribute.

The order by clause

The order by clause, as you would expect, specifies an order for the items in the result, according to the ordering specification (OrderSpec). The ordering specification contains a single expression (ExprSingle) and an optional ordering modifier (OrderModifier). You can add an order by clause to one of the query fragments in the preceding sections to see how it affects the result, as follows:

for $c in customer/row

order by $c/LastName descending

This code selects the row element in the customer XML document, listing the tuples in descending order by last name. The result is

$c: <row>

<FirstName>Chuck</FirstName>

<LastName>Wood</LastName>

<City>Philo</City>

<AreaCode>714</AreaCode>

<Telephone>555-3333</Telephone>

</row>

$c: <row>

<FirstName>Bill</FirstName>

<LastName>Bailey</LastName>

<City>Decatur</City>

<AreaCode>714</AreaCode>

<Telephone>555-2222</Telephone>

</row>

$c: <row>

<FirstName>Abe</FirstName>

<LastName>Abelson</LastName>

<City>Springfield</City>

<AreaCode>714</AreaCode>

<Telephone>555-1111</Telephone>

</row>

Because you’re ordering by last name in descending order, the record for Chuck Wood precedes that of Bill Bailey, which comes before the record for Abe Abelson.

The return clause

The return clause specifies what’s returned by the FLWOR expression after all the other clauses have had a chance to have an effect. The ExprSingle in the return clause is evaluated once for each tuple produced by the combined activity of the other clauses. The result is a sequence of values. Adding a return clause to the query fragment in the preceding section to make a complete query results in the following:

<result>

{

for $c in doc("customer.xml")

order by $c/row/LastName descending

return

<row>

{ $c/FirstName }

{ $c/LastName }

</row>

}

</result>

This code selects the row element in the customer XML document, listing the tuples in descending order by last name. Then it outputs the first and last names in the tuples. The result is

<result>

<row>

<FirstName>Chuck</FirstName>

<LastName>Wood</LastName>

</row>

<row>

<FirstName>Bill</FirstName>

<LastName>Bailey</LastName>

</row>

<row>

<FirstName>Abe</FirstName>

<LastName>Abelson</LastName>

</row>

</result>

XQuery versus SQL

The data in an XML document is in quite a different form from the data in an SQL table, but some similarities exist between the two worlds. XQuery’s FLWOR expression corresponds to some extent with the SQL SELECT expression. Both expressions are used to retrieve the data you want from a collection of data, most of which you don’t want at the moment.

Comparing XQuery’s FLWOR expression with SQL’s SELECT expression

Although XQuery’s let clause has no analog in SQL, the XQuery for clause is related to the SQL FROM clause in that both specify the source of the data. XQuery’s order by clause serves the same purpose that SQL’s ORDER BY clause serves. In both XQuery and SQL, the where clause filters out data that you don’t want to include in the result. SQL’s GROUP BY and HAVING clauses have no analogs in XQuery.

Relating XQuery data types to SQL data types

Some of XQuery’s data types correspond to SQL data types; others don’t. XQuery has some data types that don’t correspond to any SQL data types, and vice versa.

Table 3-4 lists the XQuery data types and, where applicable, the corresponding SQL types. Where no corresponding type exists, a dash serves as a placeholder.

TABLE 3-4 XQuery 1.0 Data Types and Corresponding SQL Data Types

XQuery 1.0 Data Types	SQL Data Types
`xs:string`	`CHARACTER`, `CHARACTER VARYING`, `CHARACTER LARGE OBJECT`, `NATIONAL CHARACTER`, `NATIONAL CHARACTER VARYING`, `NATIONAL CHARACTER LARGE OBJECT`
`xs:normalizedString`	—
`xs:token`	—
`xs:language`	—
`xs:NMTOKEN`	—
`xs:NMTOKENS`	—
`xs:Name`	—
`xs:NCNAME`	—
`xs:ID`	—
`xs:IDREF`	—
`xs:IDREFS`	—
`xs:ENTITY`	—
`xs:ENTITIES`	—
`xs:BOOLEAN`	`BOOLEAN`
`xs:decimal`	`NUMERIC`, `DECIMAL`
`xs:integer`	`INTEGER`
`xs:nonPositiveInteger`	—
`xs:negativeInteger`	—
`xs:long`	`BIGINT`
`xs:int`	`INTEGER`
`xs:short`	`SMALLINT`
`xs:byte`	—
`xs:nonNegativeInteger`	—
`xs:unsignedLong`	—
`xs:unsignedInt`	—
`xs:unsignedShort`	—
`xs:unsignedByte`	—
`xs:positiveInteger`	—
`xs:float`	`FLOAT`, `REAL`
`xs:double`	`FLOAT`, `DOUBLE`
`xs:error`	—
`xs:duration`	—
`xs:dateTime`	`TIMESTAMP WITH TIME ZONE`, `TIMESTAMP WITHOUT TIME ZONE`
`xs:date`	`DATE WITH TIME ZONE`, `DATE WITHOUT TIME ZONE`
`xs:time`	`TIME WITH TIME ZONE`, `TIME WITHOUT TIME ZONE`
`xs:gYearMonth`	—
`xs:gYear`	—
`xs:gMonthDay`	—
`xs:gDay`	—
`xs:gMonth`	—
`xs:hexBinary`	`BINARY LARGE OBJECT`
`xs:base64Binary`	`BINARY LARGE OBJECT`
`xs:anyURI`	—
`xs:QName`	—
`xs:NOTATION`	—
`xdt:dayTimeDuration`	`INTERVAL` (day–time interval)
`xdt:yearMonthDuration`	`INTERVAL` (year–month interval)
`xs:anyType`	`XML`
`xs:anySimpleType`	—
`xdt:untyped`	—
Node types	Structured user-defined types (UDTs)
User-defined complex types	Structured UDTs
—	`ROW`
—	`REF`
List types and sequences	`ARRAY`
List types and sequences	`MULTISET`
—	`DATALINK`

Clearly, a lot more XQuery types are available than SQL types. In most cases, casting an SQL type to an XQuery type isn’t a problem, but going the other way may be a challenge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Retrieving Data from XML Documents

Create new playlist

Sign In

Sign Up

Retrieving Data from XML Documents

XQuery

Where XQuery came from

What XQuery requires

XQuery functionality

Usage scenarios

FLWOR Expressions

The for clause

The let clause

The where clause

The order by clause

The return clause

XQuery versus SQL

Comparing XQuery’s FLWOR expression with SQL’s SELECT expression

Relating XQuery data types to SQL data types

Table of Contents for
Chapter 3: Retrieving Data from XML Documents