Chapter 26. XQuery for SQL Users

This chapter is designed to provide some background material for readers who are already using SQL and relational databases. It compares SQL and XQuery at both the data model and syntax levels. It also provides pointers for using SQL and XQuery together, and describes the role of SQL/XML.

Relational Versus XML Data Models

As you may know, relational databases represent data in terms of tables, rows, and columns. Some XML documents, such as our product catalog document, map fairly cleanly onto a relational model. Example 26-1 shows catalog2.xml, a slightly simplified version of the product catalog document used throughout this book.

Example 26-1. Product catalog document (catalog2.xml)
<catalog>
  <product dept="WMN">
    <number>557</number>
    <name>Fleece Pullover</name>
  </product>
  <product dept="ACC">
    <number>563</number>
    <name>Floppy Sun Hat</name>
  </product>
  <product dept="ACC">
    <number>443</number>
    <name>Deluxe Travel Bag</name>
  </product>
  <product dept="MEN">
    <number>784</number>
    <name>Cotton Dress Shirt</name>
    <desc>Our favorite shirt!</desc>
  </product>
</catalog>

Because the product catalog document is relatively uniform and does not contain any repeating relationships between objects, the product catalog can be represented as a single relational table, shown in Table 26-1. Each product is a row, and each possible property of the product is a column.

Table 26-1. The catalog table
numberdeptnamedesc
557WMNFleece Pullover
563ACCFloppy Sun Hat
443ACCDeluxe Travel Bag
784MENCotton Dress ShirtOur favorite shirt!

Some of the products do not have descriptions, which means that nulls (or zero-length strings) are stored in the desc column for these rows. XML does not have a direct equivalent of null values in the relational model. In XML, a “missing” value could be represented as an element or attribute that is simply omitted, as in our example, where the desc element does not appear when it does not apply. It could also be represented as an empty element (<desc></desc> or <desc/>). Yet another representation uses the XML Schema concept of “nilled” elements, as in <desc xsi:nil="true"/>.

Some XML documents encompass multiple “entities” with repeating relationships between them. The order document (order.xml) is such a document, since it describes a hierarchical relationship between an order and the items it contains. There are properties of the order itself, as well as properties of each item, so each needs to be represented by a table (see Tables 26-2 and 26-3).

Table 26-2. The orders table
numdatecust
002994322015-09-150221A
Table 26-3. The order_item table
ordnumdeptnumquantitycolor
00299432WMN5571navy
00299432ACC5631
00299432ACC4432
00299432MEN7841white
00299432MEN7841gray
00299432WMN5571black

Comparing SQL Syntax with XQuery Syntax

This section compares SQL syntax with XQuery syntax in order to give readers already familiar with SQL a jumpstart on learning XQuery. If you notice similarities between SQL and XQuery, it is not a coincidence; some of the key developers of the SQL standard also worked on the XQuery standard. Not all SQL syntax is covered in this chapter, only the most commonly used constructs.

A Simple Query

To compare SQL and XQuery queries, we will first start with our simple product catalog document. A basic SQL query might select all the values from the table that meet some specific criteria, for example, those in the ACC department. The SQL statement that accomplishes this is:

select * from catalog
where dept='ACC'

In XQuery, we can use a straight path expression for such a simple query, as in:

doc("catalog2.xml")//product[@dept='ACC']

If you don’t want to sort your results or construct new elements, it is often simpler (and possibly faster) to just use a path expression. However, we could also use a full FLWOR expression that uses a where clause similar to SQL, as in:

for $prod in doc("catalog2.xml")//product
where $prod/@dept='ACC'
return $prod

In the where clause, we need to start the reference to the dept attribute with $prod/ in order to give it some context. This is different from SQL, where if there is only one dept column in the table(s) in the query, it is assumed to be that one column. In XQuery, you must be explicit about where the dept attribute appears, because it could appear on any level of the document.

Now, suppose we want to sort the values by the product number. In SQL, we would simply add an order by clause, as in:

select * from catalog
where dept='ACC'
order by number

We would also add an order by clause to the FLWOR, again giving it the context, as in:

for $prod in doc("catalog2.xml")//product
where $prod/@dept='ACC'
order by $prod/number
return $prod

Conditions and Operators

Conditions and operators are used to filter query results. Many of the conditions and operators available in SQL can also be used in XQuery, although sometimes with a slightly modified syntax.

Comparisons

The example in the previous section used the equals sign (=) to compare the value of the department to ACC. XQuery has the same comparison operators as SQL, namely =, !=, <, <=, >, and >=. The BETWEEN condition in SQL is not directly supported, but you can always use two comparisons, as in:

for $prod in doc("catalog2.xml")//product
where $prod/number > 500 and $prod/number < 700
return $prod

Like SQL, quotes are used to surround string values, whereas they are not used to surround numeric values.

Strings in SQL can also be matched to wildcards by using a LIKE condition. For example, the query:

select * from catalog
where name LIKE 'F%'

will return products whose names start with the letter F. XQuery provides a starts-with function that would be useful in this particular case. For example:

for $prod in doc("catalog2.xml")//product
where starts-with($prod/name, 'F')
return $prod

For the more general case, the matches function allows you to match a string against any regular expression. In regular expressions, a single period (.) represents any one character, much like the underscore (_) in LIKE conditions. Adding an asterisk after the period (.*) allows any number of any characters, similar to the % character in LIKE conditions. Anchors (^ and $) can be used indicate the start and end of a string. Another way of expressing the previous query is:

for $prod in doc("catalog2.xml")//product
where matches($prod/name, '^F.*')
return $prod

Regular expressions are discussed in detail in Chapter 19.

The IN condition in SQL is useful for comparing a value with a list of specified values. For example, to find all the products that are in either the ACC or WMN departments, you could use the query:

select * from catalog
where dept in ('ACC', 'WMN')

In XQuery, no special in operator is needed because the equals operator (=) allows multiple values on either side of the comparison, as in:

for $prod in doc("catalog2.xml")//product
where $prod/@dept = ('ACC', 'WMN')
return $prod

The meaning of the = operator in this case is that the product department must be equal to at least one of the values ACC or WMN.

Arithmetic operators

For arithmetic operations, XQuery uses +, -, and * for addition, subtraction, and multiplication, just like SQL. For division, XQuery does not support the / operator (because that’s needed in path expressions), but instead uses div and idiv operators. These are covered in detail in “Arithmetic Operations”.

Boolean operators

In SQL, multiple conditions are often combined with and and or operators, and sometimes parentheses are used to group conditions together. For example, the following query selects the products that are in the ACC department whose names start with either F or G:

select * from catalog
where dept='ACC' and
      (name like 'F%' or name like 'G%')

Parentheses are used around the name-related conditions to prevent the and operator from taking precedence. If the parentheses were not there, the first two conditions would be “and-ed” together, with different results.

The and and or operators, and the parentheses, work identically in XQuery. An equivalent XQuery query is:

for $prod in doc("catalog2.xml")//product
where $prod/@dept='ACC' and 
      (matches($prod/name, '^F') or matches($prod/name, '^G'))
return $prod

To negate a condition in SQL, you can use a not operator, which is sometimes applied to a parenthesized group, and is sometimes part of the comparison syntax, as in name not like 'F%'. In XQuery, you use a not function, so it is always followed by an expression in parentheses. For example, the SQL query:

select * from catalog
where not(dept='ACC') and
      (name not like 'F%')

is equivalent to the following XQuery query:

for $prod in doc("catalog2.xml")//product
where not($prod/@dept='ACC') and
      not(matches($prod/name, '^F'))
return $prod

Functions

SQL has a number of built-in functions, many of which have equivalent XQuery functions, often with the same name. Some of the commonly used SQL functions and their XQuery equivalents are listed in Table 26-4. The syntax to call functions is the same in both languages: the arguments are enclosed in parentheses and separated by commas.

Table 26-4. Equivalent functions
SQL functionXQuery function
Numeric functions
sum sum
avg avg
count count
max max
min min
round round
ceil ceiling
floor floor
String functions
substr substring
concat concat
upper upper-case
lower lower-case
trim normalize-space
replace replace
length string-length
Date-related functions
current_date current-date
current_timestamp current-dateTime

Selecting Distinct Values

SQL has a DISTINCT keyword that allows only distinct values to be selected. For example, to get a list of the unique departments in the catalog, you would use the query:

select distinct dept from catalog

In XQuery, you would make a call to the distinct-values function, as in:

distinct-values(doc("catalog2.xml")//product/@dept)

or, if you prefer a FLWOR:

for $value in distinct-values(doc("catalog2.xml")//product/@dept)
return $value

Often you are interested in a combination of distinct values. In SQL, this is quite straightforward; you simply add more columns to the query. To get the distinct combinations of department and product number, you could use:

select distinct dept, number from catalog

However, the XQuery distinct-values function only accepts one set of values. This means that you must use a FLWOR expression with multiple for clauses to achieve the same result. This is shown in Example 26-2 and described further in “Selecting Distinct Values”.

Example 26-2. Distinctness on multiple values

Query

for $d in distinct-values(doc("catalog2.xml")//product/@dept),
    $n in distinct-values(doc("catalog2.xml")//product[@dept = $d]/number)
return <result dept="{$d}" number="{$n}"/>

Results

<result dept="WMN" number="557"/>
<result dept="ACC" number="563"/>
<result dept="ACC" number="443"/>
<result dept="MEN" number="784"/>

Working with Multiple Tables and Subqueries

Many SQL queries join multiple tables together. For example, suppose you want to join the order and product tables to retrieve results that contain values from both tables. In SQL, you might write a query such as:

select order_item.num, catalog.name, order_item.quantity
from order_item, catalog
where order_item.num = catalog.number

In XQuery, joins are not needed so frequently because the data will often be stored in a single hierarchy rather than being split across multiple tables. However, joins still arise and the mechanism is similar. In XQuery, the join in the where clause might look like Example 26-3.

Example 26-3. Two-way join in XQuery

Query

for $item in doc("order.xml")//item,
    $prod in doc("catalog2.xml")//product
where $item/@num = $prod/number
return <item num="{$item/@num}"
             name="{$prod/name}"
             quan="{$item/@quantity}"/>

Results

<item num="557" name="Fleece Pullover" quan="1"/>
<item num="563" name="Floppy Sun Hat" quan="1"/>
<item num="443" name="Deluxe Travel Bag" quan="2"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="784" name="Cotton Dress Shirt" quan="1"/>
<item num="557" name="Fleece Pullover" quan="1"/>

The XQuery example constructs an item element to hold each set of three values. This is because XQuery by default does not return the result of each evaluation of the return clause in a “row” or any other container. If you simply returned the three values for each product, as in:

return ($item/@num, $prod/name, $item/@quantity)

the result would be 18 sequential values (557, Fleece Pullover, 1, 563, Floppy Sun Hat, etc.), with no relationship among them. The item element serves as a container to group the related three values together—the same purpose a row would serve in an SQL result.

As an alternative to a where clause, you can use one or more predicates in your for clause, as in:

for $item in doc("order.xml")//item,
    $prod in doc("catalog2.xml")//product[number = $item/@num]
return <item num="{$item/@num}"
             name="{$prod/name}"
             quan="{$item/@quantity}"/>

More information on joins in XQuery can be found in “Joins”, including examples of three-way joins and outer joins.

Subselects

Another way to use multiple tables in SQL (or, indeed, multiple queries on the same table) is by using subselects. Suppose you wanted to return all products from the catalog that are included in a particular order. You might use the following query:

select *
from catalog
where number in (select num from order_item
                 where ordnum = '00299432')

Like SQL select statements, XQuery FLWOR expressions can also be contained within each other. The following query uses a FLWOR embedded in the where clause:

for $prod in doc("catalog2.xml")//product
where $prod/number =
    (for $item in doc("order.xml")/order[@num='00299432']/item
     return $item/@num)
return $prod

In fact, XQuery allows expressions to be nested more freely than SQL does. For example, you can use a nested FLWOR expression in the in clause, or in the return clause.

Combining queries by using set operators

SQL supports the use of set operators such as UNION to combine the rows from multiple select statements. These set operators have equivalents in XQuery, as shown in Table 26-5.

Table 26-5. Set operators
SQL SyntaxXQuery Syntax
query1 union query2query1 union query2
query1 | query2
query1 union all query2(query1, query2)
query1 intersect query2query1 intersect query2
query1 minus query2query1 except query2

Grouping

SQL allows you to group data by certain values. For example, if you want to count the number of products in each department, you can use the query:

select dept, count(*)
from catalog
group by dept

XQuery has a similar group by clause, as shown in Example 26-4.

Example 26-4. Grouping in XQuery

Query

xquery version "3.0";
for $prod in doc("catalog2.xml")//product
group by $d := $prod/@dept
return <result dept="{$d}" count="{count($prod)}"/>

Results

<result code="ACC" count="2"/>
<result code="MEN" count="1"/>
<result code="WMN" count="1"/>

As you can see, in order to construct a “table” of return values, we construct result elements with two attributes representing the “columns.” Grouping is covered in more detail in “Grouping”.

Combining SQL and XQuery

Most major relational database vendors now allow you to store XML in your databases. The examples in this section use Microsoft SQL Server syntax, but there is similar functionality available in Oracle RDBMS and IBM DB2.

Combining Structured and Semi-Structured Data

One use case is to combine narrative text with more highly structured data. An example is when each of the products has a textual description that can span multiple paragraphs and can be marked up with HTML-like tags to indicate sections of text that need to be in bold or italic. This is shown in Table 26-6.

Table 26-6. The prod_desc table
numberdesc
557 <p>This pullover is made from recycled polyester.</p>
563 <p>Enjoy the sun in this <i>gorgeous</i> hat!</p>
443 <p>You'll <b>never</b> be disorganized with this bag.</p>
784 <p>Our <i>favorite</i> shirt!</p> <p>Can be monogrammed upon request.</p>

When you create the table, you declare the desc column to be of type XML, as in:

create table prod_desc (
  number integer not null,
  desc xml
  );

If desired, you can specify the location of a schema for the desc column, which will ensure that any values inserted into desc conform to that schema. It will also provide all the other benefits of using a schema, such as query debugging.

Flexible Data Structures

Another use case for storing XML in a relational table is to take advantage of the flexibility of XML. Suppose each product has a set of custom properties that needs to change flexibly over time. It is possible to create columns on the catalog table for each property, but that is inflexible because a new column needs to be added when a new property is added, also necessitating changes in the application that reads and writes to the table.

Another approach might be to create generic columns named property1, property2, etc., but this is problematic because you don’t know how many columns to create, and you need some sort of mapping scheme to figure out what each column means for each product. A more flexible approach might be to store the properties as XML, as shown in Table 26-7. This allows them to be queried and even indexed, but does not force a rigid database structure.

Table 26-7. The prod_properties table
numberproperties
557
<properties>
  <sleeveLength>24</sleeveLength>
</properties>
563
<properties/>
443
<properties>
  <capacity>80</capacity>
</properties>
784
<properties>
  <sleeveLength>25</sleeveLength>
  <extraButtons>2</extraButtons>
</properties>

To constrain the query based on the contents of the desc column, you can use an XQuery expression embedded in your SQL statement. In the case of SQL Server, one way to do this is through the use of an exist function. For example:

select number, properties
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The SQL Server exist function returns true (which equals 1) if there are any nodes that match the criteria. The expression in parentheses passed to the exist function is in XQuery syntax, and it is evaluated relative to the root of the document in the properties column, for each row. The results are shown in Table 26-8.

Table 26-8. Results containing the properties element
numberproperties
557
<properties>
  <sleeveLength>24</sleeveLength>
</properties>
784
<properties>
  <sleeveLength>25</sleeveLength>
  <extraButtons>2</extraButtons>
</properties>

To return only the sleeveLength element, you could use another SQL Server function called query in the select clause, as in:

select number,
       properties.query('//sleeveLength') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

Like the exist function, the SQL Server query function accepts an XQuery query as a parameter. The string slength is used to provide a name for the column in the results. This will return a sleeveLength element for each row, as shown in Table 26-9.

Table 26-9. Results containing the sleeveLength element
numberslength
557 <sleeveLength>24</sleeveLength>
784 <sleeveLength>25</sleeveLength>

If you want to further reduce the result set to the contents of the sleeveLength element, you can call the XQuery data function in the query, as in:

select number,
       properties.query('data(//sleeveLength)') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The results are shown in Table 26-10.

Table 26-10. Results containing the sleeveLength value
numberslength
557 24
784 25

Alternatively, you can use the SQL Server value function, which requires you specify a type for the value, as in:

select properties.value('(/properties/sleeveLength)[1]', 'integer') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The SQL Server value function forces the use of the [1] predicate to ensure that only a single node is returned. If a schema were in use, and it specified that there could only ever be one sleeveLength child of properties, this would not be necessary.

SQL/XML

We’ve seen how you can query XML data stored in a table, but how do you select relational data and return it as XML? This is where SQL/XML comes in. SQL/XML is an extension to SQL that is part of the ISO SQL standard. It adds an XML datatype, and allows XML elements to be constructed within the SQL query. Example 26-5 shows an SQL/XML query that might be used on our catalog table (Table 26-1).

Example 26-5. SQL/XML query
select c.number,
       xmlelement ( name "product",
                    xmlattributes (
                      c.dept as "dept",
                      c.name as "prodname",
                   ) ) as "product_as_xml"
from catalog c;

Table 26-11 shows the results, which consist of two columns. The product_as_xml column, whose type is XML, contains a newly constructed product element for each row.

Table 26-11. Results of SQL/XML query
numberproduct_as_xml
557 <product dept="WMN" prodname="Fleece Pullover"/>
563 <product dept="ACC" prodname="Floppy Sun Hat"/>
443 <product dept="ACC" prodname="Deluxe Travel Bag"/>
784 <product dept="MEN" prodname="Cotton Dress Shirt"/>

SQL/XML is not used to query XML documents, only relational data. As you can see, it can turn relational data into XML elements. For more information on SQL/XML, please see http://bit.ly/1JlcAwJ.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.252.23