Combining SQL and XQuery

Most major relational database vendors now allow you to store XML in your databases. The examples in this section use Microsoft SQL Server 2005 syntax, but there is similar functionality available in Oracle and IBM DB2.

Combining Structured and Semistructured Data

One use case is to combine narrative text with more highly structured data. An example is when each of the products has a textual description that can span multiple paragraphs and can be marked up with HTML-like tags to indicate sections of text that need to be in bold or italic. This is shown in Table 24-7.

Table 24-7. The prod_desc table

number

desc

557

<p>This pullover is made from recycled polyester.</p>

563

<p>Enjoy the sun in this <i>gorgeous</i> hat!</p>

443

<p>You'll <b>never</b> be disorganized with this bag.</p>

784

<p>Our <i>favorite</i> shirt!</p> <p>Can be monogrammed upon request.</p>

When you create the table, you declare the desc column to be of type XML, as in:

CREATE TABLE prod_desc (
  number INTEGER NOT NULL,
  desc XML
  );

If desired, you can specify the location of a schema for the desc column, which will ensure that any values inserted into desc conform to that schema. It will also provide all the other benefits of using a schema, such as query debugging.

Flexible Data Structures

Another use case for storing XML in a relational table is to take advantage of the flexibility of XML. Suppose each product has a set of custom properties that needs to change flexibly over time. It is possible to create columns on the catalog table for each property, but that is inflexible because a new column needs to be added when a new property is added, also necessitating changes in the application that reads and writes to the table.

Another approach might be to create generic columns named property1, property2, etc., but this is problematic because you don't know how many columns to create, and you need some sort of mapping scheme to figure out what each column means for each product. A more flexible approach might be to store the properties as XML, as shown in Table 24-8. This allows them to be queried and even indexed, but does not force a rigid database structure.

Table 24-8. The prod_properties table

number

properties

557

<properties>

  <sleeveLength>24</sleeveLength>

</properties>

563

<properties/>

443

<properties>

  <capacity>80</capacity>

</properties>

784

<properties>

  <sleeveLength>25</sleeveLength>

  <extraButtons>2</extraButtons>

</properties>

To constrain the query based on the contents of the desc column, you can use an XQuery expression embedded in your SQL statement. In the case of SQL Server, one way to do this is through the use of an exist function. For example:

select number, properties
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The SQL Server exist function returns true (which equals 1) if there are any nodes that match the criteria. The expression in parentheses passed to the exist function is in XQuery syntax, and it is evaluated relative to the root of the document in the properties column, for each row. The results are shown in Table 24-9.

Table 24-9. Results containing the properties element

number

properties

557

<properties>

<sleeveLength>24</sleeveLength>

</properties>

784

<properties>

<sleeveLength>25</sleeveLength>

<extraButtons>2</extraButtons>

</properties>

To return only the sleeveLength element, I could use another SQL Server function called query in my select clause, as in:

select number,
       properties.query('//sleeveLength') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

Like the exist function, the query function accepts an XQuery query as a parameter. The string slength is used to provide a name for the column in the results. This will return a sleeveLength element for each row, as shown in Table 24-10.

Table 24-10. Results containing the sleeveLength element

number

slength

557

<sleeveLength>24</sleeveLength>

784

<sleeveLength>25</sleeveLength>

If I want to further reduce my result set to the contents of the sleeveLength element, I can call the XQuery data function in my query, as in:

select number,
       properties.query('data(//sleeveLength)') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The results are shown in Table 24-11.

Table 24-11. Results containing the sleeveLength value

number

slength

557

24

784

25

Alternatively, I can use the SQL Server value function, which requires me to specify a type for the value, as in:

select properties.value('(/properties/sleeveLength)[1]','integer') slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1

The value function forces the use of the [1] predicate to ensure that only a single node is returned. If a schema were in use, and it specified that there could only ever be one sleeveLength child of properties, this would not be necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.57