Most major relational database vendors now allow you to store XML in your databases. The examples in this section use Microsoft SQL Server 2005 syntax, but there is similar functionality available in Oracle and IBM DB2.
One use case is to combine narrative text with more highly structured data. An example is when each of the products has a textual description that can span multiple paragraphs and can be marked up with HTML-like tags to indicate sections of text that need to be in bold or italic. This is shown in Table 24-7.
Table 24-7. The prod_desc table
number |
desc |
---|---|
557 |
<p>This pullover is made from recycled polyester.</p> |
563 |
<p>Enjoy the sun in this <i>gorgeous</i> hat!</p> |
443 |
<p>You'll <b>never</b> be disorganized with this bag.</p> |
784 |
<p>Our <i>favorite</i> shirt!</p> <p>Can be monogrammed upon request.</p> |
When you create the table, you declare the desc
column to be of type XML
, as in:
CREATE TABLE prod_desc (
number INTEGER NOT NULL,
desc XML
);
If desired, you can specify the location of a schema for the desc
column, which will ensure that any values inserted into desc
conform to that schema. It will also provide all the other benefits of using a schema, such as query debugging.
Another use case for storing XML in a relational table is to take advantage of the flexibility of XML. Suppose each product has a set of custom properties that needs to change flexibly over time. It is possible to create columns on the catalog
table for each property, but that is inflexible because a new column needs to be added when a new property is added, also necessitating changes in the application that reads and writes to the table.
Another approach might be to create generic columns named property1
, property2
, etc., but this is problematic because you don't know how many columns to create, and you need some sort of mapping scheme to figure out what each column means for each product. A more flexible approach might be to store the properties as XML, as shown in Table 24-8. This allows them to be queried and even indexed, but does not force a rigid database structure.
Table 24-8. The prod_properties table
number |
properties |
---|---|
|
|
|
|
|
|
|
|
To constrain the query based on the contents of the desc
column, you can use an XQuery expression embedded in your SQL statement. In the case of SQL Server, one way to do this is through the use of an exist
function. For example:
select number, properties
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]')
= 1
The SQL Server exist
function returns true
(which equals 1
) if there are any nodes that match the criteria. The expression in parentheses passed to the exist
function is in XQuery syntax, and it is evaluated relative to the root of the document in the properties
column, for each row. The results are shown in Table 24-9.
Table 24-9. Results containing the properties element
number |
properties |
---|---|
557 |
<properties> <sleeveLength>24</sleeveLength> </properties> |
784 |
<properties> <sleeveLength>25</sleeveLength> <extraButtons>2</extraButtons> </properties> |
To return only the sleeveLength
element, I could use another SQL Server function called query
in my select clause, as in:
select number,
properties.query('//sleeveLength')
slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1
Like the exist
function, the query
function accepts an XQuery query as a parameter. The string slength
is used to provide a name for the column in the results. This will return a sleeveLength
element for each row, as shown in Table 24-10.
Table 24-10. Results containing the sleeveLength element
number |
slength |
---|---|
557 |
<sleeveLength>24</sleeveLength> |
784 |
<sleeveLength>25</sleeveLength> |
If I want to further reduce my result set to the contents of the sleeveLength
element, I can call the XQuery data
function in my query, as in:
select number, properties.query('data(
//sleeveLength)
') slength from prod_properties where properties.exist('/properties/sleeveLength[. > 20]') = 1
The results are shown in Table 24-11.
Alternatively, I can use the SQL Server value
function, which requires me to specify a type for the value, as in:
select properties.value('(/properties/sleeveLength)[1]','integer')
slength
from prod_properties
where properties.exist('/properties/sleeveLength[. > 20]') = 1
The value
function forces the use of the [1]
predicate to ensure that only a single node is returned. If a schema were in use, and it specified that there could only ever be one sleeveLength
child of properties
, this would not be necessary.
3.14.251.57