Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10 XML and Databases

WHAT YOU WILL LEARN IN THIS CHAPTER:

Why databases need to handle XML
The differences between relational and native XML databases
What basic XML features are needed from a database
How to use the XML features in MySQL
How to use the XML features in SQL Server
How to use features in eXist

Not very along ago, you had two main options when deciding where to store your data. You could go for the traditional solution, a relational database such as Oracle, Microsoft’s SQL Server, or the ever-popular open source MySQL. Alternatively, you could choose to use XML. A relational database has the advantage of efficiently storing data that can be expressed in a tabular form, although performance can be a problem if you need to join many tables together. XML has the advantage of coping with nested data or documents that can’t be easily broken down further. After a while, it became apparent that a hybrid of the two was needed: a system that could store tabular data alongside XML documents, giving the user the ability to query and modify the XML as well as perform standard operations against the relational data. This would create an all-purpose storage center giving the best of both worlds.

UNDERSTANDING WHY DATABASES NEED TO HANDLE XML

Relational databases grew from the work of Edgar Codd in the 1970s. He was the first to provide a solid mathematical foundation for the main concepts found in these systems, such as tables (which he called relations), primary keys, normalization (where efforts are made to reduce any duplication of data), and relationships, such as one-to-many and many-to-many. Nowadays, hundreds of relational database management systems are available. They range from top-end databases such as Oracle, SQL Server, and DB2, to ones designed for desktop use, such as Microsoft’s Access.

These systems have widely different feature sets, but they generally have two things in common. First, they use a special language called Structured Query Language (SQL) to query the data that resides in them. Second, they cope well with data that can be broken down and stored in a tabular form, where items are typically represented as rows in a table. The different properties of these items are represented by the different fields, or columns, of these rows.

If you are trying to represent data for your customers and their orders, a relational system is fine. You probably have one table for basic customer details, one for the order summary, and another table for the order items. You can combine these tables to extract and report on the state of your business; the orders table has a reference to the customer who made the order and the order details table has a reference to the order number.

However, in many situations you need to store data that doesn’t fit easily into this pattern. If you expand the customer and order scenario further, how would you store information regarding an actual order once it has been dispatched? It is not enough to keep a record of the actual items and their quantity because the price will almost certainly change in the future. If you wanted to go with this model, you’d need to keep track of historic prices, tax rates, and discounts. One solution is to store a copy of the order document as a binary file; this is a little inflexible because it can’t be queried if necessary. A better alternative is to store the order details as an XML document detailing the actual items and their prices, customer, and shipping information and any discounts that were applied. This XML document can then be transformed to provide a confirmation e-mail, a delivery note, or the order itself. Because it is XML, it can also be queried easily using the techniques shown in this book and precludes the need to keep historical data. This makes the database schema and any associated queries much simpler.

An alternative to storing these documents within a database is to just keep them as documents in a filesystem. This is quite easy to do, but leads to a number of problems:

You need to set up a folder structure manually and decide on a naming convention to aid easy retrieval.
You need to have two systems: the database (to manage the tabular style data) and a separate XML processor (to query the XML documents).
Retrieving whole documents is straightforward, but retrieving a fragment is more difficult.
It’s extremely difficult to index the data to allow queries that perform well. Maintaining the indexing system is yet one more system to manage.
You need two separate backups, one for the database and one for the XML files.

Having a database that can both store XML documents and query them solves these problems and is much easier to use.

You can take two approaches if you want to store your documents in a database. You can use a traditional relational database such as SQL Server, which comes with the capability to store, query, and modify XML documents. Alternatively, you can choose a native XML database, one designed from the ground up to store XML documents efficiently; these usually have limited relational capabilities. Your decision should therefore be based on what data it is you want to store. If your data consists of a large number of XML documents that need to be accessed or queried then a native XML database is probably your best option. This might be the case if your software is a content management system where different snippets of data can be merged to form complete texts. If, however, you have a situation where you store details of customers but need to attach some XML documents, a traditional relational database with XML capabilities is a better choice.

Now that you understand why XML documents and databases are a good mix, you will next learn which XML features you usually need in a database to accomplish common tasks.

ANALYZING WHICH XML FEATURES ARE NEEDED IN A DATABASE

Whether you’ve chosen a relational or native XML database, you’ll need a number of features to address the common tasks associated with XML documents. Not every application will necessarily need all features, and some will definitely be used less frequently than others. This section deals with each task separately, and gives an indication of how important each one is likely to be. This will help you choose between the many systems available.

Retrieving Documents

This is a must-have requirement; there’s little use in storing documents that can’t be accessed later. However, you have a couple of points to consider:

How will you specify which document you want? If it’s just a question of retrieving an order associated with a customer in a relational database, as detailed previously, there’s no problem. All systems allow that. On the other hand, if you need to specify a document by some data held within it (such as an invoice number or a specific date) you’ll need some sort of XML querying facility. The standard way to specify documents is by using XPath to target a particular value and return any documents that match what you need. (See Chapter 7 for a review of XPath) For example, you may want to view all invoices over $1000, so you’d specify an XPath expression similar to /order[total > 1000].
How efficient does the search need to be? This will probably depend on how many documents you have stored, but once you get past a few hundred your system will have to do some background indexing to prevent searches being slow.

Retrieving Data from Documents

It’s quite likely that you’ll want to retrieve sections of a document rather than the entire thing. For example, you may want to see the shipping address or all the line items with their respective discounts. You may also need the information in a slightly different format. This is where XQuery, covered in Chapter 9, becomes invaluable. Nearly all XML-enabled databases that expose this sort of functionality have settled on XQuery as the standard way of querying and transforming XML data (if necessary) before returning it to the user.

However, there is a huge variation in how much of the standard is supported. The winners here are the native XML databases. These databases often implement the full capabilities of XQuery, whereas some of the relational databases that have had XML features added on support only a limited subset. If this sort of operation is likely to be used heavily in your application, you’ll want to check carefully just how much of XQuery is implemented before you decide which application to use.

Updating XML Documents

Although it seems like updating XML documents should be standard, this feature is more likely to be needed in a native XML database rather than a relational one. This is because in a relational database, XML documents are often used more as snapshots of the data at a point in time and therefore it doesn’t make sense to change them. In a native application all your data is in the form of XML, so you’ll probably need to modify it at some time. Again, the facilities for such modifications vary widely between databases; one reason for this is that the standard syntax for updating an XML document was agreed on much later than the one for retrieval.

Displaying Relational Data as XML

Displaying relational data as XML applies only to relational databases. Many relational databases have the capability to present the results of a standard SQL query as XML instead of the usual tabular format. Some have just a basic capability that wraps each row and each column in an element. With others, you can specify the format more precisely, perhaps by introducing nesting (where appropriate). Some systems also allow you to insert an XSL transformation into the pipeline. Although this is something that can be achieved easily enough once the data has been returned, it’s often easier and more efficient to incorporate the transformation as part of the whole data retrieval process and let the client concentrate on the presentational aspect.

Presenting XML as Relational Data

Again, presenting XML as relational data really applies only to relational databases. Sometimes you need to combine a data from a relational source and an XML source—perhaps your XML refers to a customer’s ID and you want to include some data from the customer table in your query. One way to do this is to present the XML document as a regular table and join it to the customer table. This works if the XML data is fairly regular and doesn’t use a hierarchical structure.

A companion problem is called shredding, which means taking all or parts of an XML document and storing them in a regular database table. This is common when you receive the data from a third party and decide that the benefits of XML storage are outweighed by the need to integrate with an existing database structure. It’s quite common in this situation to place the data you require into a table and retain the XML document as an extra column on that table. This way you can still use the XML if needed, but have the advantage of being able to join on other tables easily with the data you’ve extracted.

Now that you’ve seen some of the features that are available, the following section looks at three applications that implement some or all of these features, beginning with the XML functionality in MySQL.

USING MYSQL WITH XML

The driving factors to add XML support within open source Relational Database Management Systems (RDBMS) are the same as those of their commercial competitors, except that open-source RDBMSs appear to lag well behind in this area of functionality. The reasons are diverse. Open-source projects are often less prone than commercial companies to “get big fast” and absorb all their surroundings. They are also more used to collaborating with other projects. In addition, it is possible that they are less influenced to incorporate what many regard as just another trendy feature, XML in this case. In addition, open-source projects usually have fewer financial resources for their development.

MySQL is one of the leading open source databases, although the management of its development was taken over by Oracle a short time ago. It is used heavily in conjunction with both PHP- and Java-based websites. It has a few XML features, although in comparison to the other two commercial products examined in this chapter, it’s sadly lacking. In fact, MySQL is the only database discussed in this chapter that hasn’t added anything to its XML features since the last edition of this book in 2007.

Installing MySQL

You can download MySQL from www.my.mysql.com/downloads/. Follow the links to Community Server and choose version 5.5 (or later if available). Stable versions of MySQL are available for both Windows and Unix/Linux platforms.

The download page includes a sources download and a number of binary downloads for the most common platforms, including Windows, many Linux distributions, and Mac OS X. Choose the option that is the best match for your platform and follow the instructions.

For this chapter, you need to install the server and the client programs. If you are a Debian or Ubuntu user, select “Linux x86 generic RPM (dynamically linked) downloads,” convert the .rpm packages into .deb packages using the Debian alien package and alien command, and install it like any other package using the Debian dpkg -i command.

You can also install the front-end tools, which give you a graphical user interface. However, for these examples you can just use the command shell, which is reminiscent of a Windows command prompt.

If you are installing a MySQL database for anything other than test purposes, it is recommended that you set up proper users and passwords to protect your database. For the tests covered in this chapter, you can just choose a password for the root/admin account. You’ll be prompted to do that during the install, and you will access the system using that account.

Adding Information in MySQL

You can use a GUI tool to interact with MySQL, but if you really want to understand what’s going on behind the scenes, the mysql command-line utility is your best friend. Open a Unix or Windows command window and navigate to the bin folder of the installation, for example: C:Program FilesMySQLMySQl Server 5.5in. Then type mysql -u root –p. If everything is working correctly, you should see the mysql prompt asking for the password you chose during installation:

C:Program FilesMySQLMySQl Server 5.5in>mysql –u root -p
Enter password: **********
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 4
Server version: 5.5.15 MySQL Community Server (GPL)
 
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
 
mysql>

In the following Try It Out, you will create a new database and add information.

TRY IT OUT: Creating and Populating a MySQL Database

Before you can add information in MySQL, you must create a database. A database acts as a container in which you group information related to a project.

1. To create a database named Blog with UTF-8 as a character set, type the following:

mysql> create database Blog DEFAULT CHARACTER SET 'utf8';
Query OK, 1 row affected (0.00 sec)
 
mysql>

2. Move into the newly created database by typing the following:

mysql> use Blog;
Database changed
mysql>

A big difference between a native XML database, such as eXist (which you look at later), and a relational database is that a relational database is highly structured. An XML database can learn the structure of your documents when you load them, without needing any prior definition. This isn’t possible with a relational database. In a relational database, information is stored in tables with rows and columns. These tables are similar to spreadsheet tables, except that the name and type of the columns need to be defined before you can use them.

3. Create one of these tables to hold the information needed for the exercises in this chapter. Name it BlogPost and, for the sake of simplicity, give it only two columns:

A column named PostId that will be used as a primary key when you want to retrieve a specific blog entry
A column named Post to hold the blog entry in XML

Of course, this is a very minimal schema. You might want to add more columns to this table to simplify and optimize specific queries.

To create this table, type the following:

mysql> create table BlogPost (
    -> PostId INT PRIMARY KEY,
    -> Post LONGTEXT
    -> );
 
Query OK, 0 rows affected (0.27 sec)
 
mysql>

Note that you don’t have to type the -> at the beginning of the second and subsequent lines of the create table SQL instruction; these are just prompts sent by the mysql command-line utility.

4. The database is now ready to accept your blog entries. In a real-world application, these entries would be added by a nice web application, but this chapter continues to use the mysql command-line utility to add them. In SQL, you add information through an insert statement. Enter a couple of your own blog entries following this pattern:

 mysql> INSERT BlogPost (PostId, Post) SELECT 1,
'<post xmlns:x=”http://www.w3.org/1999/xhtml”
 id=”1” author=”Joe Fawcett” dateCreated=”2011-09-18”>
  <title>A New Book</title>
  <body>
    <x:p>
      <x:b>I've been asked to co-author a new edition of <x:a
 href=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-470114878.html”>
Beginning XML</x:a> by Wrox</x:b>
    </x:p>
    <x:p>It's incredible how much has changed since the book was
 published nearly five years ago. XML
 is now a bedrock of many systems, contrarily you see less of it
 than previously as it's buried under more layers.</x:p>
 <x:p>There are also many places where it has stopped being an automatic
 choice for data transfer,
 JSON has become a popular replacement where the data is to
 be consumed directly by a
      JavaScript engine such as in a browser.</x:p>
    <x:p>The new edition should be finished towards the end of the year
 and be published early in 2012.</x:p>
  </body>
</post>'; 
Query OK, 1 row affected (0.34 sec)
 
mysql>

CreateAndLoadDatabse.sql

NOTE The XML document that you’re including is embedded in a SQL string delimited by single quotes. Any single quotes within your XML document must be escaped to fit in that string, and the SQL way to escape is by preceding it with a backslash as follows: '.

If you don’t want to bother typing in the individual statements to create the database, the table, and to insert the data, you can run the following command from the mysql shell, which processes the file found in the code download. You will need to make sure the path to the file that the previous code snippet was from is correct for your machine:

source C:mySQLCreateAndLoadDatabase.sql

How It Works

In this Try It Out, you created a database that serves as the container where information is stored, and a table that defines the structure of your data. Then you entered data in this structure. So far none of this has been XML-specific; you’ve just used standard SQL statements as supported by all relational database systems.

Querying MySQL

Now that you have your first two blog entries, what can you do with them? Because MySQL is a SQL database, so you can use all the power of SQL to query the content of your database. To show all the entries, just type the following:

SELECT * FROM BlogPost;

The result is too verbose to print in a book, but if you want something more concise, you can select only the first characters of each entry:

mysql> SELECT PostId, substring(Post, 1, 60) FROM BlogPost;
+--------+----------------------------------------------------------------------+
| PostId | substring(Post, 1, 70)                                               |
+--------+----------------------------------------------------------------------+
|      1 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”1” author=”Joe Fawc |
|      2 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”2” author=”Joe Fawc |
|      3 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”3” author=”Joe Fawc |
+--------+----------------------------------------------------------------------+
3 rows in set (0.02 sec)
 
mysql>

Or, if you just want the number of entries:

mysql> SELECT COUNT(*) FROM BlogPost;
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.13 sec)
 
mysql>

This is pure SQL, however, and could be done with any SQL database without XML support. But what if you wanted, for instance, to display the content of the title element?

The XML support in MySQL 5.5 comes from two XML functions documented at http://dev.mysql.com/doc/refman/5.5/en/xml-functions.html. These are ExtractValue and UpdateXML. The following Try It Out shows you how to use ExtractValue to query data.

TRY IT OUT: Using ExtractValue to Extract Title Data

In this Try It Out you’ll use one of MySQL’s XML functions, ExtractValue, to burrow into the XML representing a blog post and extracting its title. You’ll be using the MySQL command shell to carry out the tasks, as in previous examples.

1. The ExtractValue function evaluates the result of an XPath expression over an XML fragment passed as a string. Only a fairly restricted subset of XPath is currently implemented, which severely limits your ability to query XML fragments, but this is still enough to extract the title from content columns:

mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title FROM BlogPost;
+--------+--------------------------+
| PostId | Title                    |
+--------+--------------------------+
|      1 | A New Book               |
|      2 | Go, Neutrino, Go!        |
|      3 | Size of the Solar System |
+--------+--------------------------+
3 rows in set (0.01 sec)
 
mysql>

2. You are not limited to using the ExtractValue function in the SELECT statement; you can also use it in the WHERE clause. To retrieve the ID of the blog entry with a specific title, use the following:

mysql> SELECT PostId FROM BlogPost
    -> WHERE ExtractValue(Post, '/post/title') =
    -> 'A New Book';
+--------+
| PostId |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)
 
mysql>

How It Works

ExtractValue works by taking an XPath expression and evaluating it against the target document. It doesn’t, however, return an exact copy of any XML it finds. Instead, it returns any text that is a child of the element selected, or, in the case that an attribute is selected, its value. This means that you extract the title of a post easily because the actual title is directly contained in a <title> element. You can also filter results by using ExtractValue in a WHERE clause; providing the value you want to test is simple text and not a full element.

If you are familiar with XPath, the behavior of ExtractValue is often somewhat counterintuitive. For instance, if you try to apply the same technique to fetch the <body> of your blog entries, you’ll get the following:

mysql> SELECT PostId, ExtractValue(Post, 'post/body') Body
    -> FROM BlogPost;
+--------+-----------------------------------------------------------+
| PostId | Body                                                      |
+--------+-----------------------------------------------------------+
|      1 |
 
 
                                 |
|      2 |
 
 
                 |
|      3 |
 
 
 
     |
+--------+-----------------------------------------------------------+
3 rows in set (0.04 sec)
 
mysql>

If you are used to the XPath behavior that translates elements into strings by concatenating the text nodes from all their descendants, you might assume that ExtractValue would do the same, but that’s not the case: ExtractValue only concatenates the text nodes directly embedded in elements. In this case, the only text nodes that are direct children from description elements are whitespaces, which explains the preceding output.

To get the default XPath behavior, you need to explicitly state that you want the text nodes at any level like so:

mysql> SELECT PostId, ExtractValue(Post, 'post/body//text()') Body
    -> FROM BlogPost;
+| PostId | Body
+--------+----------------------------------------------------------------------
|      1 |
 
       I've been asked to co-author a new edition of  Beginning XML
        by Wrox
 
     It's incredible how much has changed since the book was published
 nearly five years ago. XML
      is now a bedrock of many systems, contrarily you see less of it 
than previously as it's buried....
 
3 rows in set (0.00 sec)
 
mysql>mysql>

Note that this listing has been edited for conciseness.

How would you select entries that contain images? In XPath, you use //img directly in a test, and this would be considered true if and only if there were at least one <img> element somewhere in the document. If you’re familiar with XPath, you might be tempted to write something like this:

mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost
    -> WHERE ExtractValue(Post, '//x:img') != '';
Empty set (0.00 sec)
 
mysql>

NOTE MySQL’s XML functions don’t really understand namespaces. You don’t have a way of binding a namespace URI to a prefix, so you just have to use the same prefix that exists in the source document.

This doesn’t work, however, because <img> elements are empty: they don’t have any child text nodes, and ExtractValue converts them into empty strings. To make that query work, you need to select a node that will have a value (such as //x:img/@src) or count the number of <img> elements and test that the result is greater than zero. This method is shown in the following code snippet:

mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost
    -> WHERE ExtractValue(Post, '//x:img/@src') != '';
+--------+--------------------------+
| PostId | Title                    |
+--------+--------------------------+
|      3 | Size of the Solar System |
+--------+--------------------------+
1 row in set (0.02 sec)
 
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost
    -> WHERE ExtractValue(Post, 'count(//x:img)') > 0;
+--------+--------------------------+
| PostId | Title                    |
+--------+--------------------------+
|      3 | Size of the Solar System |
+--------+--------------------------+
1 row in set (0.04 sec)
 
mysql>

You’ll hit another limitation pretty soon if you use this function. Most of the string functions of XPath are not implemented. For instance, if you want to find entries with links to URIs from www.wrox.com, you might be tempted to write something such as the following:

mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost
    -> WHERE ExtractValue
    -> (Post, 'count(//x:a[starts-with(@href, “http://www.wrox.com”)])') > 0;

Unfortunately, the starts-with function is not implemented, so you’ll get an error message—and not an informative one at that. It will just state that there’s a syntax error; you need to use SQL to do what you can’t do with XPath:

mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost
    -> WHERE ExtractValue(Post, '//x:a/@href')
    -> LIKE 'http://www.wrox.com/%';
+--------+------------+
| PostId | Title      |
+--------+------------+
|      1 | A New Book |
+--------+------------+
1 row in set (0.06 sec)
 
mysql>

This ensures that any href attribute begins with the Wrox domain.

Now that you’ve seen how to make the most of MySQL’s somewhat limited select functionality, it’s time to try updating an XML document.

Updating XML in MySQL

The second XML function introduced by MySQL 5.5 is called UpdateXML. Like any SQL function, UpdateXML doesn’t perform database updates, but it is handy when you use it in update statements.

UpdateXML takes three parameters:

A string containing an XML document
An XPath expression that points to an element
An XML fragment

UpdateXML takes the XML document, looks for the content pointed to by the XPath expression passed as the second parameter and replaces it with the XML fragment passed as the third parameter. It then returns the new XML formed by the function as a string.

To change the title of the second blog entry, for example, use the following:

mysql> UPDATE BlogPost
    -> SET Post = UpdateXml(Post, '/post/title',
    -> '<title>Faster Than Light?</title>')
    -> WHERE PostId = 2;
Query OK, 1 row affected (0.13 sec)
Rows matched: 1  Changed: 1  Warnings: 0
 
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost;
+--------+--------------------------+
| PostId | Title                    |
+--------+--------------------------+
|      1 | A New Book               |
|      2 | Faster Than Light?       |
|      3 | Size of the Solar System |
+--------+--------------------------+
3 rows in set (0.00 sec)
 
mysql>

This function is obviously handy in this situation, but note that the XPath expression must point to an element. This means that the granularity of updates is at element level, so if you want to update an attribute value, you are out of luck.

Usability of XML in MySQL

After this introduction to the XML features of MySQL 5.5, you may be wondering, how usable these features are in real-world applications? To answer this question, first note that support of XML in MySQL 5.5 is limited to the two string functions already shown. In other words, there’s no such thing as an XML column type. Your documents are stored as text and need to be parsed each time you use one of these functions.

Consider one of the queries that you have seen:

SELECT PostId FROM BlogPost WHERE
ExtractValue(Post, '/post/title') = 'A New Book';

To process this query, the database engine needs to read the full content of all the blog entries, parse this content, and apply the XPath expression that extracts the title. That’s fine with your couple of blog entries, but likely not something you want to do if you are designing a WordPress clone able to store millions of blog entries.

To optimize the design of the sample database that you created, you would extract the information that is most commonly used in user queries and move it into table columns. In the Blog example created earlier, obvious candidates would be the title, the author, and the publication date. Having this data available as columns enables direct access for the engine. If you need further optimization, you can use these columns to create indexes.

The other consideration to keep in mind is the mismatch between the current implementation and the XPath usages. You saw an example of that when you had to explicitly specify that you wanted to concatenate text nodes from all the descendants. If you use these functions, you will see more examples where behavior differs from the generally accepted XML standards. This mismatch may be reduced in future releases, and is something to watch carefully because it could lead to incompatible changes.

With these restrictions in mind, if you are both a MySQL and an XML user, you will find these first XML features most welcome, and there is no reason to ignore them. They don’t turn MySQL into a native XML database yet, but they are a step in the right direction!

Client-Side XML Support

The features that you have seen so far are all server-side features implemented by the database engine. You don’t need anything to support XML on the client side, and it is very easy to use any programming language to convert SQL query results into XML. However, you might find it disappointing to leave this chapter without at least a peek at an XML feature that can be handy when you use the mysql command-line utility. To see this feature in action, start a new session but add the --xml option:

C:Program FilesMySQLMySQL Server 5.5in>mysql -u root -p --xml
Enter password: **********
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 15
Server version: 5.5.15 MySQL Community Server (GPL)
 
Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
 
mysql>

The --xml option has switched on the XML mode, and the query results will now be output as XML:

mysql> USE Blog
Database changed
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title
    -> FROM BlogPost;
<?xml version=”1.0”?>
 
<resultset statement=”SELECT PostId, ExtractValue(Post, '/post/title') Title
FROM BlogPost” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
  <row>
        <field name=”PostId”>1</field>
        <field name=”Title”>A New Book</field>
  </row>
 
  <row>
        <field name=”PostId”>2</field>
        <field name=”Title”>Faster Than Light?</field>
  </row>
 
  <row>
        <field name=”PostId”>3</field>
        <field name=”Title”>Size of the Solar System</field>
  </row>
</resultset>
3 rows in set (0.00 sec)
 
mysql>

Although that’s not very readable as is, it’s a useful feature when you use mysql in shell or DOS scripts. When you get your results as XML documents, you can run XML tools such as XSLT transformations. If you need a truly simple way to turn out a query result in XHTML, this is definitely something that you’ll find useful.

Now that you’ve seen an open source implementation, it’s time to move on to a commercial product, SQL Server.

USING SQL SERVER WITH XML

Microsoft’s SQL Server has had XML functionality since version 2000. Version 2005 added a lot more, but since then there haven’t been many changes. The version you use in this section is 2008 R2, but the examples work with any version from 2005 upwards unless otherwise specified.

Installing SQL Server

For these examples you’ll use the free Express edition of SQL Server. You can download it from www.microsoft.com/sqlserver/en/us/editions/express.aspx. You’ll need to choose the appropriate option depending on whether you need the 32 or 64 bit version. Make sure you select the install that comes with the developer tools so that you can use SQL Server Management Studio to run the examples. You will also need to download a sample database to work with. The AdventureWorks OLTP database is available at http://msftdbprodsamples.codeplex.com/releases/view/55926 and represents a fictitious bicycle manufacturing company. When downloaded into SQL Server Management Studio, the database is referred to as AdventureWorks.

DOWNLOADING DATABASES FOR SQL SERVER

If you are having trouble downloading and installing the sample databases you can use the files in the code download for this chapter and perform the following steps:

1. Copy AdventureWorks_Data.mdf and AdventureWorks_Data_log.ldf to a suitable folder and then open SQL Server Management Studio (SSMS).

2. Connect to the local instance and right-click the Databases node in the object explorer and choose Attach....

3. Use the Add button to browse for the AdventureWorks_Data.mdf file and Click OK and then OK again.

4. You can then refresh the Databases node by pressing F5 and the new database should appear. You can then right-click on it and choose Rename and call it AdventureWorks2008R2 and hit F5 to complete the task.

The first piece of functionality discussed is how to present standard relational data as XML.

Presenting Relational Data as XML

Transforming tabular data to an XML format is a rather common requirement. SQL Server offers a number of options to achieve this, and they all involve appending the phrase FOR XML <mode> to the end of a regular SELECT query. You can use four different modes; each one enables you to tailor the results to a lesser or greater degree. The most basic mode of FOR XML is RAW.

Using FOR XML RAW

The simplest mode you can use is RAW. Suppose you have the following query, which selects the basic details of orders that have a value greater than $300,000:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000;

ForXmlQueries.sql

The example queries in this section are available in the code download for the chapter in a file named ForXmlQueries.sql. The results, three rows, are shown in Figure 10-1.

FIGURE 10-1

To return XML, add FOR XML RAW to the query:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW;

ForXmlQueries.sql

You’ll now get an attribute-centric XML view of the data, with each row wrapped in a <row> element. However, there’s no document element added so it’s actually an XML fragment. One of the rows is shown in the following code:

<row PurchaseOrderID=”4007” RevisionNumber=”13” Status=”2”
     EmployeeID=”251” VendorID=”1594” ShipMethodID=”3”
     OrderDate=”2008-04-01T00:00:00” ShipDate=”2008-04-26T00:00:00”
     SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000”
     TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407” />

If you want an element-centric view, add the ELEMENTS directive to the query:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW, ELEMENTS;

ForXmlQueries.sql

You’ll then get rows like the following:

<row>
  <PurchaseOrderID>4007</PurchaseOrderID>
  <RevisionNumber>13</RevisionNumber>
  <Status>2</Status>
  <EmployeeID>251</EmployeeID>
  <VendorID>1594</VendorID>
  <ShipMethodID>3</ShipMethodID>
  <OrderDate>2008-04-01T00:00:00</OrderDate>
  <ShipDate>2008-04-26T00:00:00</ShipDate>
  <SubTotal>554020.0000</SubTotal>
  <TaxAmt>44321.6000</TaxAmt>
  <Freight>11080.4000</Freight>
  <TotalDue>609422.0000</TotalDue>
  <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate>
</row>

As mentioned before, the query returns an XML fragment rather than a full document. To add a surrounding root element, use the ROOT directive combined with the name of the root element you want:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW, ELEMENTS, ROOT('orders');

ForXmlQueries.sql

You’ll now get an <orders> element around all the <row> elements.

You may also want to change the name of the default row container, which is <row>. Simply add the name in parentheses after the RAW keyword:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW('order'), ELEMENTS, ROOT('orders'),

This will give you results similar to the following:

<orders>
  <order>
    <PurchaseOrderID>4007</PurchaseOrderID>
    <RevisionNumber>13</RevisionNumber>
    <Status>2</Status>
    <EmployeeID>251</EmployeeID>
    <VendorID>1594</VendorID>
    <ShipMethodID>3</ShipMethodID>
    <OrderDate>2008-04-01T00:00:00</OrderDate>
    <ShipDate>2008-04-26T00:00:00</ShipDate>
    <SubTotal>554020.0000</SubTotal>
    <TaxAmt>44321.6000</TaxAmt>
    <Freight>11080.4000</Freight>
    <TotalDue>609422.0000</TotalDue>
    <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate>
  </order>
 <!-- more order elements -->
</orders>

Another issue that commonly arises is how to treat nulls in the results. The default is to not output the element or attribute at all if its value is null. Sometimes it’s easier to process the results if the element is there, but empty; to differentiate between an element that contains a null value and one which has an empty string there needs to be a marker on the element to signify that its value is null rather than an empty string. The marker used is xsi:nil=”true “. This is a standard attribute from the schema instance namespace, so SQL Server also needs to add the correct namespace binding. If you want this treatment, use the XSINIL directive after the ELEMENTS keyword. The following code shows how to ensure that elements that have a null value are output rather than being omitted:

 SSELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW('order'), ELEMENTS XSINIL, ROOT('orders'),

ForXmlQueries.sql

The result is as follows:

<orders xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
  <order>
    <PurchaseOrderID>4007</PurchaseOrderID>
    <RevisionNumber>14</RevisionNumber>
    <Status>2</Status>
    <EmployeeID>251</EmployeeID>
    <VendorID>1594</VendorID>
    <ShipMethodID>3</ShipMethodID>
    <OrderDate>2008-04-01T00:00:00</OrderDate>
    <ShipDate xsi:nil=”true” />
    <SubTotal>554020.0000</SubTotal>
    <TaxAmt>44321.6000</TaxAmt>
    <Freight>11080.4000</Freight>
    <TotalDue>609422.0000</TotalDue>
    <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate>
  </order>
  <!-- more order elements -->
</orders>

If your copy of the database doesn’t have a null shipping date, you can always change one—the code in ForXmlQueries.sql makes the requisite change and then restores it at the end.

One final feature that might be useful, especially if your data is being passed to a third party, is the ability to add an XML schema. This is done by appending the previous code with the XMLSCHEMA directive like so:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML RAW('order'), ELEMENTS XSINIL, ROOT('orders'), XMLSCHEMA;

ForXmlQueries.sql

The schema (highlighted here) is included just after the document element and before the actual results:

<orders xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
  <xsd:schema targetNamespace=”urn:schemas-microsoft-com:sql:SqlRowSet1”
        xmlns:xsd=”http://www.w3.org/2001/XMLSchema”
        xmlns:sqltypes=”http://schemas.microsoft.com/sqlserver/2004/sqltypes”
        elementFormDefault=”qualified”>
  <xsd:import namespace=”http://schemas.microsoft.com/sqlserver/2004/sqltypes”
       schemaLocation=
   “http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd” />
    <xsd:element name=”order”>
     <xsd:complexType>
      <xsd:sequence>
       <xsd:element name=”PurchaseOrderID” type=”sqltypes:int” nillable=”1” />
       <xsd:element name=”RevisionNumber” type=”sqltypes:tinyint” nillable=”1”/>
       <xsd:element name=”Status” type=”sqltypes:tinyint” nillable=”1” />
       <xsd:element name=”EmployeeID” type=”sqltypes:int” nillable=”1” />
       <xsd:element name=”VendorID” type=”sqltypes:int” nillable=”1” />
       <xsd:element name=”ShipMethodID” type=”sqltypes:int” nillable=”1” />
       <xsd:element name=”OrderDate” type=”sqltypes:datetime” nillable=”1” />
       <xsd:element name=”ShipDate” type=”sqltypes:datetime” nillable=”1” />
       <xsd:element name=”SubTotal” type=”sqltypes:money” nillable=”1” />
       <xsd:element name=”TaxAmt” type=”sqltypes:money” nillable=”1” />
       <xsd:element name=”Freight” type=”sqltypes:money” nillable=”1” />
       <xsd:element name=”TotalDue” type=”sqltypes:money” nillable=”1” />
       <xsd:element name=”ModifiedDate” type=”sqltypes:datetime” nillable=”1” />
      </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>
  <order xmlns=”urn:schemas-microsoft-com:sql:SqlRowSet1”>
    <PurchaseOrderID>4007</PurchaseOrderID>
    <RevisionNumber>14</RevisionNumber>
    <Status>2</Status>
    <EmployeeID>251</EmployeeID>
    <VendorID>1594</VendorID>
    <ShipMethodID>3</ShipMethodID>
    <OrderDate>2008-04-01T00:00:00</OrderDate>
    <ShipDate xsi:nil=”true” />
    <SubTotal>554020.0000</SubTotal>
    <TaxAmt>44321.6000</TaxAmt>
    <Freight>11080.4000</Freight>
    <TotalDue>609422.0000</TotalDue>
    <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate>
  </order>
  <!-- rest of order elements -->
</orders>

Although the RAW mode has a few options, it fails miserably when dealing with hierarchical data. To have more control and to be able to handle hierarchical data more effectively you can use FOR XML AUTO.

Using FOR XML AUTO

If you try to use the RAW mode with nested data, such as orders along with the line items, you’ll get a repetitive block of XML in which the order is repeated for every line item. One of the strengths of XML is the ability to show hierarchical data cleanly, so this sort of repetition is something to be avoided. You examine how the AUTO mode copes with this in the following activity.

TRY IT OUT: Using FOR XML AUTO

In this Try It Out you’ll be using the more sophisticated FOR XML AUTO directive. You’ll see how this gives greater control than the FOR XML RAW queries that you met earlier. Primarily FOR XML AUTO is much better at handling the XML returned when two or more tables are joined in a query, for example when joining PurchaseOrderHeader with PurchaseOrderDetail to give a full view of an order.

1. To try FOR XML AUTO, simply replace the RAW keyword with AUTO in the basic query introduced in the preceding section:

 SELECT [PurchaseOrderID]
      ,[RevisionNumber]
      ,[Status]
      ,[EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[ModifiedDate]
  FROM [Purchasing].[PurchaseOrderHeader]
  WHERE [TotalDue] > 300000
  FOR XML AUTO;

ForXmlQueries.sql

You won’t see much difference in the results of this query compared to the RAW version, other than the fact that the name of the element holding the data is derived from the table name rather than being a generic row element:

<Purchasing.PurchaseOrderHeader PurchaseOrderID=”4007” RevisionNumber=”14”
 Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3”
 OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000”
 Freight=”11080.4000” TotalDue=”609422.0000”
 ModifiedDate=”2009-09-12T12:25:46.407” />

Again, the result is a fragment with no all-enclosing document element.

2. The real difference becomes apparent when a query extracting data from two linked tables is executed. The following SQL shows all the previous orders along with their individual line items:

  SELECT POH.[PurchaseOrderID]
      ,POH.[RevisionNumber]
      ,POH.[Status]
      ,POH.[EmployeeID]
      ,POH.[VendorID]
      ,POH.[ShipMethodID]
      ,POH.[OrderDate]
      ,POH.[ShipDate]
      ,POH.[SubTotal]
      ,POH.[TaxAmt]
      ,POH.[Freight]
      ,POH.[TotalDue]
      ,POH.[ModifiedDate]
      ,POD.[OrderQty]
      ,POD.[ProductID]
      ,POD.[UnitPrice]
  FROM [Purchasing].[PurchaseOrderHeader] POH
  INNER JOIN Purchasing.PurchaseOrderDetail POD
  ON POH.PurchaseOrderID = POD.PurchaseOrderID
  WHERE [TotalDue] > 300000;

ForXmlQueries.sql

Here, the tables have been joined on the PurchaseOrderId field and the tables have been aliased to use the shorter names, POH and POD. The results of this query are shown in Figure 10-2.

FIGURE 10-2

3. Now modify the query by adding FOR XML AUTO:

  SELECT POH.[PurchaseOrderID]
      ,POH.[RevisionNumber]
      ,POH.[Status]
      ,POH.[EmployeeID]
      ,POH.[VendorID]
      ,POH.[ShipMethodID]
      ,POH.[OrderDate]
      ,POH.[ShipDate]
      ,POH.[SubTotal]
      ,POH.[TaxAmt]
      ,POH.[Freight]
      ,POH.[TotalDue]
      ,POH.[ModifiedDate]
      ,POD.[OrderQty]
      ,POD.[ProductID]
      ,POD.[UnitPrice]
  FROM [Purchasing].[PurchaseOrderHeader] POH
  INNER JOIN Purchasing.PurchaseOrderDetail POD
  ON POH.PurchaseOrderID = POD.PurchaseOrderID
  WHERE [TotalDue] > 300000
  FOR XML AUTO, ROOT('orders'),

ForXmlQueries.sql

Notice that a root element has been specified, as with the RAW option. The results appear as follows with the hierarchical nature much more apparent:

<orders>
  <POH PurchaseOrderID=”4007” RevisionNumber=”16” Status=”2” EmployeeID=”251”
   VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00”
   SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000”
   TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407”>
    <POD OrderQty=”5000” ProductID=”849” UnitPrice=”24.7500” />
    <POD OrderQty=”5000” ProductID=”850” UnitPrice=”24.7500” />
    <POD OrderQty=”5000” ProductID=”851” UnitPrice=”24.7500” />
    <POD OrderQty=”750” ProductID=”852” UnitPrice=”30.9400” />
    <POD OrderQty=”750” ProductID=”853” UnitPrice=”30.9400” />
    <POD OrderQty=”750” ProductID=”854” UnitPrice=”30.9400” />
    <POD OrderQty=”1050” ProductID=”855” UnitPrice=”37.1000” />
    <POD OrderQty=”1000” ProductID=”856” UnitPrice=”37.1000” />
    <POD OrderQty=”1000” ProductID=”857” UnitPrice=”37.1000” />
  </POH>
  <!-- more POH elements -->
</orders>

Note that the elements have taken on the names of the table aliases used in the query, which gives you a way to name them anything you like.

How It Works

The original query, without the FOR XML AUTO directive, leads to a very repetitive result set with each order line also containing the full details from the header. Adding FOR XML AUTO, ROOT('orders') to the query produces a nested set of records, something XML excels at, making each order header an element with its details such as order date and ID displayed as attributes. Underneath each <POH> element is one <POD> element representing a line from the order. Again each of these elements uses attributes to show values such as order quantity and product ID.

The other options available to FOR XML RAW, such as ELEMENTS, XSINIL, and XMLSCHEMA, are also available to FOR XML AUTO.

NOTE Also available are several less commonly used features, such as those to return binary data and to use GROUP BY in XML queries. These are covered at length in the SQL Server Books Online (BOL), available from within the SQL Server Management Studio or online at msdn.microsoft.com/en-us/library/ms130214.aspx.

Despite the different options available to both the RAW and the AUTO versions of FOR XML, you will likely encounter cases where neither alternative produces the output needed. The most common scenario is when you need a combination of elements and attributes, rather than one or the other. Two options are available for this purpose: FOR XML EXPLICIT and FOR XML PATH; the latter is available only in post-2000 versions.

Using FOR XML EXPLICIT

The EXPLICIT option enables almost unlimited control over the resulting XML format, but this comes at a price. The syntax is difficult to grasp, and because the mechanism used to construct the resulting XML is based on a forward-only XML writer, the results must be grouped and ordered in a very specific way. Unless you are stuck with SQL Server 2000, the advice from Microsoft and other experts is to use the PATH option instead. If you do need to use EXPLICIT, the full details are available in the SQL Server BOL.

Using FOR XML PATH

The PATH option, based on using XPath to specify the format of the output, makes building nested XML with combinations of elements and attributes relatively simple. Take one of the earlier query result examples, in which orders over $300,000 were retrieved and returned as attribute-centric XML using the AUTO option:

<orders>
  <Purchasing.PurchaseOrderHeader PurchaseOrderID=”4007” RevisionNumber=”16”
   Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3”
   OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000”
   Freight=”11080.4000” TotalDue=”609422.0000”
   ModifiedDate=”2009-09-12T12:25:46.407” />
</orders>

What if a different layout was needed, one where the PurchaseOrderID, EmployeedID, and status were attributes but the other data appeared as elements? The PATH option uses aliases of the columns to specify how the XML is structured. The syntax is similar to XPath (covered in Chapter 7), hence the PATH keyword.

The PATH query for the order data as a mix of attributes and elements would be as follows:

 SELECT [PurchaseOrderID] [@PurchaseOrderID]
      ,[Status] [@Status]
      ,[EmployeeID] [@EmployeeID]
      ,[VendorID]
      ,[ShipMethodID]
      ,[OrderDate]
      ,[ShipDate]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
  FROM [Purchasing].[PurchaseOrderHeader] POH
  WHERE [TotalDue] > 300000
  FOR XML PATH('order'), ROOT('orders'),

ForXmlQueries.sql

Notice how data that needs to be returned as attributes is aliased to a column name beginning with @. Unaliased columns are returned as elements. The results of this query would resemble the following XML:

<orders>
  <order PurchaseOrderID=”4007” Status=”2” EmployeeID=”251”>
    <VendorID>1594</VendorID>
    <ShipMethodID>3</ShipMethodID>
    <OrderDate>2008-04-01T00:00:00</OrderDate>
    <SubTotal>554020.0000</SubTotal>
    <TaxAmt>44321.6000</TaxAmt>
    <Freight>11080.4000</Freight>
    <TotalDue>609422.0000</TotalDue>
  </order>
  <!-- more order elements -->
</orders>

The PATH option also provides control over nesting. The usual way to do this, rather than use a SQL JOIN as shown previously, is to use a subquery. The following snippet shows the order header as attributes, with the order details as nested elements:

 SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID]
  ,[POH].[Status] [@Status]
  ,[POH].[EmployeeID] [@EmployeeID]
  ,[POH].[VendorID] [@VendorID]
  ,[POH].[ShipMethodID] [@ShipMethodID]
  ,[POH].[OrderDate] [@OrderDate]
  ,[POH].[ShipDate] [@ShipDate]
  ,[POH].[SubTotal] [@SubTotal]
  ,[POH].[TaxAmt] [@TaxAmt]
  ,[POH].[Freight] [@Freight]
  ,[POH].[TotalDue] [@TotalDue]
  ,(
     SELECT [POD].[OrderQty]
           ,[POD].[ProductID]
           ,[POD].[UnitPrice]      
     FROM [Purchasing].[PurchaseOrderDetail] POD
     WHERE POH.[PurchaseOrderID] =
                               POD.[PurchaseOrderID]
     ORDER BY POD.[PurchaseOrderID]
     FOR XML PATH('orderDetail'), TYPE
   )
FROM [Purchasing].[PurchaseOrderHeader] POH
WHERE [POH].[TotalDue] > 300000
FOR XML PATH('order'), ROOT('orders'),

ForXmlQueries.sql

The main part of the query, without the inner SELECT, is much the same as before except all the output columns are specified as attributes, as shown by the alias name beginning with the @ symbol:

 SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID]
  ,[POH].[Status] [@Status]
  ,[POH].[EmployeeID] [@EmployeeID]
  ,[POH].[VendorID] [@VendorID]
  ,[POH].[ShipMethodID] [@ShipMethodID]
  ,[POH].[OrderDate] [@OrderDate]
  ,[POH].[ShipDate] [@ShipDate]
  ,[POH].[SubTotal] [@SubTotal]
  ,[POH].[TaxAmt] [@TaxAmt]
  ,[POH].[Freight] [@Freight]
  ,[POH].[TotalDue] [@TotalDue]
  (
           -- Inner query here
       )
FROM [Purchasing].[PurchaseOrderHeader] POH
WHERE [POH].[TotalDue] > 300000
FOR XML PATH('order'), ROOT('orders'),

ForXmlQueries.sql

The inner query returns the order detail relating to the customer specified in the outer query. This is accomplished by equating the PurchaseOrderDetail.PurchaseOrderId field in the outer query to the PurchaseOrderDetail.PurchaseOrderID in the nested query as shown in the following code snippet. (In SQL terms, this is known as a correlated subquery.)

 SELECT [POD].[OrderQty]
           ,[POD].[ProductID]
           ,[POD].[UnitPrice]      
     FROM [Purchasing].[PurchaseOrderDetail] POD
     WHERE POH.[PurchaseOrderID] =
                               POD.[PurchaseOrderID]
     ORDER BY POD.[PurchaseOrderID]
     FOR XML PATH('orderDetail'), TYPE

ForXmlQueries.sql

Note the TYPE option at the end of the subquery. This specifies that the resulting data should be converted to the XML data type (this is covered in more detail later in the chapter). This option ensures that the data is inserted as XML, rather than a string. The actual output from the query appears as follows:

<orders>
  <order PurchaseOrderID=”4007” Status=”2” EmployeeID=”251” VendorID=”1594”
  ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000”
  TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000”>
    <orderDetail>
      <OrderQty>5000</OrderQty>
      <ProductID>849</ProductID>
      <UnitPrice>24.7500</UnitPrice>
    </orderDetail>
    <orderDetail>
      <OrderQty>5000</OrderQty>
      <ProductID>850</ProductID>
      <UnitPrice>24.7500</UnitPrice>
    </orderDetail>
    <orderDetail>
      <OrderQty>5000</OrderQty>
      <ProductID>851</ProductID>
      <UnitPrice>24.7500</UnitPrice>
    </orderDetail>
    <orderDetail>
      <OrderQty>750</OrderQty>
      <ProductID>852</ProductID>
      <UnitPrice>30.9400</UnitPrice>
    </orderDetail>
    <!-- more orderDetail elements -->
  </order>
  <!-- more order elements -->
</orders>

Because no aliasing was applied to the inner query, the columns are represented by XML elements.

WARNING If you remove the , TYPE from the inner query, the order details are inserted as escaped XML because they are treated as text data, not markup.

Plenty of other options are available to customize the results returned from a FOR XML PATH query. The final example shows how to group data within elements. The two dates associated with the order are grouped under a <Dates> element, and an <orderDetail> element is used to hold the individual line items:

 SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID]
  ,[POH].[Status] [@Status]
  ,[POH].[EmployeeID] [@EmployeeID]
  ,[POH].[VendorID] [@VendorID]
  ,[POH].[ShipMethodID] [@ShipMethodID]
  ,[POH].[SubTotal] [@SubTotal]
  ,[POH].[TaxAmt] [@TaxAmt]
  ,[POH].[Freight] [@Freight]
  ,[POH].[TotalDue] [@TotalDue]
  ,[POH].[OrderDate] [Dates/Order]
  ,[POH].[ShipDate] [Dates/Ship]
 
  ,(
     SELECT [POD].[OrderQty]
           ,[POD].[ProductID]
           ,[POD].[UnitPrice]      
     FROM [Purchasing].[PurchaseOrderDetail] POD
     WHERE POH.[PurchaseOrderID] =
                               POD.[PurchaseOrderID]
     ORDER BY POD.[PurchaseOrderID]
     FOR XML PATH('orderDetail'), TYPE
   )
FROM [Purchasing].[PurchaseOrderHeader] POH
WHERE [POH].[TotalDue] > 300000
FOR XML PATH('order'), ROOT('orders'),

ForXmlQueries.sql

In the preceding code, the key change is to the OrderDate and ShipDate in the outer SELECT. The columns are aliased to Date/Order and Dates/Ship, so SQL Server creates a new element, Dates, to hold these two values. There is also an alias on the entire subquery, OrderDetails, that causes all of its results to be grouped under one element. The resulting XML looks like this:

<orders>
  <order PurchaseOrderID=”4008” Status=”2” EmployeeID=”258” VendorID=”1676”
 ShipMethodID=”3” SubTotal=”396729.0000” TaxAmt=”31738.3200” Freight=”7934.5800”
 TotalDue=”436401.9000”>
    <Dates>
      <Order>2008-05-23T00:00:00</Order>
      <Ship>2008-06-17T00:00:00</Ship>
    </Dates>
    <orderDetail>
      <OrderQty>700</OrderQty>
      <ProductID>858</ProductID>
      <UnitPrice>9.1500</UnitPrice>
    </orderDetail>
    <orderDetail>
      <OrderQty>700</OrderQty>
      <ProductID>859</ProductID>
      <UnitPrice>9.1500</UnitPrice>
    </orderDetail>
    <!-- more orderDetail elements -->
  </order>
  <order PurchaseOrderID=”4012” Status=”2” EmployeeID=”254” VendorID=”1636”
 ShipMethodID=”3” SubTotal=”997680.0000” TaxAmt=”79814.4000” Freight=”19953.6000”
 TotalDue=”1097448.0000”>
    <Dates>
      <Order>2008-07-25T00:00:00</Order>
      <Ship>2008-08-19T00:00:00</Ship>
    </Dates>
    <orderDetail>
      <OrderQty>6000</OrderQty>
      <ProductID>881</ProductID>
      <UnitPrice>41.5700</UnitPrice>
    </orderDetail>
    <!-- more orderDetail elements -->
  </order>
</orders>

NOTE Dozens of additional options for PATH queries are available, including how to produce comments, how to create text content, and how to add namespace declarations. For a full discussion, refer to Books Online, http://msdn.microsoft.com/en-us/library/ms130214.aspx.

That covers the basics of the FOR XML instruction. Next you take a look at storing XML within a table, starting with the xml data type.

Understanding the xml Data Type

SQL Server 2005 added an xml data type to those available, which means that XML documents can be stored in a SQL Server 2005 (or later) database. This is a vast improvement on earlier versions where there were two options for storing XML, neither of which was satisfactory. The first alternative was to shred the data into its constituent parts, which were then stored in multiple relational tables—defeating the purpose of using XML. The second choice was to convert the XML to a simple string of characters, which loses the logical content of the XML document. The additional ability to store XML data as an xml data type means that such data can be treated as if it were still an XML document. In reality, the xml data type is stored in a proprietary binary format, but, as far as the developer is concerned, it is accessible as XML, with its logical structure intact.

NOTE One or two differences exist between the data stored by SQL Server and the original document, and it is not possible to round-trip between the two and get an identical copy, although the XML Infoset is preserved (see Chapter 2 for details).

The existence of the xml data type means that XML documents stored as this type can be treated as if they were collections of XML documents sitting on your hard drive. Of course, the details of the interface to that XML is specific to SQL Server. Other advantages to having a specific data type devoted to XML are that you can store intermediate results in queries that return XML and you can use the methods of the xml data type to search and modify the XML stored in it.

There are several general advantages to storing your data in SQL Server. For one, XML storage benefits from the security, scalability, and other aspects of an enterprise-level database management system. You can also associate XML schemas with the column and, when querying the document, the appropriate type will be returned. This is a vast improvement over the previous version, where CASTing or CONVERTing was needed.

XML documents stored in SQL Server can be treated as XML in any other setting. One practical effect of that is that you can use XQuery (introduced in Chapter 9) to query these XML columns. Surprisingly, two XML document instances cannot be compared in this release, in part because of the flexibility of XML syntax. Consider, for example, the subtleties of trying to compare two lengthy XML documents that can have paired apostrophes or paired quotes to contain attribute values, differently-ordered attributes, different namespace prefixes although the namespace URI may be the same, and empty elements written with start tags and end tags or with the empty element tag.

Documents stored as the xml data type can be validated against a specified W3C XML Schema document. XML data that is not associated with a schema document is termed untyped, and XML associated with a schema document is termed typed.

In the following activity you create a simple table to contain XML documents in SQL Server. SQL Server Management Studio is the main graphical tool for manipulating database objects and writing SQL code, although Visual Studio and a number of third-party applications are also available. Refer to the “Installing SQL Server” section for instructions on how to download SQL Server Management Studio.

TRY IT OUT: Creating XML Documents in SQL Server

The following Try It Out shows how to create a table designed specifically to hold XML documents in their native state, rather than as text. Once the table has been created you’ll see how to insert a few sample XML documents and then retrieve them using SQL.

1. Open the SQL Server Management Studio (SSMS) and connect to the local instance of SQL Server (or whichever server you want to create the test database on).

2. In the Object Explorer, expand the nodes so that User Databases is shown. Right-click and select the New Database option. When dialog box opens, insert the name of the database—for this example, XMLDocTest. Before clicking OK, make sure that the Full Text Indexing option is checked.

3. Create a table called Docs using the following SQL

 CREATE TABLE dbo.Docs (
 DocID INTEGER IDENTITY PRIMARY KEY,
 XMLDoc XML
 )

XmlDataType.sql

The column XMLDoc is of type xml. Because this is a SQL statement, the data type is not case sensitive. Now you have an empty table.

4. For the purposes of this example, add simple XML documents with the following structure:

<Person>
 <FirstName></FirstName>
 <LastName></LastName>
</Person>

5. Insert XML documents using the SQL INSERT statement, as follows, which shows insertion of a single XML document:

INSERT Docs
VALUES ('<Person><FirstName>Joe</FirstName>
<LastName>Fawcett</LastName></Person>'
)

XmlDataType.sql

6. After modifying the values of the FirstName and LastName elements and adding a few documents to the XMLDoc column, confirm that retrieval works correctly using the following SQL statement:

SELECT * FROM Docs

The result of this SQL Query is shown in Figure 10-3.

FIGURE 10-3

The values contained in the XMLDoc column are displayed in the lower pane of the figure. A little later, you will create some simple XQuery queries.

How It Works

The first step created a table, Docs, which had one of the columns, XmlDoc, defined as the new XML type. The next stage used a traditional INSERT query to add some text to this column. Because the column was defined as XML, the data was converted from text to an XML document. The document can be retrieved by using a traditional SELECT query.

As an alternative to retrieving the whole XML document, you can also select only parts of it (see the upcoming sections starting with “Using the query() Method”).

Creating Indexes with the xml Data Type

XML documents in SQL Server can also be indexed for more efficient retrieval, and a full-text index can be created. To create a full-text index on a document, use a command like the following:

--If no catalog exists so far
CREATE FULLTEXT CATALOG ft ON DEFAULT
CREATE FULLTEXT INDEX ON dbo.Docs(XmlDoc) KEY INDEX <primary key name>

The xml data type enables you to use the following methods to manipulate the data and to extract it in various forms: modify(), query(), value(), exist(), and nodes(). The following sections look at each method in turn and describe how they are used.

Using the modify() Method

The xml data type can be queried using the XQuery language, which was introduced in Chapter 9. In SQL Server, XQuery expressions are embedded inside Transact-SQL. Transact-SQL is the flavor of the SQL language used in SQL Server.

Microsoft introduced the modify() method before XQuery had finalized a syntax. At the time there was talk of updating to the official standard when it appeared, but so far that hasn’t happened.

The W3C XQuery 1.0 specification is limited in that it can query only an XML (or XML-enabled) data source. There is no facility in XQuery 1.0 to carry out deletions, to insert new data, or (combining those actions) to modify data. In SQL Server, the XML Data Modification Language (DML) adds three keywords to the functionality available in XQuery 1.0. You can see these keywords in action in the following exercise:

delete
insert
replace value of

WARNING Note that although SQL itself is not case sensitive, the commands used to manipulate XML within the modify() method are. For example, if you use DELETE instead of delete, you will receive a cryptic error message.

TRY IT OUT: Deleting with XML DML

This Try It Out looks at how to delete part of an XML document using the modify() method in conjunction with the delete keyword. You’ll use a simple XML document stored in a local variable rather than a table and then target a specific part for deletion. The following code shows an example of how it can be used:

 DECLARE @myDoc xml;
SET @myDoc = '<Person><FirstName>Joe</FirstName>
 <LastName>Fawcett</LastName></Person>';
 
SELECT @myDoc;
SET @myDoc.modify(' delete /Person/*[2]'),
SELECT @myDoc;

XmlDataType.sql

To try this out in SSMS, follow these steps:

1. Open the SQL Server Management Studio.

2. Connect to the default instance.

3. From the toolbar, select New SQL Server Query, which appears on the far left.

4. Enter the preceding code.

5. Press F5 to run the SQL code. If you have typed in the code correctly, the original document should be displayed, with the modified document displayed below it. In the modified document, the LastName element has been removed.

6. Adjust the width of the columns to display the full XML.

How It Works

The first line of the code declares a variable, myDoc, and specifies the data type as xml. The SET statement specifies a value for the myDoc variable, shown in the following snippet. It’s a familiar Person element with FirstName and LastName child elements and corresponding text content.

SET @myDoc = '<Person><FirstName>Joe</FirstName>
 <LastName>Fawcett</LastName></Person>';

The SELECT statement following the SET statement causes the value of myDoc to be displayed. Next, the modify function is used to modify the value of the xml data type:

SET @myDoc.modify(' delete /Person/*[2]'),

The Data Modification Language statement inside the modify ( ) function is, like XQuery, case sensitive. The delete keyword is used to specify which part of the XML document is to be deleted. In this case, the XPath expression /Person/*[2] specifies that the second child element of the Person element is to be deleted, which is the LastName element.

The final SELECT statement shows the value of myDoc after the deletion has taken place. Figure 10-4 shows the results of both SELECT statements.

FIGURE 10-4

The following Try It Out again uses the modify() method but this time, instead of deleting unwanted XML, you insert a new element into the document.

TRY IT OUT: Inserting with XML DML

This Try It Out shows how to add data to existing XML. It uses the modify() method together with the insert keyword. Again you’ll see the operation performed on an xml data type represented by a local variable rather than that found in a table. The Transact-SQL code is shown here:

DECLARE @myDoc XML;
 SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>';
SELECT @myDoc;
SET @myDoc.modify(' insert <FirstName>Joe</FirstName> as first into /Person[1]'),
SELECT @myDoc;

XmlDataType.sql

To run this code, follow these steps:

1. Open the SQL Server Management Studio.

2. Connect to the default instance.

3. From the toolbar, select New SQL Server Query which appears on the far left.

4. Enter the preceding code.

5. Press F5 to run the SQL code. If you have typed in the code correctly, the original document should be displayed, with the modified document displayed below it. The modified document has a new FirstName element.

6. Adjust the width of the columns to display the full XML.

How It Works

In the first line you declare a variable, myDoc, and specify that it has the data type xml. In the following code:

SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>';

you set the value of the myDoc variable. You then specify a Person element that contains only a LastName element, which contains the text Fawcett.

The modify ( ) function is used to contain the XQuery extension that you want to use. The insert keyword specifies that the modification is an insert operation, that is, you are going to introduce new content into an existing document rather than create a complete document or replace some pre-existing XML. The XML to be inserted follows the insert keyword. Notice that it is not enclosed by apostrophes or quotes. The clause as first specifies that the inserted XML is to be inserted first. The into clause uses an XPath expression, /Person, to specify that the FirstName element and its content is to be added as a child element to the Person element. Given the as first clause, you know that the FirstName element is to be the first child of the Person element.

As alternatives to into, you could also use after or before. Whereas into adds children to a parent node, after and before add siblings. The preceding query could be rewritten as follows:

 DDECLARE @myDoc XML;
SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>';
SELECT @myDoc;
SET @myDoc.modify(' insert <FirstName>Joe</FirstName> before
 (/Person/LastName)[1]'),
SELECT @myDoc;

XmlDataType.sql

When you run the Transact-SQL, the first SELECT statement causes the original XML to be displayed, and the second SELECT statement causes the XML to be displayed after the insert operation has completed.

The final example of the modify() function shows how you can update, or replace, a section of XML.

TRY IT OUT: Updating with XML DML

The final example using the Data Modification Language updates the content of an XML variable so that the value of the FirstName element is changed from Joe to Gillian. The code is shown here:

 DECLARE @myDoc XML;
SET @myDoc = 
'<Person><FirstName>Joe</FirstName><LastName>Fawcett</LastName></Person>'
SELECT @myDoc;
SET @myDoc.modify(' replace value of (/Person/FirstName/text())[1] with
 “Gillian” '),
SELECT @myDoc;

XmlDataType.sql

To run this code, follow these steps:

1. Open the SQL Server Management Studio.

2. Connect to the local instance or the server you want to run the query on.

3. From the toolbar, select New SQL Server Query which appears on the far left .

4. Enter the preceding code.

5. Press F5 to run the SQL code. If you have typed in the code correctly, the original document should be displayed, with the modified document displayed below it. The document now has Gillian instead of Joe for the FirstName element’s contents.

6. Adjust the width of the columns to display the full XML.

How It Works

Notice the modify function:

SET @myDoc.modify(' replace value of (/Person/FirstName/text())[1] with
 “Gillian” '),

The replace value of keyword indicates an update, and an XPath expression indicates which part of the XML the update is to be applied to. In this case it is the text node that is the child of the FirstName element—in other words, the value of the FirstName element—specified by the XPath expression /Person/FirstName/text().

The results of the two SELECT statements are shown in Figure 10-5.

FIGURE 10-5

One of the main problems with using the modify() method is that it expects a hard-coded string as its argument. It is therefore difficult to make dynamic queries that are needed in the real world—for example, queries in which the new XML is brought in from another table. You have two ways around this. First, you can construct the query as a string and execute it dynamically using EXEC. Alternatively, you can use the built-in functions sql:column and sql:function. An example of each of these techniques follows.

For these examples you can use the Docs table created earlier. First, here’s a reminder of what a static update looks like:

 UPDATE Docs
SET XmlDoc.modify
(' replace value of (/Person/FirstName/text())[1] with “Joseph”')
WHERE DocId = 1;

XmlDataType.sql

Now suppose you want to replace the hard-coded value Joseph with a variable. You might first try this:

 DECLARE @NewName NVARCHAR(100);
SET @NewName = N'Joseph';
UPDATE Docs
SET XmlDoc.modify(' replace value of (/Person/FirstName/text())[1] with “'
 + @NewName + '”')
WHERE DocId = 1;

XmlDataType.sql

Unfortunately, that won’t work. The modify() method complains that it needs a string literal. One way around this is to build the whole SQL statement dynamically:

 DECLARE @NewName NVARCHAR(100);
SET @NewName = N'Joseph';
DECLARE @SQL NVARCHAR(MAX);
SET @SQL = 'UPDATE Docs SET XmlDoc.modify('' replace value of (/Person/FirstName/text())
[1] with “' + @NewName + '”'')
 WHERE DocId = 1';
PRINT(@SQL);
EXEC(@SQL);

XmlDataType.sql

You can see the SQL before it is executed by running only as far as the PRINT statement, that is, not executing the last line, EXEC (@SQL); (the following is displayed on a single line):

UPDATE Docs SET XmlDoc.modify(' replace value of (/Person/FirstName/text())[1] with “Joseph”') 
 WHERE DocId = 1

This is exactly the same as the code you started with.

The recommended way to update based on data that will only be known at run-time, however, is to use the built-in functions sql:column or sql:variable. The sql:column function is used when the new data is being retrieved from a table, so here sql:variable is needed:

 DECLARE @NewName NVARCHAR(100);
SET @NewName = N'Joseph';
UPDATE Docs
SET XmlDoc.modify
(' replace value of (/Person/FirstName/text())[1] with sql:variable(“@NewName”)')
WHERE DocId = 1;

XmlDataType.sql

The basic syntax is the name of the variable enclosed in double quotes as an argument to sql:variable(). Next you will see how to use standard XQuery against the xml data type.

Using the query() Method

The query() method enables you to construct XQuery statements in SQL Server. The syntax follows the XQuery syntax discussed in Chapter 9, and all the queries in that chapter can be run against a suitable XML data column.

The following query uses the query() method to output the names of each person in a newly constructed Name element, with the value of the LastName element followed by a comma and then the value of the FirstName element. The code is shown here:

 SELECT XMLDoc.query
('for $p in /Person return
 <Name>{$p/LastName/text()}, {$p/FirstName/text()}</Name>')
FROM Docs;

XmlDataType.sql

The first line indicates that a selection is being made using the query() method applied to the XMLDoc column (which, of course, is of data type xml).

The for clause specifies that the variable $p is bound to the Person element node.

The return clause specifies that a Name element is to be constructed using an element constructor. The first part of the content of each Name element is created by evaluating the XQuery expression $p/LastName/text(), which, of course, is the text content of the LastName element. A literal comma is output, and then the XQuery expression $p/FirstName/text() is evaluated.

Figure 10-6 shows the output when the SELECT statement containing the XQuery query is run.

FIGURE 10-6

Using the value() Method

The value() method uses XPath to pinpoint specific data in an XML document and then converts it into a standard SQL Server data type. It’s often used in the WHERE part of a SQL query. Suppose you want to return all the people in your Docs table who have the first name of Joe. This is one way of doing it:

 SELECT * FROM Docs
WHERE XmlDoc.value('(/*/FirstName)[1]', 'nvarchar(100)') = 'Joe';

XmlDataType.sql

This returns just one row for the first document you added. Notice how the data type that you are converting to needs to be quoted; it’s quite a common mistake to forget this. Obviously, you can also use the value() method in the SELECT list as well. If you just wanted the last name of everyone, you’d use the following:

 SELECT DocId, XmlDoc.value('(/*/LastName)[1]', 'nvarchar(100)') LastName 
FROM Docs;

XmlDataType.sql

This returns a standard two-column result set.

Using the exist() Method

The exist() method does what its name suggests—it checks if a value exists. It returns a 0 if it doesn’t, a 1 if it does, and null if the XML column contains null.

So, you could rewrite the query to return people with a first name of Joe this way:

 SELECT * FROM Docs
WHERE XmlDoc.exist('/*/FirstName[. = “Joe”]') = 1;

XmlDataType.sql

This returns the same results as the query using the value() method to do the filter did previously.

Using the nodes() Method

The nodes() method is used to present an XML document as a regular SQL table. You often need this when your query needs one row of data from a table combined with a child element of your XML. For a simple example, look at the following code:

 DECLARE @People xml;
SET @People = 
'<people><person>Joe</person>
         <person>Danny</person>
         <person>Liam</person></people>'
SELECT FirstName.value('text()[1]', 'nvarchar(100)') FirstName FROM 
@People.nodes('/*/person') Person(FirstName);

XmlDataType.sql

The nodes() method takes an XPath that points to some repetitive child elements of the main document. You then provide a table name and a column name to use later in the form TableName(ColumnName). Here, the table is Person and the column is FirstName. The FirstName column is then queried using value() to get the text. The results are shown in Figure 10-7.

FIGURE 10-7

One thing that can affect how queries are written and processed is whether SQL Server knows the structure of the XML document held in a column. There needs to be some way to specify a schema alongside an XML data type. The following section explains how you tell SQL Server exactly what format the XML stored as an xml data type should take.

W3C XML Schema in SQL Server

It was mentioned earlier that the new xml data type is now a first-class data type in SQL Server. This data type can be used to store untyped and typed XML data, so it shouldn’t be surprising that, just as relational data is specified by a schema, the new xml data type can be associated with a W3C XML Schema document to specify its structure.

Take a look at how you can specify a schema for data of type xml. The first task is to create a schema collection together with its first XML Schema. You need to give the collection a name—in this example, EmployeesSchemaCollection—and the W3C XML Schema document itself needs to be delimited with single quote marks. For example, if you wanted to create a very simple schema for a document that could contain a Person element and child elements named FirstName and LastName, you could do so using the following syntax:

 CREATE XML SCHEMA COLLECTION EmployeesSchemaCollection AS 
'<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema” targetNamespace=
“http://wiley.com/namespaces/Person” xmlns=”http://wiley.com/namespaces/Person”>
 <xsd:element name=“Person“>
  <xsd:complexType>
   <xsd:sequence>
    <xsd:element name=“FirstName“ />
    <xsd:element name=“LastName“ />
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>'

XmlDataType.sql

If you want to drop the XML Schema collection, you need to issue a DROP XMLSCHEMA statement:

DROP XML SCHEMA COLLECTION EmployeesSchemaCollection

Once you have a collection, you can add new schemas using the following syntax:

ALTER XML SCHEMA COLLECTION EmployeesSchemaCollection ADD
'<xsd:schema>
  <!--new schema inserted here -->
</xsd:schema>'

Untyped and typed XML data can be used in a SQL Server column, variable, or parameter. If you want to create a Docs table and associate it with a W3C XML Schema document, you can do so using code like the following:

CREATE TABLE [dbo].[Docs](
 [DocID] [int] IDENTITY(1,1) PRIMARY KEY,
 [XMLDoc] [xml] (EmployeesSchemaCollection))

The advantage of applying a schema collection is twofold. First, it acts as a validation check; XML not conforming to one of the schemas in the collection will be rejected in the same way that a column declared as an INT will not accept random textual data. Second, queries against the XML will return typed data as specified by the schema, rather than generic text.

For optimization, XML Schemas are shredded and stored internally in a proprietary format. Most of the schema can be reconstructed as an XML document from this proprietary format using the xml_schema_namespace intrinsic function. Therefore, if you had imported the schema into the EmployeesSchemaCollection shown in the XmlDataType.sql snippet, you could retrieve it using the following code:

SELECT xml_schema_namespace(N'dbo', N'EmployeesSchemaCollection')

Remember, too, that there can be multiple ways of writing a functionally equivalent W3C XML Schema document—for example, using references, named types, or anonymous types. SQL Server will not respect such differences when reconstituting a schema document.

In addition, the parts of the schema that are primarily documentation—for example, annotations and comments—are not stored in SQL Server’s proprietary format. Therefore, to ensure precise recovery of an original W3C XML Schema document, it is necessary to store the serialized XML Schema document separately. One option is to store it in a column of type xml or varchar(max) in a separate table.

Your final look at SQL Server concerns how to specify namespaces.

Dealing with Namespaced Documents

Often the documents you are working with will have namespaced elements or attributes, and you’ll need to specify a prefix to namespace URI binding in order to query them. You accomplish this by using the WITH XMLNAMESPACES statement.

The following example creates a document with a namespace and then queries it:

 DECLARE @NamespacedData xml;
SET @NamespacedData = '<x:data xmlns:x=”http://wrox.com/namespaces/examples”>
                       <x:item id=”1”>One</x:item>
                       <x:item id=”2”>Two</x:item>
                       <x:item id=”1”>Three</x:item>
                       </x:data>';
WITH XMLNAMESPACES ('http://wrox.com/namespaces/examples' as x)
SELECT @NamespacedData.value('(/x:data/x:item[@id = 2])[1]', 'nvarchar(10)') Item;

XmlDataType.sql

This returns the value of the element that has an id equal to 2; in this case the result is Two.

The key point here is that you specify the namespace URI and a suitable prefix. The prefix chosen doesn’t have to match the one in the document. One thing to note is that WITH XMLNAMESPACES must be preceded by a semicolon. If the previous statement doesn’t end in a semicolon, place it before the WITH:

;WITH XMLNAMESPACES ('http://wrox.com/namespaces/examples' as x)

You can also specify a default namespace if you need to:

 DECLARE @NamespacedData xml;
SET @NamespacedData = '<data xmlns=”http://wrox.com/namespaces/examples”>
                       <item id=”1”>One</item>
                       <item id=”2”>Two</item>
                       <item id=”1”>Three</item>
                       </data>';
WITH XMLNAMESPACES (DEFAULT 'http://wrox.com/namespaces/examples')
SELECT @NamespacedData.value('(/data/item[@id = 2])[1]', 'nvarchar(10)') Item;

XmlDataType.sql

This produces the same result as when you used an explicit prefix, x, bound to the namespace as shown in the XmlDataType.sql snippet.

So far you’ve seen two examples of how XML features and functionality have been added on to an existing relational database. In the next section you take the next step and examine an application designed from the ground up for the express purpose of storing and managing large numbers of XML documents.

USING EXIST WITH XML

The eXist XML database has been around since 2000, and has a solid reputation in its field. It is used as the basis for many systems, particularly those concerned with document and content management. Your first step is to download and install it.

Downloading and Installing eXist

Before doing anything with eXist, visit its website at http://exist-db.org/. From there, you’ll find links to download the latest version. The download is available for different platforms: a .jar file suitable for Unix/Linux and an .exe for Windows. The examples in this chapter use the Windows installation, version 1.4.1. You may need to make sure that the version of Java installed is recent enough for the version of eXist. For version 1.4.x, you’ll need Java 1.4 or higher.

NOTE If you are not sure which version of Java is installed on your computer, type java -version in a DOS or Unix terminal.

Once you have your download ready and have the right version of Java installed, you should be able to install eXist by clicking the .jar or .exe file on any properly configured workstation. If that’s not the case, open a command window and type the following:

java -jar eXist-<version>.jar

A fancy graphical installer will pop up and guide you through the installation, which is very straightforward.

WARNING When installing on Windows, you should install to somewhere other than the traditional Program Files. A good alternative is C:Exist if you are on Windows Vista or later because the install package was designed before the User Account Control (UAC) security measures were introduced. Alternatively, you can temporarily disable UAC before running the install.

WARNING At some stage during the install you’ll be prompted for a master password. Do not make the mistake of choosing one with an ampersand (&) in it. There’s a bug in the installer that causes the ampersand and anything after it to be truncated.

When that’s done, you have a ready-to-run native XML database that can be used in three different modes:

You can use eXist as a Java library to embed a database server in your own Java application.
You can run it as a standalone database server as you would run a SQL database server.
You can run it embedded in a web server and get the features of both a standalone database and a web interface to access the database.

After the installation, eXist can be used in the last two modes using a different set of scripts that you can find in its bin subdirectory:

server (.sh or .bat depending on your platform) is used to run eXist as a standalone database server.
startup (.sh or .bat) is used to start eXist embedded in a web server, and shutdown (.sh or .bat) is used to stop this web server. This is the mode that you will use for the exercises in this chapter because it is the one that includes most features.

To check that the installation is correct, launch startup.sh or startup.bat in a terminal. If you chose to install to the default directory on Windows (Program FilesExist), you’ll need to run the command prompt as administrator because standard users can’t write to this folder. You should see a series of warnings and information, concluding with (if everything is okay) the following lines with the date and time reflecting that of installation:

29 Sep 2011 14:03:57,021 [main] INFO  (JettyStart.java [run]:175)
 - eXist-db has started on port 8080. Configured contexts:
29 Sep 2011 14:03:57,022 [main] INFO  (JettyStart.java [run]:177)
 - http://localhost:8080/exist
29 Sep 2011 14:03:57,023 [main] INFO  (JettyStart.java [run]:179)
 - ----------------------------------------------------

These lines mean that jetty (the Java web server that comes with this eXist download) is ready to accept connections on port 8080.

NOTE By default, the web server listens to port 8080. This means that it will fail to start if another service is already bound to this port on your computer. If that’s the case, either stop this service before you start eXist or change eXist’s configuration to listen to another port. You can find instructions how to do so on eXist’s website at http://exist-db.org/exist/quickstart.xml#installation.

The last step to check that everything runs smoothly is to open your favorite web browser to http://localhost:8080/exist/ and confirm that eXist’s home page, shown in Figure 10-8, opens.

FIGURE 10-8

Interacting with eXist

Congratulations—you have your first native XML database up and running! Now it’s time to find out how you can interact with it. You will soon see that eXist is so open that you have many options.

Using the Web Interface

The first option is to use the web interface at http://localhost:8080/exist/. Scroll down this web page to the Administration section (on the left side). Click Admin to go to http://localhost:8080/exist/admin/admin.xql, where you need to log in as user admin with the password you chose during the installation process.

Once you’re logged in, you have access to the commands from the left-side menu. Feel free to explore by yourself how you can manage users and set up the example that eXist suggests you install.

When you are ready to continue this quick tour of eXist, click Browse Collection (see Figure 10-9).

FIGURE 10-9

NOTE XML documents are organized in collections; a collection is equivalent to a directory on a file system. They are really the same concept. You can think of an eXist database as a black box that packages the features you lack when you store XML documents on disk, while retaining the same paradigm of a hierarchical structure of collections, or directories.

A brand-new eXist installation has a number of existing collections, but you will create a new one named blog using the Create Collection button in Listing 10-1. Once this collection is created, follow the link to browse it. This new collection is empty. Using the Upload button, upload the documents blog-1.xml, blog-2.xml, and blog-3.xml, which you can download from the code samples for this chapter on the Wrox site. These documents are sample blog entries such as the one shown in Listing 10-1.

LISTING 10-1: Blog-1.xml

<?xml version=”1.0” encoding=”UTF-8”?>
<post xmlns:x=”http://www.w3.org/1999/xhtml” id=”1”
 author=”Joe Fawcett” dateCreated=”2011-09-18”>
  <title>A New Book</title>
  <body>
    <x:p>
      <x:b>I've been asked to co-author a new edition of <x:a
 href=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-0470114878.html”>
Beginning XML</x:a>
        by Wrox</x:b>
    </x:p>
    <x:p>It's incredible how much has changed since the
 book was published nearly five years ago. XML
      is now a bedrock of many systems, contrarily you see less of it than
 previously as it's buried
      under more layers.</x:p>
    <x:p>There are also many places where it has stopped being an automatic
 choice for data transfer,
      JSON has become a popular replacement where the data
 is to be consumed directly by a
      JavaScript engine such as in a browser.</x:p>
    <x:p>The new edition should be finished towards the end of the year
 and be published early in
      2012.</x:p>
  </body>
</post>

After you have uploaded these documents, you can display them by clicking their links. Now that you have documents in the /db/blog collection, you can query these documents, still using the web interface. To do so, click the Home link to go back to the home page and follow the link to the XQuery sandbox, which you can reach at http://localhost:8080/exist/sandbox/sandbox.xql.

Recall what you learned of XPath in Chapter 7 and XQuery in Chapter 9: the large text area surrounded by a yellow border expects a query written in XPath or XQuery. If you start with something simple, such as /item[@id='1'], and click Send, you’ll get all the documents from all the collections that have an item root element with an id attribute equal to 1. If you’ve followed the instructions that led to this point, you should get only the content of the first blog entry.

Of course, you can write more complex queries. For example, if you want to determine the titles, IDs, and links of blog entries with a link on the Wrox site, you can write the following (Listing 10-2).

LISTING 10-2: PostsWithWroxLinks.xquery

xquery version “1.0”;
declare namespace x=”http://www.w3.org/1999/xhtml”;
 
for $item in /post
  where .//x:a[contains(@href, 'wrox.com')]
  return <match> 
     <id>{string($item/@id)}</id>
     {$item/title}
     {$item//x:a[contains(@href, 'wrox.com')]}
  </match>

Note that you need to bind the namespace URI to a prefix as XQuery, and eXist has full support for namespaces.

Feel free to try as many queries as you like, and then move on to discover the eXist client.

Using the eXist Client

The eXist client is a standalone graphical tool that can perform the same kind of operations as the web interface. To start the client, perform the following steps:

1. Click the client.sh or client.bat script, depending on your environment. You should see a login screen. Enter the password that you set up for the admin user. Before you click the OK button, note the URL entry field. By default, this field has the value xmldb: exist://localhost:8080/exist/xmlrpc. Details about the different components of this URL won’t be covered here, but note the localhost:8080 piece: it means that this client tool uses HTTP to connect to the eXist database and that you can administer eXist databases on other machines.

2. The next screen enables you to browse the collections or your database. Click the blog link in the Name column to find your three blog entries; and, if you click one of them, you get a window where you can edit the entry.

3. Back at the main window, click the button with binoculars to open the Query dialog. Here you can try your XPath and XQuery skills again. Paste the same query from Listing 10-2 into the upper panel and click the button that has a pair of binoculars and the word Submit. Note the Trace tab in the Results window at the bottom. There you find the execution path of your queries, which may contain useful information to debug or optimize them. Figure 10-10 shows the query in Listing 10-2 run in the eXist client.

FIGURE 10-10

There is much more to explore with this client. For example, you can also save and restore collections or full databases. Once you’re done exploring, read on to see how eXist can be used as a WebDAV server.

Using WebDAV

WebDAV stands for Web-based Distributed Authoring and Versioning. It designates a set of IETF RFCs that define how HTTP can be used to not only read resources, but also to write them. WebDAV is widely and natively implemented in most common operating systems and tools, and eXist’s capability to expose its collections as WebDAV repositories can greatly facilitate the way you import and export documents.

NOTE The IETF (Internet Engineering Task Force) is the standardization organization that publishes most of the protocol-oriented Internet specifications, including HTTP. Its specifications are called RFCs (Requests For Comments); and despite this name, they are de facto standards.

As a first contact with WebDAV, point your web browser to http://localhost:8080/exist/webdav/db/. You need to enter the login and password of your database admin again. Then you will see a page where you can browse the collections and content of your database. Without browser extensions, you have read-only access; you need to set up your WebDAV client to gain write access and see the eXist database as a repository.

The eXist documentation available on your local database at http://localhost:8080/exist/webdav.xml includes detailed instructions for setting up Microsoft Windows, KDE Konqueror, oXygen, and XML Spy to use WebDAV. WebDAV support is also built into the finder on Mac OS X. In Windows XP and later, this feature is known as web folders and is fairly easy to configure; just note that if you are using IE8 or later, you’ll need to open the File Explorer, choose Map Network Drive from the Tools menu, and use the Connect to a Website that You can Use to Store Your Documents and Pictures link. Because these setups are well described in the eXist documentation, they aren’t covered here.

The only feature that you lack using the WebDAV interface is the capability to execute queries, but you’ll see next that you can regain this feature if you use an XML IDE.

Using an XML IDE

Your favorite XML IDE can probably access your eXist database through WebDAV. If it is interfaced with eXist, you can also execute queries from the IDE itself. This is the case with oXygen 8.0 which is available as a 30-day evaluation license from its site at www.oxygenxml.com/.

To configure the connection to your eXist database, perform the following steps:

1. Select the database perspective using either its icon on the toolbar or Windows

Open Perspective

Database from the main menu. Then click the Configure Database Sources button situated at the upper-right corner of the Database Explorer window. This opens the database preferences window.

2. Create a new data source with type eXist and add the following files:

exist.jar
lib/core/xmlrpc-client-3.1.1.jar
lib/core/xmlrpc-common-3.1.1.jar
lib/core/xmlrpc-common-3.1.1.jar
lib/core/ws-commons-util-1.0.2.jar

(The version numbers may be slightly different, but there will be only one version of each.)

3. Save this data source and create a connection using it with the eXist connection parameters. Save this connection and the database preferences, and you’re all set.

The Database Explorer shows the newly created connection, and you can now browse and update the eXist database as you would browse and open documents on your local filesystem.

So far, all this could be done through WebDAV, but not the following, which interrogates the data. To execute a query, perform the following steps:

1. Create a new document through the File New icon or menu item.

2. Choose type XQuery for this document and type your query. When you’re done, click the Apply Transformation Scenario button on the toolbar or select this action though the Document

XML Document

Apply Transformation Scenario menu item. Because no scenario is attached to this document yet, eXist opens the Configure Transformation Scenario dialog. The default scenario uses the Saxon XQuery engine.

3. To use the eXist XQuery engine, create a new scenario and select your eXist database connection as the Transformer.

4. Save this scenario and click Transform Now to run the query. You should get the same results as previously.

5. Now that this scenario is attached to your query document, you can update the query and click the Apply Transformation Scenario button to run it without needing to go through this configuration again.

Thus far you’ve seen a number of different ways to interact with the XML data using a variety of graphical tools, but you still need to see how web applications can access your database. This is discussed in the next section.

Using the REST Interface

What better way to interface your database with a web application could there be than using HTTP as it was meant to be used? This is the purpose of the REST interface.

NOTE REST stands for Representational State Transfer and is seen, in many cases, as a simpler and more efficient alternative to using SOAP based web services. It uses the intrinsic HTTP commands to create, update and retrieve data from a remote web service. REST and SOAP are covered in depth in Chapters 14 and 15.

As a first step, you can point your browser to http://localhost:8080/exist/rest/. Doing so shows you the content of your database root exposed as an XML document. This XML format is less user-friendly than browsing the content of your collections through the admin web interface or even through browsing the WebDAV repository, but far more easy to process in an application!

The full content of the database is available through this interface. For instance, http://localhost:8080/exist/rest/db/blog/ shows the content of the blog collection, and http://localhost:080/exist/rest/db/blog/blogItem1.xml gets you the first blog item. This becomes more interesting when you start playing with query strings. The REST interface accepts a number of parameters, including a _query parameter that you can use to send XPath or XQuery simple queries straight away!

For instance, if you want to get all the titles from all the documents in the collection /db/blog, you can query http://localhost:8080/exist/rest/db/blog/?_query=//title. The results are shown in Figure 10-11.

FIGURE 10-11

This XML deserves an XSLT transformation to be presented as HTML; and if you remember what you learned in Chapter 8, a simple transformation such as the one shown in Listing 10-3 would display the results better than the raw XML shown in Figure 10-11.

LISTING 10-3: Links.xslt

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”
    xmlns:exist=”http://exist.sourceforge.net/NS/exist” version=”1.0”>
    <xsl:template match=»/exist:result»>
        <html>
            <head>
                <title>Query results</title>
            </head>
            <body>
                <h1>eXist query results</h1>
                <p>Showing results 
<xsl:value-of select="@exist:start"/> to 
<xsl:value-of
 select="@exist:end"/> out of <xsl:value-of select="@exist:hits"/>:</p>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="*">
        <p>
            <xsl:copy-of select="."/>
        </p>
    </xsl:template>
</xsl:stylesheet>

The good news is that the eXist REST interface can execute this transformation for you if you like. But before you can do that, you need to store the transformation in the database. To do so, you can use any of the methods you have seen so far to upload documents in the database (the web interface, the eXist client, WebDAV, or your favorite XML IDE). Because this section is about the REST interface, you can use REST to upload the document.

Storing documents with the REST interface uses an HTTP PUT request; unfortunately, you can’t do that with your web browser. To send an HTTP PUT request, you need to either do a bit of programming (all the programming languages have libraries available to support this) or use a utility such as curl (http://curl.haxx.se/), which is available for most platforms. This program has a lot of different command-line options. If you have curl installed on your machine, to store the document Links.xslt at location http://localhost:8080/exist/rest/db/xslt/, just type the following command in a Unix or Windows command window:

curl -T links.xslt http://localhost:8080/exist/rest/db/xslt/

This command simply sends this document through an HTTP PUT. The eXist REST interface also supports HTTP DELETE requests, and you can also delete this document. To do so, use the -X option, which enables you to define the HTTP method that you want to use and write:

curl -X DELETE localhost:8080/exist/rest/db/xslt/links.xslt

Of course, if you have run the previous command, you need to upload the transformation again before you can use it! Now that your style sheet is stored in the database, to use it just add an _xsl parameter, specifying its location. Then paste or type this URL in your browser: http://localhost:8080/exist/rest/db/blog/?_query=//title&_xsl=/db/xslt/links.xslt. The result is shown in Figure 10-12.

FIGURE 10-12

You have now seen how to use HTTP GET, PUT, and DELETE methods. If you are familiar with HTTP, you may be wondering whether the REST interface supports the HTTP POST method. The answer is yes; this method is used to send requests that are too big to be easily pasted in the query string of an HTTP GET request. These queries have to be wrapped into an XML document, the structure of which is defined in the eXist documentation. For instance, the query encountered in Listing 10-3 would need to be in the format shown in Listing 10-4.

LISTING 10-4: PostsWithWroxLinks.xml

<?xml version=”1.0” encoding=”UTF-8”?>
<query xmlns=”http://exist.sourceforge.net/NS/exist”>
  <text>
    <![CDATA[
  declare namespace x=”http://www.w3.org/1999/xhtml”;
 
for $item in /post
  where .//x:a[contains(@href, 'wrox.com')]
  return <match> 
     <id>{string($item/@id)}</id>
     {$item/title}
     {$item//x:a[contains(@href, 'wrox.com')]}
  </match>
  ]]>
  </text>
</query>

Note how the query itself has been cautiously embedded within a CDATA section so that it qualifies as well-formed XML. To send this query using the REST interface, you can use curl and a -d option. The command looks like the following:

curl -d @linksToWrox.xml http://localhost:8080/exist/rest/db/

Other Interfaces

You’ve already seen four ways to interact with eXist, but there are many more out there. The following list briefly covers a few.

XML:DB API: The XML:DB API is a common API defined by a number of XML database editors. Its original purpose was to define a vendor-neutral API to play the same role with XML databases that JDBC plays with SQL databases. Unfortunately, the project failed to attract commercial vendors and seems to have lost all its momentum. The XML:DB is still the API of choice to access your eXist database if you are developing in Java.
XML-RPC: XML-RPC interface has the same functionality as the REST interface together with some added features—for example, you can update an XML fragment without uploading whole documents and administer your database entirely within this interface.
SOAP: A SOAP interface is also available with the same features of the XML-RPC interface for those of you who prefer SOAP over XML-RPC.
Atom Publishing Protocol (APP): An APP interface has been recently developed so that you can see your collections as Atom feeds.

Choosing an Interface

With so many options, how do you decide which one you should be using? Ask yourself whether it really matters. You can think of your eXist database as a black box that encapsulates your XML documents. These documents are located in collections that are similar to file directories. The black box acts like a filesystem with XQuery capabilities and provides a number of different interfaces to access the same set of documents in different ways. Whichever interface is used, the effect is the same. You can choose, case by case, the interface that is most convenient for the task you have to do. The following is a list of tips to help you decide:

If you need a filesystem-like type of access to your documents, WebDAV is a sure choice.
If all you have is a browser, the web interface is what you need.
If your XML IDE supports eXist, that makes your life easier. If you’re using a tool that is a good web citizen and can use the different HTTP, you can plug the REST interface directly.
If you’re developing in Java, have a look at the XML:DB API.
If you want to integrate your database with Atom tools, the APP interface is designed for you.
If you’re a web services fan, you will choose either the XML-RPC or the SOAP interface.

The richness of this set of interfaces means that your documents will never be locked in the database and can remain accessible in any environment.

SUMMARY

In this chapter you’ve learned about the following:

Much of today’s data comes in the form of tabular data and XML combined.
You need a dedicated storage type for XML rather than just use a text field. You also need methods to extract specific values and fragments of XML as well as methods to create new XML formats combining the relational and XML data. You will probably want the facility to update XML documents although this is not always a necessity.
A relational database handles both tabular data and XML documents, whereas a native XML database is designed to cope solely with XML documents.
High-end systems such as Oracle and SQL Server XML have their own data type and there are suitable methods available on these types for retrieval and manipulation of the XML.
The features available in a native XML database include the ability to store large document collections as well as the ability to efficiently query across these documents.

EXERCISES

You can find suggested answers to these questions in Appendix A.

1. List the main reasons to choose a relational database with XML features over a native XML database.

2. What five methods are available against an XML data type? (No peeking!)

3. MySQL has only two XML-related functions. If you could ask for one more feature or function, what would it be?

WHAT YOU LEARNED IN THIS CHAPTER

TOPIC	KEY POINTS
Storage needs	There is a big difference between relational data, data in a tabular format, and XML data. Therefore, special mechanisms are needed to store XML within relational systems.
Essential features in databases.	XML needs to be stored in a native format, rather than as text. There must also be ways to query it for specific values and a way to return fragments of XML. Ideally there should also be a way to treat XML as tabular data if possible.
Choosing an application	Most commercial relational databases have fairly advanced XML features, particularly Oracle and SQL Server. Native XML databases are designed to cope with the situation in which all data is held as XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10: XML and Databases

Create new playlist

Sign In

Sign Up

Chapter 10

XML and Databases

UNDERSTANDING WHY DATABASES NEED TO HANDLE XML

ANALYZING WHICH XML FEATURES ARE NEEDED IN A DATABASE

Retrieving Documents

Retrieving Data from Documents

Updating XML Documents

Displaying Relational Data as XML

Presenting XML as Relational Data

USING MYSQL WITH XML

Installing MySQL

Adding Information in MySQL

Querying MySQL

Updating XML in MySQL

Usability of XML in MySQL

Client-Side XML Support

USING SQL SERVER WITH XML

Installing SQL Server

Presenting Relational Data as XML

Using FOR XML RAW

Using FOR XML AUTO

Using FOR XML EXPLICIT

Using FOR XML PATH

Understanding the xml Data Type

Creating Indexes with the xml Data Type

Using the modify() Method

Using the query() Method

Using the value() Method

Using the exist() Method

Using the nodes() Method

W3C XML Schema in SQL Server

Dealing with Namespaced Documents

USING EXIST WITH XML

Downloading and Installing eXist

Interacting with eXist

Using the Web Interface

Using the eXist Client

Using WebDAV

Using an XML IDE

Using the REST Interface

Other Interfaces

Choosing an Interface

SUMMARY

Table of Contents for
Chapter 10: XML and Databases