WHAT YOU WILL LEARN IN THIS CHAPTER:
Not very along ago, you had two main options when deciding where to store your data. You could go for the traditional solution, a relational database such as Oracle, Microsoft’s SQL Server, or the ever-popular open source MySQL. Alternatively, you could choose to use XML. A relational database has the advantage of efficiently storing data that can be expressed in a tabular form, although performance can be a problem if you need to join many tables together. XML has the advantage of coping with nested data or documents that can’t be easily broken down further. After a while, it became apparent that a hybrid of the two was needed: a system that could store tabular data alongside XML documents, giving the user the ability to query and modify the XML as well as perform standard operations against the relational data. This would create an all-purpose storage center giving the best of both worlds.
Relational databases grew from the work of Edgar Codd in the 1970s. He was the first to provide a solid mathematical foundation for the main concepts found in these systems, such as tables (which he called relations), primary keys, normalization (where efforts are made to reduce any duplication of data), and relationships, such as one-to-many and many-to-many. Nowadays, hundreds of relational database management systems are available. They range from top-end databases such as Oracle, SQL Server, and DB2, to ones designed for desktop use, such as Microsoft’s Access.
These systems have widely different feature sets, but they generally have two things in common. First, they use a special language called Structured Query Language (SQL) to query the data that resides in them. Second, they cope well with data that can be broken down and stored in a tabular form, where items are typically represented as rows in a table. The different properties of these items are represented by the different fields, or columns, of these rows.
If you are trying to represent data for your customers and their orders, a relational system is fine. You probably have one table for basic customer details, one for the order summary, and another table for the order items. You can combine these tables to extract and report on the state of your business; the orders table has a reference to the customer who made the order and the order details table has a reference to the order number.
However, in many situations you need to store data that doesn’t fit easily into this pattern. If you expand the customer and order scenario further, how would you store information regarding an actual order once it has been dispatched? It is not enough to keep a record of the actual items and their quantity because the price will almost certainly change in the future. If you wanted to go with this model, you’d need to keep track of historic prices, tax rates, and discounts. One solution is to store a copy of the order document as a binary file; this is a little inflexible because it can’t be queried if necessary. A better alternative is to store the order details as an XML document detailing the actual items and their prices, customer, and shipping information and any discounts that were applied. This XML document can then be transformed to provide a confirmation e-mail, a delivery note, or the order itself. Because it is XML, it can also be queried easily using the techniques shown in this book and precludes the need to keep historical data. This makes the database schema and any associated queries much simpler.
An alternative to storing these documents within a database is to just keep them as documents in a filesystem. This is quite easy to do, but leads to a number of problems:
Having a database that can both store XML documents and query them solves these problems and is much easier to use.
You can take two approaches if you want to store your documents in a database. You can use a traditional relational database such as SQL Server, which comes with the capability to store, query, and modify XML documents. Alternatively, you can choose a native XML database, one designed from the ground up to store XML documents efficiently; these usually have limited relational capabilities. Your decision should therefore be based on what data it is you want to store. If your data consists of a large number of XML documents that need to be accessed or queried then a native XML database is probably your best option. This might be the case if your software is a content management system where different snippets of data can be merged to form complete texts. If, however, you have a situation where you store details of customers but need to attach some XML documents, a traditional relational database with XML capabilities is a better choice.
Now that you understand why XML documents and databases are a good mix, you will next learn which XML features you usually need in a database to accomplish common tasks.
Whether you’ve chosen a relational or native XML database, you’ll need a number of features to address the common tasks associated with XML documents. Not every application will necessarily need all features, and some will definitely be used less frequently than others. This section deals with each task separately, and gives an indication of how important each one is likely to be. This will help you choose between the many systems available.
This is a must-have requirement; there’s little use in storing documents that can’t be accessed later. However, you have a couple of points to consider:
It’s quite likely that you’ll want to retrieve sections of a document rather than the entire thing. For example, you may want to see the shipping address or all the line items with their respective discounts. You may also need the information in a slightly different format. This is where XQuery, covered in Chapter 9, becomes invaluable. Nearly all XML-enabled databases that expose this sort of functionality have settled on XQuery as the standard way of querying and transforming XML data (if necessary) before returning it to the user.
However, there is a huge variation in how much of the standard is supported. The winners here are the native XML databases. These databases often implement the full capabilities of XQuery, whereas some of the relational databases that have had XML features added on support only a limited subset. If this sort of operation is likely to be used heavily in your application, you’ll want to check carefully just how much of XQuery is implemented before you decide which application to use.
Although it seems like updating XML documents should be standard, this feature is more likely to be needed in a native XML database rather than a relational one. This is because in a relational database, XML documents are often used more as snapshots of the data at a point in time and therefore it doesn’t make sense to change them. In a native application all your data is in the form of XML, so you’ll probably need to modify it at some time. Again, the facilities for such modifications vary widely between databases; one reason for this is that the standard syntax for updating an XML document was agreed on much later than the one for retrieval.
Displaying relational data as XML applies only to relational databases. Many relational databases have the capability to present the results of a standard SQL query as XML instead of the usual tabular format. Some have just a basic capability that wraps each row and each column in an element. With others, you can specify the format more precisely, perhaps by introducing nesting (where appropriate). Some systems also allow you to insert an XSL transformation into the pipeline. Although this is something that can be achieved easily enough once the data has been returned, it’s often easier and more efficient to incorporate the transformation as part of the whole data retrieval process and let the client concentrate on the presentational aspect.
Again, presenting XML as relational data really applies only to relational databases. Sometimes you need to combine a data from a relational source and an XML source—perhaps your XML refers to a customer’s ID and you want to include some data from the customer table in your query. One way to do this is to present the XML document as a regular table and join it to the customer table. This works if the XML data is fairly regular and doesn’t use a hierarchical structure.
A companion problem is called shredding, which means taking all or parts of an XML document and storing them in a regular database table. This is common when you receive the data from a third party and decide that the benefits of XML storage are outweighed by the need to integrate with an existing database structure. It’s quite common in this situation to place the data you require into a table and retain the XML document as an extra column on that table. This way you can still use the XML if needed, but have the advantage of being able to join on other tables easily with the data you’ve extracted.
Now that you’ve seen some of the features that are available, the following section looks at three applications that implement some or all of these features, beginning with the XML functionality in MySQL.
The driving factors to add XML support within open source Relational Database Management Systems (RDBMS) are the same as those of their commercial competitors, except that open-source RDBMSs appear to lag well behind in this area of functionality. The reasons are diverse. Open-source projects are often less prone than commercial companies to “get big fast” and absorb all their surroundings. They are also more used to collaborating with other projects. In addition, it is possible that they are less influenced to incorporate what many regard as just another trendy feature, XML in this case. In addition, open-source projects usually have fewer financial resources for their development.
MySQL is one of the leading open source databases, although the management of its development was taken over by Oracle a short time ago. It is used heavily in conjunction with both PHP- and Java-based websites. It has a few XML features, although in comparison to the other two commercial products examined in this chapter, it’s sadly lacking. In fact, MySQL is the only database discussed in this chapter that hasn’t added anything to its XML features since the last edition of this book in 2007.
You can download MySQL from www.my.mysql.com/downloads/. Follow the links to Community Server and choose version 5.5 (or later if available). Stable versions of MySQL are available for both Windows and Unix/Linux platforms.
The download page includes a sources download and a number of binary downloads for the most common platforms, including Windows, many Linux distributions, and Mac OS X. Choose the option that is the best match for your platform and follow the instructions.
For this chapter, you need to install the server and the client programs. If you are a Debian or Ubuntu user, select “Linux x86 generic RPM (dynamically linked) downloads,” convert the .rpm packages into .deb packages using the Debian alien package and alien command, and install it like any other package using the Debian dpkg -i command.
You can also install the front-end tools, which give you a graphical user interface. However, for these examples you can just use the command shell, which is reminiscent of a Windows command prompt.
If you are installing a MySQL database for anything other than test purposes, it is recommended that you set up proper users and passwords to protect your database. For the tests covered in this chapter, you can just choose a password for the root/admin account. You’ll be prompted to do that during the install, and you will access the system using that account.
You can use a GUI tool to interact with MySQL, but if you really want to understand what’s going on behind the scenes, the mysql command-line utility is your best friend. Open a Unix or Windows command window and navigate to the bin folder of the installation, for example: C:Program FilesMySQLMySQl Server 5.5in. Then type mysql -u root –p. If everything is working correctly, you should see the mysql prompt asking for the password you chose during installation:
C:Program FilesMySQLMySQl Server 5.5in>mysql –u root -p Enter password: ********** Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 4 Server version: 5.5.15 MySQL Community Server (GPL) Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
In the following Try It Out, you will create a new database and add information.
mysql> create database Blog DEFAULT CHARACTER SET 'utf8'; Query OK, 1 row affected (0.00 sec) mysql>
mysql> use Blog; Database changed mysql>
mysql> create table BlogPost ( -> PostId INT PRIMARY KEY, -> Post LONGTEXT -> ); Query OK, 0 rows affected (0.27 sec) mysql>
mysql> INSERT BlogPost (PostId, Post) SELECT 1, '<post xmlns:x=”http://www.w3.org/1999/xhtml” id=”1” author=”Joe Fawcett” dateCreated=”2011-09-18”> <title>A New Book</title> <body> <x:p> <x:b>I've been asked to co-author a new edition of <x:a href=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-470114878.html”> Beginning XML</x:a> by Wrox</x:b> </x:p> <x:p>It's incredible how much has changed since the book was published nearly five years ago. XML is now a bedrock of many systems, contrarily you see less of it than previously as it's buried under more layers.</x:p> <x:p>There are also many places where it has stopped being an automatic choice for data transfer, JSON has become a popular replacement where the data is to be consumed directly by a JavaScript engine such as in a browser.</x:p> <x:p>The new edition should be finished towards the end of the year and be published early in 2012.</x:p> </body> </post>'; Query OK, 1 row affected (0.34 sec) mysql>
CreateAndLoadDatabse.sql
source C:mySQLCreateAndLoadDatabase.sql
Now that you have your first two blog entries, what can you do with them? Because MySQL is a SQL database, so you can use all the power of SQL to query the content of your database. To show all the entries, just type the following:
SELECT * FROM BlogPost;
The result is too verbose to print in a book, but if you want something more concise, you can select only the first characters of each entry:
mysql> SELECT PostId, substring(Post, 1, 60) FROM BlogPost; +--------+----------------------------------------------------------------------+ | PostId | substring(Post, 1, 70) | +--------+----------------------------------------------------------------------+ | 1 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”1” author=”Joe Fawc | | 2 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”2” author=”Joe Fawc | | 3 | <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”3” author=”Joe Fawc | +--------+----------------------------------------------------------------------+ 3 rows in set (0.02 sec) mysql>
Or, if you just want the number of entries:
mysql> SELECT COUNT(*) FROM BlogPost; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ 1 row in set (0.13 sec) mysql>
This is pure SQL, however, and could be done with any SQL database without XML support. But what if you wanted, for instance, to display the content of the title element?
The XML support in MySQL 5.5 comes from two XML functions documented at http://dev.mysql.com/doc/refman/5.5/en/xml-functions.html. These are ExtractValue and UpdateXML. The following Try It Out shows you how to use ExtractValue to query data.
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title FROM BlogPost; +--------+--------------------------+ | PostId | Title | +--------+--------------------------+ | 1 | A New Book | | 2 | Go, Neutrino, Go! | | 3 | Size of the Solar System | +--------+--------------------------+ 3 rows in set (0.01 sec) mysql>
mysql> SELECT PostId FROM BlogPost -> WHERE ExtractValue(Post, '/post/title') = -> 'A New Book'; +--------+ | PostId | +--------+ | 1 | +--------+ 1 row in set (0.00 sec) mysql>
If you are familiar with XPath, the behavior of ExtractValue is often somewhat counterintuitive. For instance, if you try to apply the same technique to fetch the <body> of your blog entries, you’ll get the following:
mysql> SELECT PostId, ExtractValue(Post, 'post/body') Body -> FROM BlogPost; +--------+-----------------------------------------------------------+ | PostId | Body | +--------+-----------------------------------------------------------+ | 1 | | | 2 | | | 3 | | +--------+-----------------------------------------------------------+ 3 rows in set (0.04 sec) mysql>
If you are used to the XPath behavior that translates elements into strings by concatenating the text nodes from all their descendants, you might assume that ExtractValue would do the same, but that’s not the case: ExtractValue only concatenates the text nodes directly embedded in elements. In this case, the only text nodes that are direct children from description elements are whitespaces, which explains the preceding output.
To get the default XPath behavior, you need to explicitly state that you want the text nodes at any level like so:
mysql> SELECT PostId, ExtractValue(Post, 'post/body//text()') Body -> FROM BlogPost; +| PostId | Body +--------+---------------------------------------------------------------------- | 1 | I've been asked to co-author a new edition of Beginning XML by Wrox It's incredible how much has changed since the book was published nearly five years ago. XML is now a bedrock of many systems, contrarily you see less of it than previously as it's buried.... 3 rows in set (0.00 sec) mysql>mysql>
Note that this listing has been edited for conciseness.
How would you select entries that contain images? In XPath, you use //img directly in a test, and this would be considered true if and only if there were at least one <img> element somewhere in the document. If you’re familiar with XPath, you might be tempted to write something like this:
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost -> WHERE ExtractValue(Post, '//x:img') != ''; Empty set (0.00 sec) mysql>
This doesn’t work, however, because <img> elements are empty: they don’t have any child text nodes, and ExtractValue converts them into empty strings. To make that query work, you need to select a node that will have a value (such as //x:img/@src) or count the number of <img> elements and test that the result is greater than zero. This method is shown in the following code snippet:
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost -> WHERE ExtractValue(Post, '//x:img/@src') != ''; +--------+--------------------------+ | PostId | Title | +--------+--------------------------+ | 3 | Size of the Solar System | +--------+--------------------------+ 1 row in set (0.02 sec) mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost -> WHERE ExtractValue(Post, 'count(//x:img)') > 0; +--------+--------------------------+ | PostId | Title | +--------+--------------------------+ | 3 | Size of the Solar System | +--------+--------------------------+ 1 row in set (0.04 sec) mysql>
You’ll hit another limitation pretty soon if you use this function. Most of the string functions of XPath are not implemented. For instance, if you want to find entries with links to URIs from www.wrox.com, you might be tempted to write something such as the following:
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost -> WHERE ExtractValue -> (Post, 'count(//x:a[starts-with(@href, “http://www.wrox.com”)])') > 0;
Unfortunately, the starts-with function is not implemented, so you’ll get an error message—and not an informative one at that. It will just state that there’s a syntax error; you need to use SQL to do what you can’t do with XPath:
mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost -> WHERE ExtractValue(Post, '//x:a/@href') -> LIKE 'http://www.wrox.com/%'; +--------+------------+ | PostId | Title | +--------+------------+ | 1 | A New Book | +--------+------------+ 1 row in set (0.06 sec) mysql>
This ensures that any href attribute begins with the Wrox domain.
Now that you’ve seen how to make the most of MySQL’s somewhat limited select functionality, it’s time to try updating an XML document.
The second XML function introduced by MySQL 5.5 is called UpdateXML. Like any SQL function, UpdateXML doesn’t perform database updates, but it is handy when you use it in update statements.
UpdateXML takes three parameters:
UpdateXML takes the XML document, looks for the content pointed to by the XPath expression passed as the second parameter and replaces it with the XML fragment passed as the third parameter. It then returns the new XML formed by the function as a string.
To change the title of the second blog entry, for example, use the following:
mysql> UPDATE BlogPost -> SET Post = UpdateXml(Post, '/post/title', -> '<title>Faster Than Light?</title>') -> WHERE PostId = 2; Query OK, 1 row affected (0.13 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost; +--------+--------------------------+ | PostId | Title | +--------+--------------------------+ | 1 | A New Book | | 2 | Faster Than Light? | | 3 | Size of the Solar System | +--------+--------------------------+ 3 rows in set (0.00 sec) mysql>
This function is obviously handy in this situation, but note that the XPath expression must point to an element. This means that the granularity of updates is at element level, so if you want to update an attribute value, you are out of luck.
After this introduction to the XML features of MySQL 5.5, you may be wondering, how usable these features are in real-world applications? To answer this question, first note that support of XML in MySQL 5.5 is limited to the two string functions already shown. In other words, there’s no such thing as an XML column type. Your documents are stored as text and need to be parsed each time you use one of these functions.
Consider one of the queries that you have seen:
SELECT PostId FROM BlogPost WHERE ExtractValue(Post, '/post/title') = 'A New Book';
To process this query, the database engine needs to read the full content of all the blog entries, parse this content, and apply the XPath expression that extracts the title. That’s fine with your couple of blog entries, but likely not something you want to do if you are designing a WordPress clone able to store millions of blog entries.
To optimize the design of the sample database that you created, you would extract the information that is most commonly used in user queries and move it into table columns. In the Blog example created earlier, obvious candidates would be the title, the author, and the publication date. Having this data available as columns enables direct access for the engine. If you need further optimization, you can use these columns to create indexes.
The other consideration to keep in mind is the mismatch between the current implementation and the XPath usages. You saw an example of that when you had to explicitly specify that you wanted to concatenate text nodes from all the descendants. If you use these functions, you will see more examples where behavior differs from the generally accepted XML standards. This mismatch may be reduced in future releases, and is something to watch carefully because it could lead to incompatible changes.
With these restrictions in mind, if you are both a MySQL and an XML user, you will find these first XML features most welcome, and there is no reason to ignore them. They don’t turn MySQL into a native XML database yet, but they are a step in the right direction!
The features that you have seen so far are all server-side features implemented by the database engine. You don’t need anything to support XML on the client side, and it is very easy to use any programming language to convert SQL query results into XML. However, you might find it disappointing to leave this chapter without at least a peek at an XML feature that can be handy when you use the mysql command-line utility. To see this feature in action, start a new session but add the --xml option:
C:Program FilesMySQLMySQL Server 5.5in>mysql -u root -p --xml Enter password: ********** Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 15 Server version: 5.5.15 MySQL Community Server (GPL) Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
The --xml option has switched on the XML mode, and the query results will now be output as XML:
mysql> USE Blog Database changed mysql> SELECT PostId, ExtractValue(Post, '/post/title') Title -> FROM BlogPost; <?xml version=”1.0”?> <resultset statement=”SELECT PostId, ExtractValue(Post, '/post/title') Title FROM BlogPost” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <row> <field name=”PostId”>1</field> <field name=”Title”>A New Book</field> </row> <row> <field name=”PostId”>2</field> <field name=”Title”>Faster Than Light?</field> </row> <row> <field name=”PostId”>3</field> <field name=”Title”>Size of the Solar System</field> </row> </resultset> 3 rows in set (0.00 sec) mysql>
Although that’s not very readable as is, it’s a useful feature when you use mysql in shell or DOS scripts. When you get your results as XML documents, you can run XML tools such as XSLT transformations. If you need a truly simple way to turn out a query result in XHTML, this is definitely something that you’ll find useful.
Now that you’ve seen an open source implementation, it’s time to move on to a commercial product, SQL Server.
Microsoft’s SQL Server has had XML functionality since version 2000. Version 2005 added a lot more, but since then there haven’t been many changes. The version you use in this section is 2008 R2, but the examples work with any version from 2005 upwards unless otherwise specified.
For these examples you’ll use the free Express edition of SQL Server. You can download it from www.microsoft.com/sqlserver/en/us/editions/express.aspx. You’ll need to choose the appropriate option depending on whether you need the 32 or 64 bit version. Make sure you select the install that comes with the developer tools so that you can use SQL Server Management Studio to run the examples. You will also need to download a sample database to work with. The AdventureWorks OLTP database is available at http://msftdbprodsamples.codeplex.com/releases/view/55926 and represents a fictitious bicycle manufacturing company. When downloaded into SQL Server Management Studio, the database is referred to as AdventureWorks.
The first piece of functionality discussed is how to present standard relational data as XML.
Transforming tabular data to an XML format is a rather common requirement. SQL Server offers a number of options to achieve this, and they all involve appending the phrase FOR XML <mode> to the end of a regular SELECT query. You can use four different modes; each one enables you to tailor the results to a lesser or greater degree. The most basic mode of FOR XML is RAW.
The simplest mode you can use is RAW. Suppose you have the following query, which selects the basic details of orders that have a value greater than $300,000:
SELECT [PurchaseOrderID] ,[RevisionNumber] ,[Status] ,[EmployeeID] ,[VendorID] ,[ShipMethodID] ,[OrderDate] ,[ShipDate] ,[SubTotal] ,[TaxAmt] ,[Freight] ,[TotalDue] ,[ModifiedDate] FROM [Purchasing].[PurchaseOrderHeader] WHERE [TotalDue] > 300000;
ForXmlQueries.sql
The example queries in this section are available in the code download for the chapter in a file named ForXmlQueries.sql. The results, three rows, are shown in Figure 10-1.
To return XML, add FOR XML RAW to the query:
SELECT [PurchaseOrderID]
,[RevisionNumber]
,[Status]
,[EmployeeID]
,[VendorID]
,[ShipMethodID]
,[OrderDate]
,[ShipDate]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[ModifiedDate]
FROM [Purchasing].[PurchaseOrderHeader]
WHERE [TotalDue] > 300000
FOR XML RAW;
ForXmlQueries.sql
You’ll now get an attribute-centric XML view of the data, with each row wrapped in a <row> element. However, there’s no document element added so it’s actually an XML fragment. One of the rows is shown in the following code:
<row PurchaseOrderID=”4007” RevisionNumber=”13” Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” ShipDate=”2008-04-26T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407” />
If you want an element-centric view, add the ELEMENTS directive to the query:
SELECT [PurchaseOrderID] ,[RevisionNumber] ,[Status] ,[EmployeeID] ,[VendorID] ,[ShipMethodID] ,[OrderDate] ,[ShipDate] ,[SubTotal] ,[TaxAmt] ,[Freight] ,[TotalDue] ,[ModifiedDate] FROM [Purchasing].[PurchaseOrderHeader] WHERE [TotalDue] > 300000 FOR XML RAW, ELEMENTS;
ForXmlQueries.sql
You’ll then get rows like the following:
<row> <PurchaseOrderID>4007</PurchaseOrderID> <RevisionNumber>13</RevisionNumber> <Status>2</Status> <EmployeeID>251</EmployeeID> <VendorID>1594</VendorID> <ShipMethodID>3</ShipMethodID> <OrderDate>2008-04-01T00:00:00</OrderDate> <ShipDate>2008-04-26T00:00:00</ShipDate> <SubTotal>554020.0000</SubTotal> <TaxAmt>44321.6000</TaxAmt> <Freight>11080.4000</Freight> <TotalDue>609422.0000</TotalDue> <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate> </row>
As mentioned before, the query returns an XML fragment rather than a full document. To add a surrounding root element, use the ROOT directive combined with the name of the root element you want:
SELECT [PurchaseOrderID]
,[RevisionNumber]
,[Status]
,[EmployeeID]
,[VendorID]
,[ShipMethodID]
,[OrderDate]
,[ShipDate]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[ModifiedDate]
FROM [Purchasing].[PurchaseOrderHeader]
WHERE [TotalDue] > 300000
FOR XML RAW, ELEMENTS, ROOT('orders');
ForXmlQueries.sql
You’ll now get an <orders> element around all the <row> elements.
You may also want to change the name of the default row container, which is <row>. Simply add the name in parentheses after the RAW keyword:
SELECT [PurchaseOrderID]
,[RevisionNumber]
,[Status]
,[EmployeeID]
,[VendorID]
,[ShipMethodID]
,[OrderDate]
,[ShipDate]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[ModifiedDate]
FROM [Purchasing].[PurchaseOrderHeader]
WHERE [TotalDue] > 300000
FOR XML RAW('order'), ELEMENTS, ROOT('orders'),
This will give you results similar to the following:
<orders> <order> <PurchaseOrderID>4007</PurchaseOrderID> <RevisionNumber>13</RevisionNumber> <Status>2</Status> <EmployeeID>251</EmployeeID> <VendorID>1594</VendorID> <ShipMethodID>3</ShipMethodID> <OrderDate>2008-04-01T00:00:00</OrderDate> <ShipDate>2008-04-26T00:00:00</ShipDate> <SubTotal>554020.0000</SubTotal> <TaxAmt>44321.6000</TaxAmt> <Freight>11080.4000</Freight> <TotalDue>609422.0000</TotalDue> <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate> </order> <!-- more order elements --> </orders>
Another issue that commonly arises is how to treat nulls in the results. The default is to not output the element or attribute at all if its value is null. Sometimes it’s easier to process the results if the element is there, but empty; to differentiate between an element that contains a null value and one which has an empty string there needs to be a marker on the element to signify that its value is null rather than an empty string. The marker used is xsi:nil=”true “. This is a standard attribute from the schema instance namespace, so SQL Server also needs to add the correct namespace binding. If you want this treatment, use the XSINIL directive after the ELEMENTS keyword. The following code shows how to ensure that elements that have a null value are output rather than being omitted:
SSELECT [PurchaseOrderID]
,[RevisionNumber]
,[Status]
,[EmployeeID]
,[VendorID]
,[ShipMethodID]
,[OrderDate]
,[ShipDate]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[ModifiedDate]
FROM [Purchasing].[PurchaseOrderHeader]
WHERE [TotalDue] > 300000
FOR XML RAW('order'), ELEMENTS XSINIL, ROOT('orders'),
ForXmlQueries.sql
The result is as follows:
<orders xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <order> <PurchaseOrderID>4007</PurchaseOrderID> <RevisionNumber>14</RevisionNumber> <Status>2</Status> <EmployeeID>251</EmployeeID> <VendorID>1594</VendorID> <ShipMethodID>3</ShipMethodID> <OrderDate>2008-04-01T00:00:00</OrderDate> <ShipDate xsi:nil=”true” /> <SubTotal>554020.0000</SubTotal> <TaxAmt>44321.6000</TaxAmt> <Freight>11080.4000</Freight> <TotalDue>609422.0000</TotalDue> <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate> </order> <!-- more order elements --> </orders>
If your copy of the database doesn’t have a null shipping date, you can always change one—the code in ForXmlQueries.sql makes the requisite change and then restores it at the end.
One final feature that might be useful, especially if your data is being passed to a third party, is the ability to add an XML schema. This is done by appending the previous code with the XMLSCHEMA directive like so:
SELECT [PurchaseOrderID] ,[RevisionNumber] ,[Status] ,[EmployeeID] ,[VendorID] ,[ShipMethodID] ,[OrderDate] ,[ShipDate] ,[SubTotal] ,[TaxAmt] ,[Freight] ,[TotalDue] ,[ModifiedDate] FROM [Purchasing].[PurchaseOrderHeader] WHERE [TotalDue] > 300000 FOR XML RAW('order'), ELEMENTS XSINIL, ROOT('orders'), XMLSCHEMA;
ForXmlQueries.sql
The schema (highlighted here) is included just after the document element and before the actual results:
<orders xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”> <xsd:schema targetNamespace=”urn:schemas-microsoft-com:sql:SqlRowSet1” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:sqltypes=”http://schemas.microsoft.com/sqlserver/2004/sqltypes” elementFormDefault=”qualified”> <xsd:import namespace=”http://schemas.microsoft.com/sqlserver/2004/sqltypes” schemaLocation= “http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd” /> <xsd:element name=”order”> <xsd:complexType> <xsd:sequence> <xsd:element name=”PurchaseOrderID” type=”sqltypes:int” nillable=”1” /> <xsd:element name=”RevisionNumber” type=”sqltypes:tinyint” nillable=”1”/> <xsd:element name=”Status” type=”sqltypes:tinyint” nillable=”1” /> <xsd:element name=”EmployeeID” type=”sqltypes:int” nillable=”1” /> <xsd:element name=”VendorID” type=”sqltypes:int” nillable=”1” /> <xsd:element name=”ShipMethodID” type=”sqltypes:int” nillable=”1” /> <xsd:element name=”OrderDate” type=”sqltypes:datetime” nillable=”1” /> <xsd:element name=”ShipDate” type=”sqltypes:datetime” nillable=”1” /> <xsd:element name=”SubTotal” type=”sqltypes:money” nillable=”1” /> <xsd:element name=”TaxAmt” type=”sqltypes:money” nillable=”1” /> <xsd:element name=”Freight” type=”sqltypes:money” nillable=”1” /> <xsd:element name=”TotalDue” type=”sqltypes:money” nillable=”1” /> <xsd:element name=”ModifiedDate” type=”sqltypes:datetime” nillable=”1” /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> <order xmlns=”urn:schemas-microsoft-com:sql:SqlRowSet1”> <PurchaseOrderID>4007</PurchaseOrderID> <RevisionNumber>14</RevisionNumber> <Status>2</Status> <EmployeeID>251</EmployeeID> <VendorID>1594</VendorID> <ShipMethodID>3</ShipMethodID> <OrderDate>2008-04-01T00:00:00</OrderDate> <ShipDate xsi:nil=”true” /> <SubTotal>554020.0000</SubTotal> <TaxAmt>44321.6000</TaxAmt> <Freight>11080.4000</Freight> <TotalDue>609422.0000</TotalDue> <ModifiedDate>2009-09-12T12:25:46.407</ModifiedDate> </order> <!-- rest of order elements --> </orders>
Although the RAW mode has a few options, it fails miserably when dealing with hierarchical data. To have more control and to be able to handle hierarchical data more effectively you can use FOR XML AUTO.
If you try to use the RAW mode with nested data, such as orders along with the line items, you’ll get a repetitive block of XML in which the order is repeated for every line item. One of the strengths of XML is the ability to show hierarchical data cleanly, so this sort of repetition is something to be avoided. You examine how the AUTO mode copes with this in the following activity.
SELECT [PurchaseOrderID]
,[RevisionNumber]
,[Status]
,[EmployeeID]
,[VendorID]
,[ShipMethodID]
,[OrderDate]
,[ShipDate]
,[SubTotal]
,[TaxAmt]
,[Freight]
,[TotalDue]
,[ModifiedDate]
FROM [Purchasing].[PurchaseOrderHeader]
WHERE [TotalDue] > 300000
FOR XML AUTO;
ForXmlQueries.sql
<Purchasing.PurchaseOrderHeader PurchaseOrderID=”4007” RevisionNumber=”14” Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407” />
SELECT POH.[PurchaseOrderID] ,POH.[RevisionNumber] ,POH.[Status] ,POH.[EmployeeID] ,POH.[VendorID] ,POH.[ShipMethodID] ,POH.[OrderDate] ,POH.[ShipDate] ,POH.[SubTotal] ,POH.[TaxAmt] ,POH.[Freight] ,POH.[TotalDue] ,POH.[ModifiedDate] ,POD.[OrderQty] ,POD.[ProductID] ,POD.[UnitPrice] FROM [Purchasing].[PurchaseOrderHeader] POH INNER JOIN Purchasing.PurchaseOrderDetail POD ON POH.PurchaseOrderID = POD.PurchaseOrderID WHERE [TotalDue] > 300000;
ForXmlQueries.sql
SELECT POH.[PurchaseOrderID]
,POH.[RevisionNumber]
,POH.[Status]
,POH.[EmployeeID]
,POH.[VendorID]
,POH.[ShipMethodID]
,POH.[OrderDate]
,POH.[ShipDate]
,POH.[SubTotal]
,POH.[TaxAmt]
,POH.[Freight]
,POH.[TotalDue]
,POH.[ModifiedDate]
,POD.[OrderQty]
,POD.[ProductID]
,POD.[UnitPrice]
FROM [Purchasing].[PurchaseOrderHeader] POH
INNER JOIN Purchasing.PurchaseOrderDetail POD
ON POH.PurchaseOrderID = POD.PurchaseOrderID
WHERE [TotalDue] > 300000
FOR XML AUTO, ROOT('orders'),
ForXmlQueries.sql
<orders> <POH PurchaseOrderID=”4007” RevisionNumber=”16” Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407”> <POD OrderQty=”5000” ProductID=”849” UnitPrice=”24.7500” /> <POD OrderQty=”5000” ProductID=”850” UnitPrice=”24.7500” /> <POD OrderQty=”5000” ProductID=”851” UnitPrice=”24.7500” /> <POD OrderQty=”750” ProductID=”852” UnitPrice=”30.9400” /> <POD OrderQty=”750” ProductID=”853” UnitPrice=”30.9400” /> <POD OrderQty=”750” ProductID=”854” UnitPrice=”30.9400” /> <POD OrderQty=”1050” ProductID=”855” UnitPrice=”37.1000” /> <POD OrderQty=”1000” ProductID=”856” UnitPrice=”37.1000” /> <POD OrderQty=”1000” ProductID=”857” UnitPrice=”37.1000” /> </POH> <!-- more POH elements --> </orders>
Despite the different options available to both the RAW and the AUTO versions of FOR XML, you will likely encounter cases where neither alternative produces the output needed. The most common scenario is when you need a combination of elements and attributes, rather than one or the other. Two options are available for this purpose: FOR XML EXPLICIT and FOR XML PATH; the latter is available only in post-2000 versions.
The EXPLICIT option enables almost unlimited control over the resulting XML format, but this comes at a price. The syntax is difficult to grasp, and because the mechanism used to construct the resulting XML is based on a forward-only XML writer, the results must be grouped and ordered in a very specific way. Unless you are stuck with SQL Server 2000, the advice from Microsoft and other experts is to use the PATH option instead. If you do need to use EXPLICIT, the full details are available in the SQL Server BOL.
The PATH option, based on using XPath to specify the format of the output, makes building nested XML with combinations of elements and attributes relatively simple. Take one of the earlier query result examples, in which orders over $300,000 were retrieved and returned as attribute-centric XML using the AUTO option:
<orders> <Purchasing.PurchaseOrderHeader PurchaseOrderID=”4007” RevisionNumber=”16” Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000” ModifiedDate=”2009-09-12T12:25:46.407” /> </orders>
What if a different layout was needed, one where the PurchaseOrderID, EmployeedID, and status were attributes but the other data appeared as elements? The PATH option uses aliases of the columns to specify how the XML is structured. The syntax is similar to XPath (covered in Chapter 7), hence the PATH keyword.
The PATH query for the order data as a mix of attributes and elements would be as follows:
SELECT [PurchaseOrderID] [@PurchaseOrderID] ,[Status] [@Status] ,[EmployeeID] [@EmployeeID] ,[VendorID] ,[ShipMethodID] ,[OrderDate] ,[ShipDate] ,[SubTotal] ,[TaxAmt] ,[Freight] ,[TotalDue] FROM [Purchasing].[PurchaseOrderHeader] POH WHERE [TotalDue] > 300000 FOR XML PATH('order'), ROOT('orders'),
ForXmlQueries.sql
Notice how data that needs to be returned as attributes is aliased to a column name beginning with @. Unaliased columns are returned as elements. The results of this query would resemble the following XML:
<orders> <order PurchaseOrderID=”4007” Status=”2” EmployeeID=”251”> <VendorID>1594</VendorID> <ShipMethodID>3</ShipMethodID> <OrderDate>2008-04-01T00:00:00</OrderDate> <SubTotal>554020.0000</SubTotal> <TaxAmt>44321.6000</TaxAmt> <Freight>11080.4000</Freight> <TotalDue>609422.0000</TotalDue> </order> <!-- more order elements --> </orders>
The PATH option also provides control over nesting. The usual way to do this, rather than use a SQL JOIN as shown previously, is to use a subquery. The following snippet shows the order header as attributes, with the order details as nested elements:
SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID] ,[POH].[Status] [@Status] ,[POH].[EmployeeID] [@EmployeeID] ,[POH].[VendorID] [@VendorID] ,[POH].[ShipMethodID] [@ShipMethodID] ,[POH].[OrderDate] [@OrderDate] ,[POH].[ShipDate] [@ShipDate] ,[POH].[SubTotal] [@SubTotal] ,[POH].[TaxAmt] [@TaxAmt] ,[POH].[Freight] [@Freight] ,[POH].[TotalDue] [@TotalDue] ,( SELECT [POD].[OrderQty] ,[POD].[ProductID] ,[POD].[UnitPrice] FROM [Purchasing].[PurchaseOrderDetail] POD WHERE POH.[PurchaseOrderID] = POD.[PurchaseOrderID] ORDER BY POD.[PurchaseOrderID] FOR XML PATH('orderDetail'), TYPE ) FROM [Purchasing].[PurchaseOrderHeader] POH WHERE [POH].[TotalDue] > 300000 FOR XML PATH('order'), ROOT('orders'),
ForXmlQueries.sql
The main part of the query, without the inner SELECT, is much the same as before except all the output columns are specified as attributes, as shown by the alias name beginning with the @ symbol:
SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID] ,[POH].[Status] [@Status] ,[POH].[EmployeeID] [@EmployeeID] ,[POH].[VendorID] [@VendorID] ,[POH].[ShipMethodID] [@ShipMethodID] ,[POH].[OrderDate] [@OrderDate] ,[POH].[ShipDate] [@ShipDate] ,[POH].[SubTotal] [@SubTotal] ,[POH].[TaxAmt] [@TaxAmt] ,[POH].[Freight] [@Freight] ,[POH].[TotalDue] [@TotalDue] ( -- Inner query here ) FROM [Purchasing].[PurchaseOrderHeader] POH WHERE [POH].[TotalDue] > 300000 FOR XML PATH('order'), ROOT('orders'),
ForXmlQueries.sql
The inner query returns the order detail relating to the customer specified in the outer query. This is accomplished by equating the PurchaseOrderDetail.PurchaseOrderId field in the outer query to the PurchaseOrderDetail.PurchaseOrderID in the nested query as shown in the following code snippet. (In SQL terms, this is known as a correlated subquery.)
SELECT [POD].[OrderQty] ,[POD].[ProductID] ,[POD].[UnitPrice] FROM [Purchasing].[PurchaseOrderDetail] POD WHERE POH.[PurchaseOrderID] = POD.[PurchaseOrderID] ORDER BY POD.[PurchaseOrderID] FOR XML PATH('orderDetail'), TYPE
ForXmlQueries.sql
Note the TYPE option at the end of the subquery. This specifies that the resulting data should be converted to the XML data type (this is covered in more detail later in the chapter). This option ensures that the data is inserted as XML, rather than a string. The actual output from the query appears as follows:
<orders> <order PurchaseOrderID=”4007” Status=”2” EmployeeID=”251” VendorID=”1594” ShipMethodID=”3” OrderDate=”2008-04-01T00:00:00” SubTotal=”554020.0000” TaxAmt=”44321.6000” Freight=”11080.4000” TotalDue=”609422.0000”> <orderDetail> <OrderQty>5000</OrderQty> <ProductID>849</ProductID> <UnitPrice>24.7500</UnitPrice> </orderDetail> <orderDetail> <OrderQty>5000</OrderQty> <ProductID>850</ProductID> <UnitPrice>24.7500</UnitPrice> </orderDetail> <orderDetail> <OrderQty>5000</OrderQty> <ProductID>851</ProductID> <UnitPrice>24.7500</UnitPrice> </orderDetail> <orderDetail> <OrderQty>750</OrderQty> <ProductID>852</ProductID> <UnitPrice>30.9400</UnitPrice> </orderDetail> <!-- more orderDetail elements --> </order> <!-- more order elements --> </orders>
Because no aliasing was applied to the inner query, the columns are represented by XML elements.
Plenty of other options are available to customize the results returned from a FOR XML PATH query. The final example shows how to group data within elements. The two dates associated with the order are grouped under a <Dates> element, and an <orderDetail> element is used to hold the individual line items:
SELECT [POH].[PurchaseOrderID] [@PurchaseOrderID] ,[POH].[Status] [@Status] ,[POH].[EmployeeID] [@EmployeeID] ,[POH].[VendorID] [@VendorID] ,[POH].[ShipMethodID] [@ShipMethodID] ,[POH].[SubTotal] [@SubTotal] ,[POH].[TaxAmt] [@TaxAmt] ,[POH].[Freight] [@Freight] ,[POH].[TotalDue] [@TotalDue] ,[POH].[OrderDate] [Dates/Order] ,[POH].[ShipDate] [Dates/Ship] ,( SELECT [POD].[OrderQty] ,[POD].[ProductID] ,[POD].[UnitPrice] FROM [Purchasing].[PurchaseOrderDetail] POD WHERE POH.[PurchaseOrderID] = POD.[PurchaseOrderID] ORDER BY POD.[PurchaseOrderID] FOR XML PATH('orderDetail'), TYPE ) FROM [Purchasing].[PurchaseOrderHeader] POH WHERE [POH].[TotalDue] > 300000 FOR XML PATH('order'), ROOT('orders'),
ForXmlQueries.sql
In the preceding code, the key change is to the OrderDate and ShipDate in the outer SELECT. The columns are aliased to Date/Order and Dates/Ship, so SQL Server creates a new element, Dates, to hold these two values. There is also an alias on the entire subquery, OrderDetails, that causes all of its results to be grouped under one element. The resulting XML looks like this:
<orders> <order PurchaseOrderID=”4008” Status=”2” EmployeeID=”258” VendorID=”1676” ShipMethodID=”3” SubTotal=”396729.0000” TaxAmt=”31738.3200” Freight=”7934.5800” TotalDue=”436401.9000”> <Dates> <Order>2008-05-23T00:00:00</Order> <Ship>2008-06-17T00:00:00</Ship> </Dates> <orderDetail> <OrderQty>700</OrderQty> <ProductID>858</ProductID> <UnitPrice>9.1500</UnitPrice> </orderDetail> <orderDetail> <OrderQty>700</OrderQty> <ProductID>859</ProductID> <UnitPrice>9.1500</UnitPrice> </orderDetail> <!-- more orderDetail elements --> </order> <order PurchaseOrderID=”4012” Status=”2” EmployeeID=”254” VendorID=”1636” ShipMethodID=”3” SubTotal=”997680.0000” TaxAmt=”79814.4000” Freight=”19953.6000” TotalDue=”1097448.0000”> <Dates> <Order>2008-07-25T00:00:00</Order> <Ship>2008-08-19T00:00:00</Ship> </Dates> <orderDetail> <OrderQty>6000</OrderQty> <ProductID>881</ProductID> <UnitPrice>41.5700</UnitPrice> </orderDetail> <!-- more orderDetail elements --> </order> </orders>
That covers the basics of the FOR XML instruction. Next you take a look at storing XML within a table, starting with the xml data type.
SQL Server 2005 added an xml data type to those available, which means that XML documents can be stored in a SQL Server 2005 (or later) database. This is a vast improvement on earlier versions where there were two options for storing XML, neither of which was satisfactory. The first alternative was to shred the data into its constituent parts, which were then stored in multiple relational tables—defeating the purpose of using XML. The second choice was to convert the XML to a simple string of characters, which loses the logical content of the XML document. The additional ability to store XML data as an xml data type means that such data can be treated as if it were still an XML document. In reality, the xml data type is stored in a proprietary binary format, but, as far as the developer is concerned, it is accessible as XML, with its logical structure intact.
The existence of the xml data type means that XML documents stored as this type can be treated as if they were collections of XML documents sitting on your hard drive. Of course, the details of the interface to that XML is specific to SQL Server. Other advantages to having a specific data type devoted to XML are that you can store intermediate results in queries that return XML and you can use the methods of the xml data type to search and modify the XML stored in it.
There are several general advantages to storing your data in SQL Server. For one, XML storage benefits from the security, scalability, and other aspects of an enterprise-level database management system. You can also associate XML schemas with the column and, when querying the document, the appropriate type will be returned. This is a vast improvement over the previous version, where CASTing or CONVERTing was needed.
XML documents stored in SQL Server can be treated as XML in any other setting. One practical effect of that is that you can use XQuery (introduced in Chapter 9) to query these XML columns. Surprisingly, two XML document instances cannot be compared in this release, in part because of the flexibility of XML syntax. Consider, for example, the subtleties of trying to compare two lengthy XML documents that can have paired apostrophes or paired quotes to contain attribute values, differently-ordered attributes, different namespace prefixes although the namespace URI may be the same, and empty elements written with start tags and end tags or with the empty element tag.
Documents stored as the xml data type can be validated against a specified W3C XML Schema document. XML data that is not associated with a schema document is termed untyped, and XML associated with a schema document is termed typed.
In the following activity you create a simple table to contain XML documents in SQL Server. SQL Server Management Studio is the main graphical tool for manipulating database objects and writing SQL code, although Visual Studio and a number of third-party applications are also available. Refer to the “Installing SQL Server” section for instructions on how to download SQL Server Management Studio.
CREATE TABLE dbo.Docs ( DocID INTEGER IDENTITY PRIMARY KEY, XMLDoc XML )
XmlDataType.sql
<Person> <FirstName></FirstName> <LastName></LastName> </Person>
INSERT Docs VALUES ('<Person><FirstName>Joe</FirstName> <LastName>Fawcett</LastName></Person>' )
XmlDataType.sql
SELECT * FROM Docs
XML documents in SQL Server can also be indexed for more efficient retrieval, and a full-text index can be created. To create a full-text index on a document, use a command like the following:
--If no catalog exists so far CREATE FULLTEXT CATALOG ft ON DEFAULT CREATE FULLTEXT INDEX ON dbo.Docs(XmlDoc) KEY INDEX <primary key name>
The xml data type enables you to use the following methods to manipulate the data and to extract it in various forms: modify(), query(), value(), exist(), and nodes(). The following sections look at each method in turn and describe how they are used.
The xml data type can be queried using the XQuery language, which was introduced in Chapter 9. In SQL Server, XQuery expressions are embedded inside Transact-SQL. Transact-SQL is the flavor of the SQL language used in SQL Server.
Microsoft introduced the modify() method before XQuery had finalized a syntax. At the time there was talk of updating to the official standard when it appeared, but so far that hasn’t happened.
The W3C XQuery 1.0 specification is limited in that it can query only an XML (or XML-enabled) data source. There is no facility in XQuery 1.0 to carry out deletions, to insert new data, or (combining those actions) to modify data. In SQL Server, the XML Data Modification Language (DML) adds three keywords to the functionality available in XQuery 1.0. You can see these keywords in action in the following exercise:
DECLARE @myDoc xml; SET @myDoc = '<Person><FirstName>Joe</FirstName> <LastName>Fawcett</LastName></Person>'; SELECT @myDoc; SET @myDoc.modify(' delete /Person/*[2]'), SELECT @myDoc;
XmlDataType.sql
SET @myDoc = '<Person><FirstName>Joe</FirstName> <LastName>Fawcett</LastName></Person>';
SET @myDoc.modify(' delete /Person/*[2]'),
The following Try It Out again uses the modify() method but this time, instead of deleting unwanted XML, you insert a new element into the document.
DECLARE @myDoc XML; SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>'; SELECT @myDoc; SET @myDoc.modify(' insert <FirstName>Joe</FirstName> as first into /Person[1]'), SELECT @myDoc;
XmlDataType.sql
SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>';
DDECLARE @myDoc XML; SET @myDoc = '<Person><LastName>Fawcett</LastName></Person>'; SELECT @myDoc; SET @myDoc.modify(' insert <FirstName>Joe</FirstName> before (/Person/LastName)[1]'), SELECT @myDoc;
XmlDataType.sql
The final example of the modify() function shows how you can update, or replace, a section of XML.
DECLARE @myDoc XML; SET @myDoc = '<Person><FirstName>Joe</FirstName><LastName>Fawcett</LastName></Person>' SELECT @myDoc; SET @myDoc.modify(' replace value of (/Person/FirstName/text())[1] with “Gillian” '), SELECT @myDoc;
XmlDataType.sql
SET @myDoc.modify(' replace value of (/Person/FirstName/text())[1] with “Gillian” '),
One of the main problems with using the modify() method is that it expects a hard-coded string as its argument. It is therefore difficult to make dynamic queries that are needed in the real world—for example, queries in which the new XML is brought in from another table. You have two ways around this. First, you can construct the query as a string and execute it dynamically using EXEC. Alternatively, you can use the built-in functions sql:column and sql:function. An example of each of these techniques follows.
For these examples you can use the Docs table created earlier. First, here’s a reminder of what a static update looks like:
UPDATE Docs SET XmlDoc.modify (' replace value of (/Person/FirstName/text())[1] with “Joseph”') WHERE DocId = 1;
XmlDataType.sql
Now suppose you want to replace the hard-coded value Joseph with a variable. You might first try this:
DECLARE @NewName NVARCHAR(100); SET @NewName = N'Joseph'; UPDATE Docs SET XmlDoc.modify(' replace value of (/Person/FirstName/text())[1] with “' + @NewName + '”') WHERE DocId = 1;
XmlDataType.sql
Unfortunately, that won’t work. The modify() method complains that it needs a string literal. One way around this is to build the whole SQL statement dynamically:
DECLARE @NewName NVARCHAR(100); SET @NewName = N'Joseph'; DECLARE @SQL NVARCHAR(MAX); SET @SQL = 'UPDATE Docs SET XmlDoc.modify('' replace value of (/Person/FirstName/text()) [1] with “' + @NewName + '”'') WHERE DocId = 1'; PRINT(@SQL); EXEC(@SQL);
XmlDataType.sql
You can see the SQL before it is executed by running only as far as the PRINT statement, that is, not executing the last line, EXEC (@SQL); (the following is displayed on a single line):
UPDATE Docs SET XmlDoc.modify(' replace value of (/Person/FirstName/text())[1] with “Joseph”') WHERE DocId = 1
This is exactly the same as the code you started with.
The recommended way to update based on data that will only be known at run-time, however, is to use the built-in functions sql:column or sql:variable. The sql:column function is used when the new data is being retrieved from a table, so here sql:variable is needed:
DECLARE @NewName NVARCHAR(100); SET @NewName = N'Joseph'; UPDATE Docs SET XmlDoc.modify (' replace value of (/Person/FirstName/text())[1] with sql:variable(“@NewName”)') WHERE DocId = 1;
XmlDataType.sql
The basic syntax is the name of the variable enclosed in double quotes as an argument to sql:variable(). Next you will see how to use standard XQuery against the xml data type.
The query() method enables you to construct XQuery statements in SQL Server. The syntax follows the XQuery syntax discussed in Chapter 9, and all the queries in that chapter can be run against a suitable XML data column.
The following query uses the query() method to output the names of each person in a newly constructed Name element, with the value of the LastName element followed by a comma and then the value of the FirstName element. The code is shown here:
SELECT XMLDoc.query ('for $p in /Person return <Name>{$p/LastName/text()}, {$p/FirstName/text()}</Name>') FROM Docs;
XmlDataType.sql
The first line indicates that a selection is being made using the query() method applied to the XMLDoc column (which, of course, is of data type xml).
The for clause specifies that the variable $p is bound to the Person element node.
The return clause specifies that a Name element is to be constructed using an element constructor. The first part of the content of each Name element is created by evaluating the XQuery expression $p/LastName/text(), which, of course, is the text content of the LastName element. A literal comma is output, and then the XQuery expression $p/FirstName/text() is evaluated.
Figure 10-6 shows the output when the SELECT statement containing the XQuery query is run.
The value() method uses XPath to pinpoint specific data in an XML document and then converts it into a standard SQL Server data type. It’s often used in the WHERE part of a SQL query. Suppose you want to return all the people in your Docs table who have the first name of Joe. This is one way of doing it:
SELECT * FROM Docs WHERE XmlDoc.value('(/*/FirstName)[1]', 'nvarchar(100)') = 'Joe';
XmlDataType.sql
This returns just one row for the first document you added. Notice how the data type that you are converting to needs to be quoted; it’s quite a common mistake to forget this. Obviously, you can also use the value() method in the SELECT list as well. If you just wanted the last name of everyone, you’d use the following:
SELECT DocId, XmlDoc.value('(/*/LastName)[1]', 'nvarchar(100)') LastName FROM Docs;
XmlDataType.sql
This returns a standard two-column result set.
The exist() method does what its name suggests—it checks if a value exists. It returns a 0 if it doesn’t, a 1 if it does, and null if the XML column contains null.
So, you could rewrite the query to return people with a first name of Joe this way:
SELECT * FROM Docs WHERE XmlDoc.exist('/*/FirstName[. = “Joe”]') = 1;
XmlDataType.sql
This returns the same results as the query using the value() method to do the filter did previously.
The nodes() method is used to present an XML document as a regular SQL table. You often need this when your query needs one row of data from a table combined with a child element of your XML. For a simple example, look at the following code:
DECLARE @People xml; SET @People = '<people><person>Joe</person> <person>Danny</person> <person>Liam</person></people>' SELECT FirstName.value('text()[1]', 'nvarchar(100)') FirstName FROM @People.nodes('/*/person') Person(FirstName);
XmlDataType.sql
The nodes() method takes an XPath that points to some repetitive child elements of the main document. You then provide a table name and a column name to use later in the form TableName(ColumnName). Here, the table is Person and the column is FirstName. The FirstName column is then queried using value() to get the text. The results are shown in Figure 10-7.
One thing that can affect how queries are written and processed is whether SQL Server knows the structure of the XML document held in a column. There needs to be some way to specify a schema alongside an XML data type. The following section explains how you tell SQL Server exactly what format the XML stored as an xml data type should take.
It was mentioned earlier that the new xml data type is now a first-class data type in SQL Server. This data type can be used to store untyped and typed XML data, so it shouldn’t be surprising that, just as relational data is specified by a schema, the new xml data type can be associated with a W3C XML Schema document to specify its structure.
Take a look at how you can specify a schema for data of type xml. The first task is to create a schema collection together with its first XML Schema. You need to give the collection a name—in this example, EmployeesSchemaCollection—and the W3C XML Schema document itself needs to be delimited with single quote marks. For example, if you wanted to create a very simple schema for a document that could contain a Person element and child elements named FirstName and LastName, you could do so using the following syntax:
CREATE XML SCHEMA COLLECTION EmployeesSchemaCollection AS '<xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema” targetNamespace= “http://wiley.com/namespaces/Person” xmlns=”http://wiley.com/namespaces/Person”> <xsd:element name=“Person“> <xsd:complexType> <xsd:sequence> <xsd:element name=“FirstName“ /> <xsd:element name=“LastName“ /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>'
XmlDataType.sql
If you want to drop the XML Schema collection, you need to issue a DROP XMLSCHEMA statement:
DROP XML SCHEMA COLLECTION EmployeesSchemaCollection
Once you have a collection, you can add new schemas using the following syntax:
ALTER XML SCHEMA COLLECTION EmployeesSchemaCollection ADD '<xsd:schema> <!--new schema inserted here --> </xsd:schema>'
Untyped and typed XML data can be used in a SQL Server column, variable, or parameter. If you want to create a Docs table and associate it with a W3C XML Schema document, you can do so using code like the following:
CREATE TABLE [dbo].[Docs]( [DocID] [int] IDENTITY(1,1) PRIMARY KEY, [XMLDoc] [xml] (EmployeesSchemaCollection))
The advantage of applying a schema collection is twofold. First, it acts as a validation check; XML not conforming to one of the schemas in the collection will be rejected in the same way that a column declared as an INT will not accept random textual data. Second, queries against the XML will return typed data as specified by the schema, rather than generic text.
For optimization, XML Schemas are shredded and stored internally in a proprietary format. Most of the schema can be reconstructed as an XML document from this proprietary format using the xml_schema_namespace intrinsic function. Therefore, if you had imported the schema into the EmployeesSchemaCollection shown in the XmlDataType.sql snippet, you could retrieve it using the following code:
SELECT xml_schema_namespace(N'dbo', N'EmployeesSchemaCollection')
Remember, too, that there can be multiple ways of writing a functionally equivalent W3C XML Schema document—for example, using references, named types, or anonymous types. SQL Server will not respect such differences when reconstituting a schema document.
In addition, the parts of the schema that are primarily documentation—for example, annotations and comments—are not stored in SQL Server’s proprietary format. Therefore, to ensure precise recovery of an original W3C XML Schema document, it is necessary to store the serialized XML Schema document separately. One option is to store it in a column of type xml or varchar(max) in a separate table.
Your final look at SQL Server concerns how to specify namespaces.
Often the documents you are working with will have namespaced elements or attributes, and you’ll need to specify a prefix to namespace URI binding in order to query them. You accomplish this by using the WITH XMLNAMESPACES statement.
The following example creates a document with a namespace and then queries it:
DECLARE @NamespacedData xml; SET @NamespacedData = '<x:data xmlns:x=”http://wrox.com/namespaces/examples”> <x:item id=”1”>One</x:item> <x:item id=”2”>Two</x:item> <x:item id=”1”>Three</x:item> </x:data>'; WITH XMLNAMESPACES ('http://wrox.com/namespaces/examples' as x) SELECT @NamespacedData.value('(/x:data/x:item[@id = 2])[1]', 'nvarchar(10)') Item;
XmlDataType.sql
This returns the value of the element that has an id equal to 2; in this case the result is Two.
The key point here is that you specify the namespace URI and a suitable prefix. The prefix chosen doesn’t have to match the one in the document. One thing to note is that WITH XMLNAMESPACES must be preceded by a semicolon. If the previous statement doesn’t end in a semicolon, place it before the WITH:
;WITH XMLNAMESPACES ('http://wrox.com/namespaces/examples' as x)
You can also specify a default namespace if you need to:
DECLARE @NamespacedData xml; SET @NamespacedData = '<data xmlns=”http://wrox.com/namespaces/examples”> <item id=”1”>One</item> <item id=”2”>Two</item> <item id=”1”>Three</item> </data>'; WITH XMLNAMESPACES (DEFAULT 'http://wrox.com/namespaces/examples') SELECT @NamespacedData.value('(/data/item[@id = 2])[1]', 'nvarchar(10)') Item;
XmlDataType.sql
This produces the same result as when you used an explicit prefix, x, bound to the namespace as shown in the XmlDataType.sql snippet.
So far you’ve seen two examples of how XML features and functionality have been added on to an existing relational database. In the next section you take the next step and examine an application designed from the ground up for the express purpose of storing and managing large numbers of XML documents.
The eXist XML database has been around since 2000, and has a solid reputation in its field. It is used as the basis for many systems, particularly those concerned with document and content management. Your first step is to download and install it.
Before doing anything with eXist, visit its website at http://exist-db.org/. From there, you’ll find links to download the latest version. The download is available for different platforms: a .jar file suitable for Unix/Linux and an .exe for Windows. The examples in this chapter use the Windows installation, version 1.4.1. You may need to make sure that the version of Java installed is recent enough for the version of eXist. For version 1.4.x, you’ll need Java 1.4 or higher.
Once you have your download ready and have the right version of Java installed, you should be able to install eXist by clicking the .jar or .exe file on any properly configured workstation. If that’s not the case, open a command window and type the following:
java -jar eXist-<version>.jar
A fancy graphical installer will pop up and guide you through the installation, which is very straightforward.
When that’s done, you have a ready-to-run native XML database that can be used in three different modes:
After the installation, eXist can be used in the last two modes using a different set of scripts that you can find in its bin subdirectory:
To check that the installation is correct, launch startup.sh or startup.bat in a terminal. If you chose to install to the default directory on Windows (Program FilesExist), you’ll need to run the command prompt as administrator because standard users can’t write to this folder. You should see a series of warnings and information, concluding with (if everything is okay) the following lines with the date and time reflecting that of installation:
29 Sep 2011 14:03:57,021 [main] INFO (JettyStart.java [run]:175) - eXist-db has started on port 8080. Configured contexts: 29 Sep 2011 14:03:57,022 [main] INFO (JettyStart.java [run]:177) - http://localhost:8080/exist 29 Sep 2011 14:03:57,023 [main] INFO (JettyStart.java [run]:179) - ----------------------------------------------------
These lines mean that jetty (the Java web server that comes with this eXist download) is ready to accept connections on port 8080.
The last step to check that everything runs smoothly is to open your favorite web browser to http://localhost:8080/exist/ and confirm that eXist’s home page, shown in Figure 10-8, opens.
Congratulations—you have your first native XML database up and running! Now it’s time to find out how you can interact with it. You will soon see that eXist is so open that you have many options.
The first option is to use the web interface at http://localhost:8080/exist/. Scroll down this web page to the Administration section (on the left side). Click Admin to go to http://localhost:8080/exist/admin/admin.xql, where you need to log in as user admin with the password you chose during the installation process.
Once you’re logged in, you have access to the commands from the left-side menu. Feel free to explore by yourself how you can manage users and set up the example that eXist suggests you install.
When you are ready to continue this quick tour of eXist, click Browse Collection (see Figure 10-9).
A brand-new eXist installation has a number of existing collections, but you will create a new one named blog using the Create Collection button in Listing 10-1. Once this collection is created, follow the link to browse it. This new collection is empty. Using the Upload button, upload the documents blog-1.xml, blog-2.xml, and blog-3.xml, which you can download from the code samples for this chapter on the Wrox site. These documents are sample blog entries such as the one shown in Listing 10-1.
LISTING 10-1: Blog-1.xml
<?xml version=”1.0” encoding=”UTF-8”?> <post xmlns:x=”http://www.w3.org/1999/xhtml” id=”1” author=”Joe Fawcett” dateCreated=”2011-09-18”> <title>A New Book</title> <body> <x:p> <x:b>I've been asked to co-author a new edition of <x:a href=”http://www.wrox.com/WileyCDA/WroxTitle/productCd-0470114878.html”> Beginning XML</x:a> by Wrox</x:b> </x:p> <x:p>It's incredible how much has changed since the book was published nearly five years ago. XML is now a bedrock of many systems, contrarily you see less of it than previously as it's buried under more layers.</x:p> <x:p>There are also many places where it has stopped being an automatic choice for data transfer, JSON has become a popular replacement where the data is to be consumed directly by a JavaScript engine such as in a browser.</x:p> <x:p>The new edition should be finished towards the end of the year and be published early in 2012.</x:p> </body> </post>
After you have uploaded these documents, you can display them by clicking their links. Now that you have documents in the /db/blog collection, you can query these documents, still using the web interface. To do so, click the Home link to go back to the home page and follow the link to the XQuery sandbox, which you can reach at http://localhost:8080/exist/sandbox/sandbox.xql.
Recall what you learned of XPath in Chapter 7 and XQuery in Chapter 9: the large text area surrounded by a yellow border expects a query written in XPath or XQuery. If you start with something simple, such as /item[@id='1'], and click Send, you’ll get all the documents from all the collections that have an item root element with an id attribute equal to 1. If you’ve followed the instructions that led to this point, you should get only the content of the first blog entry.
Of course, you can write more complex queries. For example, if you want to determine the titles, IDs, and links of blog entries with a link on the Wrox site, you can write the following (Listing 10-2).
LISTING 10-2: PostsWithWroxLinks.xquery
xquery version “1.0”; declare namespace x=”http://www.w3.org/1999/xhtml”; for $item in /post where .//x:a[contains(@href, 'wrox.com')] return <match> <id>{string($item/@id)}</id> {$item/title} {$item//x:a[contains(@href, 'wrox.com')]} </match>
Note that you need to bind the namespace URI to a prefix as XQuery, and eXist has full support for namespaces.
Feel free to try as many queries as you like, and then move on to discover the eXist client.
The eXist client is a standalone graphical tool that can perform the same kind of operations as the web interface. To start the client, perform the following steps:
There is much more to explore with this client. For example, you can also save and restore collections or full databases. Once you’re done exploring, read on to see how eXist can be used as a WebDAV server.
WebDAV stands for Web-based Distributed Authoring and Versioning. It designates a set of IETF RFCs that define how HTTP can be used to not only read resources, but also to write them. WebDAV is widely and natively implemented in most common operating systems and tools, and eXist’s capability to expose its collections as WebDAV repositories can greatly facilitate the way you import and export documents.
As a first contact with WebDAV, point your web browser to http://localhost:8080/exist/webdav/db/. You need to enter the login and password of your database admin again. Then you will see a page where you can browse the collections and content of your database. Without browser extensions, you have read-only access; you need to set up your WebDAV client to gain write access and see the eXist database as a repository.
The eXist documentation available on your local database at http://localhost:8080/exist/webdav.xml includes detailed instructions for setting up Microsoft Windows, KDE Konqueror, oXygen, and XML Spy to use WebDAV. WebDAV support is also built into the finder on Mac OS X. In Windows XP and later, this feature is known as web folders and is fairly easy to configure; just note that if you are using IE8 or later, you’ll need to open the File Explorer, choose Map Network Drive from the Tools menu, and use the Connect to a Website that You can Use to Store Your Documents and Pictures link. Because these setups are well described in the eXist documentation, they aren’t covered here.
The only feature that you lack using the WebDAV interface is the capability to execute queries, but you’ll see next that you can regain this feature if you use an XML IDE.
Your favorite XML IDE can probably access your eXist database through WebDAV. If it is interfaced with eXist, you can also execute queries from the IDE itself. This is the case with oXygen 8.0 which is available as a 30-day evaluation license from its site at www.oxygenxml.com/.
To configure the connection to your eXist database, perform the following steps:
The Database Explorer shows the newly created connection, and you can now browse and update the eXist database as you would browse and open documents on your local filesystem.
So far, all this could be done through WebDAV, but not the following, which interrogates the data. To execute a query, perform the following steps:
Thus far you’ve seen a number of different ways to interact with the XML data using a variety of graphical tools, but you still need to see how web applications can access your database. This is discussed in the next section.
What better way to interface your database with a web application could there be than using HTTP as it was meant to be used? This is the purpose of the REST interface.
As a first step, you can point your browser to http://localhost:8080/exist/rest/. Doing so shows you the content of your database root exposed as an XML document. This XML format is less user-friendly than browsing the content of your collections through the admin web interface or even through browsing the WebDAV repository, but far more easy to process in an application!
The full content of the database is available through this interface. For instance, http://localhost:8080/exist/rest/db/blog/ shows the content of the blog collection, and http://localhost:080/exist/rest/db/blog/blogItem1.xml gets you the first blog item. This becomes more interesting when you start playing with query strings. The REST interface accepts a number of parameters, including a _query parameter that you can use to send XPath or XQuery simple queries straight away!
For instance, if you want to get all the titles from all the documents in the collection /db/blog, you can query http://localhost:8080/exist/rest/db/blog/?_query=//title. The results are shown in Figure 10-11.
This XML deserves an XSLT transformation to be presented as HTML; and if you remember what you learned in Chapter 8, a simple transformation such as the one shown in Listing 10-3 would display the results better than the raw XML shown in Figure 10-11.
LISTING 10-3: Links.xslt
<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” xmlns:exist=”http://exist.sourceforge.net/NS/exist” version=”1.0”> <xsl:template match=»/exist:result»> <html> <head> <title>Query results</title> </head> <body> <h1>eXist query results</h1> <p>Showing results <xsl:value-of select="@exist:start"/> to <xsl:value-of select="@exist:end"/> out of <xsl:value-of select="@exist:hits"/>:</p> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="*"> <p> <xsl:copy-of select="."/> </p> </xsl:template> </xsl:stylesheet>
The good news is that the eXist REST interface can execute this transformation for you if you like. But before you can do that, you need to store the transformation in the database. To do so, you can use any of the methods you have seen so far to upload documents in the database (the web interface, the eXist client, WebDAV, or your favorite XML IDE). Because this section is about the REST interface, you can use REST to upload the document.
Storing documents with the REST interface uses an HTTP PUT request; unfortunately, you can’t do that with your web browser. To send an HTTP PUT request, you need to either do a bit of programming (all the programming languages have libraries available to support this) or use a utility such as curl (http://curl.haxx.se/), which is available for most platforms. This program has a lot of different command-line options. If you have curl installed on your machine, to store the document Links.xslt at location http://localhost:8080/exist/rest/db/xslt/, just type the following command in a Unix or Windows command window:
curl -T links.xslt http://localhost:8080/exist/rest/db/xslt/
This command simply sends this document through an HTTP PUT. The eXist REST interface also supports HTTP DELETE requests, and you can also delete this document. To do so, use the -X option, which enables you to define the HTTP method that you want to use and write:
curl -X DELETE localhost:8080/exist/rest/db/xslt/links.xslt
Of course, if you have run the previous command, you need to upload the transformation again before you can use it! Now that your style sheet is stored in the database, to use it just add an _xsl parameter, specifying its location. Then paste or type this URL in your browser: http://localhost:8080/exist/rest/db/blog/?_query=//title&_xsl=/db/xslt/links.xslt. The result is shown in Figure 10-12.
You have now seen how to use HTTP GET, PUT, and DELETE methods. If you are familiar with HTTP, you may be wondering whether the REST interface supports the HTTP POST method. The answer is yes; this method is used to send requests that are too big to be easily pasted in the query string of an HTTP GET request. These queries have to be wrapped into an XML document, the structure of which is defined in the eXist documentation. For instance, the query encountered in Listing 10-3 would need to be in the format shown in Listing 10-4.
LISTING 10-4: PostsWithWroxLinks.xml
<?xml version=”1.0” encoding=”UTF-8”?> <query xmlns=”http://exist.sourceforge.net/NS/exist”> <text> <![CDATA[ declare namespace x=”http://www.w3.org/1999/xhtml”; for $item in /post where .//x:a[contains(@href, 'wrox.com')] return <match> <id>{string($item/@id)}</id> {$item/title} {$item//x:a[contains(@href, 'wrox.com')]} </match> ]]> </text> </query>
Note how the query itself has been cautiously embedded within a CDATA section so that it qualifies as well-formed XML. To send this query using the REST interface, you can use curl and a -d option. The command looks like the following:
curl -d @linksToWrox.xml http://localhost:8080/exist/rest/db/
You’ve already seen four ways to interact with eXist, but there are many more out there. The following list briefly covers a few.
With so many options, how do you decide which one you should be using? Ask yourself whether it really matters. You can think of your eXist database as a black box that encapsulates your XML documents. These documents are located in collections that are similar to file directories. The black box acts like a filesystem with XQuery capabilities and provides a number of different interfaces to access the same set of documents in different ways. Whichever interface is used, the effect is the same. You can choose, case by case, the interface that is most convenient for the task you have to do. The following is a list of tips to help you decide:
The richness of this set of interfaces means that your documents will never be locked in the database and can remain accessible in any environment.
In this chapter you’ve learned about the following:
EXERCISES
You can find suggested answers to these questions in Appendix A.
WHAT YOU LEARNED IN THIS CHAPTER
TOPIC | KEY POINTS |
Storage needs | There is a big difference between relational data, data in a tabular format, and XML data. Therefore, special mechanisms are needed to store XML within relational systems. |
Essential features in databases. | XML needs to be stored in a native format, rather than as text. There must also be ways to query it for specific values and a way to return fragments of XML. Ideally there should also be a way to treat XML as tabular data if possible. |
Choosing an application | Most commercial relational databases have fairly advanced XML features, particularly Oracle and SQL Server. Native XML databases are designed to cope with the situation in which all data is held as XML. |
18.226.185.96