Reading XML Data from Databases

With SQL Server 2000, you have two basic ways to retrieve data as XML: you can use the XML extensions to the SELECT command, or you can execute a query on a particular text or BLOB field that is known to contain XML data. SQL Server does not mark these fields with a special attribute or data type to indicate that they contain XML data, however.

With the first technique, you typically use the FOR XML clause in a traditional query command. In response, the DBMS executes the query in two steps. First it executes the SELECT statement, and next it applies the FOR XML transformation to a rowset. The resulting XML is then sent to the client as a one-column­ rowset.

Note

Although specific to the OLE DB specification, the term rowset is often used generically to indicate a set of rows that contain columns of data. Rowsets are key objects that enable all OLE DB data providers to expose query result set data in a tabular format.


The FOR XML extensions let you consider XML mostly as a data output format. With the alternative technique for retrieving data as XML, you can store raw XML data in a text or BLOB field and retrieve that data using an ordinary query—preferably a scalar, single-field query. In both cases, the Microsoft ADO.NET object model, along with the Microsoft .NET Framework XML core classes, provide a number of handy features to extract XML data quickly and effectively.

SQL Server 2000 XML Extensions

The XML support in SQL Server 2000 provides URL-driven access to the database resources, XML-driven data management, and the possibility of using XPath queries to select data from relational tables. SQL Server 2000 does not create ad hoc storage structures for XML data. It does provide an ad hoc infrastructure for reading, writing, and querying relational data through the XML logical filter.

The following list gives you a bird’s-eye view of the key XML features available in SQL Server 2000 and its latest extension, SQLXML 3.0:

  • Access SQL Server through a URL.

    An ISAPI filter running on top of the Internet Information Services (IIS) allows you to directly query commands to SQL Server using HTTP. You simply point to a properly formatted URL, and what you get back is the result set data formatted as XML data.

  • Create XML schema-driven views of relational data.

    Similar to CREATE VIEW, this feature lets you represent a result set as an XML document written according to a given XML Schema Definition (XSD) or XML-Data Reduced (XDR) schema. You specify the mapping rules between the native fields and XML attributes and elements. The resultant XML document can be treated as a regular XML Document Object Model (XML DOM) object and queried using XPath expressions.

  • Return fetched data as XML.

    This feature is at the foundation of the entire XML support in SQL Server 2000. A database internal engine is capable of formatting raw column data into XML fragments and exposing those fragments as strings to callers. This capability is incorporated in the SELECT statement and can be controlled through a number of clauses and attributes.

  • Insert data represented as an XML document.

    Just as you can read relational data into hierarchical XML documents, you can write XML data into tables. The source document is preprocessed by a system stored procedure named sp_xml_preparedocument. The parsed document is then passed on to a special module—named OPENXML—that provides a rowset view of the XML data. At this point, to ordinary Transact-SQL (T-SQL) commands, XML native data looks like ordinary result sets.

SQLXML 3.0 is an extension to SQL Server 2000 designed to keep current with evolving W3C standards for XML and other requested functions. Available as a free download at http://www.msdn.microsoft.com/downloads, SQLXML 3.0 also provides a bunch of managed classes for exposing some of the functionalities to .NET Framework applications. SQLXML 3.0 includes the ability to expose stored procedures as Web services via the Simple Object Access Protocol (SOAP) and adds support for ADO.NET DiffGrams and client-side XML transformations.


XML Extensions to the SELECT Statement

In SQL Server 2000, you can query existing relational tables and return results as XML documents rather than as standard rowsets. The query is written and runs normally. If the SELECT statement contains a trailing FOR XML clause, the result set is then transformed into a string of XML text. Within the FOR XML clause, you can specify one of the XML modes described in Table 8-1.

Table 8-1. Modes of the FOR XML Extension
Mode Description
AUTO Returns query results as a sequence of <table> XML elements, where table is the name of the table. Fields are rendered as node attributes. If the additional ELEMENTS clause is specified, rows are rendered as child nodes instead of attributes.
EXPLICIT The query defines the schema of the XML document being returned.
RAW Returns query results as a sequence of generic <row> nodes with as many attributes as the selected fields.

The mode is valid only in the SELECT command for which it has been set. In no way does the mode affect any subsequent queries. XML-driven queries can be executed directly or from within stored procedures.

Tip

The XML data contains an XDR schema if you append the XMLDATA attribute to the FOR XML mode of choice, as shown here:

SELECT * FROM Employees FOR XML, XMLDATA

Schema information is incorporated in a <schema> node prepended to the document.


The FOR XML AUTO Mode

The AUTO mode returns data packed as XML fragments—that is, without a root node. The alias of the table determines the name of each node. If the query joins two tables on the value of a column, the resulting XML schema provides nested elements.

Let’s consider the following simple query:

SELECT CustomerID, ContactName FROM Customers FOR XML AUTO

The XML result set has the form shown here:

<Customers CustomerID="ALFKI" ContactName="Maria Anders" />
<Customers CustomerID="ANATR" ContactName="Ana Trujillo" />
...

Try now with a command that contains an INNER JOIN, as follows:

SELECT Customers.CustomerID, Customers.ContactName,
  Orders.OrderID
FROM Customers 
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
FOR XML AUTO

Interestingly, in this case the XML output automatically groups child records below the parent:

<Customers CustomerID="ALFKI" ContactName="Maria Anders">
  <Orders OrderID="10643"/>
  <Orders OrderID="10692"/>
  <Orders OrderID="10783"/>
  ...
</Customers>
<Customers CustomerID="ALFKI" ContactName="Ana Trujillo">
  <Orders OrderID="11459"/>
  <Orders OrderID="10987"/>
...
</Customers>
...

If the ELEMENTS attribute is also specified, the data rows are rendered in XML through elements rather than as attributes. Let’s consider the following query:

SELECT CustomerID, ContactName FROM Customers FOR XML AUTO,
  ELEMENTS

The XML output is similar to this:

<Customers>
  <CustomerID>ALFKI</CustomerID>
  <ContactName>Maria Anders</ContactName>
</Customers>
<Customers>
  <CustomerID>ANATR</CustomerID>
  <ContactName>Ana Trujillo</ContactName>
</Customers>
...

In the case of INNER JOINs, the output becomes the following:

<Customers>
  <CustomerID>ALFKI</CustomerID>
  <ContactName>Maria Anders</ContactName>
  <Orders>
    <OrderID>10643</OrderID>
  </Orders>
  <Orders>
    <OrderID>10692</OrderID>
  </Orders>
  ...
</Customers>
...

The FOR XML AUTO mode always resolves table dependencies in terms of nested rows. The overall XML stream is not completely well-formed. Instead of an XML document, the output is an XML fragment, making it easier for clients to concatenate more result sets into a single structure.

Note

If you also add the BINARY BASE64 option to a FOR XML query, any binary data that is returned will automatically be encoded using a base64 algorithm.


The FOR XML RAW Mode

As its name suggests, the FOR XML RAW mode is the least rich mode in terms of features and options. When designed using this mode, the query returns an XML fragment that, at a first glance, might look a lot like the fragment produced by the FOR XML AUTO option. You obtain an XML fragment made of <row> nodes with as many attributes as the columns. For example, consider the following simple query:

SELECT CustomerID, ContactName FROM Customers FOR XML RAW

The output is shown here:

<row CustomerID="ALFKI" ContactName="Maria Anders" />
<row CustomerID="ANATR" ContactName="Ana Trujillo" />
...

You can’t change the name of the node, nor can you render attributes as nested nodes. So far, so good—the RAW mode is only a bit less flexible than the AUTO mode. However, the situation changes when you use joined tables.

The schema of XML data remains intact even when you process multiple tables. The INNER JOIN statement from the previous section run in FOR XML RAW mode originates the following output:

<row CustomerID="ALFKI" ContactName="Maria Anders"
  OrderID="10643"/>
<row CustomerID="ALFKI" ContactName="Maria Anders"
  OrderID="10692"/>
<row CustomerID="ALFKI" ContactName="Maria Anders"
  OrderID="10783"/>
...

Even with the naked eye, you can see that the RAW mode produces a less optimized and more redundant output than the AUTO mode. The ELEMENTS clause is not supported in RAW mode, whereas XMLDATA and BINARY BASE64 are perfectly legitimate.

Limitations of FOR XML

The FOR XML clause is not valid in all cases in which a SELECT statement is acceptable. In general, FOR XML can be used only when the selection produces direct output going to the SQL Server client, whatever that output is. Let’s review a couple of common scenarios in which you can’t make use of the FOR XML clause. For a more complete overview, please refer to SQL Server’s Books Online.

FOR XML Can’t Be Used in Subselections

SQL Server 2000 allows you to use the output of an inner SELECT statement as a virtual table to which an outer SELECT statement can refer. The inner query can’t return XML data if you plan to use its output to perform further operations. For example, the following query is not valid:

SELECT * FROM (SELECT * FROM Employees FOR XML AUTO) AS t

Likewise, the FOR XML clause is not valid in a SELECT statement that is used to create a view. For example, the following statement is not allowed:

CREATE VIEW MyOrders AS 
    SELECT OrderId, OrderDate FROM Orders FOR XML AUTO

In contrast, you can select data from a view and return it as XML. In addition, FOR XML can’t be used with cursors.

FOR XML Can’t Be Used with Computed Columns

The current version of SQL Server does not permit GROUP BY and aggregate functions to be used with FOR XML AUTO. Aggregate functions and GROUP BY clauses can be safely used if the XML query is expressed in RAW mode, however. The following code returns the expected results:

SELECT min(unitprice) AS price, max(quantity) AS quantity 
FROM [order details] FOR XML RAW

The only caveat is that you must explicity name the computed columns using the AS keyword. The output is shown here:

<row price="2.0000" quantity="130" />

Table 8-1 mentioned a third FOR XML mode—the EXPLICIT mode. The EXPLICIT mode goes beyond the rather basic goals of both AUTO and RAW. It is designed to enable users to build a personal schema to render relational data in XML. The EXPLICIT mode is one of the ways that programmers have to create custom XML views of stored data.

Client-Side XML Formatting

SQLXML 3.0 extends the base set of SQL Server 2000 XML extensions by including client-side formatting capabilities in addition to the default server-side XML formatting. From within a .NET Framework application, you use SQLXML 3.0 managed classes (more on this in the section “SQLXML Managed Classes,” on page 386) to set up a command that returns XML data.

When the command executes, the managed classes—at least in this version of the SQLXML library—call into a middle-tier OLE DB provider (SQLXMLOLEDB) object, which in turn calls into the OLE DB provider for SQL Server. The command that hits the database does not contain the FOR XML clause. When the rowset gets back to the SQLXMLOLEDB provider, it is transformed into XML according to the syntax of the FOR XML clause and returned to the client. Figure 8-1 compares server-side and client-side XML formatting.

Figure 8-1. The client-side XML formatting feature of SQLXML 3.0 makes use of intermediate OLE DB providers to execute the query and transform the results.


As you’d expect, the two intermediate OLE DB providers cause more performance problems than they ever attempt to resolve. On the other hand, SQLXML 3.0 is not specifically designed for the .NET Framework, although it contains a few managed classes that we’ll look at in the section “SQLXML Managed Classes,” on page 386. In a nutshell, keep in mind that SQLXML 3.0 provides client-side XML formatting but that this feature is rather inefficient. For .NET Framework applications, a much better approach for client-side XML rendering is represented by the XmlDataDocument class. (See the section “The XmlDataDocument Class,” on page 372.)

Creating XML Views

Just as a CREATE VIEW statement in SQL lets you create a virtual table by collecting columns from one or more tables, an XML view provides an alternative and highly customizable way to present relational data in XML.

Building an XML view consists of defining a custom XML schema and mapping to its elements the columns and the tables selected by the query. Once built, an XML view can be used like its close cousin, SQL view. In particular, an XML view can be queried using XPath expressions and transformed using XSL Transformation (XSLT) scripts. An XML view is simply a stream of XML data and can be used as allowed by .NET. In the .NET Framework, you can use XML views through readers, XML DOM, or even specialized classes, such as those in SQLXML 3.0.

There are two possible ways to create XML views: you can use the FOR XML EXPLICIT mode of the SELECT statement, or you can build an annotated XDR or XSD schema. To use an XSD schema, you must install SQLXML 3.0 first.

The FOR XML EXPLICIT Mode

The query defines the shape of the generated XML document. The ultimate goal of the query is making hierarchical data fit into a tabular rowset. An EXPLICIT mode query creates a virtual table in which all the information fetched from the tables is organized in such a way that it can then be easily rendered in XML. The definition of the schema is free, and of course, programmers must ensure that the final output is well-formed XML.

Any FOR XML EXPLICIT query requires two extra metacolumns, named Tag and Parent. The values in these columns are used to generate the XML hierarchy. The Tag column contains a unique numeric index for each XML root node that is expected to have children in the XML schema. The Parent column contains a tag value that links a given node to a particular, and previously defined, subtree.

To add columns, you must use a relatively complex syntax for column aliases. Each selected column must have an alias defined according to the following syntax:

SELECT column_name AS [ParentNode!ParentTag!TagName!Directive]

The ParentNode item represents the name of the node element that is expected to be the parent of the column data. The ParentTag is the tag number of the parent. The TagName item indicates the name of the XML element that contains the column data. Finally, the Directive element can take various values, the most common ones being no value or element. If no value is specified, the column data is rendered as an attribute named TagName; otherwise, it will be an element attribute.

It’s interesting to note that an EXPLICIT mode query consists of one or more tables that result from SELECT statements potentially involving multiple tables and joined data. Let’s see what’s needed to obtain the following XML representation of the rows in the Northwind database’s Employees table:

								<Employee id="employeeid"
  name="titleOfCourtesy lastname, firstname">
  <PersonalData>
    <Birth>birthdate</Birth>
    <City>city</City>
  </PersonalData>
  <JobData>
    <Hired>hiredate</Hired>
    <Title>title</Title>
  </JobData>
  <Notes>notes</Notes>
</Employee>

The boldface lines in this code represent the roots of the three subtrees of XML data being created. Each subtree corresponds to a different tag, and each must be filled by resorting to a different SELECT statement.

To begin filling the subtrees, consider the following query:

SELECT 1 AS Tag, 
       NULL AS Parent,
       employeeid AS [Employee!1!ID],
       lastname AS [Employee!1!Name]

This statement fills in the first tag—the fragment’s root—which has no parent and contains two attributes, ID and Name. The employeeid and the lastname columns will fill respectively the ID and the Name attributes of an <Employee> node with no parent.

The first table always defines the structure of the XML view. Successive tables can only fill in holes—nothing new will be added. Consequently, to obtain the previous schema, you must write the first tag as follows:

SELECT 1 AS Tag, 
  NULL AS Parent,
  employeeid AS [Employee!1!ID],
  titleofcourtesy + ‘ ‘ + lastname + ‘, ‘ + firstname 
       AS [Employee!1!Name],
  NULL AS [PersonalData!2!BirthDate!element],
  NULL AS [PersonalData!2!City!element],
  NULL AS [JobData!3!HireDate!element],
  NULL AS [JobData!3!Title!element],
  lastname AS [Employee!1!Notes!element]
FROM Employees

The columns with NULL values will be selected by successive queries. In particular, you’ll notice PersonalData and JobData trees with tag IDs of 2 and 3, respectively. The former contains a pair of BirthDate and City elements. The latter holds elements named Title and HireDate.

To unify all the subtables, you must use the UNION ALL statement. The complete statement is shown here:

SELECT 
  1 AS Tag, 
  NULL AS Parent,
  employeeid AS [Employee!1!ID],
  titleofcourtesy + ‘ ‘ + lastname + ‘, ‘ + firstname 
       AS [Employee!1!Name],
  NULL AS [PersonalData!2!BirthDate!element],
  NULL AS [PersonalData!2!City!element],
  NULL AS [JobData!3!HireDate!element],
  NULL AS [JobData!3!Title!element],
  lastname AS [Employee!1!Notes!element]
FROM Employees

UNION ALL
SELECT 
  2, 1,
  employeeid,
  titleofcourtesy + ‘ ‘ + lastname + ‘, ‘ + firstname,
  birthdate,
  city,
  hiredate,
  title,
  notes
FROM Employees

UNION ALL
SELECT 
  3, 1,
  employeeid,
  titleofcourtesy + ‘ ‘ + lastname + ‘, ‘ + firstname,
  birthdate,
  city,
  hiredate,
  title,
  notes
FROM Employees
ORDER BY [Employee!1!ID]
FOR XML EXPLICIT

The T-SQL UNION ALL operator combines the results of two or more SELECT statements into a single result set. All participating result sets must have the same number of columns, and corresponding columns must have compatible data types.

Using an Annotated Mapping Schema

A more lightweight alternative to FOR XML EXPLICIT views is the annotated schema. SQL Server 2000 lets you create XML views by defining an XDR schema with special annotations that work like placeholders for selected data. Basically, instead of defining the schema using a new syntax and combining multiple virtual tables, you use a standard XML data definition language and map elements to columns using ad hoc annotations.

The base version of SQL Server 2000 supports only XDR. If you want to use XSD, you must install SQLXML 3.0. (To review the differences between XDR and XSD, see Chapter 3)

The following listing shows a simple XSD annotated schema that defines an <Employee> node with a couple of child nodes—<FirstName> and <LastName>:

<xsd:schema 
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
   <xsd:element name="Employee" sql:relation="Employees" >
     <xsd:complexType>
       <xsd:sequence>
          <xsd:element name="FName" 
             sql:field="FirstName" type="xsd:string" /> 
          <xsd:element name="LName"  
             sql:field="LastName" type="xsd:string" />
       </xsd:sequence>
       <xsd:attribute name="EmployeeID" type="xsd:integer" />
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

The annotations sql:relation and sql:field facilitate the mapping between the source table and the resulting XML data. In particular, sql:relation indicates that the given node is related to the specified table. The sql:field annotation indicates the column that should be used to populate the given element. If no sql:field annotation is provided, SQL Server expects to find a perfect match between the element or attribute name and a column. In the preceding schema, the EmployeeID attribute is linked directly by name.

Note

Annotated schemas do not allow you to use expressions when selecting columns. The sql:field annotation can accept only the name of an existing column; it can’t accept an expression that evaluates to a column name.


Are XML Views Effective?

The FOR XML EXPLICIT clause and annotated schemas are two somewhat equivalent ways to query relational tables and return data formatted according to a particular XML schema. XSD mapping is more powerful than XDR, but all in all, in terms of raw functionalities, explicit and schema mapping are two nearly identical options for building XML views.

Certainly the FOR XML EXPLICIT clause can lead to hard-to-maintain code, whereas annotated schemas are probably easier to read and maintain and, in addition, keep the schema distinct from the query and the data.

The real XML mapping schema issue is this: What’s the added value that XML views bring to your code? Are you sure that the ability to execute XPath queries justifies the creation of an XML view? The XPath query engine is certainly inferior to the SQL Server’s query engine, at least for complex queries like the ones you might need to perform on real-world data. In addition, for read/write solutions, writing data back to the native relational tables can be less than effective if done through XML. We’ll return to this topic when we look at the OPENXML provider in the section “The OPENXML Rowset Provider,” on page 376.

One scenario in which reading relational data as XML turns out to be really and clearly effective is when you need to turn fetched data into more manageable or easily interoperable structures. If you need to exchange an invoice document with commercial partners, using an XML representation of the data is certainly useful, because you process data in an intermediate, platform-independent and application-independent format, while preserving the ability to create views and perform queries locally. In addition, having the database return and accept XML data with a custom layout can only help considerably.

In this scenario, another reasonable step you might need to take is transforming the XML data into high-level data structures such as classes. For .NET Framework applications, XML serialization is key technology that you must absolutely be familiar with. We’ll examine XML serialization in Chapter 11.

Let’s look now at how ADO.NET and XML classes can be used to read and process relational data expressed in a hierarchical shape.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.56.28