XML support in SQL Server

SQL Server in version 2005 also started to feature extended support for XML data inside the database engine, although some basic support was already included in version 2000. The support starts by generating XML data from tabular results. You can use the FOR XML clause of the SELECT statement for this task.

The first option for creating XML from a query result is the RAW option. The XML created is quite close to the relational (tabular) presentation of the data. In raw mode, every row from returned rowsets converts to a single element named row, and columns translate to the attributes of this element.

The FOR XML AUTO option gives you nice XML documents with nested elements, and it is not complicated to use. In auto and raw modes, you can use the keyword ELEMENTS to produce element-centric XML. The WITH NAMESPACES clause, preceding the SELECT part of the query, defines namespaces and aliases in the returned XML.

With the last two flavors of the FOR XML clause—the EXPLICIT and PATH option—you can manually define the XML returned. With these two options, you have total control of the XML document returned. The explicit mode is included for backward compatibility only; it uses proprietary T-SQL syntax for formatting XML. The path mode uses standard XML XPath expressions to define the elements and attributes of the XML you are creating.

In raw and auto modes, you can also return the XSD schema of the document you are creating. This schema is included inside XML returned, before the actual XML data; therefore, it is called inline schema. You return XSD with the XMLSCHEMA directive. This directive accepts a parameter that defines a target namespace. If you need schema only, without data, simply include a WHERE condition in your query with a predicate that no row can satisfy.

The following query generates an XML document from the regular tabular result set by using the FOR XML clause with AUTO option, to generate element-centric XML instance, with namespace and inline schema included:

SELECT c.[Customer Key] AS CustomerKey, 
  c.[WWI Customer ID] AS CustomerId, 
  c.[Customer],  
  c.[Buying Group] AS BuyingGroup, 
  f.Quantity, 
  f.[Total Excluding Tax] AS Amount, 
  f.Profit 
FROM Dimension.Customer AS c 
  INNER JOIN Fact.Sale AS f 
    ON c.[Customer Key] = f.[Customer Key] 
WHERE c.[Customer Key] IN (127, 128) 
FOR XML AUTO, ELEMENTS,  
  ROOT('CustomersOrders'), 
  XMLSCHEMA('CustomersOrdersSchema'); 
GO

Here is the partial result of this query. The first part of the result is the inline schema:

    <CustomersOrders>
      <xsd:schema targetNamespace="CustomersOrdersSchema" ...
        <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" ...
        <xsd:element name="c">
          <xsd:complexType>
            <xsd:sequence>
              <xsd:element name="CustomerKey" type="sqltypes:int" />
              <xsd:element name="CustomerId" type="sqltypes:int" />
              <xsd:element name="Customer">
                <xsd:simpleType>
                  <xsd:restriction base="sqltypes:nvarchar" ...
                    <xsd:maxLength value="100" />
                  </xsd:restriction>
                </xsd:simpleType>
              </xsd:element>
              ...
            </xsd:sequence>
          </xsd:complexType>
        </xsd:element>
      </xsd:schema>
      <c xmlns="CustomersOrdersSchema">
        <CustomerKey>127</CustomerKey>
        <CustomerId>127</CustomerId>
        <Customer>Tailspin Toys (Point Roberts, WA)</Customer>
        <BuyingGroup>Tailspin Toys</BuyingGroup>
        <f>
          <Quantity>3</Quantity>
          <Amount>48.00</Amount>
          <Profit>31.50</Profit>
        </f>
        <f>
          <Quantity>9</Quantity>
          <Amount>2160.00</Amount>
          <Profit>1363.50</Profit>
        </f>
      </c>
      <c xmlns="CustomersOrdersSchema">
        <CustomerKey>128</CustomerKey>
        <CustomerId>128</CustomerId>
        <Customer>Tailspin Toys (East Portal, CO)</Customer>
        <BuyingGroup>Tailspin Toys</BuyingGroup>
        <f>
          <Quantity>84</Quantity>
          <Amount>420.00</Amount>
          <Profit>294.00</Profit>
        </f>
      </c>
      ...
    </CustomersOrders>

You can also do the opposite process: convert XML to tables. Converting XML to relational tables is known as shredding XML. You can do this by using the nodes() method of the XML data type, or with the OPENXML() rowset function.

Inside SQL Server, you can also query the XML data from Transact-SQL to find specific elements, attributes, or XML fragments. XQuery is a standard language for browsing XML instances and returning XML. It is much richer than XPath expressions, an older standard, which you can use for simple navigation only. With XQuery, you can navigate as with XPath; however, you can also loop over nodes, shape the returned XML instance, and much more.

For a query language, you need a query-processing engine. The SQL Server database engine processes XQuery inside T-SQL statements through XML data type methods. Not all XQuery features are supported in SQL Server. For example, XQuery user-defined functions are not supported in SQL Server because you already have T-SQL and CLR functions available. Additionally, T-SQL supports nonstandard extensions to XQuery, called XML DML, that you can use to modify elements and attributes in XML data. Because an XML data type is a large object, it could be a huge performance bottleneck if the only way to modify an XML value were to replace the entire value.

The real power of XQuery lies in so-called FLWOR expressions. FLWOR is the acronym for for, let, where, order by, and return. A FLWOR expression is actually a for each loop. You can use it to iterate through a sequence returned by an XPath expression. Although you typically iterate through a sequence of nodes, you can use FLWOR expressions to iterate through any sequence. You can limit the nodes to be processed with a predicate, sort the nodes, and format the returned XML. The parts of a FLWOR statement are:

For: With a for clause, you bind iterator variables to input sequences. Input sequences are either sequences of nodes or sequences of atomic values. You create atomic value sequences using literals or functions.
Let: With the optional let clause, you assign a value to a variable for a specific iteration. The expression used for an assignment can return a sequence of nodes or a sequence of atomic values.
Where: With the optional where clause, you filter the iteration.
Order by: Using the order by clause, you can control the order in which the elements of the input sequence are processed. You control the order based on atomic values.
Return: The return clause is evaluated once per iteration, and the results are returned to the client in the iteration order. With this clause, you format the resulting XML.

You can store XML instances inside SQL Server database in a column of the XML data type. An XML data type includes five methods that accept XQuery as a parameter. The methods support querying (the query() method), retrieving atomic values (the value() method), existence checks (the exist() method), modifying sections within the XML data (the modify() method) as opposed to overriding the whole thing, and shredding XML data into multiple rows in a result set (the nodes() method).

The following code creates a variable of the XML data type to store an XML instance in it. Then it uses the query() method to return XML fragments from the XML instance. This method accepts XQuery query as a parameter. The XQuery query uses the FLWOR expressions to define and shape the XML returned:

DECLARE @x AS XML; 
SET @x = N' 
<CustomersOrders> 
  <Customer custid="1"> 
    <!-- Comment 111 --> 
    <companyname>CustA</companyname> 
    <Order orderid="1"> 
      <orderdate>2016-07-01T00:00:00</orderdate> 
    </Order> 
    <Order orderid="9"> 
      <orderdate>2016-07-03T00:00:00</orderdate> 
    </Order> 
    <Order orderid="12"> 
      <orderdate>2016-07-12T00:00:00</orderdate> 
    </Order> 
  </Customer> 
  <Customer custid="2"> 
    <!-- Comment 222 -->   
    <companyname>CustB</companyname> 
    <Order orderid="3"> 
      <orderdate>2016-07-01T00:00:00</orderdate> 
    </Order> 
    <Order orderid="10"> 
      <orderdate>2016-07-05T00:00:00</orderdate> 
    </Order> 
  </Customer> 
</CustomersOrders>'; 
SELECT @x.query('for $i in CustomersOrders/Customer/Order 
                 let $j := $i/orderdate 
                 where $i/@orderid < 10900 
                 order by ($j)[1] 
                 return  
                 <Order-orderid-element> 
                  <orderid>{data($i/@orderid)}</orderid> 
                  {$j} 
                 </Order-orderid-element>') 
       AS [Filtered, sorted and reformatted orders with let clause];

Here is the result of the previous query:

    <Order-orderid-element>
      <orderid>1</orderid>
      <orderdate>2016-07-01T00:00:00</orderdate>
    </Order-orderid-element>
    <Order-orderid-element>
      <orderid>3</orderid>
      <orderdate>2016-07-01T00:00:00</orderdate>
    </Order-orderid-element>
    <Order-orderid-element>
      <orderid>9</orderid>
      <orderdate>2016-07-03T00:00:00</orderdate&gt;
    </Order-orderid-element>
    <Order-orderid-element>
      <orderid>10</orderid>
      <orderdate>2016-07-05T00:00:00</orderdate>
    </Order-orderid-element>
    <Order-orderid-element>
      <orderid>12</orderid>
      <orderdate>2016-07-12T00:00:00</orderdate>
    </Order-orderid-element>

XML data type is actually a large object type. Scanning through the XML data sequentially is not a very efficient way of retrieving a simple scalar value. With relational data, you can create an index on a filtering column, allowing an index seek operation instead of a table scan. Similarly, you can index XML columns with specialized XML indexes. The first index you create on an XML column is the primary XML index. This index contains a shredded persisted representation of the XML values. For each XML value in the column, the index creates several rows of data. The number of rows in the index is approximately the number of nodes in the XML value. Such an index alone can speed up searches for a specific element by using the exist() method. After creating the primary XML index, you can create up to three other types of secondary XML indexes:

PATH: This secondary XML index is especially useful if your queries specify path expressions. It speeds up the exist() method better than the primary XML index. Such an index also speeds up queries that use value() for a fully specified path.
VALUE: This secondary XML index is useful if queries are value-based and the path is not fully specified, or it includes a wildcard.
PROPERTY: This secondary XML index is very useful for queries that retrieve one or more values from individual XML instances using the value() method. The primary XML index has to be created first. It can be created only on tables with a clustered primary key.

Table of Contents for XML support in SQL Server

Create new playlist

Sign In

Sign Up

Table of Contents for
XML support in SQL Server