9. XML in the Database: The XML Data Type

SQL SERVER 2005 introduces a new scalar data type, the XML data type. XML is a first-class data type in SQL Server now, and this has a wide-ranging impact on the use of XML.

The XML Data Type

A new type of data has gained popularity in recent years: XML. XML has evolved from a simple data transfer format to a data model that includes its own schema-definition vocabulary, XSD, as well as query languages. In this chapter, we’ll look at the XML data type and see how it differs from conventional CLOB (character large object field, called TEXT field in SQL Server) storage of an XML document.

You can use the XML data type like any other data type in SQL Server. It can be used in the following ways:

• As a column in a table

• As a variable in Transact-SQL (T-SQL)

• As a stored procedure or user-defined function parameter

• As a user-defined function return value

The XML type is quite similar, but not identical, to the distinct type defined by SQL:1999 and discussed in Chapter 1. Although the serialized form of XML is similar in appearance to a CLOB (VARCHAR(MAX)), you convert it to and from a VARCHAR type, rather than assign it. Like distinct types, the XML data type cannot be compared with other data types without being cast or converted, but unlike distinct types, two instances of an XML data type cannot be compared at all.

Like a SQL:1999 distinct type, the XML type has its own methods; these methods enable the use of an alternative query language, XQuery. The data in an XML type does not follow the relational data model but is based on an extended XML Infoset model, which is used to model structured—that is, hierarchical—data.

A column that is defined as being of type XML stores its data in the database itself. The column is not a pointer to an XML document on the file system. This means that XML data is included in the backup and restore process; is subject to ordinary SQL Server security (and some extensions to security, as we’ll mention later); and participates in transactions, constraints, and logging. Having XML data inside a relational database may offend some relational purists, but it means that your data lives in a single repository for reasons of administration, reliability, and control.

Using XML Data Type in Tables

Let’s begin by using the XML data type to define a column in a table. The following DDL statement creates a table:

CREATE TABLE xml_tab (
   the_id INTEGER PRIMARY KEY,
   xml_col XML)

Note that you can also have a table that consists of only a single XML data type column; the XML data type cannot be used itself as a primary key, however. Later, we will see that you can create an XML-specific index on this column, but this index will not be used in SQL comparisons; it will be used to improve the performance of the functions associated with an XML data type. In addition to the previous example that created a single XML type column in a table, you can have tables that contain more than one XML data type column. You can create a table with an XML column in local or global temporary tables as well as ordinary tables. An XML data type column can be used in a VIEW as well.

XML data type columns have certain limitations when used in tables:

• They may not be declared as a PRIMARY KEY in a table.

• They may not be declared as a FOREIGN KEY in a table.

• They may not be declared with a UNIQUE constraint.

• They may not be declared with the COLLATE keyword.

These first three limitations exist because individual instances of the XML data type may not be compared with one another. Although it would not be difficult to perform a string comparison with XML data, it would be an order of magnitude more difficult to perform a comparison at the Infoset level. The two XML documents in the following example are Infoset equivalent but not lexically equivalent:

<!-- These two documents are equivalent -->
<doc1>
   <row au_id="111-11-1111"/>
</doc1>

<doc1>
   <row au_id='111-11-1111'></row>
</doc1>

The XML data type cannot use the COLLATE keyword, because XML provides its own encoding via the encoding attribute on the XML document declaration. The following code is an example of using encoding. If the document uses an encoding that is unknown or unsupported, SQL Server will return an error. An error will also result if the content does not match the specified encoding. The encoding specifies UTF-8 but contains characters encoded as Unicode, for example.

-- This works correctly
INSERT xml_tab VALUES(1,

'<?xml version="1.0" encoding="utf-8"?>
<doc1>
   <row au_id="111-11-1111"/>
</doc1>')

-- This fails, cannot switch encoding
-- encoding does not match character type (Unicode) being converted
INSERT xml_tab VALUES(1,

N'<?xml version="1.0" encoding="utf-8"?>
<doc1>
   <row au_id="111-11-1111"/>
</doc1>')


-- This fails with an unknown encoding error
INSERT xml_tab VALUES(1,

'<?xml version="1.0" encoding="i-bogus"?>
<doc1>
   <row au_id='111-11-1111'></row>
</doc1>')

Although the encoding specified in the XML document declaration is taken into consideration when storing the document, documents are always physically stored as UTF-16. The XML data type’s internal format is an opaque binary format. XML is tokenized before being stored, producing a certain amount of compression, although compression is not the primary purpose of the binary format. The XML data type column has the same size limit as the MAX data types discussed in Chapter 7; it can be up to 2GB. Because of the tokenization, you may be able to store more than 2GB of XML text in the column and, in rare instances, less than 2GB of text. You can obtain the actual length, in bytes, of an XML data type instance by using the DATALENGTH system function. Columns of the XML data type are stored in-row by default like the MAX data types, and this option is may be overridden by the "large value types out of row" option on the table. The XML data can also reside on a different physical filegroup from the rest of the data in the table.

XML data type columns can have NULL constraints (the default nullability is the current default of your SQL Server session) and DEFAULT values. Column- and table-level CHECK constraints based on the XML Infoset model are supported, although constraints using methods that are specific to the XML type must be wrapped in a UDF. Although we’ll talk about the methods in more detail in Chapter 10, a typical CHECK constraint on an XML type is shown in Listing 9-1.

Listing 9-1. XML CHECK constraints must be encapsulated in a UDF


-- pdoc must have a person element
-- as a child of the people root
-- create the wrapper UDF
CREATE FUNCTION check_for_person (@thedoc XML)
RETURNS BIT
AS
BEGIN
-- function that uses XQuery, see next chapter
RETURN @thedoc.exist('/people/person')
END
GO
CREATE TABLE xmltab(
  id INTEGER PRIMARY KEY,
  pdoc XML CHECK(dbo.check_for_person(pdoc)=1)
)


Because XML data types follow the XML Infoset data model, they are usually not constrained by relational constraints but by a collection of one or more XML schemas. SQL Server’s XML data type supports schema validation. We will explore this in detail in the next section.

The XML data type supports an implicit conversion from any character or national character data type, including CHAR, VARCHAR, NCHAR, and NVARCHAR, but not from other SQL Server data types. You can use CAST or CONVERT to convert from BINARY, VARBINARY, TEXT, NTEXT, and data types to the XML data type for storage as well. Casting from TEXT and NTEXT is permitted to enable forward compatibility for users who stored their XML data in these data types in previous versions of SQL Server. Casting from the BINARY data types is useful for features like SQL Server Service Broker that can communicate using binary or XML. In addition, you can store a SQL_VARIANT data type in an XML data type table after casting it to a character-based type. Listing 9-2 shows inserting rows into a table containing an XML data type.

Listing 9-2. Inserting rows into a table with an XML data type


CREATE TABLE xml_tab(
  the_id INTEGER PRIMARY_KEY IDENTITY,
  xml_col XML)
GO

-- these work fine

INSERT INTO xml_tab VALUES('<doc1></doc1>')
INSERT INTO xml_tab VALUES(N'<doc1></doc1>')

-- so does this (if first cast to varchar/nvarchar)
DECLARE @v SQL_VARIANT
SET @v = N'<someotherdoc></someotherdoc>'
INSERT INTO xml_tab VALUES(CAST(@v AS varchar(max)))

-- this fails at the insert statement
DECLARE @SOMENUM FLOAT
SET @SOMENUM = 3.1416
INSERT INTO xml_tab VALUES(CAST(@SOMENUM as XML))


Although we’ve stored only well-formed XML documents in the XML data type column so far, we can also use this column to store document fragments or top-level text nodes. This is useful because we can store the results of a SELECT...FOR XML query or XQuery results, which may not be complete documents, in an XML data type column. The documents or fragments must abide by XML well-formedness rules, however. Listing 9-3 shows some examples.

Listing 9-3. Inserting XML data into a table


CREATE TABLE xml_tab(
  the_id INTEGER PRIMARY_KEY IDENTITY,
  xml_col XML)
GO

-- ok, complete document
INSERT INTO xml_tab VALUES('<doc2></doc2>')

-- ok, document fragment
-- though it's not a single well-formed document
-- (two root elements)
INSERT INTO xml_tab VALUES('<doc1></doc1><doc2></doc2>')

-- ok, text node/atomic value
INSERT INTO xml_tab VALUES('The Window and Shade Store')

-- error, not well-formed
INSERT INTO xml_tab VALUES('The Window & Shade Store')

-- error, not well-formed
INSERT INTO xml_tab VALUES('<doc1><doc2></doc1></doc2>')


There is no implicit conversion from the XML data type to any other data types, but the CAST and CONVERT operators do convert between any of the character or National character data types, as well as BINARY and VARBINARY. When you cast from BINARY and VARBINARY, you can either specify the encoding in the XML itself or include the beginning byte-order mark (0xFFFE) if the format is Unicode. When you cast to BINARY or VARBINARY, the XML will be cast to UTF-16 with the byte-order mark present. You can cast a TEXT or NTEXT data type instance to an instance of the XML type, but you cannot cast an instance of the XML type to TEXT or NTEXT. Using one of the special methods of the XML data type (the value method, discussed later in this chapter) can produce different SQL data types. Listing 9-4 shows retrieving data from an XML data type column.

Listing 9-4. Returning data from an XML data type column


CREATE TABLE xml_tab(
  the_id INTEGER PRIMARY_KEY IDENTITY,
  xml_col XML)
GO

INSERT INTO xml_tab VALUES('<doc2></doc2>')
INSERT INTO xml_tab VALUES(N'<doc2></doc2>')
GO

-- both rows' values are cast to the same data type
SELECT CAST(xml_col as NCHAR(2000)) FROM xml_tab
GO

-- illegal, no cast to SQL_VARIANT
SELECT CAST(xml_col as SQL_VARIANT) FROM xml_tab
GO


Because the values of two XML data type instances cannot be compared (except by using IS NULL), you cannot use the XML data type in SQL predicates or SQL clauses that require comparison, such as GROUP BY and ORDER BY. The XML data type also cannot be used in any scalar function or aggregate where comparison is required. Because the XML data type is castable to any character-based type, however, this functionality will work if CAST (or CONVERT) is used. This behavior is exactly the behavior specified for a distinct data type by the SQL:1999 specification. Although comparison of the XML document type as a string is risky, notice that in the preceding example, both '<doc2/>' and '<doc2></doc2>' are “converted” to the same lexical form, '<doc2/>', when the CAST or CONVERT function is used. The XML data type does retain Infoset fidelity but does not guarantee lexical fidelity. Listing 9-5 illustrates what you can and cannot do with the XML data type in SQL statements.

Listing 9-5. Using the XML data type in SQL statements


-- assume the same xml_tab as in previous examples

-- comparison to NULL works
SELECT the_id FROM xml_tab
  WHERE xml_col IS NULL

-- illegal
SELECT xml_col FROM xml_tab
  GROUP BY xml_col

SELECT xml_col FROM xml_tab
  ORDER BY xml_col

SELECT xml_col FROM xml_tab
  WHERE xml_col = '<doc2/>'

-- fails, no implicit conversion to character type
SELECT SUBSTRING(xml_col,1,2) FROM xml_tab

-- casting to string allows this to work
SELECT xml_col from xml_tab
  WHERE CAST(xml_col AS VARCHAR) = '<doc2/>'


Using XML Data Variables and Parameters

SQL Server 2005 allows you to use the XML data type as a normal scalar variable in T-SQL. You can assign XML documents or fragments to the variable as you would any other variable, as shown in this example:

-- declare a variable of XML data type
DECLARE @x XML

-- implicit cast (must be a valid XML document or fragment)
SET @x = '<doc1><name>Bob</name></doc1>'

-- use it
INSERT xml_tab VALUES(@x)

Just as with an XML column, variables of the XML data type can be used as input to an assignment statement. Variables of the XML data type have the same processing limits as columns of the XML data type; they may not be used in place of a string in scalar functions such as SUBSTRING, in comparisons, in ORDER BY or GROUP BY clauses in dynamic SQL, or as parameters without first being cast or converted to a character data type. Stored procedures or user-defined function parameters and user-defined function return codes may be XML types. This allows you to return dynamic XML to the user based on logical operations, such as in the following example:

-- create the user-defined function
CREATE FUNCTION my_business_logic(
  @in_dept INTEGER
)
RETURNS XML
AS
BEGIN
  DECLARE @x XML
  -- do some business logic that produces an XML document
  RETURN @x
END
GO

-- now call it, using dept 10
SELECT dbo.my_business_logic(10)

Note that the XML return code is a scalar type rather than a TABLE type.

As with XML data type columns and variables, procedure parameters and return codes can be declared with a schema collection name and used to ensure schema validity. Although being able to schema-validate input parameters may obviate the requirement for a lot of domain-checking of input, we can still use the fact that we can do processing inside procedures to make the XML a lot more dynamic.

Later in this chapter, we’ll see how using and producing XML based on dynamic rules and being able to pass in XML as just another data type can be used in conjunction with the new extended features of the composition and decomposition functions, SELECT...FOR XML and OpenXML.

Typed and Untyped XML: Cataloguing and Using XML SCHEMA COLLECTIONs

In addition to storing untyped XML documents or fragments in an XML data type column, you can use SQL Server to validate your XML data type column, variable, or parameter by associating it with an XML SCHEMA COLLECTION. Therefore, you can think of XML data types as being either schema-validated (containing data types defined by a specific set of XML schemas) or untyped (containing any well-formed XML). Whether your XML type is typed or untyped, it can still contain documents or fragments, because fragments can also be schema valid. In addition, when you define an XML type to be schema validated, you can also specify that it can contain only XML documents, or XML documents or fragments (known as XML content).

XML schemas define a series of data types that exist in a particular namespace. Schemas are themselves well-formed schema-compliant XML documents, just as relational table definitions and other DDL are valid T-SQL. Although there can be more than one schema definition document for a particular namespace, a schema definition document defines types in only one namespace. The XML Schema Definition language (XSD) defines a standard set of base types that are supported as types in XML documents, just as the SQL:1999 standard defines a set of base types that relational databases must support. The XSD base data types are shown in Figure 9-1.

Figure 9-1. XSD base data types

image

Schema documents may also import types from other namespaces using the import element. There are also some special namespaces that are “imported” by default when using SQL Server’s XML type. Two of the most important ones are

• The http://www.w3.org/2001/XMLSchemanamespace. This namespace defines the constructs (elements and attributes) used in XML schema documents.

• The http://www.w3.org/2001/XMLSchema-instancenamespace. This namespace defines the constructs to be used in XML documents that are not schemas.

These usually are assigned the namespace prefixes xs and xsi, respectively, although a lot of other schema-manipulating products will use the xsd prefix rather than xs. A complete explanation of the XML Schema specification is beyond the scope of this book, but Listing 9-6 illustrates the main concept.

Listing 9-6. A simple XML schema


<!-- defines types for the namespace "http://example.org/People"
     This is known as the targetNamespace but does not indicate
     The location of the schema document -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:tns="http://example.org/People"
            targetNamespace="http://example.org/People" >

   <xsd:simpleType name="personAge" >
     <xsd:restriction base="xsd:float" >
       <xsd:maxInclusive value="120" />
       <xsd:minExclusive value="0" />
     </xsd:restriction>
   </xsd:simpleType>

   <xsd:element name="age" type="tns:personAge" />

</xsd:schema>


Note that an XSD schema includes some schema elements, such as max-Inclusive and minExclusive, that serve to constrain or restrict the base data types. This constraint process is known as derivation by restriction, and the schema elements and attributes that act as constraints are known as facets.

Although there is an xsi:schemaLocation attribute that can be helpful in locating arbitrary schema documents in an instance document, the XML schema specification does not mandate an algorithm by which an XML document locates its schema. SQL Server 2005 stores schema documents inside the database and keeps track of them based on the schema collection; it doesn’t use xsi:schemaLocation.

SQL Server XML SCHEMA COLLECTIONs

Schema documents are catalogued in SQL Server as part of a named XML SCHEMA COLLECTION by means of the CREATE XML SCHEMA COLLECTION DDL statement:

USE pubs
GO

CREATE XML SCHEMA COLLECTION peoplecoll AS
' <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <!-- other types omitted -->
   <xsd:simpleType name="personAge" >
     <xsd:restriction base="xsd:float" >
       <xsd:maxInclusive value="120" />
       <xsd:minExclusive value="0" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="age" type="personAge" />
</xsd:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:tns="http://example.org/LogansRun"
            targetNamespace="http://example.org/LogansRun" >
   <!-- other types omitted -->
   <xsd:simpleType name="personAge" >
     <xsd:restriction base="xsd:float" >
       <xsd:maxInclusive value="30" />
       <xsd:minExclusive value="0" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="age" type="tns:personAge" />
</xsd:schema>'

Note that an XML SCHEMA COLLECTION has a SQL Server object name (dbo.peoplecoll, in this case, because XML SCHEMA COLLECTIONs are scoped within a SQL schema) and consists of one or more XML schemas, defining the permitted types in one or more XML namespaces. It cannot be referenced outside the database it is defined in or by using a three-part object name, like pubs.dbo.peoplecoll. An XML schema that will be used to validate XML content also can have no associated namespace, like the first schema in the previous collection. This is known as the no-namespace schema and is distinguished by the fact that its schema element has no targetNamespace attribute. You are restricted to one no-namespace schema per XML schema collection.

Typed XML

XML data type columns, parameters, and variables may be typed or untyped—that is, they may conform to a schema or not. To specify that you are using a typed column, you would specify the schema collection name in parentheses as a qualifier on XML data type specification, like this:

CREATE TABLE xml_tab (
   the_id INTEGER,
   xml_col XML(peoplecoll)
)

By doing this, you’ve just defined a series of integrity constraints with respect to what can appear in the XML that makes up that column! Typing the XML data in a column by using an XML schema collection not only serves as an integrity constraint, but also is an optimization for SQL Server’s XQuery engine because you are using typed data in the query. It also allows the XQuery engine to know the data type of its intermediate and final results. If the XML data is not strongly typed, XQuery treats everything as a weakly typed “string” value, xdt:untypedAtomic.

The integrity checking for a typed XML column happens each time a new value is set into it. This would occur when you are inserting or updating the column in a table. As an example, creating the following table and adding some rows to it will cause each instance of the XML data type to be schema validated at insert time:

CREATE TABLE person_tab(
 id INT IDENTITY PRIMARY KEY,
 -- the person column can only contain
 -- Infoset items that are defined in the schema collection
 -- defined above
 person XML(peoplecoll))
GO

-- this works, person between 0 and 30 years old
INSERT INTO person_tab VALUES(
'<age xmlns="http://example.org/LogansRun">11</age>')

-- so does this, using the no-namespace schema
INSERT INTO person_tab VALUES(
'<age>75</age>')

-- this insert fails
INSERT INTO person_tab VALUES(
'<age xmlns="http://example.org/LogansRun">31</age>')

You can precede your schema collection identifier with the keyword DOCUMENT or CONTENT. If you do not use one of these keywords, the default is equivalent to specifying CONTENT. If DOCUMENT is specified, the column can contain only XML documents (a document is defined as having a single root element), but if you specify CONTENT, the column can contain documents or fragments, as long as all the elements match an element in one of the schemas. Here’s an example that illustrates the difference:

CREATE TABLE person_docs(
 id INT IDENTITY primary key,
 person XML(DOCUMENT peoplecoll))
GO

CREATE TABLE person_content(
 id INT IDENTITY PRIMARY KEY,
 person XML(CONTENT peoplecoll))
GO

-- this works with either table, a single root element
INSERT INTO person_docs VALUES(
'<age xmlns="http://example.org/LogansRun">11</age>')
INSERT INTO person_content VALUES(
'<age xmlns="http://example.org/LogansRun">11</age>')

-- this fails, more than one root element
INSERT INTO person_docs VALUES(
'<age xmlns="http://example.org/LogansRun">5</age>
 <age xmlns="http://example.org/LogansRun">5</age>
')
GO

-- this works because it's a valid fragment
INSERT INTO person_content VALUES(
'<age xmlns="http://example.org/LogansRun">5</age>
 <age xmlns="http://example.org/LogansRun">5</age>
')
GO

XML variables can also be schema validated by declaring them with a name of an already-defined schema collection:

-- this document or fragment must correspond to this schema collection
DECLARE @x XML(accountingcoll)

-- input is validated here
SET @x =
'<po xmlns="urn:com-develop:purchaseorder">
  <orderid>4321</orderid>
  <customerid>10753</customerid>
  <items>
     <itemno>987-65</itemno>
     <qty>5</qty>
  </items>
</po>'

There are three ways to specify the type of an XML element in an XML document. You can define an XML namespace prefix for a particular namespace and use the prefix on the element in question, use a default XML namespace definition, or use xsi:type to specify the data type.

Management of XML Schemas and Schema Collections

When an XML schema collection is stored in SQL Server, its schemas are not stored directly as XML documents. Instead, they are shredded into a proprietary format that is useful for optimizing schema validation. Although you can extract your XML schemas using the system function xml_schema_namespace(), as shown in the following example, comments and schema annotations are not recovered:

-- this returns a single XML schema document
-- for the 'http://example.org/LogansRun' namespace
-- if it occurs in the XML schema collection 'dbo.peoplecoll'
SELECT xml_schema_namespace(
   N'dbo', N'peoplecoll', N'http://example.org/LogansRun')
GO

-- this returns all of the XML schema documents
-- in the peoplecoll XML schema collection
-- multiple schema documents are separated by a space
SELECT xml_schema_namespace(N'dbo', N'peoplecoll')
GO

The exact signature of xml_schema_namespace is

DECLARE FUNCTION xml_schema_namespace (
  @relational_schema NVARCHAR(4000),
  @xml_schema_collection_name NVARCHAR(4000),
  [,@namespace NVARCHAR(4000)]
)
RETURNS XML

where @relational_schema is the relational schema that contains the collection. It returns a schema document (as an XML data type instance) representing the content of the XML schema namespaces associated with the SQL XML schema collection identifier.

If you need to keep track of an original schema in its exact text format, you should store it separately. A convenient way to keep track of your schemas is to insert them into a table with an XML column:

CREATE TABLE xml_schema_save_tab (
  the_id INTEGER PRIMARY KEY,
  xml_schema_col XML)
GO

INSERT INTO xml_schema_save_tab VALUES(1,
N'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:tns="http://example.org/LogansRun"
            targetNamespace="http://example.org/LogansRun" >
   <!-- this schema defines a single data type, personAge,
        and I want to save this comment as well as the schema -->
   <xsd:simpleType name="personAge" >
     <xsd:restriction base="xsd:float" >

       <xsd:maxInclusive value="30" />
       <xsd:minExclusive value="0" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="age" type="tns:personAge" />
</xsd:schema>')
GO

XML schema collections are tied to a specific SQL schema within a specific database; they are first-class SQL Server database objects that can be referenced with a one- or two-part name by users with the appropriate permission. Because many XML documents use types from multiple XML namespaces, an XML schema collection can contain multiple unrelated schemas. In addition, many XML schemas import types from other schemas; you can use the XML schema <import> statement to import another XML schema namespace that you use in a second schema definition for a different namespace. In addition to defining all the schemas in a collection by using the CREATE XML SCHEMA COLLECTION DDL statement, you can add more schemas to a collection after it’s been created using ALTER XML SCHEMA COLLECTION. If we first define the following XML SCHEMA COLLECTION,

CREATE XML SCHEMA COLLECTION mytrees AS
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:tns="http://example.org/Trees"
            targetNamespace="http://example.org/Trees" >
   <xsd:simpleType name="treeAge" >
     <xsd:restriction base="xsd:float" >
       <xsd:maxInclusive value="1000" />
       <xsd:minExclusive value="0" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="treeage" type="tns:treeAge" />
</xsd:schema>'

it is permissible to import that schema definition into another schema definition for a different namespace, as the following code shows:

ALTER XML SCHEMA COLLECTION mytrees ADD
'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:tns="http://example.org/Trees2"
            targetNamespace="http://example.org/Trees2" >
 <xsd:import namespace="http://example.org/Trees"/>
   <xsd:simpleType name="treeAge2" >
     <xsd:restriction xmlns:t2="http://example.org/Trees"
                      base="t2:treeAge" >

       <xsd:maxInclusive value="500" />
     </xsd:restriction>
   </xsd:simpleType>
   <xsd:element name="treeage2" type="tns:treeAge2" />
</xsd:schema>'

Notice that the simpleType treeAge2 in the namespace http://example.org/Trees2 is derived by restriction from the base type t2:treeAge in a different namespace. We could also have defined both schemas in the collection with a single CREATE XML SCHEMA COLLECTION DDL statement.

An XML schema collection is dropped from the database like any other SQL Server object and is subject to the same constraints. You may not drop an XML schema collection, for example, if it is being used to type an XML data type column:

-- this would fail if the XML schema collection
-- was used to type an XML data type column
DROP XML SCHEMA COLLECTION peoplecoll

Security for XML SCHEMA COLLECTIONS and Strongly Typed Instances

Security for XML SCHEMA COLLECTIONS and strongly typed instances is applied on the database objects and is analogous to security on native SQL Server data. XML schema collections are scoped to the SQL schema within a database, so you can permit users or roles to define XML schemas:

GRANT CREATE XML SCHEMA COLLECTION TO public

When a specific XML schema collection is catalogued, permissions must be granted to reference the schema or use strongly typed columns or parameters:

GRANT REFERENCES ON XML SCHEMA COLLECTION::people
  TO FRED

The permissions that can be granted on a specific schema collection are as follows:

REFERENCES—Gives permission to define tables and views that reference a schema collection

EXECUTE—Gives permission to use strongly typed columns, parameters, or variables that refer to a given schema collection

Creating an Index on an XML Column

You can create indexes on an XML column, using approximately the same syntax that you use for a SQL index. Four kinds of XML indexes can be created. Before you can create any kind of XML index, the table must have an ordinary SQL data type primary key column. First, you must create the primary index; this creates a node table and an index of the node table. Note that a node table is a relational table that contains one row for every node in an XML document, this can get fairly large. We’ll discuss the exact structure of the node table in Chapter 10. This index associates each node with the SQL key column and is useful for ad hoc queries. You can create an XML index over only the document structure, using the FOR PATH keyword. This is similar to the concept of creating a “key” in XSLT; this type of index helps in XQuery path statements. You can also create an index over the values of the elements and attributes in the XML data type column with the FOR VALUE keyword. This type of index can help in XQuery content searches. The FOR PROPERTY keyword creates an index that is most usable when your XML consists of a shallow hierarchy with many elements or attributes that are really name–value pairs. Additional XML index types may be defined in the future. You create an XML index on the entire column, and you cannot index subsets of the document content. The syntax is shown in Listing 9-7.

Listing 9-7. Creating XML indexes


CREATE TABLE xml_tab(
  the_id INTEGER PRIMARY KEY IDENTITY,
  xml_col XML)
GO

CREATE PRIMARY XML INDEX xmlidx1 ON xml_tab(xml_col)
GO

-- structural index
CREATE XML INDEX xmls1 ON xml_tab(xml_col)
 USING XML INDEX xmlidx1 FOR PATH
GO

-- property index
CREATE XML INDEX xmlp1 ON xml_tab(xml_col)
 USING XML INDEX xmlidx1 FOR PROPERTY
GO

-- value index
CREATE XML INDEX xmlv1 ON xml_tab(xml_col)
 USING XML INDEX xmlidx1 FOR VALUE
GO


Although this is similar to an “ordinary” SQL primary key and index creation statement with an extra XML keyword, the actual effect of the statement is much different from creating a SQL Server index. What you are creating in the case of an XML column is an index over the internal representation or structure of the column whose purpose is to optimize XQuery queries rather than SQL queries. Remember that the SQL comparison operators cannot be used on an XML column. Because the index contains the (SQL) primary key of the table, however, it can assist in queries that use XQuery criteria and a primary key value in a SQL WHERE clause. Though the internal representation of the XML index is an internal implementation detail, suffice it to say that creating such an index will not help optimize queries that cast or convert the XML data to character types first.

Because the XML index is not a “normal” SQL index, some limitations apply to these indexes:

• You cannot create an XML composite index—that is, an index on more than one XML column or an XML column and a non-XML column.

• You cannot create an XML index as a clustered index or use it as a partitioning criterion.

In addition, because all XML indexes and SQL indexes share the same value space in a database, you cannot create an XML index (of any kind) and a SQL index with the same index name, or two different kinds of XML indexes with the same name. Although an XML type can also be used with full-text search, this is outside of the scope of this book.

XML Type Functions

In addition to being used as a table or view column, variable, or parameter in its entirety, the XML data type contains a variety of type-specific methods. These are invoked by using the SQL:1999 instance.method syntax, similar to the WRITE method of the VARCHAR(max) type.

The XML type methods encompass a few different groups of functionality:

• Determining whether a node or nodes exist that satisfy a given XQuery expression (exist method)

• Selecting a single value using XQuery and returning it as a SQL data type (value method)

• Querying the value of the XML type via XQuery (query method)

• Creating a table with one row per node that matches an XQuery (nodes method) for use with the other XQuery methods

• Modifying the value of the XML type via XQuery DML (modify method)

The modify method is a mutator method, like VARCHAR(MAX).WRITE; all the other methods are accessor methods. As with the WRITE and UDT mutators, the modify method may not be used with a NULL instead of an XML data type.

Because all the current methods on an XML data type are built around using XQuery to query an instance of the type, we’ll defer these until the next chapter. Other methods—for example, validating an instance of an XML type on demand—that have nothing to do with XQuery may be added to the XML data type in future releases. Currently, you can validate an instance of an XML data type against an XML SCHEMA COLLECTION by attempting to assign it to a typed XML variable or insert it into a typed XML column.

SELECT . . . FOR XML Enhancements

SQL Server 2000 provides an enhancement to T-SQL permitting composition of XML document fragments using SQL queries against relational tables. This is the SELECT...FOR XML syntax. This syntax can produce fragments in element or attribute normal form XML and can even produce a combination of elements and attributes. There are three “dialects” of FOR XML queries:

FOR XML RAW—Produces one XML element for each row in the result, no matter how many tables participate in the query. There is an attribute for each column, and the names of attributes reflect the column names or aliases. FOR XML RAW has been enhanced in SQL Server 2005 to allow element normal form XML.

FOR XML AUTO—Produces one XML element by row in the result but produces nested XML elements if there is more than one table in the query. The order of nesting is defined by the order of the columns in the SELECT statement.

FOR XML EXPLICIT—Produces XML by means of SQL UNION queries. Each arm of the UNION query produces a different level of XML. This is by far the most flexible dialect and can produce element or attribute normal form and nesting XML exactly as you like. This is also by far the most complex dialect to program.

Listings 9-8, 9-9, and 9-10 show the results of FOR XML SELECT statements against the pubs database.

Listing 9-8. FOR XML RAW syntax and results


-- this query:
SELECT Customers.CustomerID, Orders.OrderID
FROM Customers
  JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerID
FOR XML RAW

-- produces this XML output document fragment
  <row CustomerID="ALFKI" OrderID="10643" />
  <row CustomerID="ALFKI" OrderID="10692" />
  <row CustomerID="ALFKI" OrderID="10703" />
  <row CustomerID="ALFKI" OrderID="10835" />
  <row CustomerID="ANATR" OrderID="10308" />


Listing 9-9. FOR XML AUTO syntax and results


-- this query:
SELECT Customers.CustomerID, Orders.OrderID
FROM Customers
  JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerID
FOR XML AUTO

-- produces the following XML document fragment
  <Customers CustomerID="ALFKI">
    <Orders OrderID="10643" />
    <Orders OrderID="10692" />
    <Orders OrderID="10702" />
    <Orders OrderID="10835" />

  </Customers>
  <Customers CustomerID="ANATR">
    <Orders OrderID="10308" />
  </Customers>


Listing 9-10. FOR XML EXPLICIT syntax and results


-- this query:
SELECT      1 as Tag, NULL as Parent,
            Customers.CustomerID as [Customer!1!CustomerID],
            NULL as [Order!2!OrderID]
FROM        Customers
UNION ALL
SELECT      2, 1,
            Customers.CustomerID,
            Orders.OrderID
FROM Customers
  JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY [Customer!1!CustomerID]
FOR XML EXPLICIT

-- produces this output document fragment
<Customer CustomerID="ALFKI">
   <Order OrderID="10643"/>
   <Order OrderID="10692"/>
   <Order OrderID="10702"/>
</Customer>


In SQL Server 2005, there are quite a few refinements and enhancements to FOR XML queries:

• There is a new dialect of FOR XML query called FOR XML PATH.

FOR XML can produce an instance of an XML type.

FOR XML is able to prepend the XML result with an inline schema in XSD schema format. The previous version of FOR XML could prepend only an inline XDR (XML Data Reduced) schema.

• You can select the namespace for the inline XSD schema referred to earlier.

• You can nest FOR XML queries.

• You can produce element-centric XML using FOR XML RAW.

• You can choose to generate xsi:nil for NULL database values rather than leave that element out of the XML result entirely.

You can produce a root element for the XML fragment, making it an XML document.

• There is improved whitespace handling through entitization—that is, representing characters as XML entity references, using the numeric value for the character.

• There are subtle improvements to the algorithm for determining nesting in FOR XML AUTO.

Let’s explore some of these features to see how they would be useful.

FOR XML PATH Mode

SQL Server 2005’s choices of XML output are rich and varied. FOR XML RAW and FOR XML AUTO produce two well-known but “static” XML documents. There is a maximum of one level of nesting with FOR XML RAW; FOR XML AUTO requires that all columns selected from the same table occur at the same nesting level. With RAW and AUTO modes, you must choose either element normal form or attribute normal form. Mixing elements and attributes in the same document requires FOR XML EXPLICIT, which is quite a bit more complex to write. FOR XML PATH is a new mode that gives you more control over nesting levels and mixing attributes and elements. It combines the ease of coding of AUTO and RAW modes with the power of EXPLICIT mode. In fact, it should be possible to code almost all the document formats that require EXPLICIT mode (or postquery XML transformation) using FOR XML PATH mode.

With PATH mode, you shape the XML document by using column aliases that contain XPath expressions. When PATH mode sees an alias that contains a forward slash, it creates another level of hierarchy in the output document. Listing 9-11 shows an example using the authors table that combines the first and last name into a single name element and makes au_id an attribute by using PATH mode.

Listing 9-11. FOR XML PATH syntax and results


WITH XMLNAMESPACES('http://example.org/person/names' as nam)
SELECT au_id AS [@authorid],
       au_fname AS [nam:name/nam:firstname],
       au_lname AS [nam:name/nam:lastname]

  FROM authors
  WHERE au_id > '998'
  FOR XML PATH
GO

-- this produces the following document fragment:
<row xmlns:nam="http://example.org/person/names" authorid="998-72-3567">
  <nam:name>
    <nam:firstname>Albert</nam:firstname>
    <nam:lastname>Ringer</nam:lastname>
  </nam:name>
</row>


In addition to being able to mix elements and attributes and create new hierarchy levels, you can use the following XPath node test functions:

node()—The content is inserted as a text node. If the content is a complex UDT, the entire tree is inserted. You can also use * (asterisk) as a shortcut for node().

text()—The content is inserted as a text node, but this produces an error if the column’s data type is UDT or XML.

data()—The content is inserted as an atomic value followed by a single space. This allows you to produce lists of element and attribute values.

comment()—This produces an XML comment using the value.

processing-instruction()—This produces an XML processing instruction using the value.

You can also specify namespaces in your output document by using namespace-to-namespace prefix mappings and the SQL:2003 standard "WITH XMLNAMESPACES" syntax. Long pathnames are subject to the limitation that SQL Server aliases can be up to 128 Unicode characters long.

Producing an XML Data Type

It is now possible to use the XML fragments or the documents produced by FOR XML to populate an XML data type column in a table or an XML variable or procedure parameter. This can be done via a few different methods.

First, we can set the result of a SELECT...FOR XML query to a variable of XML data type, like this:

-- declare a variable of type XML
DECLARE @x XML
-- now, write to it,
-- NOTE: parentheses around the SELECT statement are required
SET @x = ( SELECT * FROM authors FOR XML AUTO, TYPE )

You can also use FOR XML queries to produce input to a table using INSERT INTO ... SELECT syntax, like this:

-- create a table
CREATE TABLE xml_tab(id INT IDENTITY PRIMARY KEY, xml_col XML)
GO

-- populate it with a FOR XML query
DECLARE @x XML
SET @x = ( SELECT * FROM authors FOR XML AUTO, TYPE )
INSERT INTO xml_tab VALUES (@x)
GO

Finally, because the XML type is a distinct type, you may want to return it to the caller as a VARCHAR or NVARCHAR data type. You can use the result of a FOR XML query for this. Here’s an example:

DECLARE @x NVARCHAR(max)
-- it is implement important NOT to use TYPE qualifier here
-- or else a conversion will be required
SET @x = (SELECT * FROM pubs.dbo.authors FOR XML RAW)

Using a VARCHAR or NVARCHAR data type differs from using the TYPE qualifier in FOR XML in that the TYPE qualifier will raise errors if invalid characters and non–well-formed fragments are created.

Producing an XML data type column drastically affects how the data is presented to clients. When you leave out the TYPE specifier, FOR XML queries produce a rowset of data that is made into a single stream in the client libraries rather than appearing as a one-column, one-row rowset. The .NET Framework library Microsoft.Data.SqlXml, introduced with the SQLXML 3.0 Web release, is a .NET Framework wrapper around the unmanaged OLE DB code to enable .NET Framework clients to process the XML stream. Although this stream of data appeared in Query Analyzer as though it were a one-column rowset with a single column named with a specific GUID, this was only a limitation of Query Analyzer. The special GUID was an indication to the client libraries that the data actually should appear as a stream.

When the TYPE specifier is used, a FOR XML query does produce a one-row, one-column rowset. This should reduce the confusion for client-side programmers who were never quite comfortable with a SQL statement that produced a stream of XML. SQL Server 2005 can now produce an XML data type column that can be consumed with “ordinary” rowset-handling code. See Chapter 13 for examples of handling XML data type columns from the client.

Producing an Inline XSD Format Schema

FOR XML queries now can include an inline schema that describes the resultset in XML Schema Definition format. This works only with RAW and AUTO modes. At the time that SQL Server 2000 was released, the XSD schema was not yet a W3C standard. Rather than support an intermediate version of XSD, SQL Server 2000 was able to prepend an XDR format schema. XDR is a Microsoft-specific precursor of the XSD schema specification designed around OLE DB; it was submitted to the W3C as a note prior to the standardization of XSD. XDR is still used by products like BizTalk and APIs like ADO classic. XDR is supported, though deprecated, in ADO.NET as well. Listing 9-12 shows how to prepend an XDR schema or XSD schema to a SELECT...FOR XML result.

Listing 9-12. Prepending schemas to FOR XML results


-- prepend an XDR schema
SELECT * FROM authors
  FOR XML AUTO, XMLDATA

-- prepend an XSD schema (new in SQL Server 2005)
SELECT * FROM authors
  FOR XML AUTO, XMLSCHEMA


ADO classic supports XDR schemas when used inline with ADODB.Recordsets; however, ADO is very picky about the exact dialect of XDR supported, requires specific XDR annotations, and is not compatible with FOR XML, XMLDATA. BizTalk did support using FOR XML, XMLDATA queries to describe its data, although BizTalk 2000 and later can support XSD schemas in addition to XDR. Schema generation is supported with the FOR XML RAW and FOR XML AUTO dialects.

ADO.NET 1 exposes a method, SqlCommand.GetXmlReader, that returns a document fragment from a SELECT...FOR XML query. In version 1, using the XMLDATA option to generate an inline schema was a requirement for using the returned XmlReader correctly to populate a System.Data.DataSet. Using the new FOR XML, XMLSCHEMA version should provide much better results, because the XSD support in .NET Framework (including ADO.NET) far outstrips XDR support. You should be able to use FOR XML, XMLSCHEMA queries in BizTalk and other Microsoft products as well. The instance schema produced includes a schema import for the "http://schemas.microsoft.com/sqlserver/2004/sqltypes" built-in namespace and indicates that it can be found at "http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd". This schema contains the mapping of SQL Server’s type system to XML schema types. These “instance schema plus rowset” elements can be wrapped in SOAP packets and used manually in a Web Service; we’ll discuss Web Services support in SQL Server 2005 in more detail in Chapter 12.

For producing interoperable XML in this manner, the picture is a little less rosy. The XML schema specification does not mandate the way in which an XML document locates its schema during validation. Although the XSD specification provides the attribute xsi:schemaLocation (in the XSI namespace described earlier), XML processors are not required to support even this mechanism, and the location of a schema is completely implementation defined. SQL Server uses a set of precatalogued system and user schemas when doing its own validation.

What we’re getting to is that very few non-Microsoft XML processors or tools recognize inline schemas and will use them to do schema validation. The XML editor XML Spy is a notable exception. So although inline schemas are fine in an environment where they will be consumed by Microsoft tools, they are not interoperable. Although it would be inconvenient to use, a generic XSLT or XQuery program could be used to split out the inline schema.

NULL Database Values

The XML model, especially XML schema, handles missing or unknown values differently from SQL. SQL specifies that both missing and unknown values are represented as NULL. In an XML schema, the definition for an attribute that could be missing is specified as use="optional"; therefore, in FOR XML queries, NULL attribute values are simply omitted. When FOR XML AUTO, ELEMENTS, or FOR XML RAW, ELEMENTS is specified, though, there are two choices for the XML representation. By default, when the database contains a NULL value, the element in the FOR XML result is simply omitted. In an XML schema, this representation format would be defined in XML as an element with the "maxOccurs=1" and "minOccurs=0" facets.

With elements, in the SQL Server 2005 version of FOR XML, we have a different choice. Rather than leave an element corresponding to a NULL database value out entirely, we can also specify that the FOR XML query use xsi:nil="1" to indicate an XML nil value. An example should clarify the choices. We have created and populated a simple table as follows:

CREATE TABLE students (
  id INTEGER, name VARCHAR(50), major VARCHAR(20) NULL)

INSERT students VALUES(1, 'Bob Smith', 'Chemistry')
INSERT students VALUES(2, 'Fred Jones', NULL)

Using the query SELECT * FROM students for XML AUTO, ELEMENTS in SQL Server Management Studio yields the following result:

<students>
  <id>1</id>
  <name>Bob Smith</name>
  <major>Chemistry</major>
 </students>
 <students>
  <id>2</id>
  <name>Fred Jones</name>
 </students>

Note that Fred Jones’s major element is simply missing. Using the query SELECT * FROM students FOR XML AUTO, ELEMENTS XSINIL yields the following results:

<students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <id>1</id>
  <name>Bob Smith</name>
  <major>Chemistry</major>
 </students>
 <students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <id>2</id>
  <name>Fred Jones</name>
  <major xsi:nil="1" />
 </students>

Using xsi:nil="1" indicates that the value of Fred Smith’s major is nil. Because some Web Service toolkits use xsi:nil (and expect it in SOAP messages that are sent to them), this is a nice option to have when generating XML.

Producing a Root Element

By default, FOR XML queries produce XML fragments—that is, otherwise-well-formed XML that lacks a root element. APIs that expect an XML document, such as XmlDocument.Load, will fail in attempting to load the fragment. The reason for this behavior is that output from multiple FOR XML queries can be composed into a single document; the client-side data access API (such as Microsoft.Data.SqlXml in SQLXML 3.0 and above) is expected to add the root element. For users and libraries that do not expose a method to add the root element, now you can add it using the ROOT directive of a FOR XML query. You are allowed to name the root element anything you want. The syntax is shown here:

-- this query:
SELECT Customers.CustomerID, Orders.OrderID
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerID
FOR XML AUTO, ROOT('NorthwindCustomers')

-- produces the following XML document (not a fragment)
<NorthwindCustomers>
  <Customers CustomerID="ALFKI">

    <Orders OrderID="10643" />
    <Orders OrderID="10692" />
    <Orders OrderID="10702" />
    <Orders OrderID="10835" />
  </Customers>
  <Customers CustomerID="ANATR">
    <Orders OrderID="10308" />
  </Customers>
  <!-- some rows omitted -->
</NorthwindCustomers>

Other Features

Two features that may need more explanation are whitespace entitization and nested XML queries. Whitespace entitization is an improvement on the way in which the SQL Server 2000 FOR XML generation algorithm treats carriage returns and line feeds in the output XML. SQL Server 2000 renders carriage returns and line feeds as their native hexadecimal characters, causing problems with parsers that expect these characters to be XML entities. In SQL Server 2005, the carriage return, for example, is encoded as &#xD; this improves fidelity on the client side when processing the XML but is incompatible with your current FOR XML applications. To retain SQL Server 2000 compatibility for whitespace compatibility, use the version of CONVERT that acts like xml:space="preserve" has been specified. Here’s an example:

-- CONVERT a string to XML then CONVERT it back to NVARCHAR(10)

SELECT CONVERT(NVARCHAR(10), CONVERT(XML, '<b>  </b>', 1), 1)

→ <b>  </b>  -- retain whitespace, no entitization
SELECT CONVERT(NVARCHAR(10), CONVERT(XML, '<b>  </b>', 0), 0)
→ <b/>       -- don't retain whitespace
SELECT CONVERT(NVARCHAR(10), CONVERT(XML, '<b>  </b>', 0), 1)
→ <b/>       -- don't retain whitespace
SELECT CONVERT(NVARCHAR(10), CONVERT(XML, '<b>  </b>', 1), 0)
→ <b> &#x20;</b> -- retain whitespace, SQL Server 2005 entitization

In SQL Server 2005’s FOR XML processing, you can use nested queries to produce levels of nesting in your XML. These are similar to subqueries in SQL, except that the resultset is not flat but produces multiple nested levels of hierarchy. Using the stores and discounts tables in the pubs database, the following query

SELECT stor_id, stor_name, state,
 (SELECT discounttype, discount FROM discounts d
   WHERE d.stor_id = s.stor_id
   FOR XML AUTO, ELEMENTS, TYPE)
FROM stores s
ORDER BY s.stor_id
FOR XML AUTO, ELEMENTS

will yield the following nested XML:

<s>
  <stor_id>6380</stor_id>
  <stor_name>Eric the Read Books</stor_name>
  <state>WA</state>
 </s>
 <!-- some elements omitted here -->
 <s>
  <stor_id>8042</stor_id>
  <stor_name>Bookbeat</stor_name>
  <state>OR</state>
  <d>
   <discounttype>Customer Discount</discounttype>
   <discount>5.00</discount>
  </d>
</s>

Mapping SQL and XML Data Types

Throughout this chapter, we’ve been able, through the new XML data type, to mix XML and relational types at will, sometimes even in the same query. SQL types can be used in the production of XML data types, and XML data types and XML queries against these types can return output that is usable in T-SQL, perhaps even as input to complex .NET Framework types. At this point, before we discuss the composition and decomposition functionality, it behooves us to realize that we are actually dealing with two different type systems. These are the SQL type system, as defined by SQL:1999; and the XML type system, as defined by the XML 1 and Namespaces, the XPath 2 and XQuery 1 data model, and XML Schema specifications. We are mostly concerned with the XML Schema specification, because this is the heart of the XML type system definitions.

XML Schema is a rich type system, encompassing simple types (similar to SQL types), XML types (from XML’s SGML roots), complex types using object-oriented type principles like the .NET Framework type system, and SQL:1999 OBJECT data types. Some of the constructs go beyond the bounds of the current SQL type system, even when SQL:1999 complex types are included. An XML facet, for example, can indicate that an array of elements always consists of exactly five or six elements (minOccurs=5, maxOccurs=6); nothing in SQL:1999 is analogous. In this section, we go over the idiosyncracies and edge cases in mappings showing mappings from SQL types to XML types, and vice versa. We’ll also defer the subject of mapping XML data types to SQL types until Chapter 10, because this is used in XQuery and to a lesser extent in OpenXML.

Mapping SQL Types to XML Types

You can produce XML types from the underlying SQL types when using these features of SQL Server 2005:

FOR XML queries

sql:variable and sql:column in server-side XQuery functions on the XML data type (discussed in Chapter 10)

• Producing SOAP messages from SQL Server (discussed in Chapter 12)

SQL Server defines a specific XML Schema to codify the mapping, using http://schemas.microsoft.com/sqlserver/2004/sqltypes as a namespace, which is close to the SQL:2003 Part 14 ANSI standard (more on the standard later). This schema provides a direct mapping of SQL Server types to XML types; therefore, it refers to SQL Server data types that are not explicitly mentioned in the spec and leaves out types that SQL Server does not support. In general, this schema defines an XML SimpleType named after the corresponding SQL Server type but derived by restriction in XML. The SQL Server CHAR data type, for example, is represented as follows:

<xsd:simpleType name="char">
      <xsd:restriction base="xsd:string"/>
</xsd:simpleType>

String, Binary, and Decimal Types

Character-based data types in SQL Server are mapped to the xsd:string data type. Each column in a SQL Server table or a SELECT statement can have character types of a different length (CHAR(x)) or maximum length (VARCHAR(x)). In the SQL Server specific sql types schema, these types are mapped to an xsd:string with no xsd:maxLength or xsd:length facet:

<!-- other data types elided for clarity ->
<!-- from http://schemas.microsoft.com/sqlserver/2004/
sqltypes -->
<xsd:simpleType name="varchar">
            <xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:simpleType name="nvarchar">
            <xsd:restriction base="xsd:string"/>
</xsd:simpleType>

In specific resultsets or columns, you would add the maxLength facet or, in the case of CHAR, the xsd:length and xsd:length facets, just as you would in SQL Server:

<!-- a column that is defined as NVARCHAR(40) -->
<xsd:element name="first_name" minOccurs="0">
            <xsd:simpleType>
                  <xsd:restriction base="sqltypes:nvarchar">
                             <xsd:maxLength="40"/>
                        </xsd:restriction>
            </xsd:simpleType>
</xsd:element>

It seems odd at first glance to group binary and decimal data types with the character data types. The reason for doing this is that binary data types can have a maximum length but can be variable. Decimal data types can have a variable precision and scale. In mapping to SQL Server, these are approached the same way that character data types are. In general, the binary data types (binary, varbinary, and image in SQL Server) are defined as xsd:base64Binary. The decimal data type maps to xsd:decimal. When specific instances are referred to in resultset schemas, the correct maxLength, precision, and scale facets are added.

Other General Data Types

The integral data types, bit data type, and float data types have almost exact mappings to XSD data types. In the case of integral types, they will have restrictions based on the value space. The SQL Server data type INT, for example, has a value space that is slightly different from xsd:integer. Another case is the SQL Server MONEY and SMALLMONEY data types, which map to decimal types with specific maxInclusive and minInclusive facets:

<xsd:simpleType name="smallmoney">
      <xsd:restriction base="xsd:decimal">
            <xsd:totalDigits value="10"/>
            <xsd:fractionDigits value="4"/>
            <xsd:maxInclusive value="214748.3647"/>
            <xsd:minInclusive value="-214748.3648"/>
      </xsd:restriction>
</xsd:simpleType>

Date Data Type

The SQL:2003 standard maps date data types directly to XSD-equivalent data types—that is, types like xs:date, xs:time, and xs:timestamp. Many implementations (including SQL Server) actually store date values as a number of seconds since a “magic date” and time values with a given precision. The standard for SQL data types does specify the range of the date data type and the precision of the time data type, although SQL Server’s SQL datetime data type does not implement either standard. The XSD-to-SQL type mappings accommodate specifying different ranges by using annotations; SQL Server’s implementation of mapping uses the xs:pattern, xs:minOccurs, and xs:maxOccurs facets.

Note that although SQL:2003 Part 14 is straightforward in its mapping of date, time, and datetime data types, XQuery 1 and XPath have different representations of these types that must include a time-zone specifier. The two data models are not aligned. This complicates matters when using SQL Server datetime with SQL Server XQuery.

Pattern-Based Data Types

Some SQL Server data types do not have an exact, or even an approximate, value-space–based relationship to any corresponding XSD data type. Examples of this include SQL Server’s GUID data type and SQL Server’s date-based data types, which are not based at all on the ISO8601 standard date used by XSD. These types are mapped to XSD by using the XSD pattern facet and considered a derivation by restriction of either xsd:string (in the case of GUID) or xsd:dateTime. As an example, here is the mapping of a GUID:

<xsd:simpleType name="uniqueidentifier">
  <xsd:restriction base="xsd:string">
     <xsd:pattern value="([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]
{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})|({[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-
[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}})"/>
            </xsd:restriction>
      </xsd:simpleType>
</xsd:simpleType>

Wildcard Data Types

The two SQL Server types that must be represented as wildcards are SQL_VARIANT and the SQL Server XML type. SQL_VARIANT must map to xsd:any because there is no simple combination of restrictions that would cover all the different possibilities. SQL Server’s XML type would map to xsd:any with a possible wildcard schema region—that is, any in a specific namespace. This is even more straightforward in the case of typed XML instances.

Nullability

SQL Server type instances, when defined as columns in tables, can be defined as NULL or NOT NULL. NOT NULL is the default in XSD schemas, as represented by the default facets maxOccurs=1 and minOccurs=1. Data types that are declared NULL in SQL Server tables must be represented as minOccurs=0. NULL values in XML can be represented simply by omitting the element or by specifying an empty element with the attribute xsi:type=nil. When using SQL Server’s FOR XML, you can choose either option using the new ELEMENTS XSINIL directive discussed earlier in this chapter. Note that there is also a new column directive, !xsinil, for the FOR XML EXPLICIT mode.

OpenXML Enhancements

SQL Server 2000 provides a system-defined function, OpenXML, that creates Rowsets from XML documents. This allows an XML document to be decomposed into possibly multiple Rowsets. These Rowsets can be exposed as resultsets or used with the INSERT INTO...SELECT statement to insert rows into one or more relational tables. This is also known as shredding an XML document. SQL Server 2005 provides an alternative, preferred way to produce a rowset from XML: the xml.nodes function. We’ll talk about the xml.nodes function in Chapter 10.

OpenXML requires an XML document handle as input; this handle is produced by using a system stored procedure, sp_xml_preparedocument. OpenXML uses a subset of XPath 1 to indicate which nodes to select as rows and also to indicate which elements and attributes correspond to columns in the Rowset. An example of using OpenXML is shown in Listing 9-13.

Listing 9-13. Using OpenXML to insert rows


DECLARE @h int
DECLARE @xmldoc VARCHAR(1000)

SET @xmldoc =
'<root>
<stores stor_id="8888" stor_name="Bob''s Books"
     stor_address="111 Somewhere" city="Portland"
     state="OR" zip="97225">
</stores>
<stores stor_id="8889"
     stor_name="Powell''s City Of Books"
     stor_address="1005 W Burnside" city="Portland"
     state="OR" zip="97209">
</stores>
</root>'

EXEC sp_xml_preparedocument @h OUTPUT, @xmldoc

INSERT INTO stores
SELECT * FROM OpenXML(@h,'/root/stores')
WITH stores

EXEC sp_xml_removedocument @h


There are two changes to OpenXML for SQL Server 2005:

• The XML data type is also supported as an output column or an overflow column in the OpenXML WITH clause.

• You can pass an XML data type variable directly into sp_xml_preparedocument as long as it contains an XML document (not a fragment).

Another improvement in SQL Server 2005 that should assist in OpenXML processing is the introduction of the VARCHAR(MAX) and XML data types. In SQL Server 2000, although you could pass in TEXT fields as procedure parameters, you could not operate on them or generate them inside the procedure. Because in SQL Server 2005, you can have parameters of VARCHAR(MAX) and XML data types, you can process these types before using them in OpenXML processing. In addition, because SELECT...FOR XML now produces variables of the XML data type with a root element, you can generate the input to OpenXML from a FOR XML query.

You must take care when using OpenXML, because it produces Rowsets from an XML DOM (document object model). The integer returned from sp_xml_preparedocument is actually a pointer to an instance of an XML DOM. This object (a COM object in the current implementation) is very memory intensive. SQL Server will produce errors if a serverwide memory limit for XML processing is exceeded. Using the XML data type as input to sp_xml_preparedocument produces a DOM even though the XML data type may already be parsed.

Loading XML into the Database from Files

For loading large XML documents into the database, parsing the documents into a DOM just to use OpenXML is very memory intensive. Users are encouraged to use the XML Bulk Load facility on the client side. This Bulk Load facility works by reading the XML and producing SQL statements on the client side, which are sent to SQL Server as a batch.

In SQL Server 2005, you can also insert XML data stored in flat files directly into XML data type columns by using SQL BULK INSERT–type functionality on the server. This saves on network traffic because you specify the exact SQL statement to be used rather than have client Bulk Load shred an XML document into relational tables by using a mapping schema.

Loading XML on the server is accomplished by using the system rowset provider function OPENROWSET and specifying the new BULK provider. The BULK provider can also load multiple rows of XML data into XML columns from a single file. Given a file that looks like this

1            <Root><Invoice InvoiceID="12345" /></Root>
2            <Root><Invoice InvoiceID="13579" /></Root>

and a two-column table defined in SQL Server as follows,

CREATE TABLE invoices (rowid INTEGER PRIMARY KEY, invoicedoc XML)

this SQL Server statement uses the BULK provider to populate the invoices table:

INSERT invoices
 SELECT *
 FROM OPENROWSET
  (Bulk 'c:myinvoices.txt', formatfile = 'c:cpformat.xml') AS X
GO

As with other BULK INSERT operations, the new syntax requires a format file. Here’s a format file for the simple example:

<BCPFORMAT
  xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=" " MAX_LENGTH="12" />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=" " />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="IntCol" xsi:type="SQLINT" />
  <COLUMN SOURCE="2" NAME="XmlCol" xsi:type="SQLCHAR" />
 </ROW>
</BCPFORMAT>

If your table contains only a single XML column to be inserted (for example, if the rowid in the previous invoices table is an identity column), you can use the bulk copy SINGLE_CLOB or SINGLE_BLOB option, like this:

INSERT invoices2
 SELECT *
 FROM OPENROWSET
  (Bulk 'c:invoice1.xml', SINGLE_BLOB) AS X
go

Usage of SINGLE_BLOB is like SINGLE_CLOB or SINGLE_NCLOB, but it does not have conflicting codepage encoding issues. When your flat file, which might be obtained from a customer using BizTalk, is loaded into a SQL Server data type column, it can be parsed into relational tables by using OpenXML. A temporary table with an XML column might be used as a way to load large documents for later processing.

ANSI SQL Standard Compliance

In addition to the prolific specifications for everything XML guided by the W3C, the ANSI/ISO SQL committee has gotten into the act with an effort to standardize the integration of XML and relational databases. This series of specifications was started under the auspices of the SQLX committee of database vendors but has been subsumed as Part 14 of the SQL:2003 specification. A committee that includes representatives from Microsoft, Oracle, and IBM, among others, is working on this part of the SQL spec. This specification touches on a number of subjects that we’ve discussed in this chapter, and it’s interesting to look at how the implementation relates to the specification.

XML Data Type

The XML data type in SQL Server 2005 conforms to the SQL:2003 specification. In fact, one of the architects on the SQL Server XML team is a member of the SQL:2003 standardization committee. The XML type is defined as a new scalar data type that can be used in variables, procedure parameters, and columns in tables. The limitations of SQL Server’s XML data type (for example, two instances of the XML data type cannot be compared for equality) are specification compliant. Casting and converting to and from other data types differs from the spec, which mandates the XMLPARSE and XMLSERIALIZE methods.

The data model of the XML data type defined by Part 14 is based on a slightly extended XML information set model, where the document information item is replaced by a root information item that roughly corresponds to the document node in the XPath 2/XQuery 1 data model. This data model permits an XML data type to contain XML documents, document fragments, and top-level text elements. The SQL Server XML data type adheres to this standard, but this standard is not the same as the XPath 2.0/XQuery 1.0 data model. The XML Schema specification permits “lax” validation of its instances—that is, for an instance of the XML, if you can find the schema definition for an element, you do strict validation and throw an error if it doesn’t validate correctly, but if you can’t find the definition, you “skip” validate. When using schema validation, SQL Server does not permit “lax” validation and allows only “strict” or “skip” validation. The SQL:2003 spec distinguishes two variations of the XML data type: XML document with prolog and XML content. The difference between the two is used in defining how concatenating two XML types should behave. SQL Server allows you to make a similar distinction (document versus content) between schema-validated XML data types but does not allow concatenation of two XML data type instances directly, though you can cast each one to a character data type, concatenate the character types, and cast the result to the XML data type:

DECLARE @x1 XML, @x2 XML, @x3 XML

SELECT @x1 = '<doc1></doc1><doc2></doc2>'
SELECT @x2 = '<doc3/>'

-- this is permitted by the spec
-- but produces an error in SQL Server
SELECT @x3 = @x1 + @x2

-- this works
DECLARE @v1 VARCHAR(MAX)
SET @v1 = (CAST(@x1 AS VARCHAR(MAX))) + (CAST(@x2 AS VARCHAR(MAX)))
SET @x3 = @v1

The ANSI SQL spec indicates that XML data type values probably will be stored in either UTF-8 or UTF-16, although alternative encodings will be permitted. The spec also defines that XML data type columns cannot use the SQL COLLATION keyword or functions; this is consistent with SQL Server 2005 behavior.

Finally, the ANSI SQL specification defines a series of XML data type constructor functions, such as XmlForest and XmlElement. SQL Server 2005 currently does not support these functions. SQL Server’s FOR XML TYPE keyword syntax, however, can be used to construct an XML data type; this syntax in SQL Server predates the specification and is easier to use.

In addition, in SQL Server 2005, XQuery constructor functions can be used to create an instance of an XML data type from SQL content using the XQuery sql:column function, as we’ll see in Chapter 10.

Mapping SQL Catalogs and Tables to XML

The ANSI SQL spec defines mappings of SQL catalogs, schemas, and tables to a virtual XML document, which can then be used to produce instances of an XML data type. The ANSI standard provides for both a table-as-document and a table-as-forest mapping. The ANSI standard mapping to a document is roughly equivalent to SQL Server’s SELECT * FROM table FOR XML RAW, ELEMENTS, ROOT('tablename'), where tablename is the actual name of the table. Mapping as a forest is roughly equivalent to SQL Server’s SELECT * FROM table FOR XML AUTO, ELEMENTS. SQL Server’s FOR XML capability provides a rich superset of the ANSI standard for in-the-database XML generation. The following example shows the table-as-document and table-as-forest documents that result when the standard is applied with a simple SQL table:

<!-- Mapping of a table in the ANSI spec -->
<!-- Also, it's the result of
<!-- "SELECT * FROM EMPLOYEE
      FOR XML RAW,ELEMENTS ROOT('EMPLOYEE')" -->
<!-- in SQL Server -->

<EMPLOYEE>
 <ROW>
  <EMPNO>000010</EMPNO>
  <FIRSTNAME>CHRISTINE</FIRSTNAME>
  <LASTNAME>HAAS</LASTNAME>
  <BIRTHDATE>1933-08-24</BIRTHDATE>
  <SALARY>52750.00</SALARY>
 </ROW>
 <ROW>
  <EMPNO>000020</EMPNO>
  <FIRSTNAME>MICHAEL</FIRSTNAME>
  <LASTNAME>THOMPSON</LASTNAME>
  <BIRTHDATE>1948-02-02</BIRTHDATE>
  <SALARY>41250.00</SALARY>
 </ROW>
.
.
.

</EMPLOYEE>

<!-- ANSI mapping as an XmlForest -->

<!-- Also the result of ->
<!-- "SELECT * FROM EMPLOYEE FOR XML AUTO,ELEMENTS -->
<!-- in SQL Server -->

<EMPLOYEE>
  <EMPNO>000010</EMPNO>
  <FIRSTNAME>CHRISTINE</FIRSTNAME>
  <LASTNAME>HAAS</LASTNAME>
  <BIRTHDATE>1933-08-24T00:00:00</BIRTHDATE> <!-- ISO 8606 -->
  <SALARY>52750.00</SALARY>
</EMPLOYEE>
<EMPLOYEE>
  <EMPNO>000020</EMPNO>
  <FIRSTNAME>MICHAEL</FIRSTNAME>
  <LASTNAME>THOMPSON</LASTNAME>
  <BIRTHDATE>1948-02-02T00:00:00</BIRTHDATE>
  <SALARY>41250.00</SALARY>
</EMPLOYEE>

In addition, the ANSI spec provides a method to convert SQL names to XML names. Many names that are valid (for tables and columns) in SQL are not valid XML names. The XML specification forbids certain constructs in names (such as names that begin with XML or contain spaces) that are valid SQL names, for example. The specification mandates replacing characters that are illegal in XML with “_xHHHH_,” where “HHHH” is the hexadecimal characters that make up the Unicode code point; for example, “hire date” in SQL becomes “hire_x0020_date” in XML. In addition, any SQL name that begins with “XML” is prefixed by “_xFFFF_”; for example, “xmlcol” becomes “_xFFFF_xmlcol”. SQL Server 2005’s FOR XML extension completely complies with the specification.

Mapping SQL Data Types to XML Data Types

The ANSI SQL standard also establishes a mapping of ANSI SQL types, which do not have a one-to-one correspondence to SQL Server data types. This has a namespace http://standards.iso.org/iso/9075/2003/sqlxml. Other than the set of types covered, there are a few differences between the ANSI specification and SQL Server’s implementation and the "http://schemas.microsoft.com/sqlserver/2004/sqltypes" schema.

The ANSI SQL spec permits both xsd:base64Binary and xsd:hexBinary as choices for representing a VARBINARY type. SQL Server chose xsd:base64Binary. The spec also allows representing nullable data as xsd:nillible=true and using xsi:nil on the empty element. SQL Server 2005 supports either type.

Finally, the ANSI SQL spec suggests the use of xsd:annotations (XSD comments) to indicate SQL constructs that cannot be described (such as data-specific types) or that have no meaning in XSD. For example:

<!-- annotation for database-specific SMALLINT data type in ANSI SQL -->
<xsd:annotation>
  <xsd:appinfo>
    <sqlxml:sqltype kind="PREDEFINED" name="SMALLINT"/>
  </xsd:appinfo>
</xsd:annotation>

The specification also states that “it is implementation defined whether these annotations are generated.” The SQL Server sqltypes schema does not use these annotations and instead adds attributes to schema elements to capture other SQL info. It is permitted to add attributes to an xml schema element, as long as they are not in the xsd namespace. Here is an example:

<xsd:simpleType>
  <xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033"
      sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
      sqltypes:sqlSortId="52">
    <xsd:maxLength value="10"></xsd:maxLength>
  </xsd:restriction>
</xsd:simpleType>

Where Are We?

In this chapter, we looked closely at the new XML data type, which provides integration of relational and XML functionality, in addition to permitting XML data to be stored in the database. This XML data can be security protected, can be XSD schema–validated, can take part in transactions, can be operated on by SQL utilities, and in general is a great step forward from storing XML in files on the file system.

We’ve also seen how SQL Server’s XML functionality adheres closely to XML standards and is an implementation of most of the new SQL:2003 Part 14 standard. This allows interoperability among not only Microsoft products, but also those from third-party and other large vendors.

In Chapter 10, we’ll investigate the newest XML query languages, XQuery 1.0 and XPath 2.0, and see how these work inside SQL Server.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.209