xsd
can be used to generate a best-guess
schema from any XML document. It will make certain assumptions about
the structure of your document, based on the data found in the
example you provide. For example, it will always set
minOccurs
to 1
and
maxOccurs
to unbounded
for each
element. It will also always use the xs:sequence
compositor for lists of elements, even if your example XML document
has elements in various orders. This can present the odd situation of
the sample document used to generate the XSD failing validation with
the XSD generated from it. Finally, the type
attribute of each xs:element
and
xs:attribute
element defaults to
xs:string
.
For these reasons, you should never take the generated XSD for granted. Always edit it to make sure it will fit your real requirements.
Using the purchase order document from Chapter 2, you can generate an XSD with the following command line:
xsd po1456.xml
You can go ahead and use XSD to generate
the source code. I’ve already done so, and tweaked
the generated code to ensure that this XSD validates the PO
correctly. These edits are highlighted in Example 8-1. I intentionally introduced a couple of
mistakes in my edits. I’ve done this to point out
how XmlSchema
validates an XSD, and
I’ll explain that more in a moment.
<?xml version="1.0" encoding="utf-8"?> <xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="po"> <xs:complexType> <xs:attribute name="id" type="xs:ID
" /> <xs:sequence> <xs:element name="date"> <xs:complexType> <xs:attribute name="year" type="xs:string" /> <xs:attribute name="month" type="xs:string" /> <xs:attribute name="day" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="address" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string" msdata:Ordinal="0" /> <xs:element name="street" type="xs:string"maxOccurs="3
" msdata:Ordinal="1" /> <xs:element name="city" type="xs:string" msdata:Ordinal="2" /> <xs:element name="state" type="xs:string" msdata:Ordinal="3" /> <xs:element name="zip" type="xs:string" msdata:Ordinal="4" /> </xs:sequence> <xs:attribute name="type" type="xs:string" /> </xs:complexType> </xs:element> <xs:element name="items" minOccurs="2
" maxOccurs="1
"> <xs:complexType> <xs:sequence> <xs:element name="item" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:attribute name="quantity" type="xs:string" /> <xs:attribute name="productCode" type="xs:string" /> <xs:attribute name="description" type="xs:string" /> <xs:attribute name="unitCost" type="xs:string" /> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="NewDataSet" msdata:IsDataSet="true"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element ref="po" /> </xs:choice> </xs:complexType> </xs:element> </xs:schema>
There are a few pieces of this generated
XSD that you should note. First is the inclusion of the namespace
prefix msdata
in the attributes
msdata:Ordinal
and
msdata:IsDataSet
. The
urn:schemas-microsoft-com:xml-msdata
namespace
provides hints to the DataSet
class when
serializing an XML instance to a database.
Second is the
NewDataSet
element itself. This is used when
generating source code for the XSD with the
/dataset
flag; the resulting source code will
provide the definition of a subclass of
System.Data.DataSet
.
I’ll address both of these issues in depth in Chapter 9 and Chapter 11.
Given the generated XSD and the modifications to it, you can do two things. First, you can verify that it is a valid XML Schema after the changes. The program shown in Example 8-2 will do just that.
using System; using System.IO; using System.Xml.Schema; public class ValidateSchema { public static void Main(string [ ] args) { ValidationEventHandler handler = new ValidationEventHandler(ValidateSchema.Handler); XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler); schema.Compile(handler); } public static void Handler(object sender, ValidationEventArgs e) { Console.WriteLine(e.Message); } }
A
ValidationEventHandler
can be called in two
places. The first, checking the XML Schema itself, happens on the
following line:
XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler);
XmlSchema.Read( )
reads the content of the XSD from a
Stream
, TextReader
, or
XmlReader
, and takes a
ValidationEventHandler
delegate as its second
parameter; the ValidationEventHandler
is covered
in Chapter 2. Any XML validation errors that arise
while reading in the file will be reported to the
ValidationEventHandler
.
It’s important to note that the
ValidationEventHandler
handles two different
aspects of checking a schema’s content; checking
whether it contains valid XML, and verifying whether it constitutes
an acceptable XSD. In Example 8-2,
I’m using the same
ValidationEventHandler
for both checks, but they
could be two separate delegates.
The second phase, validating the content of the XSD, happens here:
schema.Compile(handler);
In this phase, the content of the XSD is checked to make sure that it
is really a valid instance of XML Schema. Its errors will also be
reported to the ValidationEventHandler
. With the
XSD in Example 8-1, running this validator will
produce the following output:
C:Chapter 8>ValidateSchema po.xsd The content model of a complex type must consist of 'annotation'(if present) followed by zero or one of 'simpleContent' or 'complexContent' or 'group' or 'choice' or 'sequence' or 'all' followed by zero or more attributes or attributeGroups followed by zero or one anyAttribute. An error occurred at (6, 8). minOccurs value cannot be greater than maxOccurs value. An error occurred at (25, 10).
Looking back, I made two mistakes.
First, the id
attribute of the
po
element is in the wrong place; the
xsd:attribute
element must come
after the xsd:sequence
element when defining an element. You can move the attribute into its
proper place to avoid this error. This validation error was caught by
the Read( )
method, because it is a case of the
XML itself being invalid.
Granted, this error is a little contrived. xsd
generated the elements in the correct order, but I moved the
xsd:attribute
element to make a point.
Second, the items
element has
minOccurs
set to 3
and
maxOccurs
set to 1
. In this
case, the Compile( )
method caught my error,
because the XSD was a well-formed XML document, although it did not
constitute a sane XML Schema instance.
At the end of the program, you’ll notice that the
entire XSD is loaded. Although it is not valid, it sits in memory,
ready to be used. Rather than editing the schema on disk, you could
have used the XmlSchema
type’s
methods to work with it and make it valid, as you’ll
see later in this chapter.
You can now use the generated XSD, with the changes to correct my errors, to validate the document that was used to generate it. Example 8-3 shows a program that validates an XML document with an XSD, with a couple of interesting lines highlighted.
using System; using System.IO; using System.Xml; using System.Xml.Schema; public class Validate { private static bool valid = true; public static void Main(string [ ] args) { using (Stream stream = File.OpenRead(args[0])) { XmlValidatingReader reader = new XmlValidatingReader(new XmlTextReader(stream)); reader.ValidationType = ValidationType.Schema; reader.Schemas.Add("", args[1]); reader.ValidationEventHandler += new ValidationEventHandler(Handler); while (reader.Read( )) { // do nothing } } if (valid) { Console.WriteLine("Document is valid."); } } public static void Handler(object sender, ValidationEventArgs e) { valid = false; Console.WriteLine(e.Message); } }
Take a look at the lines that are highlighted in the example:
reader.ValidationType = ValidationType.Schema;
This line sets the
XmlValidatingReader
’s
ValidationType
property to
ValidationType.Schema
. As I mentioned in the
discussion of validation by DTD in Chapter 2, this
alone is not enough to cause the document to be validated; the
following line takes care of that:
reader.Schemas.Add("", args[1]);
This line adds the XSD whose name is
passed in on the command line to the
XmlSchemaCollection
in
XmlValidatingReader
’s
Schemas
property.
XmlSchemaCollection
is just what it sounds like, a
collection of schemas. Its Add( )
method has four
overloads. The one used here takes two string
s;
the first is the namespace URI to which the schema applies, and the
second is the name of the XSD file which will be read. Other
overloads allow you to add an XmlSchema
instance,
an XmlReader
, or an entire
XmlSchemaCollection
to the list. The document will
be validated with each schema in the
XmlSchemaCollection
:
while (reader.Read( )) { // do nothing }
These lines read and validate the XML document. Once
XmlValidatingReader
is told to validate the
document, all you have to do is read it and it will be validated. The
while
loop need not do anything else.
It’s worth
noting that, had you not validated my faulty XSD before attempting to
validate an XML document with it, the same errors would have been
found. There are two differences, however. First, only the first
error would have been reported via an
XmlSchemaException
, rather than being handled with
the ValidationEventHandler
. Since exceptions are
not being caught in this program, the errors would have
short-circuited the XmlReader
’s
processing.
Second, the XSD is not explicitly being loaded into memory, so you would not have been given the opportunity to attempt to correct it (assuming your program had a way to do that, of course).
13.59.96.247