Generating a Schema from an XML Document

xsd can be used to generate a best-guess schema from any XML document. It will make certain assumptions about the structure of your document, based on the data found in the example you provide. For example, it will always set minOccurs to 1 and maxOccurs to unbounded for each element. It will also always use the xs:sequence compositor for lists of elements, even if your example XML document has elements in various orders. This can present the odd situation of the sample document used to generate the XSD failing validation with the XSD generated from it. Finally, the type attribute of each xs:element and xs:attribute element defaults to xs:string.

For these reasons, you should never take the generated XSD for granted. Always edit it to make sure it will fit your real requirements.

Using the purchase order document from Chapter 2, you can generate an XSD with the following command line:

xsd po1456.xml

You can go ahead and use XSD to generate the source code. I’ve already done so, and tweaked the generated code to ensure that this XSD validates the PO correctly. These edits are highlighted in Example 8-1. I intentionally introduced a couple of mistakes in my edits. I’ve done this to point out how XmlSchema validates an XSD, and I’ll explain that more in a moment.

Example 8-1. Generated XSD for purchase orders
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" 
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="po">
    <xs:complexType>
      <xs:attribute name="id" type="xs:ID" />
      <xs:sequence>
        <xs:element name="date">
          <xs:complexType>        
            <xs:attribute name="year" type="xs:string" />
            <xs:attribute name="month" type="xs:string" />
            <xs:attribute name="day" type="xs:string" />
          </xs:complexType>
        </xs:element>
        <xs:element name="address" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="name" type="xs:string" msdata:Ordinal="0" />
              <xs:element name="street" type="xs:string" maxOccurs="3" msdata:Ordinal="1" />
              <xs:element name="city" type="xs:string" msdata:Ordinal="2" />
              <xs:element name="state" type="xs:string" msdata:Ordinal="3" />
              <xs:element name="zip" type="xs:string" msdata:Ordinal="4" />
            </xs:sequence>
            <xs:attribute name="type" type="xs:string" />
          </xs:complexType>
        </xs:element>
        <xs:element name="items" minOccurs="2" maxOccurs="1">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="item" minOccurs="0" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="quantity" type="xs:string" />
                  <xs:attribute name="productCode" type="xs:string" />
                  <xs:attribute name="description" type="xs:string" />
                  <xs:attribute name="unitCost" type="xs:string" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="NewDataSet" msdata:IsDataSet="true">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element ref="po" />
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

There are a few pieces of this generated XSD that you should note. First is the inclusion of the namespace prefix msdata in the attributes msdata:Ordinal and msdata:IsDataSet. The urn:schemas-microsoft-com:xml-msdata namespace provides hints to the DataSet class when serializing an XML instance to a database.

Second is the NewDataSet element itself. This is used when generating source code for the XSD with the /dataset flag; the resulting source code will provide the definition of a subclass of System.Data.DataSet.

I’ll address both of these issues in depth in Chapter 9 and Chapter 11.

Given the generated XSD and the modifications to it, you can do two things. First, you can verify that it is a valid XML Schema after the changes. The program shown in Example 8-2 will do just that.

Example 8-2. Validation of an XML Schema
using System;
using System.IO;
using System.Xml.Schema;

public class ValidateSchema {
  public static void Main(string [ ] args) {
    ValidationEventHandler handler = new ValidationEventHandler(ValidateSchema.Handler);
    XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler);
    schema.Compile(handler);
  }

  public static void Handler(object sender, ValidationEventArgs e) {
    Console.WriteLine(e.Message);
  }
}

A ValidationEventHandler can be called in two places. The first, checking the XML Schema itself, happens on the following line:

XmlSchema schema = XmlSchema.Read(File.OpenRead(args[0]),handler);

XmlSchema.Read( ) reads the content of the XSD from a Stream, TextReader, or XmlReader, and takes a ValidationEventHandler delegate as its second parameter; the ValidationEventHandler is covered in Chapter 2. Any XML validation errors that arise while reading in the file will be reported to the ValidationEventHandler.

Tip

It’s important to note that the ValidationEventHandler handles two different aspects of checking a schema’s content; checking whether it contains valid XML, and verifying whether it constitutes an acceptable XSD. In Example 8-2, I’m using the same ValidationEventHandler for both checks, but they could be two separate delegates.

The second phase, validating the content of the XSD, happens here:

schema.Compile(handler);

In this phase, the content of the XSD is checked to make sure that it is really a valid instance of XML Schema. Its errors will also be reported to the ValidationEventHandler. With the XSD in Example 8-1, running this validator will produce the following output:

C:Chapter 8>ValidateSchema po.xsd
The content model of a complex type must consist of 'annotation'(if present) 
followed by zero or one of 'simpleContent' or 'complexContent' or 'group' or 'choice' 
or 'sequence' or 'all' followed by zero or more attributes or attributeGroups followed by 
zero or one anyAttribute. An error occurred at (6, 8).
minOccurs value cannot be greater than maxOccurs value. An error occurred at (25, 10).

Looking back, I made two mistakes. First, the id attribute of the po element is in the wrong place; the xsd:attribute element must come after the xsd:sequence element when defining an element. You can move the attribute into its proper place to avoid this error. This validation error was caught by the Read( ) method, because it is a case of the XML itself being invalid.

Tip

Granted, this error is a little contrived. xsd generated the elements in the correct order, but I moved the xsd:attribute element to make a point.

Second, the items element has minOccurs set to 3 and maxOccurs set to 1. In this case, the Compile( ) method caught my error, because the XSD was a well-formed XML document, although it did not constitute a sane XML Schema instance.

At the end of the program, you’ll notice that the entire XSD is loaded. Although it is not valid, it sits in memory, ready to be used. Rather than editing the schema on disk, you could have used the XmlSchema type’s methods to work with it and make it valid, as you’ll see later in this chapter.

You can now use the generated XSD, with the changes to correct my errors, to validate the document that was used to generate it. Example 8-3 shows a program that validates an XML document with an XSD, with a couple of interesting lines highlighted.

Example 8-3. Validation of an XML file with an XML Schema
using System;
using System.IO;
using System.Xml;
using System.Xml.Schema;

public class Validate {

private static bool valid = true;

  public static void Main(string [ ] args) {

    using (Stream stream = File.OpenRead(args[0])) {
      XmlValidatingReader reader = new XmlValidatingReader(new XmlTextReader(stream));
      reader.ValidationType = ValidationType.Schema;
      reader.Schemas.Add("", args[1]);
      reader.ValidationEventHandler += new ValidationEventHandler(Handler);
      
      while (reader.Read( )) {
        // do nothing
      }
    }
    if (valid) {
      Console.WriteLine("Document is valid.");
    }
  }

public static void Handler(object sender, ValidationEventArgs e) {
    valid = false;
    Console.WriteLine(e.Message);
  }
}

Take a look at the lines that are highlighted in the example:

reader.ValidationType = ValidationType.Schema;

This line sets the XmlValidatingReader’s ValidationType property to ValidationType.Schema. As I mentioned in the discussion of validation by DTD in Chapter 2, this alone is not enough to cause the document to be validated; the following line takes care of that:

reader.Schemas.Add("", args[1]);

This line adds the XSD whose name is passed in on the command line to the XmlSchemaCollection in XmlValidatingReader’s Schemas property. XmlSchemaCollection is just what it sounds like, a collection of schemas. Its Add( ) method has four overloads. The one used here takes two strings; the first is the namespace URI to which the schema applies, and the second is the name of the XSD file which will be read. Other overloads allow you to add an XmlSchema instance, an XmlReader, or an entire XmlSchemaCollection to the list. The document will be validated with each schema in the XmlSchemaCollection:

while (reader.Read( )) {
  // do nothing
}

These lines read and validate the XML document. Once XmlValidatingReader is told to validate the document, all you have to do is read it and it will be validated. The while loop need not do anything else.

It’s worth noting that, had you not validated my faulty XSD before attempting to validate an XML document with it, the same errors would have been found. There are two differences, however. First, only the first error would have been reported via an XmlSchemaException, rather than being handled with the ValidationEventHandler. Since exceptions are not being caught in this program, the errors would have short-circuited the XmlReader’s processing.

Second, the XSD is not explicitly being loaded into memory, so you would not have been given the opportunity to attempt to correct it (assuming your program had a way to do that, of course).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.96.247