The XmlValidatingReader Class

The XmlValidatingReader class is an implementation of the XmlReader class that provides support for several types of XML validation: document type definitions (DTDs), XML-Data Reduced (XDR) schemas, and XML Schemas. The XML Schema language is also referred to as XML Schema Definition (XSD). DTD and XSD are official recommendations issued by the W3C, whereas XDR is simply the Microsoft implementation of an early working draft of XML Schemas that will be superseded by XSD as time goes by.

You can use the XmlValidatingReader class to validate entire XML documents as well as XML fragments. An XML fragment is a string of XML code that does not have a root node. For example, the following XML string turns out to be a valid XML fragment but not a valid XML document. XML documents must have a root node.

<firstname>Dino</firstname>
<lastname>Esposito</lastname>

The XmlValidatingReader class works on top of an XML reader—typically an instance of the XmlTextReader class. The text reader is used to walk through the nodes of the document, and then the validating reader gets into the game, validating each piece of XML based on the requested validation type.

Supported Validation Types

What are the key differences between the validation mechanisms (DTD, XDR, and XSD) supported by the XmlValidatingReader class? Let’s briefly review the main characteristics of each mechanism.

  • DTD

    A DTD is a text file whose syntax stems directly from the Standard Generalized Markup Language (SGML)—the ancestor of XML as we know it today. A DTD follows a custom, non-XML syntax to define the set of valid tags, the attributes each tag can support, and the dependencies between tags. A DTD allows you to specify the children for each tag, their cardinality, their attributes, and a few other properties for both tags and attributes. Cardinality specifies the number of occurrences of each child element.

  • XDR

    XDR is a schema language based on a proposal submitted by Microsoft to the W3C back in 1998. (For more information, see http://www.w3.org/TR/1998/NOTE-XML-data-0105.) XDRs are flexible and overcome some of the limitations of DTDs. Unlike DTDs, XDRs describe the structure of the document using the same syntax as the XML document. Additionally, in a DTD, all the data content is character data. XDR language schemas allow you to specify the data type of an element or an attribute.

  • XSD

    XSD defines the elements and attributes that form an XML document. Each element is strongly typed. Based on a W3C recommendation, XSD describes the structure of XML documents using another XML document. XSDs include an all-encompassing type system composed of primitive and derived types. The XSD type system is also at the foundation of the Simple Object Access Protocol (SOAP) and XML Web services.

DTD was considered the cross-platform standard until a couple of years ago. Then the W3C officialized a newer standard—XSD—which is, technically speaking, far superior to DTD. Today, XSD is supported by almost all parsers on all platforms. Although the support for DTD will not be deprecated anytime soon, you’ll be better positioned if you start migrating to XSD or building new XML-driven applications based on XSD instead of DTD or XDR.

As mentioned, XDR is an early hybrid specification that never reached the status of a W3C recommendation. It then evolved into XSD. The XmlValidating­Reader class supports XDR mostly for backward compatibility, as XDR is fully supported by the Component Object Model (COM)–based Microsoft XML Core Services (MSXML).

Note

The .NET Framework provides a handy utility, named xsd.exe, that among other things can automatically convert an XDR schema to XSD. If you pass an XDR schema file (typically, a .xdr extension), xsd.exe converts the XDR schema to an XSD schema, as shown here:

xsd.exe myoldschema.xdr

The output file has the same name as the XDR schema, but with the .xsd extension.


The XmlValidatingReader Programming Interface

The XmlValidatingReader class inherits from the base class XmlReader but implements internally only a small set of all the functionalities that an XML reader exposes. The class always works on top of an existing XML reader, and many methods and properties are simply mirrored.

The dependency of validating readers on an existing text reader is particularly evident if you look at the class constructors. An XML validating reader, in fact, can’t be directly initialized from a file or a URL. The list of available constructors comprises the following overloads:

public XmlValidatingReader(XmlReader);
public XmlValidatingReader(Stream, XmlNodeType, XmlParserContext);
public XmlValidatingReader(string, XmlNodeType, XmlParserContext);

A validating reader can parse only an XML document for which a reader is provided as well as any XML fragments accessible through a string or an open stream. In the section “Under the Hood of the Validation Process,” on page 89, we’ll look more closely at the internal architecture of an XML validating reader. In the meantime, let’s analyze more closely the programming interface of such a class, starting with properties.

XmlValidatingReader Properties

Table 3-1 lists the key public properties exposed by the XmlValidatingReader class. This table does not include those properties defined in the XmlReader base class for which the XmlValidatingReader class simply mirrors the behavior of the underlying reader. Refer to Chapter 2 for more information about the base properties of XmlReader.

Table 3-1. Key Properties of the XmlValidatingReader Class
Property Description
CanResolveEntity Always returns true because the XML validating reader can always resolve entities.
EntityHandling Indicates how entities are handled. Allowable values for this property come from the EntityHandling enumeration. The default value is ExpandEntities, which means that all entities are expanded. If set to ExpandCharEntities, only character entities are expanded (for example, &apos;). General entities are returned as EntityReference node types.
Namespaces Indicates whether namespace support is requested.
NameTable Gets the name table object associated with the underlying reader.
Reader Gets the XmlReader object used to construct this instance of the XmlValidatingReader class. The return value can be cast to a more specific reader type, such as XmlTextReader. Any change entered directly to the underlying reader object can lead to unpredictable results. Use the XmlValidatingReader interface to manipulate the properties of the underlying reader.
Schemas Gets an XmlSchemaCollection object that holds a collection of preloaded XDRs and XSDs. Schema preloading is a trick used to speed up the validation process. Schemas, in fact, are cached, and there is no need to load them every time.
SchemaType Gets the schema object that represents the current node in the underlying reader. This property is relevant only for XSD validation. The object describes whether the type of the node is one of the built-in XSD types or a user-defined simple or complex type.
ValidationType Indicates the type of validation to perform. Feasible values come from the ValidationType enumeration: Auto, None, DTD, XDR, and Schema.
XmlResolver Sets the XmlResolver object used for resolving external DTD and schema location references. The XmlResolver is also used to handle any import or include elements found in XSD schemas.

The validating reader uses the underlying reader to move around the document and implements most of its XmlReader-derived properties by simply mirroring the corresponding properties of the worker reader.

XmlValidatingReader Methods

Table 3-2 lists the methods exposed by the XmlValidatingReader class that are either new or whose behavior significantly differs from the corresponding methods of the XmlReader class.

Table 3-2. Public Methods of the XmlValidatingReader Class
Method Description
Read The underlying reader moves to the next node. At the same time, the validating reader gets the node information and validates it using the schema information and the previously cached information.
ReadTypedValue Gets the value for the underlying node as a common language runtime (CLR) type. The mapping can take place only for XSDs. Whenever a direct mapping is not possible, the node value is returned as a string.
Skip Skips the children of the current node in the underlying reader. You can’t skip over badly formed XML text, however. In the XmlValidatingReader class, the Skip method also validates the skipped content.

As you can see, the programming interface of the XmlValidatingReader class does not explicitly provide a single method that can validate the entire contents of a document. The validating reader works incrementally, node by node, as the underlying reader does. Each validation error found along the way results in a particular event notification being returned to the caller application. The application is then responsible for defining an ad hoc event handler and behaving as needed.

The ValidationEventHandler Event

The XmlValidatingReader class contains a public event named ValidationEventHandler, which is defined as follows:

public event ValidationEventHandler ValidationEventHandler;

This event is used to pass information about any DTD, XDR, or XSD schema validation errors that have been detected. The handler for the event (also named ValidationEventHandler) has the following signature:

public delegate void ValidationEventHandler(
   object sender,
   ValidationEventArgs e
);

The ValidationEventArgs class is described by the following pseudocode:

public class ValidationEventArgs : EventArgs
{
   public XmlSchemaException Exception;
   public string Message;
   public XmlSeverityType Severity;
}

The Message field returns a description of the error. The Exception field, on the other hand, returns an ad hoc exception object (XmlSchemaException) with details about what happened. The schema exception class contains information about the line that originated the error, the source file, and, if available, the schema object that generated the error. The schema object (the SourceSchema­Object property) is available for XSD validation only.

The Severity field represents the severity of the validation event. The XmlSeverityType defines two levels of severity—Error and Warning. Error indicates that a serious validation error occurred when processing the document against a DTD, an XDR, or an XSD schema. If the current instance of the XmlValidatingReader class has no validation event handler set, an exception is thrown. Typically, a warning is raised when there is no DTD, XDR, or XSD schema to validate a particular element or attribute against. Unlike errors, warnings do not throw an exception if no validation event handler has been set.

The XmlValidatingReader in Action

Let’s see how to validate an XML document. As mentioned, the XmlValidating­Reader class is still a reader class, so it proceeds with an incremental validation as nodes are actually read. The caller is notified of any schema exception found for a node by raising the ValidationEventHandler event. This section describes in detail how to validate an XML document, including initializing an XML reader, handling validation errors, and setting and detecting the validation types.

Initialization of the Reader

To validate the contents of an XML file, you must first create an XML text reader to work on the file and then use this reader to initialize an instance of a validating reader. A validating reader can be initialized using a living instance of an XmlReader class—typically, an XmlTextReader object—or using an XML fragment taken from a stream or a memory string, as shown here:

XmlTextReader _coreReader = new XmlTextReader(fileName);
XmlValidatingReader reader = new XmlValidatingReader(_coreReader);

You move around the input document using the Read method as usual. Actually, you use the validating reader as you would any other XML .NET reader. At each step, however, the structure of the currently visited node is validated against the specified schema and an exception is raised if an error is found.

To validate an entire XML document, you simply loop through its contents, as shown here:

private bool ValidateDocument(string fileName)
{
   // Initialize the validating reader
   XmlTextReader _coreReader = new XmlTextReader(fileName);
   XmlValidatingReader reader = new XmlValidatingReader(_coreReader);

   // Prepare for validation
   reader.ValidationType = ValidationType.Auto;
   reader.ValidationEventHandler += new 
      ValidationEventHandler(MyHandler); 
                  
   // Parse and validate all the nodes in the document
   while(reader.Read()) {}
            
   // Close the reader
   reader.Close();
   return true;
}

The ValidationType property is set to the default value—ValidationType.Auto. In this case, the reader determines what type of validation (DTD, XDR, or XSD) is required by looking at the contents of the file. The caller application is notified of any error through a ValidationEventHandler event. In the preceding code, the MyHandler procedure runs whenever a validation error is detected, as shown here:

private void MyHandler(object sender, ValidationEventArgs e)
{
// Logs the error that occurred
PrintOut(e.Exception.GetType().Name, e.Message);
}

Figure 3-1 shows the output of the sample program ValidateDocument. The list box tracks down all the errors that have been detected. The complete code listing for the sample application showing how to set up a validating parser is available in this book’s sample files.

Figure 3-1. The sample application dumps the most significant events of its life cycle: when parsing begins, when parsing ends, and all the validation errors that have been detected in between.


When you’ve finished with the validation process, you close the reader using the Close method. This operation also resets the reader’s internal state to Closed. Closing the validating reader automatically closes the underlying text reader. However, no exception is raised if you also attempt to programmatically close the internal reader. The Close method simply returns when it is called on a reader that is already closed.

Handling Validation Errors

If you need to know the details of validation errors, you must necessarily define an event handler and pass it along to the validating reader. Whenever an error is found, the reader fires the event and then continues to parse. As a result, the event fires for all the errors detected, thus giving the caller application a chance to handle the errors separately.

In some situations, you might want to know simply whether a given XML document complies with a given schema. In this case, you don’t need to know anything about the error other than the fact that it occurred. The following code provides a class with a static method named ValidateXmlDocument. This method takes the name of an XML file, figures out the most appropriate validation schema, and returns a Boolean value.

using System;
using System.Xml;
using System.Xml.Schema;

public class XmlValidator
{
   private static bool m_isValid = false;

   // Handle any validation errors detected
   private static void ErrorHandler(object sender, 
                                    ValidationEventArgs e)
   {
      // Go on in case of warnings
         if (e.Severity == XmlSeverityType.Error) 
            m_isValid = false;
   }

   // Validate the specified XML document (using Auto mode)
   public static bool ValidateXmlDocument(string fileName) 
   {
      XmlTextReader _coreReader = new XmlTextReader(fileName);
      XmlValidatingReader reader = new XmlValidatingReader(_coreReader);
      reader.ValidationType = ValidationType.Auto;
      reader.ValidationEventHandler +=          
         new ValidationEventHandler(XmlValidator.ErrorHandler); 
            
      // Parse the document
      try 
      {
         m_isValid = true;
         while(reader.Read() && m_isValid) {}
      }
      catch
      {
         m_isValid = false;
      }
                     
      reader.Close();
      return m_isValid;
   }
}

The ValidateXmlDocument method loops through the nodes of the document until the internal member m_isValid is false or the end of the stream is reached. The m_isValid member is set to true at the beginning of the loop and changes to false the first time an error is found. At this point, the document is certainly invalid, so there is no reason to continue looping.

Because the ValidateXmlDocument method is declared static (or Shared in Microsoft Visual Basic .NET), you don’t need a particular instance of the base class to issue the call, as shown here:

if(!XmlValidator.ValidateXmlDocument("data.xml"))
   MessageBox.Show("Not a valid document!");

Note

The reader’s internal mechanisms responsible for checking a document’s well-formedness and schema compliance are distinct. So if a validating reader happens to work on a badly formed XML document, no event is fired, but an XmlException exception is raised.


Setting the Validation Type

The ValidationType property indicates what type of validation must be performed on the current document. To be effective, the property must be set before the first call to Read. Setting the property after the first call to Read would originate an InvalidOperationException exception. If no value is explicitly assigned to the property, it defaults to the ValidationType.Auto value.

The ValidationType enumeration defines all the feasible values for the property, as listed in Table 3-3.

Table 3-3. Types of Validation
Type Description
None Creates a nonvalidating reader and ignores any validation errors
Auto Determines the most appropriate type of validation by looking at the contents of the document
DTD Validates according to the specified DTD
Schema Validates according to the specified XSD schemas, including in-line schemas
XDR Validates according to XDR schemas, including in-line schemas

When the validation type is set to Auto, the reader first attempts to locate a DTD declaration in the document. The DTD validation always takes precedence over other validation types. If a DTD is found, the document is validated accordingly. Otherwise, the reader looks for an XSD, either referenced or in-line. If no XSD is found, the reader makes a final attempt to find a referenced or an in-line XDR schema. If a schema is still not found, a nonvalidating reader is created. If more than one validation schema is specified in the document, only the first occurrence, in accordance with the order just discussed, is taken into account.

Detecting the Actual Validation Type

When the ValidationType property is set to Auto, you know at the end of the process whether the semantics of your XML document are valid. But valid against which schema? The Auto mode forces the parser to make various attempts until a validation schema type is found in the source code—whether it be DTD, XSD, or XDR. Is there a way to know what type of validation the parser is actually performing when working in Auto mode?

The validating reader class provides no help on this point, but with a bit of creativity you can easily identify the information you need. This information is not directly exposed, but it is right under your nose and can be inferred from the node type and the schema type without too much effort.

If the parser detects a node of type DocumentType, it can only be validating against a DTD. By definition, the DOCTYPE node must appear outside the information set (infoset). If no DOCTYPE node is found, check whether the SchemaType property evaluates to an XmlSchemaType object. This can happen only if an XML Schema Object Model (SOM) has been created, and hence only if XSD validation is taking place. The XmlSchemaType object has even more in store. By checking the contents of the SourceUri property, you can also determine whether the schema is in-line or a reference. If the schema is in-line, the SourceUri property matches the URI of the XML document being processed. Finally, if the validation type is neither DTD nor XSD, it can only be XDR! The following source code illustrates a function that determines the actual validation type:

string GetActualValidationType(XmlValidatingReader reader, 
                               string filename)
{
   string realValidationType = "";
   if(reader.ValidationType == ValidationType.Auto)
   {
      if(reader.NodeType == XmlNodeType.DocumentType)
         realValidationType = "Auto.DTD";
      else
      {
         if(reader.SchemaType is XmlSchemaType)
         {
            XmlSchemaType xst = (XmlSchemaType) reader.SchemaType; 
            string xsd = Path.GetFileName(xst.SourceUri);
            string doc = Path.GetFileName(filename);
            if (xsd == doc)
               realValidationType = "Auto.Schema.Inline";
            else
               realValidationType = "Auto.Schema.Ref (" + xsd + ")";
         }
      }
   }
   return realValidationType;
}

This code alone is not sufficient to produce the desired effect. It must be used in combination with the main parsing loop, as shown in the following code. The function should be called from within the loop as you read nodes, and at the end loop, you should check for the results. If neither DTD nor XSD has been detected, the document can be validated only through XDR.

string valtype = "";
while(reader.Read()) 
{
   if (valtype == "")
       valtype = GetActualValidationType(reader, filename);
}

// No DTD, no XSD, so it must be XDR...
if (valtype == "" && reader.ValidationType==ValidationType.Auto)
   valtype = "Auto.XDR";

Figure 3-2 shows how the ValidateDocument application implements this feature.

Figure 3-2. The ValidateDocument application determines the type of validation occurring under the umbrella of the Auto validation type.


Although it’s easy to use, the Auto option is the most expensive of all in terms of performance because it must first figure out what type of validation to apply. Whenever possible, you should indicate explicitly the type of validation required.

Note

When the ValidationType property is set to None, the DTD-specific DOCTYPE node, if present, is not used for validation purposes. However, default attributes in the DTD are correctly reported. General entities are not automatically expanded but can be resolved using the ResolveEntity method.


Events vs. Exceptions

The typical way to detect validation errors is by means of a validation event handler. If a validation event handler is specified, no validation exception is ever raised. In practice, once the reader has found an error, it looks for an event handler. If a handler is found, the handler raises the event; otherwise, it throws an XmlSchemaException exception.

For the reader class, handling an exception is much more expensive than firing an event, so use the ValidationEventHandler event whenever possible and do not abuse exceptions. Using exceptions automatically stops the validation process after the first error. As shown in the section “Detecting the Actual Validation Type,” on page 86, you can obtain the same behavior from the event by using a slightly smarter Boolean guard for the loop. Instead of using the following statement:

while(reader.Read());

you resort to this:

while(reader.Read() && !m_errorFound)

where the m_errorFound private member is updated in the body of the event handler according to what you want to do.

A Word on XML DOM

So far, we’ve looked exclusively at how the validation process works for XML readers. But what about the XmlDocument class for XML Document Object Model (XML DOM) parsing? How can you validate against a schema while building an XML DOM? We’ll examine XML DOM classes in detail in Chapter 5, but for now a quick preview, limited to validation, is in order.

The XmlDocument class—the key .NET Framework class for XML DOM parsing—uses the Load method to parse the entire contents of a document into memory. The Load method does not validate the XML source code against a DTD or a schema, however—Load can only check whether the XML is well-formed.

If you want to validate the in-memory tree while building it, use the following overload for the XmlDocument class’s Load method:

public override void Load(XmlReader);

You can create an XML DOM from a variety of sources, including a stream, a text reader, and a file name. If you load the document through an XML validating reader, you hit your target and obtain a fully validated in-memory DOM, as shown here:

XmlTextReader _coreReader = new XmlTextReader(fileName);
XmlValidatingReader reader = new XmlValidatingReader(_coreReader);
XmlDocument doc = new XmlDocument();      
doc.Load(reader);

As you’ll see in Chapter 5, in the .NET Framework, an XML DOM is built using an internal reader. The programming interface of the XmlDocument class, however, in some cases allows you to specify the reader to use. If this reader happens to be a validating reader, you are automatically provided with a fully validated in-memory DOM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.81.240