Under the Hood of the Validation Process

Before going any further with the details of DTD, XDR, and XSD validation, let’s review what happens under the hood of the validation process and how the XmlValidatingReader class really operates.

As mentioned, a validating reader works on top of a less-specialized reader, typically an XML text reader. You initialize the validating reader simply by passing a reference to this object. Upon initialization, the validating reader copies a few settings from the underlying reader. In particular, the properties BaseURI, Normalization, and WhiteSpaceHandling get the same values as the underlying reader. During the initialization step, an internal validator object is created to manage the schema information on a per-node basis.

Important

Although one of the XmlValidatingReader constructors takes an instance of the XmlReader class as its parameter, actually that reader can only be an instance of the XmlTextReader class, or a class that derives from it. You can’t use just any class that happens to inherit from XmlReader (for example, a custom XML reader). Internally, the XmlValidatingReader class assumes that the underlying reader is an XmlTextReader object and specifically casts the input reader to XmlTextReader. If you use XmlNodeReader or a custom reader class, you will not get an error at compile time, but an exception will be thrown at run time.


Incremental Parsing

The validation takes place as the user moves the pointer forward using the Read method. After the node has been parsed and read, it is passed on to the internal validator object for further processing. The validator object operates based on the node type and the validation type requested. The validator object makes sure that the node has all the attributes and children it is expected to have.

The validator object internally invokes two flavors of objects: the DTD parser and the schema builder. The DTD parser processes the contents of the current node and its subtree against the DTD. The schema builder builds a SOM for the current node based on the XDR or XSD schema source code. The schema builder class is actually the base class for more specialized XDR and XSD schema builders. What matters, though, is that XDR and XSD schemas are treated in much the same way and with no difference in performance.

If a node has children, another temporary reader is used to read its XML subtree in such a way the schema information for the node can be fully investigated. The overall diagram is shown in Figure 3-3.

In general, an XML reader might or might not resolve entities, but an XML validating reader always does so. The EntityHandling property defines how entities are handled. The EntityHandling property can take one of two values defined in the EntityHandling enumeration, as described in Table 3-4.

Figure 3-3. The validating reader coordinates the efforts of the internal reader, the validator, and the event handler.


Table 3-4. Ways to Handle Entities
Action Description
ExpandCharEntities Expands character entities and returns general entities as EntityReference nodes. You must then call the ResolveEntity method to expand a general entity.
ExpandEntities Default setting; expands all entities and replaces them with their underlying text.

A character entity is an XML entity that evaluates to a character and is expressed through the character’s decimal or hexadecimal representation. For example, A expands to A. Character entities are mostly used to guarantee the well-formedness of the overall document when this is potentially broken by that character.

A general entity is a normal XML entity that can expand to a string of any size, including a single character. A general entity is always expressed through text, even when it refers to a single character.

By default, the reader makes no distinction between the types of entities and expands them all when needed. By setting the EntityHandling property to ExpandCharEntities, however, you can optimize entity handling by expanding the general entities only when required. In this case, a call to Read expands only character entities. To expand general entities, you must resort to the ResolveEntity method or to GetAttribute, if the entity is part of an attribute.

The EntityHandling property can be changed on the fly; the new value takes effect when the next call to Read is made.

A Cache for Schemas

In the validating reader class, the Schemas property represents a collection—that is, an instance of the XmlSchemaCollection class—in which you can store one or more schemas that you plan to use later for validation. Using the schema collection improves overall performance because the various schemas are held in memory and don’t need to be loaded each and every time validation occurs. You can add as many XSD and XDR schemas as you want, but bear in mind that the collection must be completed before the first Read call is made.

To add a new schema to the cache, you use the Add method of the XmlSchemaCollection object. The method has a few overloads, as follows:

public void Add(XmlSchemaCollection);
public XmlSchema Add(XmlSchema);
public XmlSchema Add(string, string);
public XmlSchema Add(string, XmlReader);

The first overload populates the current collection with all the schemas defined in the given collection. The remaining three overloads build from different data and return an instance of the XmlSchema class—the .NET Framework class that contains the definition of an XSD schema.

Populating the Schema Collection

The schema collection actually consists of instances of the XmlSchema class—a kind of compiled version of the schema. The various overloads of the Add method allow you to create an XmlSchema object from a variety of input arguments. For example, consider the following method:

public XmlSchema Add(
   string ns,
   string url
);

This method creates and adds a new schema object to the collection.

The compiled schema object is created using the namespace URI associated with the schema and the URL of the source. For example, let’s assume that you have a clients.xsd file that begins as follows:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns="urn:my-company"
   elementFormDefault="qualified"

								   targetNamespace="urn:my-company">

The corresponding Add statement to insert the schema into the collection looks like this:

XmlTextReader _coreReader = new XmlTextReader(file);
XmlValidatingReader reader = new XmlValidatingReader(_coreReader);
reader.Schemas.Add("urn:my-company", "clients.xsd");

While validating, the XmlValidatingReader class identifies the schema to use for a given XML source document by matching the document’s namespace URI with the namespace URIs available in the collection. If the input document is an XDR schema, the source item to match in the schema collection is the contents of the xmlns attribute. If the input document is an XSD schema, the target­Namespace attribute in the XSD source code is used.

When you add a new schema to the collection and the namespace URI argument (the first argument) is null or empty, the Add method automatically brings in the value of the xmlns attribute if the source file is an XDR schema and the value of the targetNamespace attribute if you are adding an XSD schema, as shown here:

XmlTextReader _coreReader = new XmlTextReader(file);
XmlValidatingReader reader = new XmlValidatingReader(_coreReader);

								reader.Schemas.Add(null, "Clients.xsd");
reader.ValidationType = ValidationType.Schema;
reader.ValidationEventHandler += new ValidationEventHandler(MyHandler);

If the namespace URI you use already exists in the schema collection, the schema being added replaces the original one.

If necessary, you could also load the schema from an XML reader object by using the overload shown here:

public XmlSchema Add(
   string ns,
   XmlReader reader
);

Note

You can check whether a schema is already in the schema collection by using the Contains method. The Contains method can take either an XmlSchema object or a string representing the namespace URI associated with the schema. The former approach works only for XSD schemas. The latter covers both XSD and XDR schemas.


Different Treatments for XSD and XDR

Although you can store both XSD and XDR schemas in the schema collection, there are some differences in the way in which the XmlSchemaCollection object handles them internally. For example, the Add method returns an XmlSchema object if you add an XSD schema but returns null if the added schema is an XDR. In general, any method or property that manipulates the input or output of an XmlSchema object supports XSD schemas only.

Another difference concerns the behavior of the Item property in the XmlSchemaCollection class. The Item property takes a string representing the schema’s namespace URI and returns the corresponding XmlSchema object. This happens only for XSDs, however. If you call the Item property on a namespace URI that corresponds to an XDR schema, null is returned.

The reason behind the different treatments for XDR and XSD schemas is that XDR schemas have no object model available in the .NET Framework, so when you need to handle them through objects, the system gracefully ignores the requests.

XDR schemas are there only to preserve backward compatibility; you will not find them supported outside the Microsoft Win32 platform. It is important to pay attention to the methods and the properties you use to manage XDR in your code. The overall programming interface makes the effort to unify the methods and the properties to work on both XDRs and XSDs. But in some circumstances, those same methods and properties might lead to unpleasant surprises.

In a nutshell, you can cache an XDR schema for further and repeated use by the XmlValidatingReader class, but that’s all that you can do. You can’t check for the existence of XDR schemas, nor can a reference to an XDR schema be returned. But you can do this, and more, for XSDs.

Important

The XmlSchemaCollection object is important to improving the overall performance of the validation process. If you are validating more than one document against the same schema (XDR or XSD), preload the schema in the reader’s internal cache, represented by the Schemas property. While doing so, bear in mind that any insertion in the schema collection must be done prior to starting the validation process. You can add to the schema collection only when the reader’s state is set to Initial.


Validating XML Fragments

As mentioned, the XmlValidatingReader class has the ability to parse and validate entire documents as well as XML fragments. To parse an XML fragment, you must resort to one of the other two constructors that the XmlValidating­Reader class kindly provides, as shown here:

public XmlValidatingReader(Stream, XmlNodeType, XmlParserContext);
public XmlValidatingReader(string, XmlNodeType, XmlParserContext);

These constructors allow you to read XML fragments from a stream or a memory string and process them within the boundaries of a given parser context.

To bypass the root level rule for well-formed XML documents, you explicitly indicate what type of node the fragment happens to be. The node types for XML fragments are listed in Table 3-5.

Table 3-5. XML Fragment Node Types
Type Fragment Contents
Attribute The value of an attribute, including entities.
Document An entire XML document in which all the rules of well-formedness apply, including the root level rules.
Element Any valid element contents, including a combination of elements, comments, processing instructions, CDATA, and text. Root level rules are not enforced.

If you use any other element from the XmlNodeType enumeration, an exception is thrown. Entity references that are found in the element or the attribute body are expanded according to the value of the EntityHandling property.

When parsing a small XML fragment, you might need to take in extra information that can be used to resolve entities and add default attributes. For this purpose, you use the XmlParserContext class. (See Chapter 2 for more information about the XmlParserContext class.) The XmlParserContext argument of the XmlTextReader constructor is required if the requested validation mode is DTD or Auto. In this case, in fact, the parser context is expected to contain the reference to the DTD file against which the validation must be done. An exception is thrown if the ValidationType property is set to DTD and the XmlParserContext argument does not contain any DTD properties.

For all other validation types, the XmlParserContext argument can be specified without any DTD properties. Any schemas (XSDs or XDRs) used to validate the XML fragment must be referenced directly inside the XML fragment. When the validation is against schemas, the XmlParserContext argument is used primarily to provide information about namespace resolution.

Important

As mentioned, the XmlValidatingReader always works on top of an XML text reader and uses it to move around the nodes to validate. When you validate an XML fragment, however, you are not required to indicate a reader. So does the validating reader support a dual internal architecture to handle both cases? The fact that you don’t have to pass an XML text reader to validate an XML fragment does not mean that a text reader can’t be playing around in your code. Internally, both fragment-based constructors create a temporary text reader as their first task. The following pseudocode shows what happens:

XmlTextReader coreReader = new XmlTextReader(xml, type, context);
this = new XmlValidatingReader(coreReader);


At this point, the internal mechanisms of an XML validating reader and its programming interface should be clear. In the remainder of this chapter, we’ll examine in more detail the three key types of validation—DTD, XDR, and XSD.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.37.56