Parsing XML Fragments

The XmlTextReader class provides the basic set of functionalities to process any XML data coming from a disk file, a stream, or a URL. This kind of reader works sequentially, reading one node after the next, and does not deliberately provide any ad hoc search function to parse only a particular subtree.

In the .NET Framework, to process only fragments of XML data, excerpted from a variety of sources, you can take one of two routes. You can initialize the text reader with the XML string that represents the fragment, or you can use another, more specific, reader class—the XmlNodeReader class.

The XmlNodeReader class works on the subtree rooted in the XmlNode object passed to the class constructor. A living instance of an XmlNode object is not something you can obtain through a text reader, however. Only the .NET XML DOM parser can create and return an XmlNode object. We’ll examine the details of the XmlNodeReader class in Chapter 5, along with the .NET XML DOM parser.

If you have ever used Microsoft XML Core Services (MSXML)—the Microsoft COM XML parser—you have certainly noticed that it allows you to initialize the parser from a well-formed XML string. However, the long list of constructors that the XmlTextReader class boasts gives no clear indication that that same MSXML feature is also supplied by the .NET Framework. In this section, you’ll learn how to parse XML data stored in a memory string. First I’ll show you how to work with plain strings with no context information, and then I’ll show you how to process XML fragments using specific context information for the parser, such as namespaces and document type declarations.

Parsing Well-Formed XML Strings

The trick to initializing a text reader from a string is all in packing the string into a StringReader object. One of the XmlTextReader constructors looks like this:

public XmlTextReader(TextReader);

TextReader is an abstract class that represents a .NET reader object capable of reading a sequence of characters no matter where they are physically stored. The StringReader class inherits from TextReader and simply makes itself capable of reading the bytes of an in-memory string. Because StringReader derives from TextReader, you can safely use it to initialize XmlTextReader.

string xmlText = "…";
StringReader strReader = new StringReader(xmlText);
XmlTextReader reader = new XmlTextReader(strReader);

The net effect of this code snippet is that the XML code stored in the xmlText variable is parsed as it is read from a disk file or an open stream or downloaded from a URL.

Important

Any class based on TextReader is inherently not thread-safe. Among other things, this means that the string object you are using to contain parsable XML data might be concurrently accessed from other threads. Of course, this happens only under special conditions, but it is definitely a plausible scenario. If you have a multithreaded application and the string itself happens to be globally visible throughout the application, one thread could break the well-formedness of the string while another thread is parsing it. To avoid this situation, create a thread-safe wrapper for the StringReader class using the TextReader class’s static member Synchronized, as shown here:

String xmlText = "…";
StringReader sr = new StringReader(xmlText);
XmlTextReader reader = new XmlTextReader(sr);
TextReader strReader = TextReader.Synchronized(sr);

For performance reasons, you should use the thread-safe wrapper class only when strictly necessary. Even better, wherever possible, you should design your code to avoid the need for thread-safe classes.


Fragments and Parser Context

The context for an XML parser consists of all the information that can be used to customize the way in which the parser works. Context information includes the encoding character set, the DTD information needed to set all the default attributes and to expand entities, the namespaces, the language, and the white space handling.

If you specify the XML fragment using a StringReader object, as shown in the previous section, all elements of the parser context are set with default values. The parser context is fully defined by the XmlParserContext class. When instantiating an XmlTextReader class to operate on a string, you use the following constructor and specify a parser context:

public XmlTextReader(
    string xmlFragment, 
    XmlNodeType fragType,
    XmlParserContext context
);

The xmlFragment parameter contains the XML string to parse. The fragType argument, on the other hand, represents the type of fragment. It specifies the type of the node at the root of the fragment. Only Element, Attribute, and Document nodes are permitted.

The XmlParserContext constructor has a few overloads. The one with the shortest list of arguments, shown here, is probably the overload you will use most often:

public XmlParserContext(
    XmlNameTable nt,
    XmlNamespaceManager nsMgr,
    string xmlLang,
    XmlSpace xmlSpace
);

Creating a new parser context is as easy as running the following statements:

NameTable table = new NameTable();
table.Add("Author");
XmlNamespaceManager mgr = new XmlNamespaceManager(table);
mgr.AddNamespace("company", "urn:ThisIsMyBook");
XmlParserContext context;
context = new XmlParserContext(table, mgr, "en-US", XmlSpace.None);

The first parameter to this XmlParserContext constructor is a NameTable object. The name table is used to look up prefixes and namespaces as atomized strings. For performance reasons, you also need to pass a NameTable object—which inherits from the abstract XmlNameTable class—when creating a new instance of a namespace manager class.

Note

If the namespace manager and the parser context happen to use different NameTable objects, the XmlParserContext might not be able to recognize the namespaces brought in by the manager, resulting in an XML exception.


The second parameter to the XmlParserContext constructor is an XmlNamespaceManager object. The XmlNamespaceManager class is a type of collection class designed to contain and manage namespace information. It provides methods to add, remove, and search for namespaces. Namespaces are stored with their prefix and URN, which are passed to it through the AddNamespace method. If the prefix is an empty string, the namespace is considered to be the default.

The XmlParserContext class makes use of a namespace manager to collect all the namespaces that the fragment might use. A fragment is simply a small piece of XML code and, as such, is not expected to contain all namespace definitions that its nodes and attributes might use.

When a namespace manager is created, the class constructor automatically adds a couple of frequently used prefixes. These prefixes are listed in Table 2-5.

Table 2-5. Standard Namespace Prefixes Added to XmlNamespaceManager
Prefix Corresponding Namespace
xmlns http://www.w3.org/2000/xmlns
xml http://www.w3.org/1998/namespace

A third namespace prefix that is allowed is the empty string, which of course has no corresponding namespace URN. Thanks to this contrivance, you don’t need to create a namespace manager instance to parse XML fragments unless nodes and attributes really contain custom namespaces. Added namespaces are not verified as conforming to the W3C Namespaces specification and are discarded if they do not conform.

As mentioned in the section “The NameTable Object,” on page 49, the namespace names are atomized and placed in the related NameTable object as soon as they are added to the collection. When you call the XML reader’s Lookup­Namespace method to search for the namespace that matches the specified prefix, the prefix string is atomized and added to the name table for additional, faster use.

Any namespace declaration has a clear and well-defined scope. The namespace declaration can appear anywhere in the document, not just at the very beginning of it. The place in the source where the declaration appears determines the scope. A namespace controls all the XML elements rooted in the node in which it appears. In the following example, the namespace is applied to the node <author> and all of its descendants:

<some_parent_node>
   ⋮
<author xmlns:dinoe="http://www.dinoe.com">
<firstname>Dino</firstname>
<lastname>Esposito</lastname>
<royalty>99</royalty>
</author>
   ⋮
</some_parent_node>

The namespace defined for the <author> element does not apply to elements outside that element. The namespace is effective from its point of declaration until the end of the element. After that, any other node not qualified with a namespace prefix is assumed to belong to whichever default namespace has been declared in the document.

You can specify other settings for the parser context using the properties of the XmlParserContext class, including Encoding, BaseURI, and DocTypeName. In particular, BaseURI is especially useful because it indicates the location from which the fragment was loaded.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.160.63