The XML DOM Programming Interface

The central element in the .NET XML DOM implementation is the XmlDocument class. The XmlDocument class represents an XML document and makes it programmable by exposing its nodes and attributes through ad hoc collections. Let’s consider a simple XML document:

<MyDataSet>
    <NorthwindEmployees count="3">
        <Employee>
            <employeeid>1</employeeid>
            <firstname>Nancy</firstname>
            <lastname>Davolio</lastname>
        </Employee>
        <Employee>
            <employeeid>2</employeeid>
            <firstname>Andrew</firstname>
            <lastname>Fuller</lastname>
        <Employee>
            <employeeid>3</employeeid>
            <firstname>Janet</firstname>
            <lastname>Leverling</lastname>
        </Employee>
    </NorthwindEmployees>
</MyDataSet>

When processed by an instance of the XmlDocument class, this file creates a tree like the one shown in Figure 5-1.

Figure 5-1. Graphical representation of an XML DOM tree.


The XmlDocument class represents the entry point in the binary structure and the central console that lets you move through nodes, reading and writing contents. Each element in the original XML document is mapped to a particular .NET Framework class with its own set of properties and methods. Each element can be reached from the parent and can access all of its children and siblings. Element-specific information such as contents and attributes are available via properties.

Any change you enter is applied immediately, but only in memory. The XmlDocument class does provide an I/O interface to load from, and save to, a variety of storage media, including disk files. Subsequently, all the changes to constituent elements of an XML DOM tree are normally persisted all at once.

Note

The W3C DOM Level 1 Core and Level 2 Core do not yet mandate an official API for serializing documents to and from XML format. Such an API will come only with the DOM Level 3 specification, which at this time is only a working draft.


Before we look at the key tasks you might want to accomplish using the XML DOM programming interface, let’s review the tools that this interface provides. In particular, we’ll focus here on two major classes—the XmlDocument class and the XmlNode class. A third class, XmlDataDocument, that is tightly coupled with XML DOM in general, and XmlDocument in particular, will be covered in Chapter 8. XmlDataDocument represents the connecting link between the hierarchical world of XML and the relational world of ADO.NET DataSet objects.

The XmlDocument Class

When you need to load an XML document into memory for full-access processing, you start by creating a new instance of the XmlDocument class. The class features two public constructors, one of which is the default parameterless constructor, as shown here:

public XmlDocument();
public XmlDocument(XmlNameTable);

While initializing the XmlDocument class, you can also specify an existing XmlNameTable object to help the class work faster with attribute and node names and optimize memory management. Just as the XmlReader class does, XmlDocument builds its own name table incrementally while processing the document. However, passing a precompiled name table can only speed up the overall execution. The following code snippet demonstrates how to load an XML document into a living instance of the XmlDocument class:

XmlDocument doc = new XmlDocument();
doc.Load(fileName);

The Load method always work synchronously, so when it returns, the document has been completely (and successfully, we hope) mapped to memory and is ready for further processing through the properties and methods exposed by the class. As you’ll see in a bit more detail later in this section, the XmlDocument class uses an XML reader internally to perform any read operation and to build the final tree structure for the source document.

Note

In spite of what the beginning of this chapter might suggest, the XmlDocument class is just the logical root class of the XML DOM class hierarchy. The XmlDocument class actually inherits from the XmlNode class and is placed at the same level as classes like XmlElement, XmlAttribute, and XmlEntity that you manipulate as child elements when processing an XML document. In other words, XmlDocument is not designed as a wrapper class for XML node classes. Its design follows the XML key guideline, according to which everything in a document is a node, including the document itself.


Properties of the XmlDocument Class

Table 5-1 lists the properties supported by the XmlDocument class. The table includes only the properties that the class introduces or overrides. These properties are specific to the XmlDocument class or have a class-specific implementation. More properties are available through the base class Xml­Node, which we’ll examine in more detail in the section “The XmlNode Base Class,” on page 213.

Note

In Table 5-1, you’ll find the description of the property for a special type of XML node—the XmlNodeType.Document node. In some instances, this same property is shared with other nodes, in which case it behaves in a slightly different manner. So read this table with a grain of salt and replace the word document with the more generic word node when appropriate. For example, the OwnerDocument property returns null if the node is Document but returns the owner XmlDocument object in all other cases. Similarly, both Name and LocalName always return #document for XmlDocument, but they actually represent the qualified and simple (namespace-less) name of the particular node.


Table 5-1. Properties of the XmlDocument Class
Property Description
BaseURI Gets the base URI of the document (for example, the file path).
DocumentElement Gets the root of the document as an XmlElement object.
DocumentType Gets the node with the DOCTYPE declaration (if any).
Implementation Gets the XmlImplementation object for the document.
InnerXml Gets or sets the markup representing the body of the document.
IsReadOnly Indicates whether the document is read-only.
LocalName Returns the string #document.
Name Returns the string #document.
NameTable Gets the NameTable object associated with this implementation of the XmlDocument class.
NodeType Returns the value XmlNodeType.Document.
OwnerDocument Returns null. The XmlDocument object is not owned.
PreserveWhitespace Gets or sets a Boolean value indicating whether to preserve white space during the load and save process. Set to false by default.
XmlResolver Write-only property that specifies the XmlResolver object to use for resolving external resources. Set to null by default.

By default, the PreserveWhitespace property is set to false, which indicates that only significant white spaces will be preserved while the document is loaded. A significant white space is any white space found between markup in a mixed-contents node or any white space found within the subtree affected by the following declaration:

xml:space="preserve"

All spaces are preserved throughout the document if PreserveWhitespace is set to true before the Load method is called. As for writing, if PreserveWhitespace is set to true when the Save method is called, all spaces are preserved in the output. Otherwise, the serialized output is automatically indented. This behavior represents a proprietary extension over the standard DOM specification.

The XmlDocument Implementation

The Implementation property of the XmlDocument class defines the operating context for the document object. Implementation returns an instance of the XmlImplementation class, which provides methods for performing operations that are independent of any particular instance of the DOM.

In the base implementation of the XmlImplementation class, the list of operations that various instances of XmlDocument classes can share is relatively short. These operations include creating new documents, testing for supported features, and more important, sharing the same name table.

The XmlImplementation class is not sealed, so you could try to define a custom implementation object and use that to create new XmlDocument objects with some nonstandard settings (for example, PreserveWhitespace set to true by default). The following code snippet shows how to create two documents from the same implementation:

XmlImplementation imp = new XmlImplementation();
XmlDocument doc1 = imp.CreateDocument();
XmlDocument doc2 = imp.CreateDocument();

The following code shows how XmlImplementation could work with a custom implementation object:

MyImplementation imp = new MyImplementation();
XmlDocument doc = imp.CreateDocument();

In the section “Custom Node Classes,” on page 234, when we examine XML DOM extensions, I’ll have more to say about custom implementations.

Note

Two instances of XmlDocument can share the same implementation when the implementation is custom. Actually, all instances of XmlDocument share the same standard XmlImplementation object. Sharing the same implementation does not mean that the two objects are each other’s clone, however. The XML implementation is a kind of common runtime that services both objects.


Methods of the XmlDocument Class

Table 5-2 lists the methods supported by the XmlDocument class. The list includes only the methods that XmlDocument introduces or overrides; more methods are available through the base class XmlNode. (See the section “The XmlNode Base Class,” on page 213.)

Table 5-2. Methods of the XmlDocument Class
Method Description
CloneNode Creates a duplicate of the document.
CreateAttribute Creates an attribute with the specified name.
CreateCDataSection Creates a CDATA section with the specified data.
CreateComment Creates a comment with the specified text.
CreateDocumentFragment Creates an XML fragment. Note that a fragment node can’t be inserted into a document; however, you can insert any of its children into a document.
CreateDocumentType Creates a DOCTYPE element.
CreateElement Creates a node element.
CreateEntityReference Creates an entity reference with the specified name.
CreateNode Creates a node of the specified type.
CreateProcessingInstruction Creates a processing instruction.
CreateSignificantWhitespace Creates a significant white space node.
CreateTextNode Creates a text node. Note that text nodes are allowed only as children of elements, attributes, and entities.
CreateWhitespace Creates a white space node.
CreateXmlDeclaration Creates the standard XML declaration.
GetElementById Gets the element in the document with the given ID.
GetElementsByTagName Returns the list of child nodes that match the specified tag name.
ImportNode Imports a node from another document.
Load Loads XML data from the specified source.
LoadXml Loads XML data from the specified string.
ReadNode Creates an XmlNode object based on the information read from the given XML reader.
Save Saves the current document to the specified location.
WriteContentTo Saves all the children of the current document to the specified XmlWriter object.
WriteTo Saves the current document to the specified writer.

As you can see, the XmlDocument class has a lot of methods that create and return instances of node objects. In the .NET Framework, all the objects that represent a node type (Comment, Element, Attribute, and so on) do not have any publicly usable constructors. For this reason, you must resort to the corresponding method.

How can the XmlDocument class create and return instances of other node objects if no public constructor for them is available? The trick is that node classes mark their constructors with the internal modifier (Friend in Microsoft Visual Basic). The internal keyword restricts the default visibility of a type method or property to the boundaries of the assembly. The internal keyword works on top of other modifiers like public and protected. XmlDocument and other node classes are all defined in the System.Xml assembly, which ensures the effective working of factory methods. The following pseudocode shows the internal architecture of a factory method:

public virtual XmlXXX CreateXXX( params ) 
{
    return new XmlXXX ( params );
}

Note

When the node class is XmlDocument, the methods WriteTo and WriteContentTo happen to produce the same output, although they definitely run different code. WriteTo is designed to persist the entire contents of the node, including the markup for the node, attributes, and children. WriteContentTo, on the other hand, walks its way through the collection of child nodes and persists the contents of each using WriteTo. Here’s the pseudocode:

void WriteContentTo(XmlWriter w) {
    foreach(XmlNode n in this)
            n.WriteTo(w);
}

A Document node is a kind of super root node, so the loop on all child nodes begins with the actual root node of the XML document. In this case, WriteTo simply writes out the entire contents of the document but the super root node has no markup. As a result, the two methods produce the same output for the XmlDocument class.


Events of the XmlDocument Class

Table 5-3 lists the events that the XmlDocument class fires under the following specific conditions: when the value of a node (any node) is being edited, and when a node is being inserted into or removed from the document.

Table 5-3. Events of the XmlDocument Class
Events Description
NodeChanging, NodeChanged The Value property of a node belonging to this document is about to be changed or has been changed already.
NodeInserting, NodeInserted A node is about to be inserted into another node in this document or has been inserted already. The event fires whether you are inserting a new node, duplicating an existing node, or importing a node from another document.
NodeRemoving, NodeRemoved A node belonging to this document is about to be removed from the document or has been removed from its parent already.

All these events require the same delegate for the event handler, as follows:

public delegate void XmlNodeChangedEventHandler(
    object sender,
    XmlNodeChangedEventArgs e
);

The XmlNodeChangedEventArgs structure contains the event data. The structure has four interesting fields:

  • Action

    Contains a value indicating what type of change is occurring on the node. Allowable values, listed in the XmlNodeChanged­Action enumeration type, are Insert, Remove, and Change.

  • NewParent

    Returns an XmlNode object representing the new parent of the node once the operation is complete. The property will be set to null if the node is being removed. If the node is an attribute, the property returns the node to which the attribute refers.

  • Node

    Returns an XmlNode object that denotes the node that is being added, removed, or changed. Can’t be set to null.

  • OldParent

    Returns an XmlNode object representing the parent of the node before the operation began. Returns null if the node has no parent—for example, when you add a new node.

Some of the actions you can take on an XML DOM are compound actions consisting of several steps, each of which could raise its own event. For example, be prepared to handle several events when you set the InnerXml property. In this case, multiple nodes could be created and appended, resulting in as many NodeInserting/NodeInserted pairs. In some cases, the XmlNode class’s AppendChild method might fire a pair of NodeRemoving/NodeRemoved events prior to actually proceeding with the insertion. By design, to ensure XML well-formedness, AppendChild checks whether the node you are adding already exists in the document. If it does, the existing node is first removed to avoid identical nodes in the same subtree.

The XmlNode Base Class

When you work with XML DOM parsers, you mainly use the XmlDocument class. The XmlDocument class, however, derives from a base class, XmlNode, which provides all the core functions to navigate and create nodes.

XmlNode is the abstract parent class of a handful of node-related classes that are available in the .NET Framework. Figure 5-2 shows the hierarchy of node classes.

Figure 5-2. Graphical representation of the hierarchy of node classes and their relationships in the .NET Framework.


Both XmlLinkedNode and XmlCharacterData are abstract classes that provide basic functionality for more specialized types of nodes. Linked nodes are nodes that you might find as constituent elements of an XML document just linked to a preceding or a following node. Character data nodes, on the other hand, are nodes that contain and manipulate only text.

Properties of the XmlNode Class

Table 5-4 lists the properties of the XmlNode class that derived classes can override if necessary. For example, not all node types support attributes and not all have child nodes or siblings. For situations such as this, the overridden properties can simply return null or the empty string. By design, all node types must provide a concrete implementation for each property.

Table 5-4. Properties of the XmlNode Class 
Property Description
Attributes Returns a collection containing the attributes of the current node. The collection is of type XmlAttributeCollection.
BaseURI Gets the base URI of the current node.
ChildNodes Returns an enumerable list object that allows you to access all the children of the current node. The object returned derives from the base class XmlNodeList, which is a linked list connecting all the nodes with the same parent and the same depth level (siblings). No information is cached (not even the objects count), and any changes to the nodes are detected in real time.
FirstChild Returns the first child of the current node or null. The order of child nodes reflects the order in which they have been added. In turn, the insertion order reflects the visiting algorithm implemented by the reader. (See Chapter 2.)
HasChildNodes Indicates whether the current node has children.
InnerText Gets or sets the text of the current node and all its children. Setting this property replaces all the children with the contents of the given string. If the string contains markup, the text will be escaped first.
InnerXml Gets or sets the markup representing the body of the current node. The contents of the node is replaced with the contents of the given string. Any markup text will be parsed and resulting nodes inserted.
IsReadOnly Indicates whether the current node is read-only.
Item Indexer property that gets the child element node with the specified (qualified) name.
LastChild Gets the last child of the current node. Again, which node is the last one depends ultimately on the visiting algorithm implemented by the reader. Normally, it is the last child node in the source document.
LocalName Returns the name of the node, minus the namespace.
Name Returns the fully qualified name of the node.
NamespaceURI Gets the namespace URI of the current node.
NextSibling Gets the node immediately following the current node. Siblings are nodes with the same parent and the same depth.
NodeType Returns the type of the current node as a value taken from the XmlNodeType enumeration.
OuterXml Gets the markup code representing the current node and all of its children. Unlike InnerXml, OuterXml also includes the node itself in the markup with all of its attributes. InnerXml, on the other hand, returns only the markup found below the node, including text.
OwnerDocument Gets the XmlDocument object to which the current node belongs.
ParentNode Gets the parent of the current node (if any).
Prefix Gets or sets the namespace prefix of the current node.
PreviousSibling Gets the node immediately preceding the current node.
Value Gets or sets the value of the current node.

The collection of child nodes is implemented as a linked list. The Child­Nodes property returns an internal object of type XmlChildNodes. (The object is not documented, but you can easily verify this claim by simply checking the type of the object that ChildNodes returns.) You don’t need to use this object directly, however. Suffice to say that it merely represents a concrete implementation of the XmlNodeList class, whose methods are, for the most part, marked as abstract. In particular, XmlChildNodes implements the Item and Count properties and the GetEnumerator method.

XmlChildNodes is not a true collection and does not cache any information. When you access the Count property, for example, it scrolls the entire list, counting the number of nodes on the fly. When you ask for a particular node through the Item property, the list is scanned from the beginning until a matching node is found. To move through the list, the XmlChildNodes class relies on the node’s NextSibling method. But which class actually implements the NextSibling method? Both NextSibling and PreviousSibling are defined in the XmlLinkedNode base class.

XmlLinkedNode stores an internal pointer to the next node in the list. The object referenced is simply what NextSibling returns. Figure 5-3 how things work.

Figure 5-3. The XmlLinkedNode class’s NextSibling method lets applications navigate through the children of each node.


Scrolling forward through the list of child nodes is fast and effective. The same can’t be said for backward scrolling. The list of nodes is not double-linked, and each node doesn’t also store a pointer to the previous one in the list. For this reason, PreviousSibling reaches the target node by walking through the list from the beginning to the node that precedes the current one.

Tip

To summarize, when you are processing XML subtrees, try to minimize calls to PreviousSibling, Item, and Count because they always walk through the entire collection of subnodes to get their expected output. Whenever possible, design your code to take advantage of forward-only movements and perform them using NextSibling.


Methods of the XmlNode Class

Table 5-5 lists the methods exposed by the XmlNode class.

Table 5-5. Methods of the XmlNode Class
Method Description
AppendChild Adds the specified node to the list of children of the current node. The node is inserted at the bottom of the list.
Clone Creates a duplicate of the current node. For element nodes, duplication includes child nodes and attributes.
CloneNode Creates a duplicate of the current node. Takes a Boolean argument indicating whether cloning should proceed recursively. If this argument is true, calling the CloneNode method is equivalent to calling Clone. Entity and notation nodes can’t be cloned.
GetEnumerator Returns an internal and node-specific object that implements the IEnumerator interface. The returned object provides the support needed to arrange for-each iterations.
GetNamespaceOfPrefix Returns the closest xmlns declaration for the given prefix.
GetPrefixOfNamespace Returns the closest xmlns declaration for the given namespace URI.
InsertAfter Inserts the specified node immediately after the specified node. If the node already exists, it is first removed. If the reference node is null, the insertion occurs at the beginning of the list.
InsertBefore Inserts the specified node immediately before the specified reference node. If the node already exists, it is first removed. If the reference node is null, the insertion occurs at the bottom of the list.
Normalize Ensures that there are no adjacent XmlText nodes by merging all adjacent text nodes into a single one according to a series of precedence rules.
PrependChild Adds the specified node to the beginning of the list of children of the current node.
RemoveAll Removes all the children of the current node, including attributes.
RemoveChild Removes the specified child node.
ReplaceChild Replaces the specified child node with a new one.
SelectNodes Returns a list (XmlNodeList) of all the nodes that match a given XPath expression.
SelectSingleNode Returns only the first node that matches the given XPath expression.
Supports Verifies whether the current XmlImplementation object supports a specific feature.
WriteContentTo Saves all the children of the current node to the specified XmlWriter object. Equivalent to InnerXml.
WriteTo Saves the entire current node to the specified writer. Equivalent to OuterXml.

To locate one or more nodes in an XML DOM object, you can use either the ChildNodes collection or the SelectNodes method. With the former technique, you are given access to the unfiltered collection of child nodes. Note that in this context, child nodes means all and only the sibling nodes located one level below the current node.

The SelectNodes (and the ancillary SelectSingleNode) method exploits the XPath query language to let you extract nodes based on logical conditions. In addition, XPath queries can go deeper than one level and even work on all descendants of a node. The .NET Framework XPath implementation is covered in Chapter 6. See the section “Further Reading,” on page 244, for resources providing detailed coverage of the XPath query language.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.201.71