Parsing with the .NET XmlReader Classes

Microsoft supplies three parsers that implement the XmlReader interface (see Figure 17.1):

  • XmlTextReader— The simplest and most straightforward XML parser in .NET XmlTextReader is a one-pass, forward-only parser. XmlTextReader doesn't support validation, cannot expand general entities, and default attributes aren't made available. However, although it has several downsides, the XmlTextReader is an extremely fast and efficient parser.

  • XmlValidatingReader— The XmlValidatingReader uses a parser like the XmlTextReader to add several extended features. First, the XmlValidatingReader validates the document against a Document Type Definition or XML Schema while the document is being parsed. In addition, it adds support for the expansion of general entities and attaches the default attributes specified in the Document Type Definition or Schema.

  • XmlNodeReader— The XmlNodeReader parser is a fundamentally different parser in target usage than the standard XmlTextReader parser. The XmlNodeReader does not parse text documents into their equivalent XML form. Instead, it is used to read from XML documents that have already been parsed into a W3C Document Object Model (DOM) tree.

Figure 17.1. The Microsoft .NET XmlReader parsers.


Parsing with XmlTextReader

The simplest method of parsing XML with .NET is using the XmlTextReader class. This section provides an example of the use of XmlTextReader to accomplish basic parsing tasks. XmlTextReader is similar in functionality, though not in interface, to SAX, which is discussed in Chapter 15, “Parsing XML Based on Events.” In this section, we will develop a simple application that reads all the nodes in the document, informing the user of all the elements in the documents and printing any text contained within them.

In order to parse XML documents with the .NET XmlTextReader class, we will first need to generate an instance of the XmlTextReader. This is done simply by passing the name of the document to be parsed—in this case passing "example.xml" to the XmlTextReader constructor:

XmlTextReader xmlTextReader = new XmlTextReader("example.xml"); 

Here, we are parsing a local file, but the XmlTextReader is extremely flexible. It can be passed in almost any type of W3C Universal Resource Identifier (URI). For example, if you wanted to parse a document on the Web located at “http://www.mylocalcompany.com/payroll.xml", you could instead use the following declaration:

XmlTextReader xmlTextReader = new XmlTextReader("http://www.mylocalcompany.com/payroll.xml"); 

The XmlTextReader is a one-pass parser that moves forward through the supplied XML document node by node. Unlike SAX, which is event based, the XmlTextReader waits until the program asks for the next node to provide it. This has the advantage of letting the programmer decide when to parse the next section of the document because the programmer controls the flow of processing rather than responding to it.

The method called to inform the XmlTextReader to read the next node in the XML Document is the Read method of the XmlTextReader object.

xmlTextReader.Read() 

For our example program, we'll want to read every node in the document, so the Read method call is placed in a while loop:

while ( xmlTextReader.Read() ) 

It's important to note that the Read method of the XmlTextReader object does not actually return any data. It's easiest to think of the XML parser in this case as an assembly line: The Read method merely moves the assembly line forward one step.

However, the XmlTextReader does make it extremely easy to get access to the currently parsed XML data. Whenever the Read method is called, the currently parsed XML data is placed into the XmlTextReader object that is in use, and that data is made available via the various properties of the XmlTextReader object (see Table 17.1).

Table 17.1. XmlTextReader Properties
AttributeCount The number of attributes the current node contains
BaseURI The base URI of the current node
CanResolveEntity A boolean value indicating whether this reader can parse and resolve entities
Depth The depth of the current node in the XML document
Encoding The encoding attribute for this document
EOF A value indicating whether XmlReader is positioned at the end of the stream
HasAttributes A Boolean indicating whether the current node has any attributes
HasValue A Boolean indicating whether the node can have a Value
IsDefault A Boolean indicating if the current value was generated automatically by the DTD or schema
IsEmptyElement A Boolean indicating whether the current element is an empty element (for example, <element/>)
Item The value of the element's attribute with the specified index
LineNumber The line number where the reader is currently
LinePosition The line position where the reader is currently
LocalName The current node's local name
Name The current node's qualified name
Namespaces A boolean value indicating whether namespaces are supported
NamespaceURI The namespace URI (as defined by the W3C Namespace Specification) of the current node
NameTable The XmlNameTable associated with this XmlTextReader
NodeType The node type of the current node
Normalization A boolean value indicating whether whitespace is normalized
Prefix The current node's namespace prefix
QuoteChar The type of quotation mark used to enclose the attribute node (if the current node is an attribute node)
ReadState The current reader state
Value The value of the current node as text
WhitespaceHandling A property that specifies the manner in which whitespace is handled
XmlLang The current scope according to the xml:lang specification
XmlResolver A property that allows the XmlResolver used for DTD references to be modified
XmlSpace The current scope according to the xml:space specification

For this example, it will be necessary to keep track of when an element is started, when an element is ended, and when a text node is available. In each of these cases, a process for determining the current type of the node is needed. The type of node that has just been parsed is stored in the NodeType property of the XmlTextReader. The NodeType property contains a value that matches the possible values of the XmlNodeType enumeration, which can have the values listed in Table 17.2.

Table 17.2. Possible XmlNodeType Values
Attribute An XML attribute
CDATA A CDATA section
Comment A comment
Document A Document object, which serves as the root of the entire XML Document
DocumentFragment An association to a node or a subtree of another document
DocumentType A Document Type Declaration
Element An XML element. Specifically, the start of an XML element
EndElement The end of an XML element
Entity An Entity Declaration
EntityReference An Entity Reference
None The value of the NodeType property of XmlReader before the Read() method has been called to parse a document
Notation A Document Type Declaration notation
ProcessingInstruction An XML processing instruction
SignificantWhiteSpace Whitespace that is placed between nodes if there is a mixed content model
Text The text contained within an element
Whitespace Insignificant whitespace placed between nodes
XMLDeclaration An XML declaration node

Now that we're able to parse the documents into XML nodes and determine the type of each node, we're ready to print out the information desired for each node. The first type of node is the Element. Keep in mind that an element node doesn't contain the entire element; it is informing the program that the start of an element has been encountered (much like the SAX StartElement event handler). Specifically, the children of the current element have not yet been parsed and they will be encountered after further parsing. The name of the XML element that has been parsed is placed in the Name property of the XmlTextReader.

  if (xmlTextReader.NodeType == XmlNodeType.Element) 
  {
    // Signal the start of the element
    Console.WriteLine("Start Element: " +  xmlTextReader.Name);
  }

The next type of node that the program needs to be watching for is the Text node. Whenever a text node is encountered, its contents are placed in the Value property of the XmlTextReader. As part of the example application, the text of a node needs to be printed, so the value of the Text node is printed out.

else if (xmlTextReader.NodeType == XmlNodeType.Text) 
{
  Console.WriteLine(xmlTextReader.Value);
}

Finally, whenever the end of an element is reached, an EndElement node is placed in the XmlTextReader. The name of the XML element that is ending is placed in the Name property of the XmlTextReader.

else if (xmlTextReader.NodeType == XmlNodeType.EndElement) 
{
  // Signal the end of the element
  Console.WriteLine("End Element: " + xmlTextReader.Name);
}

As it turns out, using the XmlTextReader interface is actually quite similar to using SAX. The main difference is that instead of the parser determining when the next node is to be parsed and then notifying the application, the application has direct control over the parsing of the next node. The full source code to this example is shown in Listing 17.1 (with the Visual Basic .NET version in Listing 17.2), and the output from the example is shown in Listing 17.3.

Listing 17.1. C# XmlTextReader Example
using System;
using System.Xml;
using System.Text;
class SimpleXmlTextReader
{
  static void Main(string[] args)
  {
    try
    {
      //Create an instance of the XMLTextReader.
      XmlTextReader xmlTextReader = new XmlTextReader("example.xml");
      // This method reads the XML file and generates the output
      Console.WriteLine("Start of Document");
      while ( xmlTextReader.Read() )
      {
        // Process a start of element node.
        if (xmlTextReader.NodeType == XmlNodeType.Element)
        {
          // Signal the start of the element
          Console.WriteLine("Start Element: " +  xmlTextReader.Name);
        }
        // Process a text node.
        else if (xmlTextReader.NodeType == XmlNodeType.Text)
        {
          //Add the text data to the output.
          Console.WriteLine(xmlTextReader.Value);
        }
        //Process an end of element node.
        else if (xmlTextReader.NodeType == XmlNodeType.EndElement)
        {
          // Signal the end of the element
          Console.WriteLine("End Element: " + xmlTextReader.Name);
        }
      } // End while loop
      xmlTextReader.Close();
    }
    catch (XmlException ex)
    {
      Console.WriteLine("An XML exception occurred: " + ex.ToString());
    }
    catch (Exception ex)
    {
      Console.WriteLine("A general exception occurred: " + ex.ToString());
    }

    Console.WriteLine("End of Document");
  }
} //End SimpleXmlTextReader
						

Listing 17.2. Visual Basic .NET XmlTextReader Example
Imports System
Imports System.Xml
Imports System.Text
Module SimpleXmlTextReaderVB
  Sub Main()
    Try
      'Create an instance of the XMLTextReader.
      Dim xmlTextReader As XmlTextReader
      xmlTextReader = New XmlTextReader("example.xml")
      Console.WriteLine("Start of Document")
      ' Continually read the next element that is available in the parsed
      ' document until there are no more.
      Do While xmlTextReader.Read()
        ' Process a start of element node.
        If xmlTextReader.NodeType = XmlNodeType.Element Then
          ' Signal the start of the element
          Console.WriteLine("Start Element: " + xmlTextReader.Name)
          ' Process a text node.
        ElseIf xmlTextReader.NodeType = XmlNodeType.Text Then
          'Add the text data to the output.
          Console.WriteLine(xmlTextReader.Value)
          'Process an end of element node.
        ElseIf xmlTextReader.NodeType = XmlNodeType.EndElement Then
          ' Signal the end of the element
          Console.WriteLine("End Element: " + xmlTextReader.Name)
        End If
      Loop
      xmlTextReader.Close()
      Catch ex As XmlException
        Console.WriteLine("An XML exception occurred: " + ex.ToString())
      Catch ex As Exception
        Console.WriteLine("A general exception occurred: " + ex.ToString())
    End Try
    Console.WriteLine("Endt of Document")
  End Sub
End Module
						

Listing 17.3. Sample Java SAX Application Output
Start of Document
Start Element: net_xml
Start Element: standards
Start Element: standard
.NET Standards Documents
End Element: standard
Start Element: standard
Document Object Model(DOM)
End Element: standard
Start Element: standard
Simple API for XML(SAX)
End Element: standard
End Element: standards
Start Element: parsers
Start Element: parser
XmlTextReader
End Element: parser
Start Element: parser
XmlNodeReader
End Element: parser
Start Element: parser
XmlValidatingReader
End Element: parser
End Element: parsers
End Element: net_xml
End of Document
						

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.16.23