Parsing with the .NET XmlReader Classes

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Parsing with the .NET `XmlReader` Classes

Microsoft supplies three parsers that implement the XmlReader interface (see Figure 17.1):

XmlTextReader— The simplest and most straightforward XML parser in .NET XmlTextReader is a one-pass, forward-only parser. XmlTextReader doesn't support validation, cannot expand general entities, and default attributes aren't made available. However, although it has several downsides, the XmlTextReader is an extremely fast and efficient parser.
XmlValidatingReader— The XmlValidatingReader uses a parser like the XmlTextReader to add several extended features. First , the XmlValidatingReader validates the document against a Document Type Definition or XML Schema while the document is being parsed. In addition, it adds support for the expansion of general entities and attaches the default attributes specified in the Document Type Definition or Schema.
XmlNodeReader— The XmlNodeReader parser is a fundamentally different parser in target usage than the standard XmlTextReader parser. The XmlNodeReader does not parse text documents into their equivalent XML form. Instead, it is used to read from XML documents that have already been parsed into a W3C Document Object Model (DOM) tree.

Figure 17.1. The Microsoft .NET `XmlReader` parsers.

Parsing with `XmlTextReader`

The simplest method of parsing XML with .NET is using the XmlTextReader class. This section provides an example of the use of XmlTextReader to accomplish basic parsing tasks. XmlTextReader is similar in functionality, though not in interface, to SAX, which is discussed in Chapter 15, “Parsing XML Based on Events.” In this section, we will develop a simple application that reads all the nodes in the document, informing the user of all the elements in the documents and printing any text contained within them.

In order to parse XML documents with the .NET XmlTextReader class, we will first need to generate an instance of the XmlTextReader. This is done simply by passing the name of the document to be parsed—in this case passing "example.xml" to the XmlTextReader constructor:

XmlTextReader xmlTextReader = new XmlTextReader("example.xml");

Here, we are parsing a local file, but the XmlTextReader is extremely flexible. It can be passed in almost any type of W3C Universal Resource Identifier (URI). For example, if you wanted to parse a document on the Web located at “http://www.mylocalcompany.com/payroll.xml", you could instead use the following declaration:

XmlTextReader xmlTextReader = new XmlTextReader("http://www.mylocalcompany.com/payroll.xml");

The XmlTextReader is a one-pass parser that moves forward through the supplied XML document node by node. Unlike SAX, which is event based, the XmlTextReader waits until the program asks for the next node to provide it. This has the advantage of letting the programmer decide when to parse the next section of the document because the programmer controls the flow of processing rather than responding to it.

The method called to inform the XmlTextReader to read the next node in the XML Document is the Read method of the XmlTextReader object.

xmlTextReader.Read()

For our example program, we'll want to read every node in the document, so the Read method call is placed in a while loop:

while ( xmlTextReader.Read() )

It's important to note that the Read method of the XmlTextReader object does not actually return any data. It's easiest to think of the XML parser in this case as an assembly line: The Read method merely moves the assembly line forward one step.

However, the XmlTextReader does make it extremely easy to get access to the currently parsed XML data. Whenever the Read method is called, the currently parsed XML data is placed into the XmlTextReader object that is in use, and that data is made available via the various properties of the XmlTextReader object (see Table 17.1).

Table 17.1. `XmlTextReader` Properties
`AttributeCount`	The number of attributes the current node contains
`BaseURI`	The base URI of the current node
`CanResolveEntity`	A `boolean` value indicating whether this reader can parse and resolve entities
`Depth`	The depth of the current node in the XML document
`Encoding`	The encoding attribute for this document
`EOF`	A value indicating whether `XmlReader` is positioned at the end of the stream
`HasAttributes`	A Boolean indicating whether the current node has any attributes
`HasValue`	A Boolean indicating whether the node can have a `Value`
`IsDefault`	A Boolean indicating if the current value was generated automatically by the DTD or schema
`IsEmptyElement`	A Boolean indicating whether the current element is an empty element (for example, `<element/>`)
`Item`	The value of the element's attribute with the specified index
`LineNumber`	The line number where the reader is currently
`LinePosition`	The line position where the reader is currently
`LocalName`	The current node's local name
`Name`	The current node's qualified name
`Namespaces`	A `boolean` value indicating whether namespaces are supported
`NamespaceURI`	The namespace URI (as defined by the W3C Namespace Specification) of the current node
`NameTable`	The `XmlNameTable` associated with this `XmlTextReader`
`NodeType`	The node type of the current node
`Normalization`	A `boolean` value indicating whether whitespace is normalized
`Prefix`	The current node's namespace prefix
`QuoteChar`	The type of quotation mark used to enclose the attribute node (if the current node is an attribute node)
`ReadState`	The current reader state
`Value`	The value of the current node as text
`WhitespaceHandling`	A property that specifies the manner in which whitespace is handled
`XmlLang`	The current scope according to the xml:lang specification
`XmlResolver`	A property that allows the `XmlResolver` used for DTD references to be modified
`XmlSpace`	The current scope according to the xml:space specification

For this example, it will be necessary to keep track of when an element is started, when an element is ended, and when a text node is available. In each of these cases, a process for determining the current type of the node is needed. The type of node that has just been parsed is stored in the NodeType property of the XmlTextReader. The NodeType property contains a value that matches the possible values of the XmlNodeType enumeration, which can have the values listed in Table 17.2 .

Table 17.2. Possible `XmlNodeType` Values
`Attribute`	An XML attribute
`CDATA`	A CDATA section
`Comment`	A comment
`Document`	A Document object, which serves as the root of the entire XML Document
`DocumentFragment`	An association to a node or a subtree of another document
`DocumentType`	A Document Type Declaration
`Element`	An XML element. Specifically, the start of an XML element
`EndElement`	The end of an XML element
`Entity`	An Entity Declaration
`EntityReference`	An Entity Reference
`None`	The value of the `NodeType` property of `XmlReader` before the `Read()` method has been called to parse a document
`Notation`	A Document Type Declaration notation
`ProcessingInstruction`	An XML processing instruction
`SignificantWhiteSpace`	Whitespace that is placed between nodes if there is a mixed content model
`Text`	The text contained within an element
`Whitespace`	Insignificant whitespace placed between nodes
`XMLDeclaration`	An XML declaration node

Now that we're able to parse the documents into XML nodes and determine the type of each node, we're ready to print out the information desired for each node. The first type of node is the Element. Keep in mind that an element node doesn't contain the entire element; it is informing the program that the start of an element has been encountered (much like the SAX StartElement event handler). Specifically, the children of the current element have not yet been parsed and they will be encountered after further parsing. The name of the XML element that has been parsed is placed in the Name property of the XmlTextReader.

  if (xmlTextReader.NodeType == XmlNodeType.Element) 
  {
    // Signal the start of the element
    Console.WriteLine("Start Element: " +  xmlTextReader.Name);
  }

The next type of node that the program needs to be watching for is the Text node. Whenever a text node is encountered, its contents are placed in the Value property of the XmlTextReader. As part of the example application, the text of a node needs to be printed, so the value of the Text node is printed out.

else if (xmlTextReader.NodeType == XmlNodeType.Text) 
{
  Console.WriteLine(xmlTextReader.Value);
}

Finally, whenever the end of an element is reached, an EndElement node is placed in the XmlTextReader. The name of the XML element that is ending is placed in the Name property of the XmlTextReader.

else if (xmlTextReader.NodeType == XmlNodeType.EndElement) 
{
  // Signal the end of the element
  Console.WriteLine("End Element: " + xmlTextReader.Name);
}

As it turns out, using the XmlTextReader interface is actually quite similar to using SAX. The main difference is that instead of the parser determining when the next node is to be parsed and then notifying the application, the application has direct control over the parsing of the next node. The full source code to this example is shown in Listing 17.1 (with the Visual Basic .NET version in Listing 17.2), and the output from the example is shown in Listing 17.3 .

Listing 17.1. C# `XmlTextReader` Example

using System;
using System.Xml;
using System.Text;
class SimpleXmlTextReader
{
  static void Main(string[] args)
  {
    try
    {
      //Create an instance of the XMLTextReader.
      XmlTextReader xmlTextReader = new XmlTextReader("example.xml");
      // This method reads the XML file and generates the output
      Console.WriteLine("Start of Document");
      while ( xmlTextReader.Read() )
      {
        // Process a start of element node.
        if (xmlTextReader.NodeType == XmlNodeType.Element)
        {
          // Signal the start of the element
          Console.WriteLine("Start Element: " +  xmlTextReader.Name);
        }
        // Process a text node.
        else if (xmlTextReader.NodeType == XmlNodeType.Text)
        {
          //Add the text data to the output.
          Console.WriteLine(xmlTextReader.Value);
        }
        //Process an end of element node.
        else if (xmlTextReader.NodeType == XmlNodeType.EndElement)
        {
          // Signal the end of the element
          Console.WriteLine("End Element: " + xmlTextReader.Name);
        }
      } // End while loop
      xmlTextReader.Close();
    }
    catch (XmlException ex)
    {
      Console.WriteLine("An XML exception occurred: " + ex.ToString());
    }
    catch (Exception ex)
    {
      Console.WriteLine("A general exception occurred: " + ex.ToString());
    }

    Console.WriteLine("End of Document");
  }
} //End SimpleXmlTextReader

Listing 17.2. Visual Basic .NET `XmlTextReader` Example

Imports System
Imports System.Xml
Imports System.Text
Module SimpleXmlTextReaderVB
  Sub Main()
    Try
      'Create an instance of the XMLTextReader.
      Dim xmlTextReader As XmlTextReader
      xmlTextReader = New XmlTextReader("example.xml")
      Console.WriteLine("Start of Document")
      ' Continually read the next element that is available in the parsed
      ' document until there are no more.
      Do While xmlTextReader.Read()
        ' Process a start of element node.
        If xmlTextReader.NodeType = XmlNodeType.Element Then
          ' Signal the start of the element
          Console.WriteLine("Start Element: " + xmlTextReader.Name)
          ' Process a text node.
        ElseIf xmlTextReader.NodeType = XmlNodeType.Text Then
          'Add the text data to the output.
          Console.WriteLine(xmlTextReader.Value)
          'Process an end of element node.
        ElseIf xmlTextReader.NodeType = XmlNodeType.EndElement Then
          ' Signal the end of the element
          Console.WriteLine("End Element: " + xmlTextReader.Name)
        End If
      Loop
      xmlTextReader.Close()
      Catch ex As XmlException
        Console.WriteLine("An XML exception occurred: " + ex.ToString())
      Catch ex As Exception
        Console.WriteLine("A general exception occurred: " + ex.ToString())
    End Try
    Console.WriteLine("Endt of Document")
  End Sub
End Module

Listing 17.3. Sample Java SAX Application Output

Start of Document
Start Element: net_xml
Start Element: standards
Start Element: standard
.NET Standards Documents
End Element: standard
Start Element: standard
Document Object Model(DOM)
End Element: standard
Start Element: standard
Simple API for XML(SAX)
End Element: standard
End Element: standards
Start Element: parsers
Start Element: parser
XmlTextReader
End Element: parser
Start Element: parser
XmlNodeReader
End Element: parser
Start Element: parser
XmlValidatingReader
End Element: parser
End Element: parsers
End Element: net_xml
End of Document

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Parsing with the .NET XmlReader Classes

Create new playlist

Sign In

Sign Up

Parsing with the .NET XmlReader Classes

Figure 17.1. The Microsoft .NET XmlReader parsers.

Parsing with XmlTextReader

Listing 17.1. C# XmlTextReader Example