Microsoft supplies three parsers that implement the XmlReader interface (see Figure 17.1):
XmlTextReader— The simplest and most straightforward XML parser in .NET XmlTextReader is a one-pass, forward-only parser. XmlTextReader doesn't support validation, cannot expand general entities, and default attributes aren't made available. However, although it has several downsides, the XmlTextReader is an extremely fast and efficient parser.
XmlValidatingReader— The XmlValidatingReader uses a parser like the XmlTextReader to add several extended features. First, the XmlValidatingReader validates the document against a Document Type Definition or XML Schema while the document is being parsed. In addition, it adds support for the expansion of general entities and attaches the default attributes specified in the Document Type Definition or Schema.
XmlNodeReader— The XmlNodeReader parser is a fundamentally different parser in target usage than the standard XmlTextReader parser. The XmlNodeReader does not parse text documents into their equivalent XML form. Instead, it is used to read from XML documents that have already been parsed into a W3C Document Object Model (DOM) tree.
The simplest method of parsing XML with .NET is using the XmlTextReader class. This section provides an example of the use of XmlTextReader to accomplish basic parsing tasks. XmlTextReader is similar in functionality, though not in interface, to SAX, which is discussed in Chapter 15, “Parsing XML Based on Events.” In this section, we will develop a simple application that reads all the nodes in the document, informing the user of all the elements in the documents and printing any text contained within them.
In order to parse XML documents with the .NET XmlTextReader class, we will first need to generate an instance of the XmlTextReader. This is done simply by passing the name of the document to be parsed—in this case passing "example.xml" to the XmlTextReader constructor:
XmlTextReader xmlTextReader = new XmlTextReader("example.xml");
Here, we are parsing a local file, but the XmlTextReader is extremely flexible. It can be passed in almost any type of W3C Universal Resource Identifier (URI). For example, if you wanted to parse a document on the Web located at “http://www.mylocalcompany.com/payroll.xml", you could instead use the following declaration:
XmlTextReader xmlTextReader = new XmlTextReader("http://www.mylocalcompany.com/payroll.xml");
The XmlTextReader is a one-pass parser that moves forward through the supplied XML document node by node. Unlike SAX, which is event based, the XmlTextReader waits until the program asks for the next node to provide it. This has the advantage of letting the programmer decide when to parse the next section of the document because the programmer controls the flow of processing rather than responding to it.
The method called to inform the XmlTextReader to read the next node in the XML Document is the Read method of the XmlTextReader object.
xmlTextReader.Read()
For our example program, we'll want to read every node in the document, so the Read method call is placed in a while loop:
while ( xmlTextReader.Read() )
It's important to note that the Read method of the XmlTextReader object does not actually return any data. It's easiest to think of the XML parser in this case as an assembly line: The Read method merely moves the assembly line forward one step.
However, the XmlTextReader does make it extremely easy to get access to the currently parsed XML data. Whenever the Read method is called, the currently parsed XML data is placed into the XmlTextReader object that is in use, and that data is made available via the various properties of the XmlTextReader object (see Table 17.1).
For this example, it will be necessary to keep track of when an element is started, when an element is ended, and when a text node is available. In each of these cases, a process for determining the current type of the node is needed. The type of node that has just been parsed is stored in the NodeType property of the XmlTextReader. The NodeType property contains a value that matches the possible values of the XmlNodeType enumeration, which can have the values listed in Table 17.2.
Now that we're able to parse the documents into XML nodes and determine the type of each node, we're ready to print out the information desired for each node. The first type of node is the Element. Keep in mind that an element node doesn't contain the entire element; it is informing the program that the start of an element has been encountered (much like the SAX StartElement event handler). Specifically, the children of the current element have not yet been parsed and they will be encountered after further parsing. The name of the XML element that has been parsed is placed in the Name property of the XmlTextReader.
if (xmlTextReader.NodeType == XmlNodeType.Element) { // Signal the start of the element Console.WriteLine("Start Element: " + xmlTextReader.Name); }
The next type of node that the program needs to be watching for is the Text node. Whenever a text node is encountered, its contents are placed in the Value property of the XmlTextReader. As part of the example application, the text of a node needs to be printed, so the value of the Text node is printed out.
else if (xmlTextReader.NodeType == XmlNodeType.Text) { Console.WriteLine(xmlTextReader.Value); }
Finally, whenever the end of an element is reached, an EndElement node is placed in the XmlTextReader. The name of the XML element that is ending is placed in the Name property of the XmlTextReader.
else if (xmlTextReader.NodeType == XmlNodeType.EndElement) { // Signal the end of the element Console.WriteLine("End Element: " + xmlTextReader.Name); }
As it turns out, using the XmlTextReader interface is actually quite similar to using SAX. The main difference is that instead of the parser determining when the next node is to be parsed and then notifying the application, the application has direct control over the parsing of the next node. The full source code to this example is shown in Listing 17.1 (with the Visual Basic .NET version in Listing 17.2), and the output from the example is shown in Listing 17.3.
3.145.16.23