Using the XmlReader

The .NET Framework provides three implementations of XmlReader: XmlTextReader, XmlValidatingReader, and XmlNodeReader. In this section, I’ll present each class one at a time and show you how to use them.

XmlTextReader

XmlTextReader is the most immediately useful specialization of XmlReader. XmlTextReader is used to read XML from a Stream, URL, string, or TextReader. You can use it to read XML from a text file on disk, from a web site, or from a string in memory that has been built or loaded elsewhere in your program. XmlTextReader does not validate the XML it reads; however, it does expand the general entities &lt;, &gt;, and &amp; into their text representations (<, >, and &, respectively), and it does check the XML for well-formedness.

In addition to these general capabilities, XmlTextReader can resolve system- and user-defined entities, and can be optimized somewhat by providing it with an XmlNameTable. Although XmlNameTable is an abstract class, you can instantiate a new NameTable, or access an XmlReader’s XmlNameTable through its NameTable property.

Tip

An XmlNameTable contains a collection of string objects that are used to represent the elements and attributes of an XML document. XmlReader can use this table to more efficiently handle elements and attributes that recur in a document. An XmlNameTable object is created at runtime by the .NET parser every time it reads an XML document. If you are parsing many documents with the same format, using the same XmlNameTable in each of them can result in some efficiency gains—I’ll show you how to do this later in this chapter.

Like many businesses, Angus Hardware—the hardware store I introduced in the preface—issues and processes purchase orders (POs) to help manage its finances and inventory. Being technically savvy, the company IT crew has created an XML format for Angus Hardware POs. Example 2-1 lists the XML for po1456.xml, a typical purchase order. I’ll use this document in the rest of the examples in this chapter, and some of the later examples in the book.

Example 2-1. A purchase order in XML format
<?xml version="1.0"?>

<po id="PO1456">

  <date year="2002" month="6" day="14" />

  <address type="shipping">
    <name>Frits Mendels</name>
    <street>152 Cherry St</street>
    <city>San Francisco</city>
    <state>CA</state>
    <zip>94045</zip>
  </address>

  <address type="billing">
    <name>Frits Mendels</name>
    <street>PO Box 6789</street>
    <city>San Francisco</city>
    <state>CA</state>
    <zip>94123-6798</zip>
  </address>

  <items>
    <item quantity="1" 
          productCode="R-273" 
          description="14.4 Volt Cordless Drill" 
          unitCost="189.95" />
    <item quantity="1" 
          productCode="1632S" 
          description="12 Piece Drill Bit Set" 
          unitCost="14.95" /> 
  </items>

</po>

Tip

Example 2-1 and all the other code examples in this book are available at the book’s web site, http://www.oreilly.com/catalog/netxml/.

Angus Hardware’s fulfillment department, the group responsible for pulling products off of shelves in the warehouse, has not yet upgraded, unfortunately, to the latest laser printers and hand-held bar-code scanners. The warehouse workers prefer to receive their pick lists as plain text on paper. Since the order entry department produces its POs in XML, the IT guys propose to transform their existing POs into the pick list format preferred by the order pickers.

Here’s the pick list that the fulfillment department prefers:

Angus Hardware PickList
=======================

PO Number: PO1456

Date: Friday, June 14, 2002

Shipping Address:
Frits Mendels
152 Cherry St
San Francisco, CA 94045

Quantity Product Code Description
======== ============ ===========
      1        R-273  14.4 Volt Cordless Drill
      1        1632S  12 Piece Drill Bit Set

You’ll note that while the pick list layout is fairly simple, it does require some formatting; Quantity and Product Code numbers need to be right-aligned, for example. This is a good job for an XmlReader, because you really don’t need to manipulate the XML, but just read it in and transform it into the desired text layout. (You could do this with an XSLT transform, but that solution comes later in Chapter 7!)

Example 2-2 shows the Main( ) method of a program that reads the XML purchase order listed in Example 2-1 and transforms it into a pick list.

Example 2-2. A program to transform an XML purchase order into a printed pick list
using System;
using System.IO;
using System.Xml;

public class PoToPickList {

  public static void Main(string[ ] args) {

    string url = args[0];

    XmlReader reader = new XmlTextReader(url);

    StringBuilder pickList = new StringBuilder( );
    pickList.Append("Angus Hardware PickList").Append(Environment.NewLine);
    pickList.Append("=======================").Append(Environment.NewLine).Append
(Environment.NewLine);

    while (reader.Read( )) {
      if (reader.NodeType == XmlNodeType.Element) {
        switch (reader.LocalName) {
          case "po":
            pickList.Append(POElementToString(reader));
            break;
          case "date":
            pickList.Append(DateElementToString(reader));
            break;
          case "address":
            reader.MoveToAttribute("type");
            if (reader.Value == "shipping") {
              pickList.Append(AddressElementToString(reader));
            } else {
              reader.Skip( );
            }
            break;
          case "items":
            pickList.Append(ItemsElementToString(reader));
            break;
        }
      }
    }

    Console.WriteLine(pickList);
  }
}

Let’s look at the Main( ) method in Example 2-2 in small chunks, and then we’ll dive into the rest of the program.

XmlReader reader = new XmlTextReader(url);

This line instantiates a new XmlTextReader object, passing in a URL, and assigns the object reference to an XmlReader variable. If the URL uses the http or https scheme, the XmlTextReader will take care of creating a network connection to the web site. If the URL uses the file scheme, or has no scheme at all, the XmlTextReader will read the file from disk. Because the XmlTextReader uses the System.IO classes we discussed earlier, it does not currently recognize any other URL schemes, such as ftp or gopher:

StringBuilder pickList = new StringBuilder( );
pickList.Append("Angus Hardware PickList").Append(Environment.NewLine);
pickList.Append("=======================").Append(Environment.NewLine) .Append
(Environment.NewLine);

These lines instantiate a StringBuilder object that will be used to build a string containing the text representation of the pick list. We initialize the StringBuilder with a simple page header.

Tip

The StringBuilder class provides an efficient way to build strings. You could just concatenate several string instances together using the + operator, but there’s some overhead involved in the creation of multiple strings. Using the StringBuilder is a good way to avoid that overhead. To learn more about the StringBuilder, see LearningC# by Jesse Liberty (O’Reilly).

while (reader.Read( )) {
  if (reader.NodeType == XmlNodeType.Element) {

This event loop is the heart of the code. Each time Read( )is called, the XML parser moves to the next node in the XML file. Read( ) returns true if the read was successful, and false if it was not—such as at the end of the file. The expression within the if statement ensures that you don’t try to evaluate an EndElement node as if it were an Element node; that would result in two calls to each method, one as the parser reads an Element and one as it reads an EndElement. XmlReader.NodeType returns an XmlNodeType.

Now that you have read a node, you need to determine its name:

switch (reader.LocalName) {

The LocalName property contains the name of the current node with its namespace prefix removed. A Name property that contains the name as well as its namespace prefix, if it has one, is also available. The namespace prefix itself can be retrieved with the XmlReader type’s Prefix property:

case "po":
  pickList.Append(POElementToString(reader));
  break;
case "date":
  pickList.Append(DateElementToString(reader));
  break;
case "address":
  reader.MoveToAttribute("type");
  if (reader.Value == "shipping") {
    pickList.Append(AddressElementToString(reader));
  } else {
    reader.Skip( );
  }
  break;
case "items":
  pickList.Append(ItemsElementToString(reader));
  break;

For each element name, the program calls a specific method to parse its subnodes; this demonstrates the concept of recursive descent parsing, which I discussed earlier.

One element of the XML tree, address, is of particular interest. The fulfillment department doesn’t care who’s paying for the order, only to whom the order is to be shipped. Since the Angus Hardware order pickers are only interested in shipping addresses, the program checks the value of the type attribute before calling AddressElementToString( ). If the address is not a shipping address, the program calls Skip( ) to move the parser to the next sibling of the current node.

To read in the po element, the program calls the POElementToString( ) method. Here’s the definition of that method:

private static string POElementToString(XmlReader reader) {

  string id = reader.GetAttribute("id");

  StringBuilder poBlock = new StringBuilder( );
  poBlock.Append("PO Number: ").Append(id).Append(Environment.NewLine).Append
(Environment.NewLine);
  return poBlock.ToString( );
}

The first thing this method does is to get the id attribute. The GetAttribute( ) method returns an attribute from the current node, if the current node is an element; otherwise, it returns string.Empty. It does not move the current position of the parser to the next node.

After it gets the id, POElementToString( ) can then return a properly formatted line for the pick list.

Next, the code looks for any date elements and calls DateElementToString( ):

private static string DateElementToString(XmlReader reader) {

  int year = Int32.Parse(reader.GetAttribute("year"));
  int month = Int32.Parse (reader.GetAttribute("month"));
  int day = Int32.Parse (reader.GetAttribute("day"));
  DateTime date = new DateTime(year,month,day);

  StringBuilder dateBlock = new StringBuilder( );
  dateBlock.Append("Date: ").Append(date.ToString("D")).Append(Environment.NewLine) .Append
(Environment.NewLine);
  return dateBlock.ToString( );
}

This method uses Int32.Parse( ) to convert strings as read from the date element’s attributes into int variables suitable for passing to the DateTime constructor. Next, you can format the date as required. Finally, the method returns the properly formatted date line for the pick list:

private static string AddressElementToString(XmlReader reader) {

StringBuilder addressBlock = new StringBuilder( );
addressBlock.Append("Shipping Address:
");

  while (reader.Read( ) && (reader.NodeType == XmlNodeType.
Element || reader.NodeType == XmlNodeType.Whitespace)) {
    switch (reader.LocalName) {
      case "name":
      case "company":
      case "street":
      case "zip":
        addressBlock.Append(reader.ReadString( ));
        addressBlock.Append(Environment.NewLine);
        break;
      case "city":
        addressBlock.Append(reader.ReadString( ));
        addressBlock.Append(", ");
        break;
      case "state":
        addressBlock.Append(reader.ReadString( ));
        addressBlock.Append(" ");
        break;
    }
  }

  addressBlock.Append("
");
  return addressBlock.ToString( );
}

Much like the Main( ) method of the program, AddressElementToString( ) reads from the XML file using a while loop. However, because you know the method starts at the address element, the only nodes it needs to traverse are the subnodes of address. In the cases of name, company, street, and zip, AddressElementToString( ) reads the content of each element and appends a newline character. The program must deal with the city and state elements slightly differently, however. Ordinarily, a city is followed by a comma, a state name, a space, and a zip code. Then, the program returns the properly formatted address line.

Now we come to the most complex method, ItemsElementToString( ). Its complexity lies not in its reading of the XML, but in its formatting of the output:

private static string ItemsElementToString(XmlReader reader) {

  StringBuilder itemsBlock = new StringBuilder( );
  itemsBlock.Append("Quantity Product Code Description
");
  itemsBlock.Append("======== ============ ===========
");

  while (reader.Read( ) && (reader.NodeType == XmlNodeType.
Element || reader.NodeType == XmlNodeType.Whitespace)) {
    switch (reader.LocalName) {
      case "item":
        intquantity = Int32.Parse(
        reader.GetAttribute("quantity"));
        stringproductcode = reader.GetAttribute("productCode");
        stringdescription = reader.GetAttribute("description");
        itemsBlock.AppendFormat(" {0,6}  {1,11}  {2}",
          quantity,productCode,description).Append(Environment.NewLine);
        break;
    }
  }

  return itemsBlock.ToString( );
}

The ItemsElementToString( ) method makes use of the AppendFormat( ) method of the StringBuilder object. This is not the proper place for a full discussion of .NET’s string-formatting capabilities, but suffice it to say that each parameter in the format string is replaced with the corresponding element of the parameter array, and padded to the specified number of digits. For additional information on formatting strings in C#, see Appendix B of C# In A Nutshell, by Peter Drayton, Ben Albahari, and Ted Neward (O’Reilly).

This program makes some assumptions about the incoming XML. For example, it assumes that in order for the output to be produced correctly, the elements must appear in a very specific order. It also assumes that certain elements will always occur, and that others are optional. The XmlTextReader cannot always handle exceptions to these assumptions, but the XmlValidatingReader can. To ensure that an unusable pick list is not produced, you should always validate the XML before doing any processing.

XmlValidatingReader

XmlValidatingReader is a specialized implementation of XmlReader that performs validation on XML as it reads the incoming stream. The validation may be done by explicitly providing a Document Type Declaration (DTD), an XML Schema, or an XML-Data Reduced (XDR) Schema—or the type of validation may be automatically determined from the document itself. XmlValidatingReader may read data from a Stream, a string, or another XmlReader. This allows you, for example, to validate XML from XmlNode using XmlTextReader, which does not perform validation itself. Validation errors are raised either through an event handler, if one is registered, or by throwing an exception.

The following examples will show you how to validate the Angus Hardware purchase order using a DTD. Validating XML with an XML Schema instead of a DTD will give you even more control over the data format, but I’ll talk about that topic in Chapter 8.

Example 2-3 shows the DTD for the sample purchase order.

Example 2-3. The DTD for Angus Hardware purchase orders
<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT po (date,address+,items)>
<!ATTLIST po id ID #REQUIRED>

<!ELEMENT date EMPTY>
<!ATTLIST date year CDATA #REQUIRED
               month (1|2|3|4|5|6|7|8|9|10|11|12) #REQUIRED
               day (1|2|3|4|5|6|7|8|9|10|11|
                    12|13|14|15|16|17|18|19|
                    20|21|22|23|24|25|26|27|
                    28|29|30|31) #REQUIRED>

<!ELEMENT address (name,company?,street+,city,state,zip)>
<!ATTLIST address type (billing|shipping) #REQUIRED>

<!ELEMENT name    (#PCDATA)>

<!ELEMENT company (#PCDATA)>

<!ELEMENT street  (#PCDATA)>

<!ELEMENT city    (#PCDATA)>

<!ELEMENT state   (#PCDATA)>

<!ELEMENT zip     (#PCDATA)>

<!ELEMENT items (item)+>

<!ELEMENT item EMPTY>
<!ATTLIST item quantity CDATA #REQUIRED
               productCode CDATA #REQUIRED
               description CDATA #REQUIRED
               unitCost CDATA #REQUIRED>

Tip

For more information on DTDs, see Erik Ray’s Learning XML, 2nd Edition (O’Reilly) or Elliotte Rusty Harold and W. Scott Mean’s XML in a Nutshell, 2nd Edition (O’Reilly).

To validate the XML with this DTD, you must make one small change to the XML document, and one to the code that reads it. To the XML you must add the following document type declaration after the XML declaration (<?xml version="1.0"?>) so that the validator knows what DTD to validate against.

<!DOCTYPE po SYSTEM "po.dtd">

Tip

Remember that even if you insert the <!DOCTYPE> declaration in your target XML file, you must still explicitly use XmlValidatingReader to validate the XML. XmlTextReader does not validate XML, only XmlValidatingReader can do that.

In the code that processes the XML, you must also create a new XmlValidatingReader to wrap the original XmlTextReader:

XmlReader textReader = new XmlTextReader(url);
XmlValidatingReader reader = new XmlValidatingReader(textReader);

By default, XmlValidatingReader automatically detects the document’s validation type, although you can also set the validation type manually using XmlValidatingReader’s ValidationType property:

reader.ValidationType = ValidationType.DTD;

Unfortunately, if you take this approach, you’ll find that errors are not handled gracefully. For example, if you add an address of type="mailing" to the XML document and attempt to validate it, the following exception is thrown:

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The 'type' 
attribute has an invalid value according to its data type. An error occurred at 
file:///C:/Chapter 2/po1456.xml(16, 12).
   at System.Xml.XmlValidatingReader.InternalValidationCallback(Object sender, 
ValidationEventArgs e)
   at System.Xml.Schema.Validator.SendValidationEvent(XmlSchemaException e, 
XmlSeverityType severity)
   at System.Xml.Schema.Validator.ProcessElement( )
   at System.Xml.Schema.Validator.Validate( )
   at System.Xml.Schema.Validator.Validate(ValidationType valType)
   at System.Xml.XmlValidatingReader.ReadWithCollectTextToken( )
   at System.Xml.XmlValidatingReader.Read( )
   at PoToPickListValidated.Main(String[ ] args)

Obviously, you’d like to handle exceptions more cleanly than this. You have two options: you can wrap the entire parse tree in a try...catch block, or you can set the XmlValidatingReader object’s ValidationEventHandler delegate. Since I assume that you already know how to write a try...catch block, let’s explore a solution that uses a ValidationEventHandler.

ValidationEventHandler is a type found in the System.Xml.Schema namespace, so you’ll need to first add this line to the top of your code:

using System.Xml.Schema;

Next, add the following line after you instantiate the XmlValidatingReader and set the ValidationType to ValidationType.DTD:

reader.ValidationEventHandler += new ValidationEventHandler(HandleValidationError);

This step registers the callback for validation errors.

Now, you’re ready to actually create a ValidationEventHandler. The signature of the delegate as defined by the .NET Framework is:

public delegate void ValidationEventHandler(
  object sender, ValidationEventArgs e
);

Your validation event handler must match that signature. For now, you can just write the error message to the console:

private static void HandleValidationError(
  object sender, ValidationEventArgs e) {
  Console.WriteLine(e.Message);
}

Now, if you run the purchase order conversion program using the invalid XML file I talked about earlier, the following slightly more informative message will print to the console:

'mailing' is not in the enumeration list. An error occurred at file:///C:/Chapter 2/po1456.xml(16, 12).

Tip

By default, if a validation error is encountered, an exception is thrown and processing halts. However, with XmlValidatingReader, if there were more validation errors in the file, each one of them would be reported individually as processing continued.

I’m sure you can think of useful ways to use a validation event. Some examples of useful output that I’ve thought of include:

  • If processing is being done interactively, present the user with the relevant lines of XML, so she can see the erroneous data.

  • If processing is being done by an automated process, alert a system administrator by email or pager.

The entire revised program is shown in Example 2-4.

Example 2-4. Complete program for converting an Angus Hardware XML purchase order to a pick list
using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Schema;

public class PoToPickListValidated {

  public static void Main(string[ ] args) {

    string url = args[0];

    XmlReader textReader = new XmlTextReader(url);
    XmlValidatingReader reader = new XmlValidatingReader(textReader);
    reader.ValidationType = ValidationType.DTD;
    reader.ValidationEventHandler += new ValidationEventHandler(HandleValidationError);

    StringBuilder pickList = new StringBuilder( );
    pickList.Append("Angus Hardware PickList
");
    pickList.Append("=======================

");

    while (reader.Read( )) {
      if (reader.NodeType == XmlNodeType.Element) {
        switch (reader.LocalName) {
          case "po":
            pickList.Append(POElementToString(reader));
            break;
          case "date":
            pickList.Append(DateElementToString(reader));
            break;
          case "address":
            reader.MoveToAttribute("type");
            if (reader.Value == "shipping") {
              pickList.Append(AddressElementToString(reader));
            } else {
              reader.Skip( );
            }
            break;
          case "items":
            pickList.Append(ItemsElementToString(reader));
            break;
        }
      }
    }

    Console.WriteLine(pickList);
  }

  private static string POElementToString(XmlReader reader) {

    string id = reader.GetAttribute("id");

    StringBuilder poBlock = new StringBuilder( );
    poBlock.Append("PO Number: ").Append(id).Append("

");
    return poBlock.ToString( );
  }

  private static string DateElementToString(XmlReader reader) {

    int year = XmlConvert.ToInt32(reader.GetAttribute("year"));
    int month = XmlConvert.ToInt32(reader.GetAttribute("month"));
    int day = XmlConvert.ToInt32(reader.GetAttribute("day"));
    DateTime date = new DateTime(year,month,day);

    StringBuilder dateBlock = new StringBuilder( );
    dateBlock.Append("Date: ").Append(date.ToString("D")).Append("

");
    return dateBlock.ToString( );
  }

  private static string AddressElementToString(XmlReader reader) {

    StringBuilder addressBlock = new StringBuilder( );
    addressBlock.Append("Shipping Address:
");

    while (reader.Read( ) && (reader.NodeType == XmlNodeType.Element || 
reader.NodeType == XmlNodeType.Whitespace)) {
      switch (reader.LocalName) {
        case "name":
        case "company":
        case "street":
        case "zip":
          addressBlock.Append(reader.ReadString( ));
          addressBlock.Append("
");
          break;
        case "city":
          addressBlock.Append(reader.ReadString( ));
          addressBlock.Append(", ");
          break;
        case "state":
          addressBlock.Append(reader.ReadString( ));
          addressBlock.Append(" ");
          break;
      }
    }

    addressBlock.Append("
");
    return addressBlock.ToString( );
  }

  private static string ItemsElementToString(XmlReader reader) {

    StringBuilder itemsBlock = new StringBuilder( );
    itemsBlock.Append("Quantity Product Code Description
");
    itemsBlock.Append("======== ============ ===========
");

    while (reader.Read( ) && (reader.NodeType == XmlNodeType.Element || 
reader.NodeType == XmlNodeType.Whitespace)) {
      switch (reader.LocalName) {
        case "item":
          object [ ] parms = new object [3];
          parms [0] = XmlConvert.ToInt32(reader.GetAttribute("quantity"));
          parms [1] = reader.GetAttribute("productCode");
          parms [2] = reader.GetAttribute("description");
          itemsBlock.AppendFormat(" {0,6}  {1,11}  {2}
",parms);
          break;
      }
    }

    return itemsBlock.ToString( );
  }

  private static void HandleValidationError(object sender,ValidationEventArgs e) {
    Console.WriteLine(e.Message);
  }
}

XmlNodeReader

The XmlNodeReader type is used to read an existing XmlNode from memory. For example, suppose you have an entire XML document in memory, in an XmlDocument, and you wish to deal with one of its nodes in a specialized manner. The XmlNodeReader constructor can take an XmlNode object as its argument from anywhere in an XML document or document fragment, and perform its operations relative to that node.

For example, you might wish to construct an Angus Hardware XML purchase order in memory rather than reading it from disk. One reason you might choose to construct a PO in memory is if order entry is being done by an outside party in a non-XML format, and some other section of your program is taking care of converting the data into XML. The actual construction of an XmlDocument is covered in Chapter 5, but for now let’s assume that you’ve been given a complete XmlDocument that constitutes a valid PO.

To print the pick list, you need only make one small change to Example 2-4: replace the XmlTextReader constructor with XmlNodeReader, passing in an XmlNode as its argument.

XmlReader reader = new XmlNodeReader(node);

The rest of the program continues as before, validating the XmlNode passed in and printing the pick list to the console. The only difference is in the type of inputs the program takes—in this case, the input comes directly from the XmlNode.

To recap the different XmlReader subclasses: XmlTextReader is used to read an XML document from some sort of file, whether it’s on a local disk or on a web server; XmlNodeReader is used to read an XML fragment from an XmlDocument that’s already been loaded some other way; XmlValidatingReader is used to validate an XML document that’s being read using an XmlTextReader. The subclasses of XmlReader are mostly interchangeable, with a few exceptions discussed later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.245.196