The .NET Framework provides three implementations of
XmlReader
: XmlTextReader
,
XmlValidatingReader
, and
XmlNodeReader
. In this section,
I’ll present each class one at a time and show you
how to use them.
XmlTextReader
is the most immediately
useful specialization of XmlReader
.
XmlTextReader
is used to read XML from a
Stream
, URL, string
, or
TextReader
. You can use it to read XML from a text
file on disk, from a web site, or from a string in memory that has
been built or loaded elsewhere in your program.
XmlTextReader
does not validate the XML it reads;
however, it does expand the general entities
<
, >
, and
&
into their text representations
(<
, >
, and
&
, respectively), and it does check the XML
for well-formedness.
In addition to these general
capabilities, XmlTextReader
can resolve system-
and user-defined entities, and can be optimized somewhat by providing
it with an XmlNameTable
. Although
XmlNameTable
is an abstract class, you can
instantiate a new NameTable
, or access an
XmlReader
’s
XmlNameTable
through its
NameTable
property.
An XmlNameTable
contains a collection of string
objects that are used to represent the elements and attributes of an
XML document. XmlReader
can use this table to more
efficiently handle elements and attributes that recur in a document.
An XmlNameTable
object is created at runtime by
the .NET parser every time it reads an XML document. If you are
parsing many documents with the same format, using the same
XmlNameTable
in each of them can result in some
efficiency gains—I’ll show you how to do this
later in this chapter.
Like many businesses, Angus Hardware—the hardware store I
introduced in the preface—issues and processes purchase orders
(POs) to help manage its finances and inventory. Being technically
savvy, the company IT crew has created an XML format for Angus
Hardware POs. Example 2-1 lists the XML for
po1456.xml
, a typical purchase
order. I’ll use this document in the rest of the
examples in this chapter, and some of the later examples in the book.
<?xml version="1.0"?> <po id="PO1456"> <date year="2002" month="6" day="14" /> <address type="shipping"> <name>Frits Mendels</name> <street>152 Cherry St</street> <city>San Francisco</city> <state>CA</state> <zip>94045</zip> </address> <address type="billing"> <name>Frits Mendels</name> <street>PO Box 6789</street> <city>San Francisco</city> <state>CA</state> <zip>94123-6798</zip> </address> <items> <item quantity="1" productCode="R-273" description="14.4 Volt Cordless Drill" unitCost="189.95" /> <item quantity="1" productCode="1632S" description="12 Piece Drill Bit Set" unitCost="14.95" /> </items> </po>
Example 2-1 and all the other code examples in this
book are available at the book’s web site,
http://www.oreilly.com/catalog/netxml/
.
Angus Hardware’s fulfillment department, the group responsible for pulling products off of shelves in the warehouse, has not yet upgraded, unfortunately, to the latest laser printers and hand-held bar-code scanners. The warehouse workers prefer to receive their pick lists as plain text on paper. Since the order entry department produces its POs in XML, the IT guys propose to transform their existing POs into the pick list format preferred by the order pickers.
Here’s the pick list that the fulfillment department prefers:
Angus Hardware PickList ======================= PO Number: PO1456 Date: Friday, June 14, 2002 Shipping Address: Frits Mendels 152 Cherry St San Francisco, CA 94045 Quantity Product Code Description ======== ============ =========== 1 R-273 14.4 Volt Cordless Drill 1 1632S 12 Piece Drill Bit Set
You’ll note that while the pick list layout is
fairly simple, it does require some formatting; Quantity and Product
Code numbers need to be right-aligned, for example. This is a good
job for an XmlReader
, because you really
don’t need to manipulate the XML, but just read it
in and transform it into the desired text layout. (You could do this
with an XSLT transform, but that solution comes later in Chapter 7!)
Example 2-2 shows the Main( )
method of a program that reads the XML purchase order listed in Example 2-1 and transforms it into a pick list.
using System; using System.IO; using System.Xml; public class PoToPickList { public static void Main(string[ ] args) { string url = args[0]; XmlReader reader = new XmlTextReader(url); StringBuilder pickList = new StringBuilder( ); pickList.Append("Angus Hardware PickList").Append(Environment.NewLine); pickList.Append("=======================").Append(Environment.NewLine).Append (Environment.NewLine); while (reader.Read( )) { if (reader.NodeType == XmlNodeType.Element) { switch (reader.LocalName) { case "po": pickList.Append(POElementToString(reader)); break; case "date": pickList.Append(DateElementToString(reader)); break; case "address": reader.MoveToAttribute("type"); if (reader.Value == "shipping") { pickList.Append(AddressElementToString(reader)); } else { reader.Skip( ); } break; case "items": pickList.Append(ItemsElementToString(reader)); break; } } } Console.WriteLine(pickList); } }
Let’s look at the Main( )
method
in Example 2-2 in small chunks, and then
we’ll dive into the rest of the program.
XmlReader reader = new XmlTextReader(url);
This line instantiates a new XmlTextReader
object,
passing in a URL, and assigns the object reference to an
XmlReader
variable. If the URL uses the
http
or https
scheme, the
XmlTextReader
will take care of creating a network
connection to the web site. If the URL uses the
file
scheme, or has no scheme at all, the
XmlTextReader
will read the file from disk.
Because the XmlTextReader
uses the
System.IO
classes we discussed earlier, it does
not currently recognize any other URL schemes, such as
ftp
or gopher
:
StringBuilder pickList = new StringBuilder( ); pickList.Append("Angus Hardware PickList").Append(Environment.NewLine); pickList.Append("=======================").Append(Environment.NewLine) .Append (Environment.NewLine);
These lines instantiate a StringBuilder
object
that will be used to build a string containing the text
representation of the pick list. We initialize the
StringBuilder
with a simple page
header.
The StringBuilder
class provides an efficient way
to build strings. You could just concatenate several
string
instances together using the
+
operator, but there’s some
overhead involved in the creation of multiple strings. Using the
StringBuilder
is a good way to avoid that
overhead. To learn more about the StringBuilder
,
see LearningC#
by Jesse Liberty
(O’Reilly).
while (reader.Read( )) { if (reader.NodeType == XmlNodeType.Element) {
This event loop is the heart of the code. Each time Read(
)
is called, the XML parser moves to the next node in the
XML file. Read( )
returns true
if the read was successful, and false
if it was
not—such as at the end of the file. The expression within the
if
statement ensures that you
don’t try to evaluate an
EndElement
node as if it were an
Element
node; that would result in two calls to
each method, one as the parser reads an Element
and one as it reads an EndElement
.
XmlReader.NodeType
returns an
XmlNodeType
.
Now that you have read a node, you need to determine its name:
switch (reader.LocalName) {
The LocalName
property contains the name of the current node with its namespace
prefix removed. A Name
property that contains the
name as well as its namespace prefix, if it has one, is also
available. The namespace prefix itself can be retrieved with the
XmlReader
type’s
Prefix
property:
case "po": pickList.Append(POElementToString(reader)); break; case "date": pickList.Append(DateElementToString(reader)); break; case "address": reader.MoveToAttribute("type"); if (reader.Value == "shipping") { pickList.Append(AddressElementToString(reader)); } else { reader.Skip( ); } break; case "items": pickList.Append(ItemsElementToString(reader)); break;
For each element name, the program calls a specific method to parse its subnodes; this demonstrates the concept of recursive descent parsing, which I discussed earlier.
One element of the XML tree, address
, is of
particular interest. The fulfillment department
doesn’t care who’s paying for the
order, only to whom the order is to be shipped. Since the Angus
Hardware order pickers are only interested in shipping
addresses, the program checks the value of the
type
attribute before calling
AddressElementToString( )
. If the
address
is not a shipping address, the program
calls Skip( )
to move the parser to the next
sibling of the current node.
To read in the po
element, the program calls the
POElementToString( )
method.
Here’s the definition of that method:
private static string POElementToString(XmlReader reader) { string id = reader.GetAttribute("id"); StringBuilder poBlock = new StringBuilder( ); poBlock.Append("PO Number: ").Append(id).Append(Environment.NewLine).Append (Environment.NewLine); return poBlock.ToString( ); }
The first thing this method does is to
get the id
attribute. The GetAttribute(
)
method returns an attribute from the current node, if the
current node is an element; otherwise, it returns
string.Empty
. It does not move the current
position of the parser to the next node.
After it gets the id
, POElementToString(
)
can then return a properly formatted line for the pick
list.
Next, the code looks for any date
elements and
calls DateElementToString( )
:
private static string DateElementToString(XmlReader reader) { int year = Int32.Parse(reader.GetAttribute("year")); int month = Int32.Parse (reader.GetAttribute("month")); int day = Int32.Parse (reader.GetAttribute("day")); DateTime date = new DateTime(year,month,day); StringBuilder dateBlock = new StringBuilder( ); dateBlock.Append("Date: ").Append(date.ToString("D")).Append(Environment.NewLine) .Append (Environment.NewLine); return dateBlock.ToString( ); }
This
method uses Int32.Parse( )
to convert
string
s as read from the date
element’s attributes into int
variables suitable for passing to the DateTime
constructor. Next, you can format the date as required. Finally, the
method returns the properly formatted date line for the pick list:
private static string AddressElementToString(XmlReader reader) { StringBuilder addressBlock = new StringBuilder( ); addressBlock.Append("Shipping Address: "); while (reader.Read( ) && (reader.NodeType == XmlNodeType. Element || reader.NodeType == XmlNodeType.Whitespace)) { switch (reader.LocalName) { case "name": case "company": case "street": case "zip": addressBlock.Append(reader.ReadString( )); addressBlock.Append(Environment.NewLine); break; case "city": addressBlock.Append(reader.ReadString( )); addressBlock.Append(", "); break; case "state": addressBlock.Append(reader.ReadString( )); addressBlock.Append(" "); break; } } addressBlock.Append(" "); return addressBlock.ToString( ); }
Much like the Main(
)
method of the program, AddressElementToString(
)
reads from the XML file using a while
loop. However, because you know the method starts at the
address
element, the only nodes it needs to
traverse are the subnodes of address
. In the cases
of name
, company
,
street
, and zip
,
AddressElementToString( )
reads the content of
each element and appends a newline character. The program must deal
with the city
and state
elements slightly differently, however. Ordinarily, a city is
followed by a comma, a state name, a space, and a zip code. Then, the
program returns the properly formatted address line.
Now we come to the most complex method,
ItemsElementToString( )
. Its complexity lies not
in its reading of the XML, but in its formatting of the output:
private static string ItemsElementToString(XmlReader reader) { StringBuilder itemsBlock = new StringBuilder( ); itemsBlock.Append("Quantity Product Code Description "); itemsBlock.Append("======== ============ =========== "); while (reader.Read( ) && (reader.NodeType == XmlNodeType. Element || reader.NodeType == XmlNodeType.Whitespace)) { switch (reader.LocalName) { case "item": intquantity = Int32.Parse( reader.GetAttribute("quantity")); stringproductcode = reader.GetAttribute("productCode"); stringdescription = reader.GetAttribute("description"); itemsBlock.AppendFormat(" {0,6} {1,11} {2}", quantity,productCode,description).Append(Environment.NewLine); break; } } return itemsBlock.ToString( ); }
The
ItemsElementToString( )
method makes use of the
AppendFormat( )
method of the
StringBuilder
object. This is not the proper place
for a full discussion of .NET’s string-formatting
capabilities, but suffice it to say that each parameter in the format
string is replaced with the corresponding element of the parameter
array, and padded to the specified number of digits. For additional
information on formatting strings in C#, see Appendix B of
C# In A Nutshell, by Peter Drayton, Ben
Albahari, and Ted Neward (O’Reilly).
This program makes some assumptions about the incoming XML. For
example, it assumes that in order for the output to be produced
correctly, the elements must appear in a very specific order. It also
assumes that certain elements will always occur, and that others are
optional. The XmlTextReader
cannot always handle
exceptions to these assumptions, but the
XmlValidatingReader
can. To ensure that an
unusable pick list is not produced, you should always validate the
XML before doing any processing.
XmlValidatingReader
is a specialized implementation of
XmlReader
that performs validation on XML as it
reads the incoming stream. The validation may be done by explicitly
providing a Document Type Declaration (DTD), an XML Schema, or an
XML-Data Reduced (XDR) Schema—or the type of validation may be
automatically determined from the document itself.
XmlValidatingReader
may read data from a
Stream
, a string, or another
XmlReader
. This allows you, for example, to
validate XML from XmlNode
using
XmlTextReader
, which does not perform validation
itself. Validation errors are raised either through an event handler,
if one is registered, or by throwing an exception.
The following examples will show you how to validate the Angus Hardware purchase order using a DTD. Validating XML with an XML Schema instead of a DTD will give you even more control over the data format, but I’ll talk about that topic in Chapter 8.
Example 2-3 shows the DTD for the sample purchase order.
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT po (date,address+,items)> <!ATTLIST po id ID #REQUIRED> <!ELEMENT date EMPTY> <!ATTLIST date year CDATA #REQUIRED month (1|2|3|4|5|6|7|8|9|10|11|12) #REQUIRED day (1|2|3|4|5|6|7|8|9|10|11| 12|13|14|15|16|17|18|19| 20|21|22|23|24|25|26|27| 28|29|30|31) #REQUIRED> <!ELEMENT address (name,company?,street+,city,state,zip)> <!ATTLIST address type (billing|shipping) #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT company (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!ELEMENT items (item)+> <!ELEMENT item EMPTY> <!ATTLIST item quantity CDATA #REQUIRED productCode CDATA #REQUIRED description CDATA #REQUIRED unitCost CDATA #REQUIRED>
For more information on DTDs, see Erik Ray’s Learning XML, 2nd Edition (O’Reilly) or Elliotte Rusty Harold and W. Scott Mean’s XML in a Nutshell, 2nd Edition (O’Reilly).
To validate the XML with this DTD, you must make one small change to
the XML document, and one to the code that reads it. To the XML you
must add the following document type declaration after the XML
declaration (<?xml version="1.0"?>
) so that
the validator knows what DTD to validate
against.
<!DOCTYPE po SYSTEM "po.dtd">
Remember that even if you insert the
<!DOCTYPE>
declaration in your target XML
file, you must still explicitly use
XmlValidatingReader
to validate the XML.
XmlTextReader
does not validate XML, only
XmlValidatingReader
can do that.
In the code that processes the XML, you must also create a new
XmlValidatingReader
to wrap the original
XmlTextReader
:
XmlReader textReader = new XmlTextReader(url); XmlValidatingReader reader = new XmlValidatingReader(textReader);
By default, XmlValidatingReader
automatically
detects the document’s validation type, although you
can also set the validation type manually using
XmlValidatingReader
’s
ValidationType
property:
reader.ValidationType = ValidationType.DTD;
Unfortunately, if you take this
approach, you’ll find that errors are not handled
gracefully. For example, if you add an address of
type="mailing
" to the XML document and attempt to
validate it, the following exception is thrown:
Unhandled Exception: System.Xml.Schema.XmlSchemaException: The 'type' attribute has an invalid value according to its data type. An error occurred at file:///C:/Chapter 2/po1456.xml(16, 12). at System.Xml.XmlValidatingReader.InternalValidationCallback(Object sender, ValidationEventArgs e) at System.Xml.Schema.Validator.SendValidationEvent(XmlSchemaException e, XmlSeverityType severity) at System.Xml.Schema.Validator.ProcessElement( ) at System.Xml.Schema.Validator.Validate( ) at System.Xml.Schema.Validator.Validate(ValidationType valType) at System.Xml.XmlValidatingReader.ReadWithCollectTextToken( ) at System.Xml.XmlValidatingReader.Read( ) at PoToPickListValidated.Main(String[ ] args)
Obviously,
you’d like to handle exceptions more cleanly than
this. You have two options: you can wrap the entire parse tree in a
try...catch
block, or you can set the
XmlValidatingReader
object’s
ValidationEventHandler
delegate. Since I assume
that you already know how to write a try...catch
block, let’s explore a solution that uses a
ValidationEventHandler
.
ValidationEventHandler
is a type found in the
System.Xml.Schema
namespace, so
you’ll need to first add this line to the top of
your code:
using System.Xml.Schema;
Next, add the following line after you instantiate the
XmlValidatingReader
and set the
ValidationType
to
ValidationType.DTD
:
reader.ValidationEventHandler += new ValidationEventHandler(HandleValidationError);
This step registers the callback for validation errors.
Now, you’re ready to actually create a
ValidationEventHandler
. The signature of the
delegate as defined by the .NET Framework is:
public delegate void ValidationEventHandler( object sender, ValidationEventArgs e );
Your validation event handler must match that signature. For now, you can just write the error message to the console:
private static void HandleValidationError( object sender, ValidationEventArgs e) { Console.WriteLine(e.Message); }
Now, if you run the purchase order conversion program using the invalid XML file I talked about earlier, the following slightly more informative message will print to the console:
'mailing' is not in the enumeration list. An error occurred at file:///C:/Chapter 2/po1456.xml(16, 12).
By default, if a validation error is encountered, an exception is
thrown and processing halts. However, with
XmlValidatingReader
, if there were more validation
errors in the file, each one of them would be reported individually
as processing continued.
I’m sure you can think of useful ways to use a validation event. Some examples of useful output that I’ve thought of include:
If processing is being done interactively, present the user with the relevant lines of XML, so she can see the erroneous data.
If processing is being done by an automated process, alert a system administrator by email or pager.
The entire revised program is shown in Example 2-4.
using System; using System.IO; using System.Text; using System.Xml; using System.Xml.Schema; public class PoToPickListValidated { public static void Main(string[ ] args) { string url = args[0]; XmlReader textReader = new XmlTextReader(url); XmlValidatingReader reader = new XmlValidatingReader(textReader); reader.ValidationType = ValidationType.DTD; reader.ValidationEventHandler += new ValidationEventHandler(HandleValidationError); StringBuilder pickList = new StringBuilder( ); pickList.Append("Angus Hardware PickList "); pickList.Append("======================= "); while (reader.Read( )) { if (reader.NodeType == XmlNodeType.Element) { switch (reader.LocalName) { case "po": pickList.Append(POElementToString(reader)); break; case "date": pickList.Append(DateElementToString(reader)); break; case "address": reader.MoveToAttribute("type"); if (reader.Value == "shipping") { pickList.Append(AddressElementToString(reader)); } else { reader.Skip( ); } break; case "items": pickList.Append(ItemsElementToString(reader)); break; } } } Console.WriteLine(pickList); } private static string POElementToString(XmlReader reader) { string id = reader.GetAttribute("id"); StringBuilder poBlock = new StringBuilder( ); poBlock.Append("PO Number: ").Append(id).Append(" "); return poBlock.ToString( ); } private static string DateElementToString(XmlReader reader) { int year = XmlConvert.ToInt32(reader.GetAttribute("year")); int month = XmlConvert.ToInt32(reader.GetAttribute("month")); int day = XmlConvert.ToInt32(reader.GetAttribute("day")); DateTime date = new DateTime(year,month,day); StringBuilder dateBlock = new StringBuilder( ); dateBlock.Append("Date: ").Append(date.ToString("D")).Append(" "); return dateBlock.ToString( ); } private static string AddressElementToString(XmlReader reader) { StringBuilder addressBlock = new StringBuilder( ); addressBlock.Append("Shipping Address: "); while (reader.Read( ) && (reader.NodeType == XmlNodeType.Element || reader.NodeType == XmlNodeType.Whitespace)) { switch (reader.LocalName) { case "name": case "company": case "street": case "zip": addressBlock.Append(reader.ReadString( )); addressBlock.Append(" "); break; case "city": addressBlock.Append(reader.ReadString( )); addressBlock.Append(", "); break; case "state": addressBlock.Append(reader.ReadString( )); addressBlock.Append(" "); break; } } addressBlock.Append(" "); return addressBlock.ToString( ); } private static string ItemsElementToString(XmlReader reader) { StringBuilder itemsBlock = new StringBuilder( ); itemsBlock.Append("Quantity Product Code Description "); itemsBlock.Append("======== ============ =========== "); while (reader.Read( ) && (reader.NodeType == XmlNodeType.Element || reader.NodeType == XmlNodeType.Whitespace)) { switch (reader.LocalName) { case "item": object [ ] parms = new object [3]; parms [0] = XmlConvert.ToInt32(reader.GetAttribute("quantity")); parms [1] = reader.GetAttribute("productCode"); parms [2] = reader.GetAttribute("description"); itemsBlock.AppendFormat(" {0,6} {1,11} {2} ",parms); break; } } return itemsBlock.ToString( ); } private static void HandleValidationError(object sender,ValidationEventArgs e) { Console.WriteLine(e.Message); } }
The
XmlNodeReader
type is used to read an existing
XmlNode
from memory. For example, suppose you have
an entire XML document in memory, in an
XmlDocument
, and you wish to deal with one of its
nodes in a specialized manner. The XmlNodeReader
constructor can take an XmlNode
object as its
argument from anywhere in an XML document or document fragment, and
perform its operations relative to that node.
For example, you might wish to construct an Angus Hardware XML
purchase order in memory rather than reading it from disk. One reason
you might choose to construct a PO in memory is if order entry is
being done by an outside party in a non-XML format, and some other
section of your program is taking care of converting the data into
XML. The actual construction of an XmlDocument
is
covered in Chapter 5, but for now
let’s assume that you’ve been given
a complete XmlDocument
that constitutes a valid
PO.
To print the pick list, you need only make one small change to Example 2-4: replace the XmlTextReader
constructor with XmlNodeReader
, passing in an
XmlNode
as its argument.
XmlReader reader = new XmlNodeReader(node);
The rest of the program continues as before, validating the
XmlNode
passed in and printing the pick list to
the console. The only difference is in the type of inputs the program
takes—in this case, the input comes directly from the
XmlNode
.
To
recap the different XmlReader
subclasses:
XmlTextReader
is used to read an XML document from
some sort of file, whether it’s on a local disk or
on a web server; XmlNodeReader
is used to read an
XML fragment from an XmlDocument
that’s already been loaded some other way;
XmlValidatingReader
is used to validate an XML
document that’s being read using an
XmlTextReader
. The subclasses of
XmlReader
are mostly interchangeable, with a few
exceptions discussed later.
18.221.245.196