Reading a PYX Document

Before you can write an XmlPyxReader, you first need to understand PYX syntax. PYX is a line-oriented XML syntax, developed by Sean McGrath, which reflects XML’s SGML heritage. PYX is based on Element Structure Information Set (ESIS), a popular alternative syntax for SGML.

Tip

Unlike many of the terms in this book, PYX is not an acronym for anything. A pyx is is a container used in certain religious rites, and the PYX notation was developed mostly using the Python programming language.

In a line-oriented format, each XML node occurs on a new line. The XML nodes that PYX can represent include start element, end element, attribute, character data, and processing instruction. The first character of each line indicates what sort of node the line represents. Table 4-1 shows the prefix characters and what node type each represents.

Table 4-1. PYX prefix characters and their corresponding XmlNodeType values

PYX prefix character

XmlNodeType value

(

Element

)

EndElement

A

Attribute

-

Text

?

ProcessingInstruction

As you can see by the limited number of node types it contains, PYX represents only the logical structure of an XML document, not the physical structure. There are no DocumentType, EntityReference, Comment, or CDATA XmlNodeTypes in a PYX document. This lack of certain nodes is consistent with PYX’s ESIS ancestry; in SGML, the separation between document structure and document content is enforced more rigidly than in XML.

None of this should stop you from using PYX to represent basic XML documents. In fact, PYX’s structure makes it very easy to parse using the XmlReader model.

To test your XmlPyxReader, you’ll need a file in PYX format. Example 4-1 shows the same purchase order we dealt with in Chapter 2, reformatted in PYX. A few lines are highlighted; I’ll discuss these after the example.

Example 4-1. A purchase order expressed in PYX
(po 
Aid PO1456
(date
Ayear 2002
Amonth 6
Aday 14
)date
(address 
Atype shipping
(name 
-Frits Mendels
)address
(street
-152 Cherry St
)street
(city
-San Francisco
)city
(state
-CA
)state
(zip
-94045
)zip
)address
(address 
Atype billing
(name
-Frits Mendels
)name
(street
-PO Box 6789
)street
(city
-San Francisco
)city
(state
-CA
)state
(zip
-94123-6798
)zip
)address
(items 
(item 
Aquantity 1
AproductCode R-273
Adescription 14.4 Volt Cordless Drill
AunitCost 189.95
)item
(item 
Aquantity 1
AproductCode 1632S
Adescription 12 Piece Drill Bit Set
AunitCost 14.95
)item
)items
)po

Notice that all the data matches the data from Example 2-1, although the format is clearly very different.

Each line that begins with ( is a start element, as in the first highlighted line:

(po

This is equivalent to the <po> element start tag. The next highlighted line is an attribute:

Ayear 2002

This is equivalent to year="2002" in standard XML syntax. After the A, the next whitespace-delimited word is the name of the attribute, and the rest of the line contains the attribute value. Multiple attributes on the same element are just listed in order, on separate lines.

Although PYX doesn’t really support XML namespaces, there’s no reason you can’t recognize them yourself. The following PYX fragment shows a way to represent namespaces in PYX:

(myElement
Axmlns http://www.mynamespaceuri.com/
Axmlns:foo http://www.anothernamespaceuri.com/
)myElement

That PYX fragment is equivalent to the following XML fragment:

<myElement xmlns="http://www.mynamespaceuri.com/" xmlns:foo="
http://www.anothernamespaceuri.com/" />

The next highlighted line in Example 4-1 is an EndElement node:

)date

The name of the element is given after the ) prefix character. This is equivalent to the </date> end tag. Note that there is no PYX shorthand for an empty element, like <item />.

The last highlighted line is text:

-Frits Mendels

After the -, the rest of the line contains the element’s text value. Because only the prefix character on any line is significant, the rest of the line can contain any characters, including the PYX prefix characters (, A, -, ), and ?, and XML reserved characters <, >, and &. CDATA sections are thus irrelevant in PYX.

Tip

PYX is a fairly simple format, and XmlPyxReader will be correspondingly simple. Writing a more complex XmlReader is certainly possible, but it would take several chapters’ worth of examples to show all the details. If, after reading this chapter, you’re interested in a considerably more complex model for writing XmlReader subclasses, I urge you to read Ralf Westphal’s article, “Implementing XmlReader Classes for Non-XML Data Structures and Formats.” You can view the article online at http://msdn.microsoft.com/library/en-us/dndotnet/html/Custxmlread.asp.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.170.63