Pyxie

The Pyxie package, developed by Sean McGrath, is available from http://pyxie.sourceforge.net/ and is based around a line-oriented notation known as PYX. PYX and Pyxie are an alternative to the SAX and DOM, and is, according to its author, geared for pipeline processing, in which one application’s output is fed as input to the next application. This idiom is common among Unix tools, but is also used on Windows, though it is not common there for end-user tools.

Pyxie can parse an XML document into a line-oriented format known as PYX, which give signals as to the content of the document. It’s similar to SAX in that it is event-driven; however, instead of implementing callback interfaces, the events are dumped to standard output as PYX notation. The PYX output can then be processed by other text manipulation tools such as grep, sed, and awk, or fed into other text-aware scripts you might write with Python and Perl.

PYX output appears as individual lines representing different types of markup. Consider the following XML:

<Book>
  <Name>Python and XML</Name>
  <Publisher>O'Reilly &amp; Associates</Publisher>
</Book>

The above XML would be converted to the following PYX using Pyxie or other PYX aware processors:

(Book
-

(Name
-Python and XML
)Name
-

(Publisher
-O'Reilly & Associates
)Publisher
-

)Book

One thing to note about the PYX output is that each document construct that is being dealt with is given its own line. This makes it very accommodating to Unix-style command-line processing tools. Additionally, the PYX markup starts each line with a symbol giving an indication of the node type encountered:

(

The left parenthesis is used to denote start elements.

)

The right parenthesis is used to denote the ends of elements

A

A capital A is used to mark attributes.

-

A dash (or minus) is used to mark character data.

?

A question mark is used to denote a processing instruction.

These symbols don’t cover every type of construct in XML. For example, there is no support for CDATA sections, DTDs, or comments.

Having experience with Unix system administration, we can honestly state that the line-oriented markup of the PYX syntax would be of incredible value for those familiar with sed, awk, and grep, and need to parse an XML document, but don’t want to take the time to code with a parser against the document.

Another powerful feature of PYX is the ability to quickly examine the contents of a document—leading to searchable grep-like features. The line-oriented contents can easily be searched for with a utility such as grep allowing for some complex operations on the document. For example, using grep and PYX, you could invoke grep’s options on the output of PYX data. For instance:

$> <PYX-generating-command> | grep -v "Celsius"

If your PYX output is full of temperature reports with text such as “38 degrees Celsius” the previous grep command ensures that Celsius temperatures are not included in the output. Such filtering is far more complex with XPath and the DOM. Likewise, we don’t think PYX will help very much if your task is to convert SQL record sets to XML while at the same time adding DTDs and Namespaces. In a complex case like that, working with the DOM is necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.45.5