Processing approach decisions

Many parsers supply at least two basic methods for accessing XML documents, and the application developer must then choose from the available choices. Some of the key decision factors are discussed below.

Pull method

Programming languages that do not support a call-back technique to get software libraries to pass events back to the main application cannot make use of the SAX standard. If the DOM approach is ruled out, for reasons given below, the only remaining practical option is to adopt a parser that provides a pull-based alternative. In this scenario, the application simply calls methods/functions in the XML processor to retrieve the next piece of the document.

Unfortunately, there are no standards for this approach as yet. It is necessary to learn how the individual parser supports this method.

Event or DOM

If a parser that supports both event-driven and tree-walking approaches is chosen, then the SAX and DOM standards will almost certainly be supported. The application developer may ponder which approach to choose in a given circumstance, and a number of factors can influence this decision.

Event benefits

With the event-driven approach, the parser does not have to hold much information about the document in memory. Each piece is extracted from the document and passed immediately to the application, after which it can be discarded. There is no danger of the parser running out of memory while parsing large documents. In addition, the document structure does not have to be managed in memory, either by the parser or, depending on what it needs to do, by the application. This can make parsing very fast.

However, it should be noted that some parsers provide access to the document via a SAX API, but only after parsing the whole document and building the tree model in memory. While this can still be useful, the memory usage and speed advantages are lost:



The fact that the application receives pieces of the document in the order in which they were encountered means that it does not have to do anything special in order to process the document in a simple linear fashion, from start to end.

However, it should also be noted that the linear processing issue in the tree-walker approach can be largely overcome if the parser has a convenient sequential tree-walking class (as most now have).

Tree-walking benefits

Some data preparation tasks require access to information that is further along the document. For example, to build a table of contents section at the beginning of a book, it is necessary to extract all the chapter titles that occur later. With the entire document held in memory, the document structure can be analysed several times over, quickly and easily.

When an application needs to reorder document components, or needs to build a new document but in a non-linear fashion, a data structure management module may be profitably utilized by the application to manage the document components on its behalf.

With this approach, the entire document can be validated as well-formed, and possibly also conformant to a particular DTD, before passing any of the document to the application. A document that contains errors can be rejected before the application begins to process its contents, thereby eliminating the need for messy roll-back routines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.187.121