Reading an XML File—SAX Parsers

DOM level 1 was recommended as a standard by the World Wide Web consortium, W3C, in October 1998. In the years since then, a weakness in the DOM approach has become evident. It works fine for small and medium-sized amounts of data, up to, say, hundreds of megabytes. But DOM parsing doesn't work well for very large amounts of data, in the range of gigabytes, which cannot necessarily fit in memory at once. In addition, it can waste a lot of time to process an entire document when you know that all you need is one small element a little way into the file.

To resolve these problems, a second algorithm for XML parsing was invented. It became known as the “Simple API for XML” or “SAX,” and its distinguishing characteristic is that it passes back XML elements to the calling program as it finds them. In other words, a SAX parser starts reading an XML stream, and whenever it notices a tag that starts an element, it tells the calling program. It does the same thing for closing tags too. The way a SAX parser communicates with the invoking program is via callbacks, just like event handlers for GUI programs.

The application program registers itself with the SAX parser, saying in effect “when you see one of these tags start, call this routine of mine.” It is up to the application program what it does with the information. It may need to build a data structure, or add up values, or process all elements with one particular value, or whatever. For example, to search for all CDs by The Jam, you would look for all the artist elements where the PCDATA is “The Jam.”

SAX parsing is very efficient with machine resources, but it also has a couple of drawbacks. The programmer has to write more code to interface to a SAX parser than to a DOM parser. Also, the programmer has to manually keep track of where he is in the parse in case the application needs this information (and that's a pretty big disadvantage). Finally, you can't “back up” to an earlier part of the document, or rearrange it, anymore than you can back up a serial data stream. You get the data as it flies by, and that's it.

The error handling for JAXP SAX and DOM applications are identical in that they share the same exceptions. The specifications require that validation errors are ignored by default. If you want to throw an exception in the event of a validation error, then you need to write a brief class that implements the org.xml.sax.ErrorHandler interface, and register it with your parser by calling the setErrorHandler() method of either javax.xml.parsers.DocumentBuilder or org.xml.sax.XMLReader. Error handling is the reason why DOM programs import classes from the org.xml.sax and org.xml.sax.helpers packages.

JAXP includes both SAX and DOM parsers. So which should you use in a given program? You will want to choose the parser with an eye on the following characteristics:

  • SAX parsers are generally faster and use fewer resources, so they are a good choice for servlets and other transaction oriented requirements.

  • SAX parsers require more programming effort to set them up and interact with them.

  • SAX parsers are well suited to XML that contains structured data (e.g., serialized objects).

  • DOM parsers are simpler to use.

  • DOM parsers require more memory and processor work.

  • DOM parsers are well suited to XML that contains actual documents (e.g., Microsoft Word or Excel documents in XML form, assuming Microsoft publishes the DTDs).

If it's still not clear, use a DOM parser, as it needs less coding on your part.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.255