Parsing XML

So far, you have used Internet Explorer or other third-party tools to parse your XML documents. Now you will look at three APIs that provide a way to access and manipulate the information stored in an XML document so you can build your own XML applications. The Simple API for XML (SAX) defines parsing methods and Document Object Model (DOM) defines a mechanism for accessing and manipulating well-formed XML. The third is the Java API for XML Processing (JAXP) that you will use to build a simple SAX and DOM parser. The two parsers you will develop effectively echo the input XML structure. Usually, you will want to parse XML to perform some useful function, but simply echoing the XML is a good way to learn the APIs.

JAXP has the benefit that it provides a common interface for creating and using SAX and DOM in Java.

SAX and DOM define different approaches to parsing and handling an XML document. SAX is an event-based API, whereas DOM is tree-based.

With event-based parsers, the parsing events (such as the start and end tags) are reported directly to the application through callback methods. The application implements these callback methods to handle the different components in the document, much like handling events in a graphical user interface (GUI).

Using the DOM API, you will transform the XML document into a tree structure in memory. The application then navigates the tree to parse the document.

Each method has its advantages and disadvantages. Using DOM

  • Simplifies the mapping of the structure of the XML.

  • Is a good choice when the document is not too large (less than 20Mb). If the document is large, it can place a strain on system resources.

  • Most or all of the document needs to be parsed.

  • The document is to be altered or written out in a structure that is very different from the original.

Using SAX is a good choice

  • If you are searching through an XML document for a small number of tags

  • The document is large

  • When processing speed is important

  • If the document does not need to be written out in a structure that is different from the original

SAX is a public domain API developed cooperatively by the members of the XML-DEV (XML DEVelopment) Internet discussion group.

The DOM is a set of interfaces defined by the W3C DOM Working Group. The latest DOM recommendation can be obtained from the WC3 Web site.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.254.44