You want to make one quick pass over an XML file, extracting certain tags or other information as you go.
The XML
DocumentHandler
interface specifies a number of “callbacks” that your
code must provide. In one sense this is similar to the
Listener
interfaces in AWT and Swing, as covered
briefly in Section 13.5. The most commonly used
methods are
startElement()
,
endElement()
, and text( )
. The
first two, obviously, are called at the start and end of an element,
and text( )
is called when there is character
data. The characters are stored in a large array, and you are passed
the base of the array and the offset and length of the characters
that make up your text. Conveniently, there is a string constructor
that takes exactly these arguments. Hmmm, I wonder if they thought of
that . . .
To demonstrate this, I wrote a simple program using SAX to extract names and email addresses from an XML file. The program itself is reasonably simple, and is shown in Example 21-4.
Example 21-4. SaxLister.java
import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; /** Simple lister - extract name and email tags from a user file. * Updated for SAX 2.0 */ public class SaxLister { class PeopleHandler extends DefaultHandler { boolean name = false; boolean mail = false; public void startElement(String nsURI, String strippedName, String tagName, Attributes attributes) throws SAXException { if (tagName.equalsIgnoreCase("name")) name = true; if (tagName.equalsIgnoreCase("email")) mail = true; } public void characters(char[] ch, int start, int length) { if (name) { System.out.println("Name: " + new String(ch, start, length)); name = false; } else if (mail) { System.out.println("Email: " + new String(ch, start, length)); mail = false; } } } public void list( ) throws Exception { XMLReader parser = XMLReaderFactory.createXMLReader( "org.apache.xerces.parsers.SAXParser"); // should load properties parser.setContentHandler(new PeopleHandler( )); parser.parse("people.xml"); } public static void main(String[] args) throws Exception { new SaxLister().list( ); } }
When run, it prints the listing:
$ java SaxLister users.xml Name: Ian Darwin Email: [email protected] $
One problem with SAX is that it is, well, simple, and therefore doesn’t scale well, as you can see by thinking about this program. Imagine trying to handle 12 different tags and doing something different with each one. For more involved analysis of an XML file, the Document Object Model (DOM) may be better suited. (On the other hand, DOM requires keeping the entire tree in memory, so there are some scalability issues with extremely large XML documents.) And with SAX, you can’t really “navigate” a document, since you have only a stream of events, not a real structure. For that, you want DOM or JDOM.
18.218.78.102