To parse an XML document, you instantiate a javax.xml.parsers.SAXParseFactory object to obtain a SAX-based parser. This parser is then used to read the XML document a character at a time. (In the following code fragment the document is obtained from a command-line argument.)
SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); DefaultHandler handler = new XMLParse(); saxParser.parse( new File(argv[0]), handler );
Your SAX parser class must extend the public class org.xml.sax.helpers.DefaultHandler. This class defines stub methods that receive notification (callbacks) when XML entities are parsed. By default, these methods do nothing, but they can be overridden to do anything you like. For example, a method called startElement() is invoked when the start tag for an element is recognized. This method receives the element's name and its attributes. The element's name can be passed in any one of the first three parameters to startElement(), see Table 16.6, depending on whether namespaces are being used.
In the following code example, handling for the qualified name is provided.
public void startElement(String uri, String localName, String qualifiedName, Attributes attributes) throws SAXException { System.out.println ("START ELEMENT " + qualifiedName); for (int i = 0; i< attributes.getLength(); i++) { System.out.println ("ATTRIBUTE " + attributes.getQName(i) + " = " + attributes.getValue(i)); } }
This example prints out a statement indicating that a start tag has been parsed followed by a list of the attribute names and values.
A similar endElement() method is invoked when an end tag is encountered.
public void endElement(String uri, String localName, String qualifiedName) throws SAXException { System.out.println ("END ELEMENT " + qualifiedName); }
The full parser is shown in Listing 16.9, but not all of the XML components will be handled. The default action for a parser is for all components to be ignored; only the methods that are overridden in the DefaultHandler subclass will be process XML components. For a complete list of the other DefaultHandler methods, see Table 16.7 or refer to the J2SDK, v 1.4 API Specification.
The parser first checks for the XML document, the name of which is provided on the command line. After instantiating the SAXParserFactory and constructing the handler, the XML file is parsed—that is all there is to it. This parser reports the occurrence of the start and end of the document—the start and end of elements and the characters that form the element bodies only.
If an entity method is not declared in your parser, the entity is handled by the superclass DefaultHandler methods, the default action being to do nothing. Table 16.7 gives a full list of the callback DefaultHandler methods that can be implemented.
As this code does not use any J2EE components, you can simply compile and run it from the command line. From the Day16/examples directory run the command:
> java –classpath classes XMLParse XML/jobSummary.xml
Or use the supplied asant build files and enter:
> asant XMLParse
Provide the filename XML/jobSummary.xml when prompted:
The output in Figure 16.1 is produced when this SAX parser is used on the jobSummary XML in Listing 16.4.
As you can see, the output is not very beautiful. You might like to improve it by adding indentation to the elements or even getting the output to look like the original XML.
In addition to making this parser more robust, the following functionality could be added:
Scan element contents for the special characters, such shown in a table, and replacing them with the symbolic strings as appropriate
Improve the handling of fatal parse errors (SAXParseException) with appropriate error messages giving error line numbers
Use the DefaultHandler error() and warning() methods to handle non-fatal parse errors
Configure the parser to be namespace aware with javax.xml.parsers.SAXParserFactory.setNamespaceAware(true), so that you can detect tags from multiple sources
Having seen a simple SAX parser, you will now build a parser application that uses the DOM API.
18.188.8.91