Parsing XML using SAX

The code examples in this section are written using JAXP 1.1, which supports SAX2.0.

To parse an XML document, you instantiate a javax.xml.parsers.SAXParseFactory object to obtain a SAX-based parser. This parser is then used to read the XML document a character at a time.

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

DefaultHandler handler = new XMLParse();
saxParser.parse( new File(argv[0]), handler );

Your SAX parser class must extend the public class org.xml.sax.helpers.DefaultHandler. This class defines stub methods that receive notification (callbacks) when XML entities are parsed. By default, these methods do nothing, but they can be overridden to do anything you like. For example, a method called startElement() is invoked when the start tag for an element is recognized. This method receives the element's name and its attributes. The elements name can be passed in any one of the first three parameters to startElement(), see Table 16.6, depending on whether namespaces are being used.

Table 16.6. Parameters to the startElement() Method
Parameter Contents
uri The namespace URI or the empty string if the element has no namespace URI or if namespace processing is not being performed
localName The element name (without namespace prefix) will be a non-empty string when namespaces processing is being performed
qualifiedName The element name with namespace prefix
attributes The element's attributes

In the following code example, handling for the qualified name is provided.

public void startElement(String uri, String localName, String qualifiedName, Attributes
 attributes)
                throws SAXException {
    System.out.println ("START ELEMENT " + qualifiedName);
    for (int i = 0; i< attributes.getLength(); i++) {
        System.out.println ("ATTRIBUTE " + attributes.getQName(i) + " = " + attributes
.getValue(i));
    }
}

This example prints out a statement indicating that a start tag has been parsed followed by a list of the attribute names and values.

A similar endElement() method is invoked when an end tag is encountered.

public void endElement(String uri, String localName, String qualifiedName) throws
 SAXException {
    System.out.println ("END ELEMENT " + qualifiedName);
}

In the parser in Listing 16.9, not all the XML components will be handled. The default action is for components to be ignored. For a complete list of the other DefaultHandler methods, see Table 16.7 or refer to the Java 2 Platform, Enterprise Edition, v 1.3 API Specification.

The parser first checks for the XML document, the name of which is provided on the command line (lines 9–12). After instantiating the SAXParserFactory (line 14) and constructing the handler (line 13), the XML file is parsed on line 17—that is all there is to it. Lines 32–57 are where the handler routines are defined. This parser reports the occurrence of the start and end of the document—the start and end of elements and the characters that form the element bodies only.

The complete listing for the SAX Parser is shown in Listing 16.9.

Listing 16.9. Simple SAX Parser
 1: import java.io.*;
 2: import org.xml.sax.*;
 3: import org.xml.sax.helpers.DefaultHandler;
 4: import javax.xml.parsers.*;
 5:
 6: public class XMLParse extends DefaultHandler {
 7:
 8:     public static void main(String argv[]) {
 9:         if (argv.length != 1) {
10:             System.err.println("Usage: XMLParse filename");
11:             System.exit(1);
12:         }
13:         DefaultHandler handler = new XMLParse();
14:         SAXParserFactory factory = SAXParserFactory.newInstance();
15:         try {
16:             SAXParser saxParser = factory.newSAXParser();
17:             saxParser.parse( new File(argv[0]), handler );
18:         }
19:         catch (ParserConfigurationException ex) {
20:             System.err.println ("Failed to create SAX parser:" + ex);
21:         }
22:         catch (SAXException ex) {
23:             System.err.println ("SAX parser exceeption:" + ex);
24:         }
25:         catch (IOException ex) {
26:             System.err.println ("IO exeception:" + ex);
27:         }
28:         catch (IllegalArgumentException ex) {
29:             System.err.println ("Invalid file argument" + ex);
30:         }
31:     }
32:     public void startDocument() throws SAXException {
33:         System.out.println ("START DOCUMENT");
34:     }
35:
36:     public void endDocument() throws SAXException {
37:         System.out.println ("END DOCUMENT");
38:     }
39:
40:     public void startElement(String uri, String localName, String qualifiedName,
 Attributes attributes)
41:                   throws SAXException {
42:         System.out.println ("START ELEMENT " + qualifiedName);
43:         for (int i = 0; i< attributes.getLength(); i++) {
44:             System.out.println ("ATTRIBUTE " + attributes.getQName(i) + " = " +
 attributes.getValue(i));
45:         }
46:     }
47:
48:     public void endElement(String uri, String localName, String qualifiedName) throws
 SAXException {
49:        System.out.println ("END ELEMENT " + qualifiedName);
50:     }
51:
52:     public void characters(char[] ch, int start, int length) throws SAXException {
53:         if (length > 0) {
54:             String buf = new String (ch, start, length);
55:             System.out.println ("CONTENT " + buf);
56:         }
57:     }
58: }
					

As already stated, lines 32–57 are the handler callback methods that are called when the corresponding XML entity is parsed. If an entity method is not declared in your parser, the entity is handled by the superclass DefaultHandler methods, the default action being to do nothing. Table 16.7 gives a full list of the callback DefaultHandler methods that can be implemented.

Table 16.7. SAX DefaultHandler Methods
Method Receives Notification of
characters(char[] ch, int start, int length) Character data inside an element.
startDocument() Beginning of the document.
endDocument() End of the document.
startElement(String uri, String localName, String qName, Attributes attributes) Start of an element.
endElement(String uri, String localName, qName) End of an element.
startPrefixMapping (String prefix, String uri) Start of a namespace mapping.
endPrefixMapping (String prefix) End of a namespace mapping.
error(SAXParseException e) E recoverable parser error.
FatalError (SAXParseException e) A fatal XML parsing error.
Warning (SAXParseException e) Parser warning.
IgnorableWhitespace start, Whitespace in the element (char[] ch, int contents. int length).
notationDecl(String name, String publicId, String systemId Notation declaration.
processingInstruction (String target, String data) A processing instruction.
resolveEntity(String publicId, String systemId) An external entity.
skippedEntity(String name) A skipped entity (processors may skip entities if they have not seen the declarations (for example, the entity was declared in an external DTD).

This parser can be invoked simply from the command line:

> java XMLParse jobSummary.xml

The output in Figure 16.1 is produced when this SAX parser is used on the jobSummary XML in Listing 16.4.

Figure 16.1. SAX parser output.


As you can see, the output is not very beautiful. You might like to improve it by adding indentation to the elements or even getting the output to look like the original XML.

In addition to making this parser more robust, the following functionality could be added:

  • Scan element contents for the special characters, such shown in a table, and replacing them with the symbolic strings as appropriate

  • Improve the handling of fatal parse errors (SAXParseException) with appropriate error messages giving error line numbers

  • Use the DefaultHandler error() and warning() methods to handle non-fatal parse errors

  • Configure the parser to be namespace aware with javax.xml.parsers.SAXParserFactory.setNamespaceAware(true), so that you can detect tags from multiple sources

You will now build a parser application that uses the DOM API.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.47.208