SAX is a simple, event-based application programming interface (API) for XML parsers that was developed by a group of developers who subscribe to the xml-dev mailing list. David Megginson (http://www.megginson.com) spearheaded the effort, gained consensus on design, and wrote the code. The API makes extensive use of Java interfaces as registered callbacks to an XML parser supplied by a third party. The SAX interface is event-based in that it transforms the parsing of an XML document into the invocation (or firing) of a specific method (the type of event) with its associated parameters (the specific state of the event). So an event has two components: a name (the method name) and an associated state (the method parameters)
Note
To subscribe to the xml-dev mailing list, send an email message to [email protected] with the following message: "subscribe xml-dev."
SAX is broken into four parts:
Interfaces implemented by an XML Parser (org.xml.sax package)— Parser, AttributeList, and Locator. Parser and AttributeList are required, while Locator is optional. SAX provides a default implementation for the Locator and AttributeList interfaces called LocatorImpl and AttributeListImpl, respectively.
Interfaces implemented by your application (org.xml.sax package)— DocumentHandler, ErrorHandler, DTDHandler, and EntityResolver. All these interfaces are optional.
Standard SAX classes (org.xml.sax package)— InputSource, SAXException, SAXParseException. These are fully implemented in SAX.
Helper classes (org.xml.sax.helpers package)— ParserFactory, AttributeListImpl, and LocatorImpl. These are fully implemented in SAX.
Before examining each interface, a simple example demonstrating the event-based interface of SAX will be useful. Listing 2.3 is a simple SAX tester that prints out what methods were called and the parameters passed in to those methods by the parser's SAX driver. The code in bold highlights the key actions your program needs to perform to receive SAX events from a SAX Driver. These steps are covered in detail following the listing.
1: /** SaxTester.java */ 2: package sams.chp2; 3: 4: import java.io.*; 5: 6: import java.net.*; 7: 8: import org.xml.sax.*; 9: import org.xml.sax.helpers.*; 10: 11: public class SaxTester 12: { 13: 14: public static void main(String args[]) 15: { 16: if (args.length < 1) 17: { 18: System.out.println("USAGE:java -Dorg.xml.sax.parser= 19: <classname> " +"sam.chp2.SaxTester <document>"); 20: System.exit(1); 21: } 22: 23: try 24: { 25: File f = new File(args[0]); 26: 27: // create a SAX input source 28: InputSource is = new InputSource(f.toURL().toString()); 29: 30: // instantiate a SAX driver 31: Parser sax = ParserFactory.makeParser(); 32: 33: // create our test handler 34: TestHandler handler = new TestHandler(); 35: 36: // register a document handler 37: sax.setDocumentHandler(handler); 38: 39: // register the DTD handler 40: sax.setDTDHandler(handler); 41: 42: // register the entity resolver 43: sax.setEntityResolver(handler); 44: 45: // register the error handler 46: sax.setErrorHandler(handler); 47: 48: // start the parsing! 49: sax.parse(is); 50: } catch (Throwable t) 51: { 52: t.printStackTrace(); 53: } 54: } 55: } 56: 57: class TestHandler extends HandlerBase 58: { 59: /** Locator reference. */ 60: Locator loc; 61: 62: /** method of the DocumentHandler Interface. */ 63: public void characters(char[] ch, int start, int length) 64: { 65: // Receive notification of character data inside an element. 66: System.out.println("Called characters(ch:"+ new String 67: (ch,start,length) +",start:" + start + ",length: " + length + ")"); 68: } 69: 70: /** method of the DocumentHandler Interface. */ 71: public void endDocument() 72: { 73: // Receive notification of the end of the document. 74: System.out.println("Called endDocument()"); 75: } 76: 77: /** method of the DocumentHandler Interface. */ 78: public void endElement(java.lang.String name) 79: { 80: // Receive notification of the end of an element. 81: System.out.println("Called endElement(name: " + name + ")"); 82: } 83: 84: /** method of the DocumentHandler Interface. */ 85: public void ignorableWhitespace(char[] ch, int start, int length) 86: { 87: // Notification of ignorable whitespace in element content. 88: System.out.println("Called ignorableWhitespace(ch:" + new String 89: (ch,start,length) +",start: " + start+ ",length: " + length + ")"); 90: } 91: 92: /** method of the DocumentHandler Interface. */ 93: public void processingInstruction(java.lang.String target, 94: java.lang.String data) 95: { 96: // Receive notification of a processing instruction. 97: System.out.println("Called processingInstruction(target:" + target + 98: ",data:" + data + ")"); 99: } 100: 101: /** method of the DocumentHandler Interface. */ 102: public void setDocumentLocator(Locator locator) 103: { 104: // Receive a Locator object for document events. 105: System.out.println("Called setDocumentLocator()"); 106: loc = locator; 107: } 108: 109: /** method of the DocumentHandler Interface. */ 110: public void startDocument() 111: { 112: // Receive notification of the beginning of the document. 113: System.out.println("Called startDocument()"); 114: } 115: 116: /** method of the DocumentHandler Interface. */ 117: public void startElement(java.lang.String name, AttributeList attributes) 118: { 119: // Receive notification of the start of an element. 120: System.out.println("Called startElement(name:" + name + ")"); 121: for (int i = 0; i < attributes.getLength(); i++) 122: { 123: String attName = attributes.getName(i); 124: String type = attributes.getType(i); 125: String value = attributes.getValue(i); 126: System.out.println("att-name:" + attName + ",att-type:" + type + ", att-value:" + value); 127: } 128: if (loc != null) 129: System.out.println("In " + loc.getSystemId() + ",at line " + 130: loc.getLineNumber() + " and col " + 131: loc.getColumnNumber()); 132: } 133: 134: /** method of the DTDHandler Interface. */ 135: public void unparsedEntityDecl(java.lang.String name, 136: java.lang.String publicId, java.lang.StringsystemId, 137: java.lang.String notationName) 138: { 139: // Receive notification of an unparsed entity declaration. 140: System.out.println("Called unparsedEntityDecl"); 141: System.out.println(name); 142: System.out.println(publicId); 143: System.out.println(systemId); 144: System.out.println(notationName); 145: } 146: 147: /** method of the DTDHandler Interface. */ 148: public void notationDecl(java.lang.String name, 149: java.lang.String publicId, java.lang.String systemId) 150: { 151: // Receive notification of a notation declaration. 152: System.out.println("Called notationDecl(name: " + name + 153: ",publicId: " + publicId + 154: ",systemId: " + systemId + ")"); 155: } 156: 157: /** method of the EntityResolver Interface. */ 158: public InputSource resolveEntity(java.lang.String publicId, 159: java.lang.String systemId) 160: { 161: // Resolve an external entity. 162: System.out.println("Called resolveEntity(publicId:" + publicId + 163: ",systemId:" + systemId + ")"); 164: InputSource is = null; 165: if (systemId != null) 166: { 167: // create a SAX input source 168: File f = new File(systemId); 169: try 170: { 171: is = new InputSource(f.toURL().toString()); 172: } catch (MalformedURLException mfue) 173: { } 174: } 175: else 176: is = new InputSource(new StringReader("Unknown Entity")); 177: 178: return is;179: } 180: 181: /** method of the ErrorHandler Interface. */ 182: public void error(SAXParseException e) 183: { 184: // Receive notification of a recoverable parser error. 185: System.out.println("Called error(e:" + e + ")"); 186: if (loc != null) 187: System.out.println("In " + loc.getSystemId() + ",at line " + 188: loc.getLineNumber() + " and col " + 189: loc.getColumnNumber()); 190: e.printStackTrace(); 191: } 192: 193: /** method of the ErrorHandler Interface. */ 194: public void fatalError(SAXParseException e) 195: { 196: // Report a fatal XML parsing error. 197: System.out.println("Called fatalError(e:" + e + ")"); 198: if (loc != null) 200: System.out.println("In " + loc.getSystemId() + ",at line " + 201: loc.getLineNumber() + " and col " + 202: loc.getColumnNumber()); 203: e.printStackTrace(); 204: } 205: 206: /** method of the ErrorHandler Interface. */ 207: public void warning(SAXParseException e) 208: { 209: // Receive notification of a parser warning. 210: System.out.println("Called warning()"); 211: if (loc != null) 212: System.out.println("In " + loc.getSystemId() + ",at line " + 213: loc.getLineNumber() + " and col " + 214: loc.getColumnNumber()); 215: e.printStackTrace(); 216: } 217: } |
When running SaxTester, you pass in the class name of the SAX-compliant parser by adding the Java system property org.sax.xml.parser. The following is an example of the command line:
C:>java -Dorg.sax.xml.parser=com.ibm.xml.parser.SAXDriver sams.chp2.SaxTester myaddresses.xml
Running the program produces the following output (abridged for brevity):
1: Called setDocumentLocator() 2: Called startDocument() 3: Called resolveEntity(publicId:null, systemId:file:/C:/synergysolutions/Xml-in-Java /sams/chp2/abml.dtd) 4: Called startElement(name:ADDRESS_BOOK) 5: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/myaddresses.xml,at line 3 and col 15 6: Called ignorableWhitespace(ch:,start: 80,length: 0) 7: Called ignorableWhitespace(ch: 8: ,start: 0,length: 1) 9: Called ignorableWhitespace(ch: ,start: 82,length: 1) 10: Called startElement(name:ADDRESS) 11: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/myaddresses.xml,at line 4 and col 11 12: Called ignorableWhitespace(ch:,start: 92,length: 0) 13: Called ignorableWhitespace(ch: 14: ,start: 0,length: 1) 15: Called ignorableWhitespace(ch: ,start: 94,length: 2) 16: Called startElement(name:NAME) 17: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/myaddresses.xml,at line 5 and col 9 ??? Author: Above code line too long. -Gus/DE18: Called characters(ch:Michael Daconta ,start:102,length: 16) 19: Called endElement(name: NAME) 20: ... 21: Called endElement(name: ADDRESS_BOOK) 22: Called endDocument()
This output should give you a good understanding of how the SAX API works. The SAX- compliant parser calls methods such as startDocument(), startElement(), endElement(), and endDocument() on your handler class. In the simple program in Listing 2.3, we created a handler class called TestHandler that extended org.xml.sax.HandlerBase. The HandlerBase class implements all the handler interfaces: DocumentHandler, ErrorHandler, DTDHandler, and EntityResolver. We then registered our interface handler with the parser using the setXXXHandler methods.
Before moving on to the specific SAX interfaces, let's examine the general steps used in the SAXTester program to receive the events from the SAX-compliant parser (also called a SAX driver). There are four steps:
Create an InputSource class— An XML parser requires an XML input source. The org.xml.sax.InputSource class is a single abstraction that represents multiple methods for identifying an input source to an XML parser. You can create an InputSource class by providing a character stream, a byte stream, or a URI as a string. In SaxTester.java, I created the input source using a file URI. Another possible way would be to use a FileReader, as shown in the following:
FileReader rdr = new FileReader(filename); InputSource insrc = new InputSource(rdr);
Instantiate a SAX-compliant parser— There are two methods to do this. The most direct way is to instantiate a specific SAX-compliant parser with the new operator. However, this will lock your application into only using that implementation. A more flexible method is to use the helper class called ParserFactory. The org.xml.sax.helpers. ParserFactory class has a makeParser() method that will use reflection to instantiate the class referred to in the org.xml.sax.parser System property. Remember that we set this on the command line with the -D option to the Java virtual machine (JVM). In SaxTester.java, I created a parser using the ParserFactory class.
Register the handler classes— A SAX-compliant parser is required to implement mutator methods (also called setters) for the following interfaces: DocumentHandler, ErrorHandler, DTDHandler, and EntityResolver. As a result, every SAX-compliant parser has a setDocumentHandler(), setErrorHandler(), setDTDHandler(), and setEntityResolver() method that accepts a reference to an object that implements that interface. In SaxTester.java, we have one class that implements all the interfaces, so we pass a reference to it into each method.
Now that we have covered the general process, I will discuss and demonstrate each handler interface.
This is the primary interface your application will implement to receive events from the SAX-compliant parser. If you do not want to implement all methods in this interface, you can have your class extend HandlerBase and override just the events in which you are interested. The order of events you receive will correspond to the order of markup in the XML document you are parsing. Listing 2.4 is the complete DocumentHandler interface.
1: package org.xml.sax; 2: 3: public interface DocumentHandler 4: { 5: public abstract void setDocumentLocator (Locator locator); 6: 7: public abstract void startDocument () 8: throws SAXException; 9: 10: public abstract void endDocument () 11: throws SAXException; 12: 13: public abstract void startElement (String name, AttributeList atts) 14: throws SAXException; 15: 16: public abstract void endElement (String name) 17: throws SAXException; 18: 19: public abstract void characters (char ch[], int start, int length) 20: throws SAXException; 21: 22: public abstract void ignorableWhitespace (char ch[], int start, int length) 23: throws SAXException; 24: 25: public abstract void processingInstruction (String target, String data) 26: throws SAXException; 27: } |
I will first examine the purpose of each method (event) in the interface (see Table 2.1) and then demonstrate an example that uses the interface.
Now that we understand what methods a SAX Parser calls and the parameters it passes into those methods, we need an example of how to translate those events in an application. Let's again parse our addresses.xml file, but this time convert address elements into address objects. Listing 2.5, AbmlParser.java, does just that.
1: /* AbmlParser.java */ 2: package sams.chp2; 3: 4: import java.util.*; 5: import java.io.*; 6: 7: import org.xml.sax.*; 8: import org.xml.sax.helpers.*; 9: 10: public class AbmlParser 11: { 12: public final static boolean debug; 13: 14: static 15: { 16: String strDebug = System.getProperty("DEBUG"); 17: if (strDebug == null) 18: strDebug = System.getProperty("debug"); 19: 20: if (strDebug != null && strDebug.equalsIgnoreCase("true")) 21: debug = true; 22: else 23: debug = false; 24: } 25: 26: private Parser saxParser; 27: 28: private AbmlHandler docHandler; 29: 30: class AbmlHandler implements org.xml.sax.DocumentHandler 31: { 32: /** locator object from parser. */ 33: private Locator loc; 34: 35: /** Vector of addresses. */ 36: private Vector addresses; 37: 38: /** Current element parsed. */ 39: private int currentElement; 40: 41: /** current Address */ 42: private Address currentAddress; 43: 44: /** accessor method. */ 45: public final Vector getAddresses() { return addresses; } 46: 47: /** method of the DocumentHandler Interface. */ 48: public void characters(char[] ch, int start, int length) 49: { 50: // Receive notification of character data inside an element. 51: if (debug) System.out.println("Called characters(ch:" + 52: new String(ch,start,length) + 53: ",start:" + start + ",length: "+ length + ")"); 54: 55: if (currentAddress == null) 56: return; // parser will catch this well-formedness error. 57: 58: String s = new String(ch, start, length); 59: switch (currentElement) 60: { 61: case Address.NAME: 62: currentAddress.setName(s); 63: break; 64: case Address.STREET: 65: Vector v = currentAddress.getStreets(); 66: if (v == null) 67: v = new Vector(); 68: v.addElement(s); 69: currentAddress.setStreets(v); 70: break; 71: case Address.CITY: 72: currentAddress.setCity(s); 73: break; 74: case Address.STATE: 75: currentAddress.setState(s); 76: break; 77: case Address.ZIP: 78: currentAddress.setZip(s); 79: break; 80: } 81: } 82: 83: /** method of the DocumentHandler Interface. */ 84: public void startDocument() 85: { 86: // Receive notification of the beginning of the document. 87: if (debug) System.out.println("Called startDocument()"); 88: 89: // initialize the vector 90: addresses = new Vector(); 91: } 92: 93: /** method of the DocumentHandler Interface. */ 94: public void endDocument() 95: { 96: // Receive notification of the end of the document. 97: if (debug) System.out.println("Called endDocument()"); 98: } 99: 100: /** method of the DocumentHandler Interface. */ 101: public void startElement(java.lang.String name, AttributeList attributes) 102: { 103: // Receive notification of the start of an element. 104: if (debug) System.out.println("Called startElement(name:" + name + ")"); 105: if (name.equals("ADDRESS")) 106: { 107: // create an Address object 108: currentElement = Address.ADDRESS; 109: currentAddress = new Address(); 110: } 111: else if (name.equals("NAME")) 112: currentElement = Address.NAME; 113: else if (name.equals("STREET")) 114: currentElement = Address.STREET; 115: else if (name.equals("CITY")) 116: currentElement = Address.CITY; 117: else if (name.equals("STATE")) 118: currentElement = Address.STATE; 119: else if (name.equals("ZIP")) 120: currentElement = Address.ZIP; 121: else 122: currentElement = -1; 123: } 124: 125: /** method of the DocumentHandler Interface. */ 126: public void endElement(java.lang.String name) 127: { 128: // Receive notification of the end of an element. 129: if (debug) System.out.println("Called endElement(name: " + name + ")"); 130: 131: if (name.equals("ADDRESS")) 132: addresses.addElement(currentAddress); 133: } 134: 135: /** method of the DocumentHandler Interface. */ 136: public void ignorableWhitespace(char[] ch, int start, int length) 137: { 138: // Receive notification of ignorable whitespace in element content. 139: if (debug) System.out.println("Called ignorableWhitespace(ch:" 140: + new String(ch,start,length) + 141: ",start: " + start + ",length: " + length + ")"); 142: } 143: 144: /** method of the DocumentHandler Interface. */ 145: public void processingInstruction(java.lang.String target, 146: java.lang.String data) 147: { 148: // Receive notification of a processing instruction. 149: if (debug) System.out.println("Called processingInstruction 150: (target:" + target + ",data:" + data + ")"); 151: } 152: 153: /** method of the DocumentHandler Interface. */ 154: public void setDocumentLocator(Locator locator) 155: { 156: // Receive a Locator object for document events. 157: if (debug) System.out.println("Called setDocumentLocator()"); 158: loc = locator; 159: } 160: } 161: 162: public AbmlParser() throws InstantiationException 163: { 164: try 165: { 166: saxParser = ParserFactory.makeParser(); 167: } catch (Exception e) 168: { 169: if (e instanceof InstantiationException) 170: throw (InstantiationException) e; 171: else 172: throw new InstantiationException("Reason:" + e.toString()); 173: } 174: 175: docHandler = new AbmlHandler(); 176: saxParser.setDocumentHandler(docHandler); 177: } 178: 179: /** 180: * method to parse an abml file and return a Vector of Address objects. 181: * @param is org.xml.sax.InputSource. 182: * @returns A Vector of sams.chp2.Address objects. 183: */ 184: public Vector parse(InputSource is) throws SAXException 185: { 186: try 187: { 188: saxParser.parse(is); 189: } catch (IOException ioe) { throw new SAXException(ioe.getMessage()); } 190: 191: return docHandler.getAddresses(); 192: } 193: 194: /** main() method for unit testing. */ 195: public static void main(String args[]) 196: { 197: if (args.length < 1) 198: { 199: System.out.println("USAGE: java -Dorg.xml.sax.parser=<classname> 200: + "sams.chp2.AbmlParser <document>"); 201: System.exit(1); 202: } 203: 204: try 205: { 206: AbmlParser addressParser = new AbmlParser(); 207: File f = new File(args[0]); 208: InputSource is = new InputSource(f.toURL().toString()); 209: Vector docAddresses = addressParser.parse(is); 210: 211: // how many addresses? 212: int count = docAddresses.size(); 213: System.out.println("# of addresses: " + count); 214: 215: // print out the address names 216: for (int i=0; i < count; i++) 217: { 218: System.out.println("Address of: " + 219: ((Address)docAddresses.elementAt(i)).getName()); 220: } 221: } catch (Throwable t) 222: { 223: t.printStackTrace(); 224: } 225: } 226: } |
Listing 2.6 is the Address object that is instantiated by the AbmlParser.java program.
1: package sams.chp2; 2: 3: import java.util.Vector; 4: 5: public class Address 6: { 7: public static final int ADDRESS = 1; 8: public static final int NAME = 2; 9: public static final int STREET = 3; 10: public static final int CITY = 4; 11: public static final int STATE = 5; 12: public static final int ZIP = 6; 13: 14: private String name; 15: private Vector streets = new Vector(); 16: private String city; 17: private String state; 18: private String zip; 19: 20: public String getName() { return name; } 21: public void setName(String s) { name = s; } 22: public Vector getStreets() { return streets; } 23: public void setStreets(Vector v) { streets = v; } 24: public String getCity() { return city; } 25: public void setCity(String s) { city = s; } 26: public String getState() { return state; } 27: public void setState(String s) { state = s; } 28: public String getZip() { return zip; } 29: public void setZip(String s) { zip = s; } 30: } |
Executing Listing 2.5 produces the following output:
1: C:synergysolutionsXml-in-Javasamschp2>java 2: -Dorg.xml.sax.parser=com.ibm.xml.parsers.ValidatingSAXParser 3: sams.chp2.AbmlParser myaddresses.xml 4: # of addresses: 2 5: Address of: Michael Daconta 6: Address of: Sterling Software
There are several key points to note about Listing 2.5:
The purpose of this program is to instantiate a Vector of Address objects given an XML file of Address elements.
The AbmlParser uses an inner class (called AbmlHandler) to implement org.xml.sax. DocumentHandler. The purpose of this is to make the handler object a part of the larger class.
The AbmlHandler class declares a reference to a Locator object to allow the SAX- compliant parser to set this variable. Though not used in this program, the Locator is used in error reporting, which is discussed in the next section.
The key idea behind the program is the creation of a simple finite state machine whereby SAX events trigger the appropriate state changes. The end state we want to reach is a fully populated Vector of Address objects. The key state changes are when to create an Address object, when to populate the fields, and when to store the Address object in the Vector. An Address object is created when the Address element is parsed (see the startElement() method). The fields of an Address object are populated in the characters() method by switching on the appropriate subelement (state which is set in the startElement() method when the appropriate subelement is reached). Lastly, the Address object is stored in the vector at the end of the Address element (see the endElement() method). For uniform XML document types, such as the Address Book Markup Language, the Java Data Binding standard extension will automatically generate Java class files that map to an XML element.
The main() method of Listing 2.5 demonstrates the use of the AbmlParser. It is important to notice that the use of this parser mirrors the two steps for using a SAX parser—instantiate the parser and then call the parse() method. One important difference is that the AbmlParser.parse() method returns the resultant Vector of Address objects.
Although Listing 2.5 performed the core operations for parsing an XML document and translating it into a useable Java object, it assumed that the XML instance documents contained no errors. Because this is not a wise assumption, I will now examine how SAX reports errors and how to handle them.
This is the interface you implement to customize error handling for your SAX application. A SAX-compliant parser may not throw an exception. Instead it must use this interface to report errors, and your application can determine whether it wants to throw an exception. Listing 2.7 is the ErrorHandler interface.
1: package org.xml.sax; 2: 3: public interface ErrorHandler 4: { 5: public abstract void warning (SAXParseException exception) 6: throws SAXException; 7: 8: public abstract void error (SAXParseException exception) 9: throws SAXException; 10: 11: public abstract void fatalError (SAXParseException exception) 12: throws SAXException; 13: } |
Table 2.2 shows a breakdown and explanation of each method.
The lowest level of error reporting is a warning. Examples of when the parser reports a warning are when the document encoding is incorrect (but still recognized) and when redefining an internal entity. Listing 2.8 demonstrates redefining an internal entity.
1: <?xml version="1.0"?> 2: <!DOCTYPE ADDRESS_BOOK 3: [ 4: <!ENTITY md "Michael Daconta"> 5: <!ENTITY md "Medical Doctor"> 6: ]> 7: <ADDRESS_BOOK> 8: <ADDRESS> 9: <NAME> &md; </NAME> 10: <STREET>4296 Razor Hill Road </STREET> 11: <CITY>Bealeton </CITY> 12: <STATE>VA </STATE> 13: <ZIP>22712 </ZIP> 14: </ADDRESS> 15: </ADDRESS_BOOK> |
To demonstrate the reporting of this warning, I will use the SaxTester program displayed in Listing 2.3. We run the program using a non-validating parser from Sun Microsystems, Inc., which produces the following output:
1: C:/synergysolutions/Xml-in-Java/sams/chp2>java -Dorg.xml.sax.parser=com.ibm.xml. 2: parsers.ValidatingSAXParser sams.chp2.SaxTester addrerr1.xml 3: Called setDocumentLocator() 4: Called startDocument() 5: Called warning() 6: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr1.xml, 7: at line 5 and col -1 8: org.xml.sax.SAXParseException: Using original entity definition for "&md;". 9: at com.sun.xml.parser.Parser.warning(Parser.java:2721) 10: at com.sun.xml.parser.Parser.maybeEntityDecl(Parser.java:2276) 11: at com.sun.xml.parser.Parser.maybeMarkupDecl(Parser.java:1165) 12: at com.sun.xml.parser.Parser.maybeDoctypeDecl(Compiled Code) 13: at com.sun.xml.parser.Parser.parseInternal(Compiled Code) 14: at com.sun.xml.parser.Parser.parse(Parser.java:286) 15: at sams.chp2.SaxTester.main(SaxTester.java:49) 16: Called startElement(name:ADDRESS_BOOK) 17: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr1.xml, 18: at line 7 and col -1 19: Called characters(ch:,start:132,length: 0) 20: Called characters(ch: 21: ,start:0,length: 1) 22: Called characters(ch: ,start:134,length: 1) 23: Called startElement(name:ADDRESS) 24: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr1.xml, 25: at line 8 and col -1 26: Called characters(ch:,start:144,length: 0) 27: Called characters(ch: 28: ,start:0,length: 1) 29: Called characters(^C 30: Called characters(ch: ,start:146,length: 16) 31: Called startElement(name:NAME) 32: ...
The key thing to notice in the beginning of the output is that warning() is called. Inside, the warning() method would put out the information provided from the Locator interface and the SAXParseException that was passed into the method. The following is a repeat of the warning() inside the SaxTester class.
1: /** method of the ErrorHandler Interface. */ 2: public void warning(SAXParseException e) 3: { 4: // Receive notification of a parser warning. 5: System.out.println("Called warning()"); 6: if (loc != null) 7: System.out.println("In " + loc.getSystemId() + ",at line " + 8: loc.getLineNumber() + " and col " + 9: loc.getColumnNumber()); 10: e.printStackTrace(); 11: } 12: }
The Locator object (referred to by the reference loc in the previous source) is set in the setDocumentLocator() method of the DocumentHandler interface. All of the error reporting methods should use this interface to inform the user of where in the input XML file the parser encountered the error. From the Locator interface, we report what URL the error occurred in, the line, and column number. However, because it is not required that a SAX-compliant parser provide a Locator object, you must check if the reference is null before trying to use it. Three of the four methods available in the Locator interface are used in the warning() method. The only method not used is getPublicId(), which returns a public identifier if one is available. If your application receives a warning() from the parser, you should merely report this occurrence to the user (or log it if in a server application) and continue processing. It is important to note that the Locator object gives you information about all SAX events. While it can be used with errors, all the methods that exist in Locator also exist in the SAXParseException, which is passed to all of the ErrorHandler methods. So, instead of loc.getLineNumber(), I could have used e.getLineNumber(). The final thing to note about the handling of the warning() error is that the SAX events do not stop but continue even after the warning() method is called.
The second level of SAX error reporting is when the parser calls the error() method. Use of this method conforms to the definition in the XML Specification for error that defines an error as "a violation of the rules of this specification; results are undefined. Conforming software may detect and report an error and may recover from it." While that definition is not very specific and final implementation is up to the individual creators of XML parsers, it is mostly used for validation-type errors. Informally, this group of errors is considered continuable but major. Common validation errors are undeclared elements, undeclared attributes, incorrect root element type, incorrect content model, and incorrect values for a certain attribute type. Listing 2.9 is an example of an XML document that produces a validation error.
1: <?xml version="1.0"?> 2: <!DOCTYPE ADDRESS_BOOK 3: [ 4: <!ELEMENT ADDRESS_BOOK (ADDRESS)+> 5: <!ELEMENT ADDRESS (NAME, STREET+, CITY, STATE, ZIP)> 6: <!ELEMENT NAME (#PCDATA)> 7: <!ELEMENT STREET (#PCDATA)> 8: <!ELEMENT STATE (#PCDATA)> 9: <!ELEMENT ZIP (#PCDATA)> 10: <!ATTLIST STREET TYPE (street|suiteno|aptno|other) #IMPLIED> 11: ]> 12: <ADDRESS_BOOK> 13: <ADDRESS> 14: <NAME> Michael Daconta </NAME> 15: <STREET>4296 Razor Hill Road </STREET> 16: <CITY>Bealeton </CITY> 17: <STATE>VA </STATE> 18: <ZIP>22712 </ZIP> 19: </ADDRESS> 20: </ADDRESS_BOOK> |
The error in Listing 2.9 is an undeclared element. Although the CITY element is used in the instance of the ADDRESS_BOOK document, it is not declared in the internal document type definition. When we run SaxTester with Sun's validating parser, we get the following output:
1: C:/synergysolutions/Xml-in-Java/sams/chp2>java -Dorg.xml.sax.parser=com.ibm.xml.
2: parser.ValidatingParser sams.chp2.SaxTester addrerr2.xml
3: Called setDocumentLocator()
4: Called startDocument()
5: Called startElement(name:ADDRESS_BOOK)
6: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr2.xml,
7: at line 13 and col -1
8: Called ignorableWhitespace(ch:,start: 332,length: 0)
9: Called ignorableWhitespace(ch:
10: ,start: 0,length: 1)
11: Called ignorableWhitespace(ch: ,start: 334,length: 1)
12: Called startElement(name:ADDRESS)
13: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr2.xml,
14: at line 14 and col -1
15: ...
16: Called startElement(name:STREET)
17: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr2.xml,
18: at line 16 and col -1
19: Called characters(ch:4296 Razor Hill Road ,start:404,length: 21)
20: Called endElement(name: STREET)
21: Called ignorableWhitespace(ch:,start: 434,length: 0)
22: Called ignorableWhitespace(ch:
23: ,start: 0,length: 1)
24: Called ignorableWhitespace(ch: ,start: 436,length: 2)
25: Called error(e:org.xml.sax.SAXParseException: Element type "CITY" is
26: not declared.)
27: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr2.xml,
28: at line 17 and col -1
29: org.xml.sax.SAXParseException: Element type "CITY" is not declared.
30: at com.sun.xml.parser.Parser.error(Parser.java:2733)
31: at com.sun.xml.parser.Parser.maybeElement(Compiled Code)
32: at com.sun.xml.parser.Parser.content(Compiled Code)
33: at com.sun.xml.parser.Parser.maybeElement(Compiled Code)
34: at com.sun.xml.parser.Parser.content(Compiled Code)
35: at com.sun.xml.parser.Parser.maybeElement(Compiled Code)
36: at com.sun.xml.parser.Parser.parseInternal(Compiled Code)
37: at com.sun.xml.parser.Parser.parse(Parser.java:286)
38: at sams.chp2.SaxTester.main(SaxTester.java:49)
39: Called startElement(name:CITY)
40: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr2.xml,
41: at line 17 and col -1
42: Called characters(ch:Bealeton ,start:444,length: 9)
43: Called endElement(name: CITY)
44: ...
The previous code lines only reveal the middle portion of the output. Sax events are caught before the error and after the error. The error() method is invoked when the CITY element is processed in the XML document. Because we are using a validating parser (notice the setting of the org.xml.sax.parser property), the parser reports that the element called CITY has not been declared in the internal DTD. Although this is a serious error, the parser still passed you the markup and content for the CITY element. One potential remedy for handling validation type errors is to skip the element where the parser encountered the error. Another possibility is to fill in a subelement or attribute with default data. The most serious error the parser can report is a fatal error.
The fatalError() method in the ErrorHandler interface corresponds to the definition of a fatal error in Section 1.2 of the XML Specification. The specification states that a fatal error is "an error which a conforming XML processor must detect and report to the application. After encountering a fatal error, the processor may continue processing the data to search for further errors and may report such errors to the application.… Once a fatal error is detected, however, the processor must not continue normal processing." In relation to SAX, halting normal processing means ceasing all SAX events except the reporting of errors. In informal terms, a fatal error is an uncontinuable error. The most common types of fatal errors are documents that are not well-formed. Listing 2.10 demonstrates an ill-formed document that causes a fatal error.
1: <!-- not well formed --> 2: <?xml version="1.0 3: <<ADDRESS_BOOK> 4: <ADDRESS> 5: <NAME> Michael Daconta </NAME> 6: <STREET>4296 Razor Hill Road </STREET> 7: <CITY>Bealeton 8: <STATE>VA </STATE> 9: <ZIP>22712 </ZIP> 10: </ADDRESS>> 11: </ADDRESS_BOOK> |
It should be immediately obvious that Listing 2.10 is not well-formed. The version attribute of the XML declaration does not have a closing quote. Also, the processing instruction does not end with a greater-than symbol. Lastly, the ADDRESS_BOOK element has two beginning less-than symbols. When the SaxTester application is run it produces the following output:
1: C:synergysolutionsXml-in-Javasamschp2>java -Dorg.xml.sax.parser=com.ibm.xml. 2: parsers.ValidatingSAXParser sams.chp2.SaxTester addrerr3.xml 3: Called setDocumentLocator() 4: Called startDocument() 5: Called fatalError(e:org.xml.sax.SAXParseException: XML declaration may 6: only begin entities.) 7: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/addrerr3.xml, 8: at line 2 and col -1 9: org.xml.sax.SAXParseException: XML declaration may only begin entities. 10: at com.sun.xml.parser.Parser.fatal(Parser.java:2755) 11: at com.sun.xml.parser.Parser.fatal(Parser.java:2743) 12: at com.sun.xml.parser.Parser.maybePI(Compiled Code) 13: at com.sun.xml.parser.Parser.maybeMisc(Compiled Code) 14: at com.sun.xml.parser.Parser.parseInternal(Compiled Code) 15: at com.sun.xml.parser.Parser.parse(Parser.java:286) 16: at sams.chp2.SaxTester.main(SaxTester.java:49)
When the parser encounters the ill-formed document, it reports the error. Your only recourse for this is to abort the process and report the error to the user. If your document is one member of a set, move on to the next document.
The next two interfaces are less common and are only useful under specific circumstances.
This interface provides the minimal amount of DTD processing required by the XML Specification for non-validating parsers. The purpose for this is that SAX 1.0 was meant to be simple for validating and non-validating parsers to implement. Many developers considered the lack of DTD processing a serious deficiency and it has been rectified in SAX 2.0. Listing 2.11 is the DTDHandler interface that reports two DTD declarations—notations and unparsed external entities.
Note
The SAX2 API is now complete. Information is available on this API at http://www.megginson.com/SAX/index.html. Unfortunately, at the time of writing, parser support for SAX2 is still spotty at best.
1: package org.xml.sax; 2: public interface DTDHandler 3: { 4: public abstract void notationDecl (String name, 5: String publicId, 6: String systemId) 7: throws SAXException; 8: 9: public abstract void unparsedEntityDecl (String name, 10: String publicId, 11: String systemId, 12: String notationName) 13: throws SAXException; 14: } |
Table 2.3 shows a breakdown and explanation of each method.
To demonstrate the use of the DTDHandler, I will create a simple XML document that contains both a notation and unparsed external entity. We discuss these concepts in more detail in Chapter 4. The source in Listing 2.12 represents a portion of a BOOKMARKS markup language that uses an unparsed (binary) entity to refer to a folder icon. The icon is in GIF format (or in XML-speak, the GIF notation).
1: <?xml version="1.0" ?> 2: <!DOCTYPE BOOKMARKS [ 3: <!NOTATION gif SYSTEM "apps/gifviewer.exe"> 4: <!ENTITY folder SYSTEM "images/folder1.gif" NDATA gif> 5: <!ELEMENT BOOKMARKS (BOOKMARK|FOLDER)* > 6: <!ELEMENT FOLDER (BOOKMARK|FOLDER)* > 7: <!ELEMENT BOOKMARK (#PCDATA)> 8: <!ATTLIST FOLDER 9: icon ENTITY #REQUIRED> 10: ]> 11: <BOOKMARKS> 12: <FOLDER icon="folder"></FOLDER> 13: </BOOKMARKS> |
With Listing 2.12 as input, the SaxTester program produces the following output:
1: C:synergysolutionsXml-in-Javasamschp2>java -Dorg.sax.xml.parser=com.ibm.xml. 2: parsers.ValidatingSAXParser sams.chp2.SaxTester notat1.xml 3: Called setDocumentLocator() 4: Called startDocument() 5: Called notationDecl(name: gif,publicId: null,systemId: file:/C:/synergysolutions 6: /Xml-in-Java/sams/chp2/apps/gifviewer.exe) 7: Called unparsedEntityDecl 8: folder 9: null 10: file:/C:/synergysolutions/Xml-in-Java/sams/chp2/images/folder1.gif 11: gif 12: Called startElement(name:BOOKMARKS) 13: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/notat1.xml, 14: at line 11 and col -1 15: Called ignorableWhitespace(ch:,start: 328,length: 0) 16: Called ignorableWhitespace(ch: 17: ,start: 0,length: 1) 18: Called ignorableWhitespace(ch: ,start: 330,length: 8) 19: Called startElement(name:FOLDER) 20: att-name:icon,att-type:ENTITY,att-value:folder 21: ...
As you can see, the parser informs us of the notation and the binary data. This provides enough information for us to process this binary data. The last interface of interest to SAX application writers also handles a special case.
The EntityResolver interface is used to customize the resolution of external parsed entities. Listing 2.13 is the EntityResolver interface.
1: package org.xml.sax; 2: 3: import java.io.IOException; 4: 5: public interface EntityResolver 6: { 7: public abstract InputSource resolveEntity (String publicId, 8: String systemId) 9: throws SAXException, IOException; 10: 11: } |
There is only one method in the interface. The resolveEntity() method gives you the public identifier and system identifier of the entity to resolve. Your application should only implement this interface if the XML-compliant language you are processing requires custom resolution of external entities. Although entities will be discussed in more detail in Chapter 4, for now you need to know that they are used for text replacement. An internal parsed entity provides the replacement text in the same file. An external parsed entity has the replacement text in some external file. You can specify an external parsed entity with the following syntax:
<!ENTITY entityName PUBLIC "publicId" "optionalURI">
or
<!ENTITY entityName SYSTEM "URI of resource">
Listing 2.14 demonstrates the use of both types of external parsed entities. Note that a parsed entity is another way to say a text entity (in contrast to an unparsed entity, which is some binary data).
1: <?xml version="1.0" ?> 2: <!DOCTYPE JOURNAL [ 3: <!ENTITY mcd "Michael Corey Daconta"> 4: <!ENTITY templatexml PUBLIC "boilerplate" "stuff.xml"> 5: <!ENTITY logo SYSTEM "http://www.gosynergy.com/logo"> 6: ]> 7: <JOURNAL> 8: <ENTRY> &logo; It was a bright sunny day. - &mcd; </ENTRY> 9: <ENTRY> Standard legalese here: &templatexml; </ENTRY> 10: </JOURNAL> |
In Listing 2.14 we see an external entity using the PUBLIC identifier and the SYSTEM identifier. The PUBLIC identifier is for widely used content that crosses many XML applications. A SYSTEM identifier provides a URI and means that the replacement text is located at that URI. When these entities are used in the content of the XML document, SAX calls resolveEntity() on the registered handler to resolve the entity and return the replacement text. When I use Listing 2.14 as the input to SaxTester, it produces the following output:
1: C:synergysolutionsXml-in-Javasamschp2>java -Dorg.xml.sax.parser=com.ibm.xml. 2: parsers.ValidatingSAXParser sams.chp2.SaxTester entity1.xml 3: Called setDocumentLocator() 4: Called startDocument() 5: Called startElement(name:JOURNAL) 6: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/entity1.xml, 7: at line 7 and col -1 8: Called characters(ch:,start:208,length: 0) 9: Called characters(ch: 10: ,start:0,length: 1) 11: Called startElement(name:ENTRY) 12: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/entity1.xml, 13: at line 8 and col -1 14: Called characters(ch: ,start:217,length: 1) 15: Called resolveEntity(publicId:null,systemId:http://www.gosynergy.com/logo) 16: Called characters(ch:Resolved Entity,start:0,length: 15) 17: Called characters(ch: It was a bright sunny day. - ,start:224,length: 30) 18: ... 19: Called startElement(name:ENTRY) 20: In file:/C:/synergysolutions/Xml-in-Java/sams/chp2/entity1.xml, 21: at line 9 and col -1 22: Called characters(ch: Standard legalese here: ,start:277,length: 25) 23: Called resolveEntity(publicId:boilerplate,systemId: 24: file:/C:/synergysolutions/Xml -in-Java/sams/chp2/stuff.xml) 25: Called characters(ch:Resolved Entity,start:0,length: 15) 26: ... 27: Called endDocument()
You should notice in the output that resolveEntity is called and then the resolution of the entity is returned in the very next call to the characters() method. Entities and entity resolution are an advanced topic that is discussed in more detail in the next chapter. For now, it is sufficient to know that SAX has the ability to do custom entity resolution if your application will benefit from it. This concludes our examination of interfaces that you as an application writer can implement. Again, the most common interfaces you will implement are DocumentHandler and ErrorHandler. DTDHandler and EntityResolver are for special cases. SAX has several other interfaces and classes for the parser writers to deliver the events your application processes.
Even though I do not recommend you write yet another XML parser (there are already too many freely distributable ones available), it is worthwhile to briefly examine the interfaces a SAX-compliant parser must implement.
Parser This is the main interface a SAX-compliant parser must implement. It has all the setXXXHandler methods like setDocumentHandler(). It also has two parse() methods that take an InputSource to parse.
AttributeList This interface is implemented by the parser and passed into the startElement() method of the DocumentHandler interface. This interface represents a collection of attributes for the current element. It allows you to either iterate through the entire collection of attributes (using a getName(), getType(), and getValue() method) or access a value on a specific attribute. Listing 2.15 is the AttributeList interface.
package org.xml.sax; public interface AttributeList { public abstract int getLength (); public abstract String getName (int i); public abstract String getType (int i); public abstract String getValue (int i); public abstract String getType (String name); public abstract String getValue (String name); } |
Locator The Locator interface is implemented by the parser and passed to the application via the setDocumentLocator() method in the DocumentHandler interface. The Locator interface provides information on the line number in the XML document in which a SAX event occurs. The SaxTester program used the Locator in its implementation of the startElement() method.
That completes our discussion of SAX interfaces. Now we will finish our discussion of SAX by examining all of the classes provided with SAX.
There are two categories of classes provided with the SAX distribution: standard classes and helper classes. The standard classes are part of the org.xml.sax package, and the helper classes are part of the org.xml.sax.helpers package. The following are the standard classes:
HandlerBase This is an adapter class that implements all the SAX handler interfaces as a convenience for developers. You can make your handler class extend HandlerBase and then just override the methods you are interested in.
InputSource This class is an encapsulation of all the information about an XML input source (to be handed to the parse() method of the Parser interface). The information encapsulated is the public identifier, the system identifier, a byte stream with a specified encoding, or a character stream.
The helper classes are for the convenience of both parser writers and application writers:
AttributeListImpl An implementation of the AttributeList interface for use by parser writers.
LocatorImpl An implementation of the Locator interface for use by parser writers.
ParserFactory A convenience class for SAX application writers to allow a SAX- compliant parser to be instantiated by a static method called makeParser(). There are two forms of makeParser(). The no-arg form instantiates the parser object by using the System property org.xml.sax.parser. A form that takes a string parameter accepts the classname to instantiate using reflection.
3.145.184.117