Event Based Parsing with SAX Under Java

The most straightforward method of parsing XML under Java is with the Simple API for XML (SAX). This section provides examples of the use of SAX with the Java API for XML Processing and focuses on Java-specific issues. Refer to Chapter 15, “Parsing XML Based on Events,” for a more detailed introduction to SAX.

Creating a SAX Parser Instance

In order to parse XML documents with SAX, we first need a method to generate a parser that supports the SAX API. In JAXP, parsing with SAX is made available through the creation of an instance of the JAXP SaxParser class. Because of the extremely general nature of the Java APIs for XML Processing, this is a multistep process.

First, a SAXParserFactory needs to be generated. A SAXParserFactory is not the parser itself. Instead, it is a wrapper class that, when called, generates a SAXParser. This could be any parser currently available on the system (Apache Crimson, Apache Xerces, and so on) that supports SAX. If none is specified, a default parser will be chosen. A specific parser can be chosen by setting the "org.xml.sax.driver" Java system property. In this case, however, we will use the default parser implementation:

parserFactory = SAXParserFactory.newInstance(); 

In addition, the parserFactory allows users to specify whether they want to have a namespace aware parser. Because namespaces are becoming more common, we will enable this feature of the SAXParserFactory:

parserFactory.setNamespaceAware(true); 

It is also possible to tell the Parser Factory whether you want to have a validating XML Parser. Because our example document (see the beginning of the chapter) doesn't use a DTD, for this example a non-validating parser will be chosen.

parserFactory.setValidating(false); 

Now that a Parser Factory has been created, an actual parser can be generated. The methods available to the SAXParser are listed in Table 16.2.

SAXParser saxParser = parserFactory.newSAXParser();

Table 16.2. SAXParser Methods
getParser The current SAX Parser that this class encapsulates
getProperty Gets a SAX XMLReader property
getXMLReader The underlying XMLReader associated with this parser
isNamespaceAware A boolean value that specifies whether this parser understands namespaces
isValidating A boolean value that specifies whether this parser validates against DTDs and Schemas
parse Parses the specified XML document
setProperty Sets a SAX XMLReader property

Note

JAXP 1.1 and higher are based on SAX2, and therefore include methods to work with the SAX2 XMLReader interfaces. If you are using a tool based on the older JAXP 1.0 standard (which was based on SAX1), these methods will not be available.


Handling Events

The Simple API for XML is an event-based parser interface. Whenever a node in the XML document is parsed, a matching callback function in the SAX implementation is called. In the case of Java, this is done via the DefaultHandler interface of the Java SAX specification. For our example, the various methods will print out information about the document that is available at each step of the parsing.

For the Java SAX example, the first handlers to be provided are for the startDocument and the endDocument. These notify the application when the parser has begun and when it has finished parsing the document. This can be especially helpful when processing long documents:

public void startDocument () 
{
  System.out.println("[Start Of Document]");
}
  public void endDocument ()
{
  System.out.println("[End Of Document]");
}

Similar methods are registered to be called when an element is started (startElement) and when an element is ended (endElement). It is important to note that the endElement method might be called long after the startElement is called because an element might include many elements beneath it. Keeping track of this sort of information is one of the subtle complexities of using the SAX API, and it is one reason some applications are better suited to a higher level API such as the Document Object Model (DOM).

public void startElement (String uri, String name, String qName, Attributes atts) 
{
  System.out.println("[Start element: " + qName + "]");
}
public void endElement (String uri, String name, String qName)
{
  System.out.println("[End element: " + qName + "]");
}

Finally, the event handler for character data must be provided. In the case of XML parsers, whitespace such as new lines in documents is preserved. In order to make the printing of the character data more pleasant to read, all the characters except the new lines will be printed.

public void characters (char ch[], int start, int length) 
{
  for (int i = start; i < start + length; i++)
  {
    if (ch[i] != '
')
    {
      System.out.print(ch[i]);
    }
  }
  System.out.println();
}

Parsing the Document

A SAX Parser has now been generated. However, this parser can be used over and over again. In order to use it specifically for our document, an object called an XMLReader must be created from the SAX Parser. An XMLReader is the object that actually implements the SAX interfaces for the event based parsing that is to be done.

XMLReader xmlReader = saxParser.getXMLReader(); 

Similarly, the SAX Event Handlers have been programmed, but the SAX Parser does not yet know that those handlers are the ones to use for this document. Therefore, those handlers must be placed in a custom class. In this case, our class is called SimpleSax. Then, an instance of that class must be generated, and the SAX Event Handlers set to use that class.

SimpleSAX saxHandler = new SimpleSAX(); 
xmlReader.setContentHandler(saxHandler);
xmlReader.setErrorHandler(saxHandler);

At this point, all that still needs to be done is to actually parse the document. This is accomplished through the parse method of the XMLReader object. That parse methodrequires a Java InputSource to parse, so first a Java FileReader is created for the example, "example.xml", and then that FileReader is used to create an InputSource:

FileReader fileReader = new FileReader("example.xml"); 
InputSource inputSource = new InputSource(fileReader);

Finally, the parse method of the XMLReader is called, and the document is parsed:

xmlReader.parse(inputSource); 

The full source code to this example is shown in Listing 16.1, and the output from the example is shown in Listing 16.2.

Listing 16.1. SAX Example in Java
import java.io.FileReader;

import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import org.xml.sax.XMLReader;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;

public class SimpleSAX extends DefaultHandler
{

  // Could also specify org.xml.sax.driver
  public static void main (String args[])    throws Exception
  {
    // Generate a parser factory that will provide a parser implementation
    SAXParserFactory parserFactory = SAXParserFactory.newInstance();

    // Tell the parser factory to generate a parser
    // that is aware of namespaces
    parserFactory.setNamespaceAware(true);

    // Tell the parser factory to generate a parser that does not validate
    parserFactory.setValidating(false);

    // Using the parser factory, generate a new SAX Parser
    SAXParser saxParser = parserFactory.newSAXParser();

    // Generate a specific XML Reader from the SAX Parser
    XMLReader xmlReader = saxParser.getXMLReader();

    // Create an instance of the SimpleSAX class that implements the
    // SAX DefaultHandler interface
    SimpleSAX saxHandler = new SimpleSAX();

    // Tell the XML Reader that we wish to use the generated SimpleSAX
    // Class for parsing event callbacks
    xmlReader.setContentHandler(saxHandler);

    // Tell the XML Reader to use the generated SimpleSAX class for errors as well
    xmlReader.setErrorHandler(saxHandler);

    // Open a Java fileReader for the example file
    FileReader fileReader = new FileReader("example.xml");

    // Create an InputSource based on the File Reader
    InputSource inputSource = new InputSource(fileReader);

    // Parse the document with the Input Source created from the example file.
    xmlReader.parse(inputSource);
  }

  // SAX Event Handlers

  public void startDocument ()
  {
    System.out.println("[Start Of Document]");
  }


  public void endDocument ()
  {
    System.out.println("[End Of Document]");
  }


  public void startElement (String uri, String name, String qName, Attributes atts) 
  { 
  System.out.println("[Start element: " + qName + "]");
  }


  public void endElement (String uri, String name, String qName)
  {
  System.out.println("[End element: " + qName + "]");
  }


  public void characters (char ch[], int start, int length)
  {

    // Print out all of the characters provided by the parser, ignoring newlines.
    for (int i = start; i < start + length; i++)
    {
      if (ch[i] != '
')
      {
      System.out.print(ch[i]);
      }
    }
    // Since newlines are suppressed, generate a newline at the end of the
    // characters
    System.out.println();
  }

}


Listing 16.2. Sample Java SAX Application Output
[Start Of Document]
[Start element: java_xml]
[Start element: standards]
[Start element: standard]
Java API for XML Processing(JAXP) 
[End element: standard]
[Start element: standard]
Document Object Model(DOM)
[End element: standard]
[Start element: standard]
Simple API for XML(SAX)
[End element: standard]
[End element: standards]
[Start element: parsers]
[Start element: parser]
Xerces
[End element: parser]
[Start element: parser]
Crimson
[End element: parser]
[Start element: parser]
GNUJAXP
[End element: parser]
[End element: parsers]
[End element: java_xml]
[End Of Document]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.176.243