Java Interfaces and Classes

Basically, any SAX-based XML processing application consists of two parts—a SAX parser and a set of handlers that should be implemented by a developer. Some helper classes also make life easier.

The core of SAX2 API contains two packages:

  • org.xml.sax

  • org.xml.sax.helpers

Some additional nonstandard features can be implemented in a standard way in the third package, org.xml.sax.ext, which is not a part of the SAX core.

Core Interfaces

We won't repeat the SAX 2 documentation, which is available for download on the official Web site of the project (see references at the end of the chapter).

Instead, we will discuss the steps that have to be taken and the classes that have to be implemented by developers of SAX-driven applications and by developers of SAX-compliant parsers.

Instantiating a Parser

First, an instance of a SAX parser must be instantiated before an application can use the SAX API to access XML documents. It is up to the developer to decide which parser to use with the application. All SAX-compliant parsers realize standard interfaces so that it's very easy to stick in any of the parsers available.

For the purpose of this discussion, we will not focus on any particular implementation of a SAX parser. The examples given will work with any SAX implementation that supports the SAX API in Java. For more details about using specific parsers, you can refer to Chapter 16.

The developer has two choices in instantiating an instance of a parser:

  • Using an implementation of the interface XMLReader supplied by a parser.

  • Instantiating a parser with a factory provided by the SAX API.

A few lines of code that follow demonstrate the first approach. Again, the example is implemented in Java and works with any parser that provides support for SAX API in Java.

import org.xml.sax.XMLReader; 
import javax.xml.parsers.SAXParser;

...

XMLReader myReader = new javax.xml.parsers.SAXParser();

The second approach is demonstrated here

import org.xml.sax.XMLReader; 
import org.xml.sax.XMLReaderFactory;
import org.xml.sax.SAXException

...

try
{
    XMLReader myReader = XMLReaderFactory.createXMLReader("javax.xml.parsers.SAXParser");
}
catch(SAXException se)
{
    // Report a problem
}

The second example instantiates a parser passing its class name to a factory method as a string, which can be loaded from a config file, or taken as a startup parameter. This approach is usually more flexible as it allows changing compliant parsers on-the-fly without recompiling any source code.

Using the XMLReader Interface

The XMLReader interface used in this example has a lot of interesting methods, which provides developers with the control over event handlers associated with the document being parsed, features of the parser in effect, and so on. Table 15.1 lists the methods of the interface.

Table 15.1. Methods of the XMLReader Interface
Name Parameters Description, Returns, Throws
getContentHandler None. Retrieves a content handler registered with a call to the setContentHandler method.

Returns:

ContentHandler— an implementation of a ContentHandler interface or null—if ContentHandler has not been registered.

Throws:

No exceptions.
setContentHandler

handler, ContentHandler— an implementation of a content handler to register with the parser.

Registers a content handler with the parser. The parser starts sending events to this handler immediately.

Returns:

No return value.

Throws:

java.lang.NullPointerException
getDTDHandler None. Retrieves a DTD handler registered with a call to the setDTDHandler method.

Returns:

DTDHandler— an implementation of a DTDHandler interface or null—if DTDHandler has not been registered.

Throws:

No exceptions.
setDTDHandler

handler, DTDHandler— an implementation of a DTD handler to register with the parser.

Registers a DTD handler with the parser. The parser starts sending events to this handler immediately.

Returns:

No return value.

Throws:

java.lang.NullPointerException
getErrorHandler None. Retrieves an error handler registered with a call to the setErrorHandler method.

Returns:

ErrorHandler— an implementation of an ErrorHandler interface or null—if ErrorHandler has not been registered.

Throws:

No exceptions.
setErrorHandler

handler, ErrorHandler— an implementation of an error handler to register with the parser.

Registers an Error handler with the parser. The parser starts sending events to this handler immediately.

Returns:

No return value.

Throws:

java.lang.NullPointerException
getEntityResolver None. Retrieves an EntityResolver registered with a call to the setEntityResolver method.

Returns:

EntityResolver— an implementation of the EntityResolver interface or null—if EntityResolver has not been registered.

Throws:

No exceptions.
setEntityResolver

handler, EntityResolver— an implementation of an entity resolver to register with the parser.

Registers an entity resolver with the parser.The parser starts sending events to the resolver immediately.

Returns:

No return value.

Throws:

java.lang.NullPointerException
parse

source, InputSource— the source to read the document from.

A call to this method starts parsing a document. Any events encountered will be sent to registered handlers or ignored if there is no appropriate handler registered. The method will not return unless the parsing is complete or an exception has occurred.

Returns:

No return value.

Throws:

java.io.Exception or org.xml.sax.SAXException
parse

systemId, String— the URI of the source to read the document from.

The same as the previous one with the only exception that the parser itself creates an InputSource object from the document located at the given URI.
getFeature

name, String— the name of the feature represented as a fully qualified URI.

Checks whether a feature is supported by the implementation of SAX parser. Some feature values may be available only in specific contexts, such as before, during, or after a parse.

Returns a boolean:

true if the requested feature is supported or false if it is not.

Throws:

SAXNotRecognizedException or SAXNotSupportedException
setFeature

name, String— the name of the feature represented as a fully qualified URI.

value, Boolean— the status of the feature.

Sets the status of the requested feature of the parser being used with the SAX. Some feature values may be immutable or mutable only in specific contexts, such as before, during, or after a parse.

Returns:

No return value.

Throws:

SAXNotRecognizedException or SAXNotSupportedException
getProperty

name, String— the name of the property represented as a fully qualified URI.

Checks the value of the requested property.Some property values may be available only in specific contexts, such as before, during, or after a parse.

Returns:

The property value as a java.lang.Object.

Throws:

SAXNotRecognizedException or SAXNotSupportedException
setProperty

name, String— the name of the property represented as a fully qualified URI.

value, java.lang.Object— the value of the property.

Sets the value of the requested property for the parser being used with the SAX. Some property values may be immutable or mutable only in specific contexts, such as before, during, or after a parse.

Returns:

No return value.

Throws:

SAXNotRecognizedException or SAXNotSupportedException

The XMLReader interface is a replacement for the Parser interface from version 1.0 and is a required interface for any SAX 2 driver (parser).

As you can see from Table 15.1, the XMLReader interface provides us with methods that can be used to register event handlers. This is what interests us in the next step.

Registering Event Handlers

After we have an instance of an XMLReader, we can register event handlers for the events we are interested in.

To do so, we have a choice of methods on the XMLReader interface, but before setting a handler we have to implement it.

Implementing the ContentHandler Interface

To start with, we need to implement a ContentHandler interface—the methods of which will be invoked by the parser, or called back, every time it encounters something interesting in the document being parsed.

Note

Callback methods are often utilized to handle events programmatically. For example, callbacks are used in the Windows operating system to report mouse events (and many other kinds of events), such as a mouse movement, to applications. The use of callbacks usually involves two steps—registration and handling.

Registration is needed to advise the source of events about the parties that are interested in receiving event notifications. Methods or functions are bound to an event type through registration.

Handling is what callback methods are written for. When an event occurs, the source of events (a parser in our case) invokes (or calls back) the methods registered to handle this particular type of event.

For every event, there can be multiple handlers registered. They are usually invoked in the order of registration.


Table 15.2 lists the methods of the ContentHandler interface, which have to be implemented by an application developer.

Table 15.2. Methods of the ContentHandler Interface
Name Parameters Description, Returns, Throws
characters

ch, char[]— an array of characters from the XML document.

start, int— the starting position of the chunk of data in the array.

length, int— the length of the chunk.

This callback method is called by the parser to report character data encountered in the parsed XML document. No assumptions should be made regarding the way this method is used by parser. A parser can return all the characters in a single chunk or it may be necessary to call the method more than once to retrieve all the characters. Also, the developers should keep in mind that reading beyond the range specified by the start and length parameters could bring unpredictable results.

Returns:

No return value.

Throws:

SAXException
ignorableWhitespace

ch, char[]— an array of characters from the XML document.

start, int— the starting data in the array.

length, int— the length of the chunk.

This callback method is called by the parser to report ignorable whitespace in element content. Validating parsers should always use this method; non-validating ones can also use it.

Returns:

No return value.

Throws:

SAXException
startDocument None. Parser calls this method only once when it starts parsing a document and before invoking any other handler events.

Returns:

No return value.

Throws:

SAXException
endDocument None. Parser calls this method only once when it reaches the end of a document. No other events can be initiated after invoking this method.

Returns:

No return value.

Throws:

SAXException
startElement

namespaceURI, String— the namespace used with the element name.

localName, String— the name of the element.

qName, String— the qualified name of the element.

atts, Attributes— the collection of attributes found within the element tag.

This method is called by the parser when it encounters an opening tag of an element.

Returns:

No return value.

Throws:

SAXException
endElement

namespaceURI, String— the namespace used with the element name.

localName, String— the name of the element.

qName, String— the qualified name of the element.

This method is called by the parser when it encounters a closing tag of an element.

Returns:

No return value.

Throws:

SAXException
setDocumentLocator

locator, Locator— the locator object that can return an origin of events associated with the document being parsed.

The Locator object supplied in a call to this method can be used by an application to find a place in the document where an event came from.Parsers are not required to supply this object.

Returns:

No return value.

Throws:

SAXException
startPrefixMapping

prefix, String— the namespace prefix.

uri, String— the namespace URI.

The event is initiated by parser when it enters a namespace prefix to URI mapping.

Returns:

No return value.

Throws:

SAXException
endPrefixMapping

prefix, String— the namespace prefix.

The event is initiated by parser when it leaves a namespace prefix to URI mapping.

Returns:

No return value.

Throws:

SAXException
processing-Instruction

target, String— the target of the processing instruction.

data, String— data supplied with theinstruction.

The event is initiated by parser when it encounters a processing instruction in the document being parsed.

Returns:

No return value.

Throws:

SAXException
skippedEntity

name, String— the nameof the entity skipped.

Depending on the parser's feature set, the parser can decide to skip an entity.If this is the case, it has to initiate this event.

Returns:

No return value.

Throws:

SAXException

SAX provides a default implementation of this interface that can be used as a base class for any custom implementation. This default class is the DefaultHandler and it implements do-nothing versions of the callbacks defined in the interface ContentHandler (and three other handler interfaces). Application developers can inherit their implementations from it and override the methods they need.

Implementing the ErrorHandler Interface

After the content handler has been implemented, we need to make sure our software is able to respond to any problems that may occur during the parsing.

Three kinds of errors can occur during the parsing:

  • Fatal Error— This error occurs when something prevents the parser from further processing the document. A fatal error in SAX is defined by the W3C XML 1.0 Recommendation, Section 1.2.

  • Error— This error is reported by the parser when it encounters a problem with the document but is able to recover and continue processing. An error in SAX is defined by the W3C XML 1.0 Recommendation, Section 1.2.

  • Warning— This event is reported to an application when a SAX parser wants to report something that is neither an error nor a fatal error according to Section 1.2 of the W3C XML 1.0 Recommendation.

Using these categories of errors, the ErrorHandler interface defines three callback methods that any parser expects to be implemented by application developers. These methods are described in Table 15.3:

Table 15.3. Methods of the ErrorHandler Interface
Name Parameters Description, Returns, Throws
fatalError exception, SAXParseException Called when the application is requested to process a fatal error that occurred during the parsing process.

Returns:

No return value.

Throws:

SAXException
error exception, SAXParseException Called when the application is requested to process an error that occurred during the parsing process.

Returns:

No return value.

Throws:

SAXException
warning exception, SAXParseException Called when the application is requested to process a warning that occurred during the parsing process.

Returns:

No return value.

Throws:

SAXException

And, again, the developers of SAX have taken care of implementing the default error handler for us. The class DefaultHandler discussed earlier also implements methods of the ErrorHandler interface as empty methods.

Implementing the Interface DTDHandler

The next interface to implement is DTDHandler. It fires events upon encountering notation and unparsed entity declarations that can be used by application developers.

Note

Note that this interface does not have anything to do with validation of DTDs. To perform the validation, an application should call the setFeature method on the implementation of the XMLReader interface with a parameter http://xml.org/sax/features/validation and set this parameter to true.


Because the future of DTDs is rather unclear and it looks as though XML Schemas will preempt the concept of DTDs, it is unlikely that many of the readers will have to implement this handler. However, if needed, Table 15.4 shows the methods to implement.

Table 15.4. Methods of the DTDHandler Interface
Name Parameters Description, Returns, Throws
notationDecl

name, String— notation name.

publicId, String— public identifier of the notation.

systemId, String— system identifier of the notation.

Process the declaration of a notation.

Returns:

No return value.

Throws:

SAXException
unparsedEntityDecl

name, String— entity name.

publicId, String— public identifier of the entity.

systemId, String— system identifier of the entity.

Process the declaration of an unparsed entity.

Returns:

No return value.

Throws:

SAXException

The DefaultHandler class implements empty methods for this handler.

Implementing the EntityResolver Interface

The EntityResolver interface is normally not needed to be implemented by application developers. The only use of this interface is to handle external entities before the parser tries to open them.

It may be useful to implement the interface in such cases when an application uses nonstandard system identifiers or nonstandard ways of resolving the system identifiers.

The only method of the interface is described in Table 15.5. The DefaultHandler class contains a default implementation of the interface.

Table 15.5. Methods of the EntityResolver Interface
Name Parameters Description, Returns, Throws
ResolveEntity

publicId, String— public identifier of the entity.

systemId, String— system identifier of the entity.

Called when an external entity is encountered and before the parser opens the entity.

Returns:

InputSource— an object which can be used by the parser to read an entity resolved by the application or null when the application requests the parser to try resolving it itself.

Throws:

SAXException or java.io.IOException

Having instantiated a parser to read through XML documents and implemented the handlers to process events, your application is now ready to work with XML documents. However, there are other useful features of SAX, such as the ability to change or eliminate events that can make your life much easier when it comes to processing XML input.

Filtering Events

One of the advantages of event-based processing is that it allows developers to build data processing solutions from chains or sequences of filters and handlers, which process and possibly modify data while events “flow” through them.

For that purpose, there are a special interface XMLFilter and its default implementation class XMLFilterImple available in SAX 2.0. The XMLFilter is a simple extension to the XMLReader, which adds methods to enable filtering of events initiated during the processing of documents.

XMLFilter Interface

There are only two methods in the XMLFilter interface as described in Table 15.6.

Table 15.6. Methods of the XMLFilter Interface
Name Parameters Description, Returns, Throws
setParent

parent, XMLReader— the reader object to use with the filter.

Sets the source of events for the filter, i.e.links the filter with the XMLReader which will supply the filter with events. It is possible to build a chain of filters using this method—just pass another XMLFilter object to a call to the method.

Returns:

No return value.

Throws:

No exceptions.
getParent None. Retrieves the parent reader or filter, which is the source of events for this filter.

Returns:

XMLReader— an object implementing the XMLReader or XMLFilter interface.

Throws:

No exceptions.

XMLFilterImpl Class

The XMLFilterImpl class provides application developers with a convenient default implementation of the XMLFilter interface together with all four handler interfaces previously discussed.

All you need to do to implement your filter handlers is to derive your handler class from this default implementation and provide your implementations of the handlers you want to use.

Following is a simple Java example of a Y2K-aware filter, which replaces all the letters Y with K (that's why it's Y2K-aware) in element names:

Listing 15.2. A Simple Y2K Filter—Replaces "Y" with "K" and "y" with "k"
public class y2kFilter extends XMLFilterImpl
{
    public y2kFilter(XMLReader rdr)
    {
        super(rdr);
    }


    public void startElement(
                String uri,
                String localName,
                String qName,
                Attributes atts) throws SAXException
    {
        localName = localName.replace("Y", "K");
        localName = localName.replace("y", "k");
        super.startElement(uri, localName, qName, atts);
    }

    public void endElement(
                String uri,
                String localName,
                String qName) throws SAXException
    {
        localName = localName.replace("Y", "K");
        localName = localName.replace("y", "k");
        super.endElement(uri, localName, qName);
    }

}
							

As a result of this processing, see the following portion of an XML file:

<january> 
    <new-year-day date = "1">
        <celebrate drink = "champagne" />
    </new-year-day>
</january>
<february>

</february>

The handler following our y2kFilter will receive startElement and endElement events with the following element names.

Table 15.7. Element Names As a Result of Y2K Filter startElement and endElement Events
Event Callback Element Name
startElement januark
startElement new-kear-dak
startElement Celebrate
endElement Celebrate
endElement new-kear-dak
endElement Januark
startElement Februark
endElement Februark

Implementing SAX Parsers

Now that you have been introduced to several of the interfaces in SAX, you may want the details on the implementation of SAX parsers. Although we cannot delve into the details of implementing SAX parsers in this book, we can give you a list of the interfaces that you must implement to write your own.

SAX provides implementations of most of the interfaces needed for both application and parser writers. The minimum set of parser interfaces that have to be implemented is as follows:

  • XMLReader (Parser for SAX 1–compliant parsers)— This interface in used to set features of the parser and start the parsing process. The implementation of this class will be the source of events for applications.

  • Locator Is used by an application whenever it needs to associate an event received from the parser with the location of a node in the document being parsed.

  • Attributes (AttributeList for SAX 1–compliant parsers)— Applications use this interface for obtaining lists of elements' attributes.

SAX Extensions

In addition to the set of core interfaces discussed before, the SAX 2 specification defines some features that are optional for parsers but are standardized to make all the SAX- compliant parsers compatible and transparent to applications.

The standard SAX Extension 1.0 provides the following additional interfaces:

  • DeclHandler Provides application developers with the ability to do more complex handling of DTD declarations than through the standard handlers.

  • LexicalHandler An additional handler interface to deal with lexical events. Operates with CDATA sections, comments, entities, and DTD declarations.

  • Attributes2 A bit more advanced version of the Attributes interface.

  • EntityResolver2 A more complicated extension to the EntityResolver interface.

  • Locator2 In addition to the services provided in the Locator interface, this one defines methods to retrieve the information regarding the encoding of the XML document and the version of XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.48.181