A Brief Note on Handling XML

This chapter and Chapter 11, Web Service Security assume basic familiarity with XML. We also use a few classes from JAXP (Java API for XML Processing), a Java API to process XML data, though familiarity with JAXP is not a prerequisite. Internally, JAXP relies on SAX (Simple API for XML), a public domain API for parsing XML data, developed by David Megginson and others through discussions in XML-DEV mailing list, and DOM (Document Object Model) API, a W3C recommended standard, for representing XML content in memory. JAXP also supports transformation of XML data through XSLT (XSL Transformations). Refer to the Further Reading section for references to these standards and APIs.

In this section, our aim is to refresh the understanding of those aspects of XML that we use in this chapter and later, in Chapter 11, Web Services Security. Toward this, let us take a look at a simple XML document shown in Listing 7-1 and analyze it. We use it as an input document for a few subsequent examples. This document can be found in source file book.xml, within the data subdirectory of JSTK installation directory and also in each example directory where it has been used.

Listing 7-1. XML file book.xml
<?xml version="1.0"?>

<bk:book id="j2ee_sec"
xmlns:bk="http://www.pankaj-k.net/schemas/book">
  <title id="book_title"
subject="bk:programming">J2EE Security</title>
  <author id="book_author">Pankaj Kumar</author>
  <publisher id="book_publisher">Prentice Hall</publisher>
  <bookinfo
      id="book_info"
      xmlns:bi="http://www.pankaj-k.net/schemas/bookinfo"
      xmlns:book="http://www.pankaj-k.net/schemas/book">
    <bi:categories book:area='technology' book:type="profession">
      <bi:category>Security<!-- Main Category --></bi:category>
      <bi:category>Enterprise Technology</bi:category>
    </bi:categories>
    <bk:keywords>J2EE, Security, Servlet, EJB, Web Service
</bk:keywords>
  </bookinfo>
</bk:book>

At close examination, you notice that:

  • The XML document starts with an XML declaration specifying XML version as “1.0”.

  • Root element book is in the namespace associated with URI "http://www.pankaj-k.net/schemas/book" and identified by prefix bk.

  • The root element has many children, all in the default namespace.

  • Each child element has an id attribute. By convention, these attributes are of XML type ID and can be used to identify elements within a document. Within the same XML document, two attributes of type ID cannot have the same value. Later on, we will use this attribute to address elements within the document.

  • Single quote character (') is used for specifying the value of attribute book:area in element bi:categories. Other attribute values are quoted with double quote character (").

  • There is a comment, enclosed between "<!--" and "-->" in the text content of a bi:category element.

  • URI prefixes bk and book are associated with the same URI "http://www.pankaj-k.net/schemas/book".

The significance of these observations becomes apparent as we use the document of Listing 7-1 as the test data for subsequent examples. If you look at the electronic copy, you notice that the sequence of ASCII characters shown in Listing 7-1 has been slightly modified by introducing additional new line characters. This is done only for better appearance on paper. In fact, the same is true for most of the XML documents or fragments shown in this and subsequent chapters.

It is quite common for a program dealing with XML to read an XML file, hold the XML data in memory as a DOM-based tree structure, manipulate it, and write the XML text corresponding to the modified structure to the same or another file. To read and write XML data, we use utility class XmlUtility, defined in source file XmlUtility.java and reproduced in Listing 7-2.

Listing 7-2. Utility class to read and write XML data
// File: srcjsbookch7ex1XmlUtility.java
import java.io.OutputStream;
import java.io.IOException;
import java.io.FileNotFoundException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import org.xml.sax.SAXException;
import org.w3c.dom.Document;

public class XmlUtility {
  public static Document readXML(String filename) throws
      ParserConfigurationException, FileNotFoundException,
      SAXException, IOException {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setNamespaceAware(true);
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(filename);
    return doc;
  }

  public static void writeXML(Document doc, OutputStream os) throws
      TransformerConfigurationException, TransformerException {
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();
    transformer.transform(new DOMSource(doc), new StreamResult(os));
  }
}

As you can see, the static method readXML() uses the JAXP class DocumentBuilderFactory to create an instance of another JAXP class DocumentBuilder, and then invokes parse() method with the input filename as parameter to parse the XML data and create a W3C DOM Document object. The Document object is at the root of a tree, with each node of the tree representing an information item of the XML document such as element, attribute, character data, and so on.

There is no direct method in the Document interface to serialize its nodes and write the serialized content to a file. To be able to do so, the method writeXML() uses an identity transformer with DOM Document as input and OutputStream as output. The transform() operation transforms the tree structure to a serialized text representation and writes it to the OutputStream. The OutputStream could be tied to a file or memory-based byte array.

Utility class XmlUtility is used in many of the subsequent examples. To keep the examples simple and self-contained, this file is placed in all the example directories where it is needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.106.9