Document Object Model (DOM) Parser

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Document Object Model (DOM) Parser

When you use the DOM API to parse an XML document, a tree structure representing the XML document is built in memory. You can then analyze the nodes of the tree to discover the XML contents.

The mechanism for instantiating a DOM parser is very similar to that for a SAX parser. A new instance of a DocumentBuilderFactory is obtained that is used to create a new DocumentBuilder.

The parse() method is called on this DocumentBuilder object to return an object that conforms to the public Document interface. This object represents the XML document tree. The following code fragment creates a DOM parser and reads the XML document from a file called text.xml:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File("text.xml");

With the DocumentBuilder.parse() method, you are not restricted to reading XML only from a file; you can also use a constructed InputStream or read from a source defined by a URL.

There are a number of methods provided in the Document interface to access the nodes in the tree. These are listed in Table 16.8.

The normalize() method should always be used to put all text nodes into a form where there are no adjacent text nodes or empty text nodes. In this form, the DOM view better reflects the XML structure.

As already shown, a DOM parser is instantiated in a similar manner as a SAX parser; the code should be familiar:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(argv[0]));

This is where the similarity ends. At this point, the DOM parser has built an in-memory representation of the document that will look something like Figure 16.2.

Figure 16.2. Diagram of DOM tree.

The root of the DOM tree is obtained with the getDocumentElement() method.

Element root = document.getDocumentElement();

This method returns an Element, which is simply a node that may have attributes associated with it. An element can be the parent to other elements.

There are a number of methods provided in the Document interface to access the nodes in the tree, which are listed in Table 16.8. These methods return either a Node or a NodeList (ordered collection of nodes).

Table 16.8. Document Interface Methods to Traverse a DOM Tree
Method Name	Description
`getDocumentElement()`	Allows direct access to the root element of the document
`getElementsByTagName(String)`	Returns a `NodeList` of all the elements with the given tag name in the order in which they are encountered in the tree
`getChildNodes()`	A `NodeList` that contains all children of this node
`getParentNode()`	The parent of this node
`getFirstChild()`	The first child of this node
`getLastChild()`	The last child of this node
`getPreviousSibling()`	The node immediately preceding this node

In the DOM application you are about to build, the getChildNodes() method is used to recursively traverse the DOM tree. The NodeList.getLength() method can then be used to find out the number of nodes in the NodeList.

NodeList children = node.getChildNodes();
int len = (children != null) ? children.getLength() : 0;

In addition to the tree traversal methods, the Node interface provides the following methods to investigate the contents of a node as in Table 16.9.

Table 16.9. Document Interface Methods to Inspect DOM Nodes
Method Name	Description
`getAttributes()`	A `NamedNodeMap` containing the attributes of a node if it is an `Element` or `null` if it is not.
`getNodeName()`	A string representing name of this node (the tag).
`getNodeType()`	A code representing the type of the underlying object. A node can be one of `ELEMENT_NODE`, `ATTRIBUTE_NODE`, `TEXT_NODE`, `CDATA_SECTION_NODE`, `ENTITY_REFERENCE_NODE`, `ENTITY_NODE`, `PROCESSING_INSTRUCTION_NODE`, `COMMENT_NODE`, `DOCUMENT_NODE`, `DOCUMENT_TYPE_NODE`, `DOCUMENT_FRAGMENT_NODE`, `NOTATION_NODE`.
`getNodeValue()`	A string representing the value of this node. If the node is a text node, the value will be the contents of the text node; for an attribute node, it will be the string assigned to the attribute. For most node types, there is no value and a call to this method will return `null`.
`getNamespaceURI()`	The namespace URI of this node.
`hasAttributes()`	Returns a `boolean` to indicate whether this node has any attributes.
`hasChildNodes()`	Returns a `boolean` to indicate whether this node has any children.

Listing 16.10 is the full listing of a simple standalone parser that uses DOM. It reads in a file from the command line, builds the parse tree, and outputs elements (including attributes) and text nodes as XML.

Listing 16.10. Simple DOM Parser

  1: import javax.xml.parsers.*;
  2: import org.xml.sax.*;
  3: import java.io.*;
  4: import org.w3c.dom.*;
  5: import java.util.*;
  6:
  7: public class DOMParse {
  8:
  9:     static Document document;
 10:
 11:     public static void main(String argv[]) {
 12:         if (argv.length != 1) {
 13:             System.err.println("Usage: DOMParse filename");
 14:             System.exit(1);
 15:         }
 16:         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 17:         try {
 18:             DocumentBuilder builder = factory.newDocumentBuilder();
 19:             document = builder.parse(new File(argv[0]));
 20:             document.getDocumentElement().normalize ();
 21:             Element root = document.getDocumentElement();
 22:             writeElement(root, "");
 23:         }
 24:         catch (ParserConfigurationException ex) {
 25:             System.err.println ("Failed to create DOM parser:" + ex);
 26:         }
 27:         catch (SAXException ex) {
 28:             System.err.println ("General SAX exeception:" + ex);
 29:         }
 30:         catch (IOException ex) {
 31:             System.err.println ("IO exeception:" + ex);
 32:         }
 33:         catch (IllegalArgumentException ex) {
 34:             System.err.println ("Invalid file argument" + ex);
 35:         }
 36:     }
 37:
 38:     private static void writeElement(Node n, String indent) {
 39:         StringBuffer name = new StringBuffer(indent);
 40:         name.append('<'),
       // note where to put / when printing out end tag
 41:         int tag_start = name.length();
 42:         name.append(n.getNodeName());
 43:
 44:         NamedNodeMap attrs = n.getAttributes();
 45:         int attrCount = (attrs != null) ? attrs.getLength() : 0;
 46:         StringBuffer attributes = new StringBuffer();
 47:         for (int i = 0; i < attrCount; i++) {
 48:             Node attr = attrs.item(i);
 49:             attributes.append(' '),
 50:             attributes.append(attr.getNodeName());
 51:             attributes.append("="");
 52:             attributes.append(attr.getNodeValue());
 53:             attributes.append('"'),
 54:         }
 55:         System.out.print (name);
 56:         System.out.print (attributes);
 57:         System.out.println (">");
 58:         name.append('>'),
 59:
 60:         NodeList children = n.getChildNodes();
 61:         int len = (children != null) ? children.getLength() : 0;
 62:         indent += "  ";
 63:         for (int i = 0; i < len; i++) {
 64:             Node node = children.item(i);
 65:             switch (node.getNodeType())
 66:             {
 67:               case Node.TEXT_NODE:
 68:                 writeText(node, indent);
 69:                 break;
 70:
 71:               case Node.ELEMENT_NODE:
 72:                 writeElement(node, indent);
 73:                 break;
 74:             }
 75:         }
 76:         name.insert(tag_start, '/'),
 77:         System.out.println (name);
 78:     }
 79:
 80:     private static void writeText(Node n, String indent) {
 81:         String value = n.getNodeValue().trim();
 82:         if (value.length() > 0) {
 83:             System.out.print(indent);
 84:             StringTokenizer XMLTokens = new StringTokenizer(value, "&<>'"", true);
 85:             while (XMLTokens.hasMoreTokens()) {
 86:                 String t = XMLTokens.nextToken();
 87:                 if (t.length() == 1)  // might be a special char
 88:                 {
 89:                     if (t.equals("&"))
 90:                         System.out.print ("&amp;");
 91:                     else if (t.equals("<"))
 92:                         System.out.print ("&lt;");
 93:                     else if (t.equals(">"))
 94:                         System.out.print ("&gt;");
 95:                     else if (t.equals("'"))
 96:                         System.out.print ("&apos;");
 97:                     else if (t.equals("""))
 98:                         System.out.print ("&quot;");
 99:                     else
100:                         System.out.print(t);
101:                 }
102:                 else
103:                     System.out.print(t);
104:             }
105:             System.out.println();
106:         }
107:     }
108: }

Although at first site this looks more complicated than the SAX parser, most of the additional code is concerned with producing output that conforms to the XML syntax.

Lines 38–57 prints out the start tag with any associated attributes. Lines 59–75 checks for any child nodes and calls the appropriate method according to whether the child node is an XML element or a text node. Line 76 inserts a / character into the tag name before printing it out as the end tag.

The writeText() method starting on line 80 tokenizes the text contents and replaces the special characters (listed in Table 16.2) with the appropriate XML strings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Document Object Model (DOM) Parser

Create new playlist

Sign In

Sign Up

Document Object Model (DOM) Parser

Figure 16.2. Diagram of DOM tree.

Listing 16.10. Simple DOM Parser

Table of Contents for
Document Object Model (DOM) Parser