Parsing XML with DOM

Problem

You want to examine an XML file in detail.

Solution

Use DOM to parse the document, and process the resulting in-memory tree.

Discussion

The Document Object Model (DOM) is a tree-structured representation of the information in an XML document. It consists of several interfaces, the most important of which is the node . All are in the package org.w3c.dom , reflecting the influence of the World Wide Web Consortium (http://www.w3.org) in creating and promulgating the DOM. The DOM interfaces are shown in Table 21-1.

Table 21-1. DOM interfaces

Interface

Function

Document

Top-level representation of an XML document

Node

Representation of any node in the XML tree

Element

An XML element

Text

A textual string

You don’t have to implement these interfaces; the parser generates them. When you get to creating or modifying XML documents in Section 21.6, then you can create nodes. But even then there are implementing classes. Parsing an XML document with DOM is syntactically similar to processing a file with XSL, that is, you get a reference to a parser and call its methods with objects representing the input files. The difference is that the parser returns an XML DOM, a tree of objects in memory. Example 21-5 is code that simply parses an XML document.

Example 21-5. XParse.java

import java.io.*;
import org.w3c.dom.*;
import com.sun.xml.tree.*;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

/** Parse an XML file using DOM.
 */
public class XParse {

    /** Convert the file */
    public static void parse(String fileName) {
        try {
            System.err.println("Parsing " + fileName + "...");

            // Make the document a URL so relative DTD works.
            String uri = "file:" + new File(fileName).getAbsolutePath(  );

            XmlDocument doc = XmlDocument.createXmlDocument(uri);
            System.out.println("Parsed OK");

        } catch (SAXParseException ex) {
            System.err.println("+================================+");
            System.err.println("|         *Parse Error*          |");
            System.err.println("+================================+");
            System.err.println("+ Line " + ex.getLineNumber (  )
                                + ", uri " + ex.getSystemId (  ));
            System.err.println(ex.getClass(  ));
            System.err.println(ex.getMessage(  ));
            System.err.println("+================================+");
        } catch(SAXException ex) {
            System.err.println("+================================+");
            System.err.println("|         *SAX XML Error*        |");
            System.err.println("+================================+");
            System.err.println(ex.toString(  )); 
        } catch (IOException ex) {
            System.err.println("+================================+");
            System.err.println("|     *Input/Output Error*       |");
            System.err.println("+================================+");
            System.err.println(ex.toString(  ));
        }
    }

    public static void main(String[] av) {
        if (av.length == 0) {
            System.err.println("Usage: XParse file");
            return;
        }
        for (int i=0; i<av.length; i++) {
            parse(av[i]);
        }
    }
}

You then traverse the document. You can use the defined TreeWalker interface, or you can just use the algorithm shown in Example 21-6.

Example 21-6. XTW.java (partial listing)

/* Process all the nodes, recursively. */
protected void doRecursive(Node p) {
   if (p == null) {
         return;
   }
   NodeList nodes = p.getChildNodes(  );
   int numElem = nodes.getLength(  );
   Debug.println("xml-tree", "Element has " + numElem + " children");
   for (int i=0; i<numElem; i++) {
        Node n = nodes.item(i);
        if (n == null) {
            continue;
        }

        doNode(n);

    }
}

A full code example using this approach is given in Section 21.7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.223.10