Parsing XML Documents in Code

Up to this point, I've gone after a specific element in a Web page, but there are other ways of handling documents, too. For example, you can parse—that is, read and interpret—the entire document at once. Here's an example; in this case, I'll work through this entire XML document, meetings.xml, displaying all its nodes in an HTML Web page.

To handle this document, I'll create a function, iterateChildren, that will read and display all the children of a node. As with most parsers, this function is a recursive function, which means that it can call itself to get the children of the current node. To get the name of a node, I will use the nodeName property. To parse an entire document, then, you just have to pass the root node of the entire document to the iterateChildren function, and it will work through the entire document, displaying all the nodes in that document:

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML Document
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }
    .
    .
    .

Note that I've also passed an empty string ("") to the iterateChildren function. I'll use this string to indent the various levels of the display, to indicate what nodes are nested inside what other nodes. In the iterateChildren function, I start by creating a new text string with the current indentation string (which is either an empty string or a string of spaces), as well as the name of the current node and a <BR> element so that the browser will skip to the next line:

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML Document
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var text = indentSpacing + theNode.nodeName + "<BR>"
                .
                .
                .
                return text
            }
        </SCRIPT>
    </HEAD>
    .
    .
    .

I can determine whether the current node has children by checking the childNodes property, which holds a node list of the children of the current node. I can determine whether the current node has any children by checking the length of this list with its length property; if it does have children, I call iterateChildren on all child nodes. (Note also that I indent this next level of the display by adding four nonbreaking spaces—which you specify with the &nbsp; entity reference in HTML—to the current indentation string.)

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML Document
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var text = indentSpacing + theNode.nodeName + "<BR>"

                if (theNode.childNodes.length > 0) {
                    for (var loopIndex = 0; loopIndex <
                        theNode.childNodes.length; loopIndex++) {
                        text += iterateChildren(theNode.childNodes(loopIndex),
                        indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")
                    }
                }
                return text
            }
        </SCRIPT>
    </HEAD>
    .
    .
    .

And that's all it takes; here's the whole Web page:

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML Document
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var text = indentSpacing + theNode.nodeName + "<BR>"

                if (theNode.childNodes.length > 0) {
                    for (var loopIndex = 0; loopIndex <
                        theNode.childNodes.length; loopIndex++) {
                        text += iterateChildren(theNode.childNodes(loopIndex),
                        indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")
                    }
                }
                return text
            }
        </SCRIPT>
    </HEAD>

    <BODY>
        <CENTER>
            <H1>
                Parsing an XML Document
            </H1>
        </CENTER>

        <CENTER>
            <INPUT TYPE="BUTTON" VALUE="Parse and display the document"
                ONCLICK="parseDocument()">
        </CENTER>
        <DIV ID="resultsDIV"></DIV>
    </BODY>
</HTML>

When you click the button in this page, it will read meetings.xml and display its structure as shown in Figure 7.5. You can see all the nodes listed there, indented as they should be. Note also the "meta-names" that Internet Explorer gives to document and text nodes—#document and #text.

Figure 7.5. Parsing a document in Internet Explorer.


Parsing an XML Document to Display Node Type and Content

In the previous example, the code listed the names of each node in the meetings.xml document. However, you can do more than that—you can also use the nodeValue property to list the value of each node, and I'll do that in this section. In addition, you can indicate the type of each node that you come across by checking the nodeType property. Here are the possible values for this property:

ValueDescription
1Element
2Attribute
3Text
4CDATA section
5Entity reference
6Entity
7Processing instruction
8Comment
9Document
10Document type
11Document fragment
12Notation

Here's how I determine the type of a particular node, using a JavaScript switch statement of the kind that we saw in the previous chapter:

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML document and displaying node type and content
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var typeData
                switch (theNode.nodeType) {
                    case 1:
                        typeData = "element"
                        break
                    case 2:
                        typeData = "attribute"
                        break
                    case 3:
                        typeData = "text"
                        break
                    case 4:
                        typeData = "CDATA section"
                        break
                    case 5:
                        typeData = "entity reference"
                        break
                    case 6:
                        typeData = "entity"
                        break
                    case 7:
                        typeData = "processing instruction"
                        break
                    case 8:
                        typeData = "comment"
                        break
                    case 9:
                        typeData = "document"
                        break
                    case 10:
                        typeData = "document type"
                        break
                    case 11:
                        typeData = "document fragment"
                        break
                    case 12:
                        typeData = "notation"
                }
                .
                .
                .

If the node has a value (which I check by comparing nodeValue to null, which is the value that it will have if there is no actual node value), I can display that value like this:

<HTML>
    <HEAD>
        <TITLE>
            Parsing an XML document and displaying node type and content
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">
            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var typeData

                switch (theNode.nodeType) {
                    case 1:
                        typeData = "element"
                        break
                    case 2:
                        typeData = "attribute"
                        break
                    case 3:
                        typeData = "text"
                        break
                    case 4:
                        typeData = "CDATA section"
                        break
                    case 5:
                        typeData = "entity reference"
                        break
                    case 6:
                        typeData = "entity"
                        break
                    case 7:
                        typeData = "processing instruction"
                        break
                    case 8:
                        typeData = "comment"
                        break
                    case 9:
                        typeData = "document"
                        break
                    case 10:
                        typeData = "document type"
                        break
                    case 11:
                        typeData = "document fragment"
                        break
                    case 12:
                        typeData = "notation"
                }
                  var text

                  if (theNode.nodeValue != null) {
                      text = indentSpacing + theNode.nodeName
                      + "&nbsp; = " + theNode.nodeValue
                      + "&nbsp; (Node type: " + typeData
                      + ")<BR>"
                  } else {
                      text = indentSpacing + theNode.nodeName
                      + "&nbsp; (Node type: " + typeData
                      + ")<BR>"
                  }

                 if (theNode.childNodes.length > 0) {
                    for (var loopIndex = 0; loopIndex <
                        theNode.childNodes.length; loopIndex++) {
                        text += iterateChildren(theNode.childNodes(loopIndex),
                        indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")
                    }
                }
                return text
            }
        </SCRIPT>
    </HEAD>

    <BODY>
        <CENTER>
            <H1>
                Parsing an XML document and displaying node type and content
            </H1>
        </CENTER>

        <CENTER>
            <INPUT TYPE="BUTTON" VALUE="Parse and display the document"
                ONCLICK="parseDocument()">
        </CENTER>

        <DIV ID="resultsDIV"></DIV>
    </BODY>
</HTML>

And that's all it takes; the results are shown in Figure 7.6. As you see there, the entire document is listed, as is the type of each node. In addition, if the node has a value, that value is displayed.

Figure 7.6. Using JavaScript to display element content and type.


This example listed the nodes of a document—on the other hand, some of the elements in meetings.xml have attributes as well. So how do you handle attributes?

Parsing an XML Document to Display Attribute Values

You can get access to an element's attributes with the element's attributes property. You can get attribute names and values with the name and value properties of attribute objects—I used the value property earlier in this chapter. It's also worth noting that because attributes are themselves nodes, you can use the nodeName and nodeValue properties to do the same thing; I'll do that in this example to show how it works.

Here's how I augment the previous example, looping over all the attributes that an element has and listing them. (Note that you could use the name and value properties here instead of nodeName and nodeValue.)

<HTML>
    <HEAD>
        <TITLE>
            Parsing XML to read attributes
        </TITLE>

        <XML ID="meetingsXML" SRC="meetings.xml"></XML>

        <SCRIPT LANGUAGE="JavaScript">

            function parseDocument()
            {
                documentXML = document.all("meetingsXML").XMLDocument
                resultsDIV.innerHTML = iterateChildren(documentXML, "")
            }

            function iterateChildren(theNode, indentSpacing)
            {
                var typeData

                switch (theNode.nodeType) {
                    case 1:
                        typeData = "element"
                        break
                    case 2:
                        typeData = "attribute"
                        break
                    case 3:
                        typeData = "text"
                        break
                    case 4:
                        typeData = "CDATA section"
                        break
                    case 5:
                        typeData = "entity reference"
                        break
                    case 6:
                        typeData = "entity"
                        break
                    case 7:
                        typeData = "processing instruction"
                        break
                    case 8:
                        typeData = "comment"
                        break
                    case 9:
                        typeData = "document"
                        break
                    case 10:
                        typeData = "document type"
                        break
                    case 11:
                        typeData = "document fragment"
                        break
                    case 12:
                        typeData = "notation"
                }
                  var text

                  if (theNode.nodeValue != null) {
                      text = indentSpacing + theNode.nodeName
                      + "&nbsp; = " + theNode.nodeValue
                      + "&nbsp; (Node type: " + typeData
                      + ")"
                  } else {
                      text = indentSpacing + theNode.nodeName
                      + "&nbsp; (Node type: " + typeData
                      + ")"
                  }

                if (theNode.attributes != null) {
                     if (theNode.attributes.length > 0) {
                         for (var loopIndex = 0; loopIndex <
                             theNode.attributes.length; loopIndex++) {
                             text += " (Attribute: " +
                                 theNode.attributes(loopIndex).nodeName +
                                 " = "" +
                                 theNode.attributes(loopIndex).nodeValue
                                 + "")"
                         }
                     }
                 }

                 text += "<BR>"

                 if (theNode.childNodes.length > 0) {
                    for (var loopIndex = 0; loopIndex <
                        theNode.childNodes.length; loopIndex++) {
                        text += iterateChildren(theNode.childNodes(loopIndex),
                        indentSpacing + "&nbsp;&nbsp;&nbsp;&nbsp;")
                    }
                }
                return text
            }

        </SCRIPT>
    </HEAD>

    <BODY>
        <CENTER>
            <H1>
               Parsing XML to read attributes
            </H1>
        </CENTER>

        <CENTER>
            <INPUT TYPE="BUTTON" VALUE="Parse and display the document"
                ONCLICK="parseDocument()">
        </CENTER>
        <DIV ID="resultsDIV"></DIV>
    </BODY>
</HTML>

You can see the results of this page in Figure 7.7; both elements and attributes are listed in that figure.

Figure 7.7. Listing elements and attributes in Internet Explorer.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.132.223