Processing XML with MSXML

MSXML supports two basic APIs for processing XML: DOM and SAX (the Simple API for XML). Let's start with DOM.

MSXML and DOM

As I've mentioned, the DOM method involves parsing an XML document and loading it into a tree structure in memory. An XML document parsed via DOM is known as a DOM document (or just DOM, for short). Listing 8.9 presents a simple VB application that demonstrates parsing an XML document via DOM and querying it for a particular node set. (You can find the source code for this app in the CH08domltest subfolder on the CD accompanying this book.)

Listing 8.9. A VB App That Processes an XML Document via DOM
Private Sub Command1_Click()

  Dim bstrDoc As String

  bstrDoc = "<Songs> " & _
  "<Song title='One More Day' artist='Diamond Rio' />" & _
  "<Song title='Hard Habit to Break' artist='Chicago' />" & _
  "<Song title='Forever' artist='Kenny Loggins' />" & _
  "<Song title='Boys of Summer' artist='Don Henley' />" & _
  "<Song title='Cherish' artist='Kool and the Gang' />" & _
  "<Song title='Dance' artist='Lee Ann Womack' />" & _
  "<Song title='I Will Always Love You' artist= _
      'Whitney Houston' />" & _
"</Songs>"

  Dim xmlDoc As New DOMDocument30

  If Len(Text1.Text) = 0 Then
    Text1.Text = bstrDoc
  End If

  If Not xmlDoc.loadXML(Text1.Text) Then
    MsgBox "Error loading document"
  Else
    Dim oNodes As IXMLDOMNodeList
    Dim oNode As IXMLDOMNode

    If Len(Text2.Text) = 0 Then
      Text2.Text = "//Song/@title"
    End If
    Set oNodes = xmlDoc.selectNodes(Text2.Text)

    For Each oNode In oNodes
      If Not (oNode Is Nothing) Then
        sName = oNode.nodeName
        sData = oNode.xml
        MsgBox "Node <" + sName + ">:" _
          + vbNewLine + vbTab + sData + vbNewLine
      End If
    Next

    Set xmlDoc = Nothing
    End If
End Sub

We begin by instantiating a DOMDocument object. The DOMDocument object is the key to everything else we do with DOM using MSXML. We next call DOMDocument.loadXML to parse the XML document and load it into the DOM tree. Once the document is loaded into memory, we can query it via XPath queries or manipulate it further by making DOMDocument method calls. In this example, we call the selectNodes method to query the document via XPath. DOMDocument's selectNodes method returns a node list object, which we can then loop through using For Each. For each node in the node set, we display the node name followed by its contents. Parsing an XML document via DOM turns the document into a memory object that we can then work with just as we would any other object. We're able to access and manipulate the document as though it were an object because that's exactly what it is.

MSXML and SAX

Like DOM, SAX is a W3C standard. Rather than providing an application access to XML data by materializing the document entirely in memory, SAX is an event-driven API. An application processes an XML document via SAX by responding to SAX events. As the SAX processor reads through the document, it raises an event each time it encounters a new node or section of the document. It then triggers the appropriate application event handler code and passes the relevant data about the event to the application. The application can then decide what to do in response—it could store the event data in some type of tree structure, as is the case with DOM processing; it could ignore the event; it could search the event data for a particular node or value; or it could take some other action. Once the application handles the raised event, the SAX processor continues processing the document. At no point does it store the entire document in memory as DOM does. It's really just a parsing mechanism to which an application can attach its own functionality. This is, in fact, the case with MSXML's DOM loader—SAX is its underlying parsing mechanism. MSXML's DOM loader sets up SAX event handlers that store the data passed to them via SAX in a DOM tree.

Given that SAX doesn't persist document data in memory, it's inherently far less memory consumptive than DOM. SAX is also much more trouble to use. By persisting documents in memory, DOM makes working with XML documents as easy as working with any other kind of object.

Listing 8.10 shows some VB code that demonstrates how to use SAX. It consists of three main modules: the main form, a content handler class, and an error handler class. (You can find the full source code for this application in the SAX subfolder under the CH08 folder on the CD accompanying this book.)

Listing 8.10. A VB App That Processes an XML Document via SAX
' Main form
Option Explicit

Private Sub Command1_Click()

    'Create the SAX reader object
    Dim reader As New SAXXMLReader

    'Set up the event handlers
    Dim CHandler As New ContentHandler
    Set reader.ContentHandler = CHandler

    Dim EHandler As New ErrorHandler
    Set reader.ErrorHandler = EHandler

    Text1.text = ""
    On Error GoTo ErrorTrap

    reader.parseURL (App.Path & "" & Text2.text)
    Exit Sub

ErrorTrap:
    Text1.text = Text1.text & "Error: " & Err.Number & " : "
        & Err.Description

End Sub

' Content handler
Option Explicit

Implements IVBSAXContentHandler

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI
    As String, strLocalName As String, strQName As String, ByVal
    attributes As MSXML2.IVBSAXAttributes)

    Form1.Text1.text = Form1.Text1.text & "__ELEMENT START__" &
        vbCrLf & "<" & strLocalName

    Dim i As Integer
    For i = 0 To (attributes.length - 1)
        Form1.Text1.text = Form1.Text1.text & " " &
            attributes.getLocalName(i) & "=""" &
            attributes.getValue(i) & """"
    Next

    Form1.Text1.text = Form1.Text1.text & ">" & vbCrLf

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI
    As String, strLocalName As String, strQName As String)

    Form1.Text1.text = Form1.Text1.text & "__ELEMENT END__" &
        vbCrLf & "</" & strLocalName & ">" & vbCrLf

End Sub

Private Sub IVBSAXContentHandler_characters(text As String)
    text = Replace(text, vbLf, vbCrLf)
    Form1.Text1.text = Form1.Text1.text & "__CHARACTERS__" &
        vbCrLf & text & vbCrLf
End Sub

Private Property Set IVBSAXContentHandler_documentLocator
    (ByVal RHS As MSXML2.IVBSAXLocator)
    Form1.Text1.text = Form1.Text1.text & "__DOCUMENT_LOCATOR__" &
        vbCrLf
End Property

Private Sub IVBSAXContentHandler_endDocument()
    Form1.Text1.text = Form1.Text1.text & "__DOCUMENT END__" &
        vbCrLf
End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix
    As String)
    Form1.Text1.text = Form1.Text1.text & "__PREFIX MAPPING__" &
        vbCrLf & strPrefix & vbCrLf
End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars
    As String)
    Form1.Text1.text = Form1.Text1.text & "__IGNORABLE
        WHITESPACE__" & vbCrLf & strChars & vbCrLf
End Sub

Private Sub IVBSAXContentHandler_processingInstruction(target
    As String, data As String)
    Form1.Text1.text = Form1.Text1.text & "__PROCESSING
        INSTRUCTION__" & vbCrLf & "<?" & target & " " &
        data & ">" & vbCrLf
End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
    Form1.Text1.text = Form1.Text1.text & "__SKIPPED ENTITY__" &
        vbCrLf & strName & vbCrLf
End Sub

Private Sub IVBSAXContentHandler_startDocument()
    Form1.Text1.text = Form1.Text1.text & "__DOCUMENT START__" &
        vbCrLf
End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix
    As String, strURI As String)
    Form1.Text1.text = Form1.Text1.text & "__START PREFIX
    MAPPING__" & strPrefix & " " & strURI & " " & vbCrLf
End Sub

' Error handler
Option Explicit

Implements IVBSAXErrorHandler


Private Sub IVBSAXErrorHandler_fatalError
    (ByVal lctr As IVBSAXLocator, msg As String, ByVal
    errCode As Long)
    Form1.Text1.text = Form1.Text1.text & "Fatal error: " &
    msg & " Code: " & errCode
End Sub

Private Sub IVBSAXErrorHandler_error(ByVal lctr As IVBSAXLocator,
    msg As String, ByVal errCode As Long)
    Form1.Text1.text = Form1.Text1.text & "Error: " & msg &
        " Code: " & errCode
End Sub

Private Sub IVBSAXErrorHandler_ignorableWarning
    (ByVal oLocator As MSXML2.IVBSAXLocator,
    strErrorMessage As String, ByVal nErrorCode As Long)

End Sub

As I said earlier, an application makes use of the SAX engine by invoking the SAX parser and responding to the events it raises. To use MSXML's SAX engine in a VB application, you implement SAX interfaces such as IVBSAXContentHandler, IVBSAXErrorHandler, IVBSAXDeclHandler, IVBSAXDTDHandler, and IVBSAXLexicalHandler. Implementing these interfaces amounts to setting up event handlers to respond to the events they define. In this example code, I've implemented IVBSAXContentHandler and IVBSAXErrorHandler via the ContentHandler and ErrorHandler classes.

We begin by instantiating a SAXXMLReader object. This object will process an XML document we pass it and raise events as appropriate as it reads through the document. The code in the ContentHandler and ErrorHandler classes will respond to these events and write descriptive text to the main form.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.214.6