Basic DOM Operations

This section goes over different techniques for working with DOM, and highlights some of the features that MSXML supports but PyXML does not. In addition to these convenience functions added by Microsoft, working with MSXML means also working with COM, so examples are shown here to work with the various types returned by MSXML that may stray from your standard Python list types and tuples.

The Microsoft DOM supports the same operations as the PyXML DOM, but there are differences in using them. For starters, MSXML is only accessible via COM, so your Python needs to work as a COM client. Second, and related to the first, is MSXML is not a native Python implementation and therefore doesn’t use Python types like the lists and tuples you’d find in PyXML. This section shows you the basics of working with this foreign parser from within Python.

To illustrate some node and document manipulation, you need some source XML to manipulate. You’ll want structured data like books.xml shown in Example E-1, and try out your MSXML skills.

Example E-1. books.xml
<book name="Python and XML">
  <section name="Appendix E" type="Appendix">
    <chapterTitle>Appendix E</chapterTitle>
    <bodytext>This appendix focuses on techniques for using...
    </bodytext>
  </section>
</book>

Using MSXML, it’s easy to take this document apart. But before you can work with MSXML, you have to import the correct library to access COM objects (win32com.client). Additionally, for the call to Dispatch, you need the ProgID of the Microsoft XML parser. If you’ve installed the latest Microsoft XML SDK, you have Version 3.0 of the MSXML parser. You may also have it if you’re running Visual Studio.NET or Internet Explorer 6. However, if you aren’t sure, you can download the XML SDK from Microsoft and install the newest version of the parser.

After importing the client package and calling Dispatch with the correct ProgID, use MSXML’s load method to actually load a document:

>>> import win32com.client
>>> msxml = win32com.client.Dispatch("MSXML2.DOMDocument.3.0")
>>> msxml.load("books.xml")
1

The returned 1 indicates success in Python terms, and allows for the syntax:

if (msxml.load("books.xml")):
  # success
else:
  # failure

Now that the msxml instance is ready to go, you can begin plucking out nodes and experimenting with them.

MSXML Nodes

The MSXML objects will feel familiar to you if you’ve been working with the PyXML objects throughout this book. Retrieving a documentElement or getting a node’s nodeName works as you might suspect:

>>> docelem = msxml.documentElement
>>> print docelem.nodeName
book
>>> print docelem.getAttribute("name")
Python & XML

MSXML throws in the occasional convenience like the text attribute of its Node class. This method returns all text content (or character data) beneath the current node:

>>> print docelem.text
Appendix E This appendix focuses on techniques for using...

This can come in handy when working with text-heavy documents. Related to the text attribute is the xml attribute. The xml attribute returns a string of XML representing the current node and its children:

>>> print docelem.xml
<book name="Python and XML">
        <section name="Appendix E" type="Appendix">
                <chapterTitle>Appendix E</chapterTitle>
                <bodytext>This appendix focuses on techniques for using...
    </bodytext>
        </section>
</book>

This is a definite shortcut (for your typing at least) over using the PrettyPrint method in the PyXML DOM extensions package. Of course, just like PyXML, some MSXML methods return collections of nodes rather than single nodes. In these cases, use the MSXML NodeList interface for dealing with the collections.

Using a NodeList

MSXML3.0 has great support for node lists, and provides a NodeList object for use in their manipulation. This is slightly different then the native and robust list type provided by Python and PyXML. The NodeList object has a built-in iterator that you can take advantage of by calling the nextNode method; note that this is different from the concept of iterators as they have been implemented in Python 2.2 and newer versions.

node = NodeList.nextNode(  )
while node:
  # do something here...
  node = NodeList.nextNode(  )

A while loop can be used until the nextNode method fails to return a node. Example E-2, people.xml, shows some sample XML describing workers and their job titles.

Example E-2. people.xml
<employees>
  <person title="Project Manager">Cal Ender</person>
  <person title="Development Lead">A. Buddy Codit</person>
  <person title="Customer Service Rep">Will Icare</person> 
  <person title="Documentation Writer">E. Manual</person>
  <person title="Catering Specialist">Willy Eadit</person>
</employees>

In a structure such as this, a NodeList can be a convenient way to process all nodes of a certain type. A NodeList can be returned with a call to getElementsByTagName, or by using a string expression in one of the selectNodes and selectSingleNode methods of MSXML3.0. Example E-3 shows the NodeList in use in nodelists.py:

Example E-3. nodelists.py
"""
 nodelists.py - using the NodeList object
  from MSXML3.0
"""
import win32com.client

# source XML
strSourceDoc = "people.xml"

# instantiate parser
objXML = win32com.client.Dispatch("MSXML2.DOMDocument.3.0")

# check for successful loading
if (not objXML.load(strSourceDoc)):
  print "Error loading", strSourceDoc

# grab all person elements
peopleNodes = objXML.getElementsByTagName("person")

# begin iteration of NodeList with nextNode(  )
node = peopleNodes.nextNode(  )
while node:
  # print value of text descendants
  print "Name: ", node.text,

  # print value of title attribute
  print "	Position: ", node.getAttribute("title")

  # continue iteration
node = peopleNodes.nextNode(  )

When you run nodelists.py from the command prompt, you’ll get a textual version of its contents:

C:appD>c:python21python nodelists.py
Name:  Cal Ender        Position:  Project Manager
Name:  A. Buddy Codit   Position:  Development Lead
Name:  Will Icare       Position:  Customer Service Rep
Name:  E. Manual        Position:  Documentation Writer
Name:  Willy Eadit      Position:  Catering Specialist
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.35.194