Python has several different ways for working with the
DOM. The one you choose should best fit your needs. minidom
is smaller and faster than a fully
compliant DOM, but suits the needs of most users. pulldom
provides a way to build only the
portion of the DOM needed for a particular application, allowing the DOM
to be more easily used when working with large documents or tight memory
constraints. 4DOM is a full-fledged DOM Level 2 implementation. While
these are the dominant implementations of the DOM for Python, and the
only implementations described here, realize that there are additional
implementations available that may be more tailored to your
requirements.
minidom
, part of the xml.dom
package included with both the Python standard library and PyXML, is a
lightweight DOM implementation. Its goal is to provide a simple
implementation and smaller memory footprint than a full DOM
implementation. The methods for creating the DOM are simple as well.
minidom
also supports functions for
working with string-length XML chunks and methods for extracting
them.
Overall, minidom
may be best
for loading simple (not necessarily small) configuration files for
your applications, dealing with form submissions from web pages,
handling user authorization, and using it anywhere a “little” bit of
XML is needed. You can reduce memory and time overhead by using
minidom
. These are two elements of
significant importance in web application development.
pulldom
, which also may be imported from the xml.dom
package, may be just the thing to
save your life when faced with the task of taking a portion of a large
XML document and creating a DOM instance of the subset for
manipulation. pulldom
essentially
allows for the construction of selected portions of a DOM based on SAX
events. The module uses minidom
for
the actual nodes it returns.
pulldom
seeks to be a middle
ground between the DOM and SAX. pulldom
wants to overcome the
state-management (the place-marking mentioned earlier) of SAX, but
also preserve its stream-based processing for speed and efficiency.
pulldom
also seeks to simplify the
self-similar, intricately complex nature of a complete DOM tree, its
many nodes and lists, and its memory-gobbling nature.
Both minidom
and
pulldom
have their specific fits,
but for the remainder of this book, we work with 4DOM. This is a DOM
implementation that implements most of the Level 2 features that
actually make sense outside a browser.
After your experience with SAX earlier in this chapter, interacting with the DOM may seem incredibly easy by comparison. However, dealing with a seemingly endless intricacy of stacked node classes may send you running back to SAX to do your string comparisons. However you fare, the next sections seek to introduce you to working with the DOM in Python, and to provide a reference to its interfaces.
Regardless of the implementation you use, there are two basic types of operations you can perform with the DOM. The most common operations involve retrieving information from the document, which we discuss first. Once we cover that, we move on to explain how to use the DOM to modify and create documents.
18.118.2.240