4 DOM: The Document Object Model

The Document Object Model specifies a tree-based representation for an XML document. A top-level Document instance is the root of the tree, and has a single child which is the top-level Element instance; this Element has children nodes representing the content and any sub-elements, which may have further children, and so forth. Functions are defined which let you traverse the resulting tree any way you like, access element and attribute values, insert and delete nodes, and convert the tree back into XML.

The DOM is useful for modifying XML documents, because you can create a DOM tree, modify it by adding new nodes and moving subtrees around, and then produce a new XML document as output. You can also construct a DOM tree yourself, and convert it to XML; this is often a more flexible way of producing XML output than simply writing <tag1>...</tag1> to a file.

While the DOM doesn't require that the entire tree be resident in memory at one time, the Python DOM implementation currently does keep the whole tree in RAM. It's possible to write an implementation that stores most of the tree on disk or in a database, and reads in new sections as they're accessed, but this hasn't been done yet. This means you may not have enough memory to process very large documents as a DOM tree. A SAX handler, on the other hand, can potentially churn through amounts of data far larger than the available RAM.


Subsections