xmllib
This module defines a class XMLParser
which serves as the basis
for parsing text files formatted in XML (eXtended Markup Language).
The XMLParser
class must be instantiated without arguments. It
has the following interface methods:
close()
is called.
XMLParser.close()
.
start_tag()
method has been defined. The tag
argument is the name of the tag, and the method
argument is the
bound method which should be used to support semantic interpretation
of the start tag. The attributes argument is a dictionary of
attributes, the key being the name and the value being the
value of the attribute found inside the tag's <>
brackets.
Lower case and double quotes and backslashes in the value have
been interpreted. For instance, for the tag
<A HREF="http://www.cwi.nl/">
, this method would be called as
handle_starttag('A', self.start_A, 'HREF': 'http://www.cwi.nl/')
.
The base implementation simply calls method
with attributes
as the only argument.
end_tag()
method has been defined. The tag
argument is the name of the tag, and the
method
argument is the bound method which should be used to
support semantic interpretation of the end tag. If no
end_tag()
method is defined for the closing element, this
handler is not called. The base implementation simply calls
method
.
&#ref;
''. ref can either be a decimal number,
or a hexadecimal number when preceded by x
.
In the base implementation, ref must be a number in the
range 0-255. It translates the character to ASCII and calls the
method handle_data()
with the character as argument. If
ref is invalid or out of range, the method
unknown_charref(ref)
is called to handle the error. A
subclass must override this method to provide support for character
references outside of the ASCII range.
&ref;
'' where ref is an general entity
reference. It looks for ref in the instance (or class)
variable entitydefs
which should be a mapping from entity names
to corresponding translations.
If a translation is found, it calls the method handle_data()
with the translation; otherwise, it calls the method
unknown_entityref(ref)
. The default entitydefs
defines translations for &
, &apos
, >
,
<
, and "
.
comment
argument is a string containing the text between the
``<!-
'' and ``->
'' delimiters, but not the delimiters
themselves. For example, the comment ``<!-text->
'' will
cause this method to be called with the argument 'text'
. The
default method does nothing.
data
argument is a string containing the text between the
``<![CDATA[
'' and ``]]>
'' delimiters, but not the delimiters
themselves. For example, the entity ``<![CDATA[text]]>
'' will
cause this method to be called with the argument 'text'
. The
default method does nothing.
name
is the PI target, and the data
argument is a
string containing the text between the PI target and the closing delimiter,
but not the delimiter itself. For example, the instruction
``<?XML text?>
'' will cause this method to be called with the
arguments 'XML'
and 'text'
. The default method does
nothing.
data
argument is a string containing the text between the
``<!
'' and ``>
'' delimiters, but not the delimiters
themselves. For example, the entity ``<!DOCTYPE text>
'' will
cause this method to be called with the argument 'DOCTYPE text'
. The
default method does nothing.
lineno
argument is the line number of the error, and the
message
is a description of what was wrong. The default method
raises a RuntimeError
exception. If this method is overridden,
it is permissable for it to return. This method is only called when
the error can be recovered from.
Apart from overriding or extending the methods listed above, derived classes may also define methods of the following form to define processing of specific tags. Tag names in the input stream are case dependent; the tag occurring in method names must be in the correct case:
handle_starttag()
above.