sgmllib
This module defines a class SGMLParser which serves as the
basis for parsing text files formatted in SGML (Standard Generalized
Mark-up Language). In fact, it does not provide a full SGML parser
— it only parses SGML insofar as it is used by HTML, and the module only
exists as a basis for the htmllib module.
htmllib
In particular, the parser is hardcoded to recognize the following
elements:
- Opening and closing tags of the form
``<tag attr="value" ...>'' and
``</tag>'', respectively.
- Character references of the form ``&#name;''.
- Entity references of the form ``&name;''.
- SGML comments of the form ``<!–text>''.
The SGMLParser class must be instantiated without arguments.
It has the following interface methods:
Apart from overriding or extending the methods listed above, derived
classes may also define methods of the following form to define
processing of specific tags. Tag names in the input stream are case
independent; the tag occurring in method names must be in lower
case:
Note that the parser maintains a stack of opening tags for which no
matching closing tag has been found yet. Only tags processed by
start_tag() are pushed on this stack. Definition of a
end_tag() method is optional for these tags. For tags
processed by do_tag() or by unknown_tag(), no
end_tag() method must be defined.