Standard Module sgmllib

sgmllib This module defines a class SGMLParser which serves as the basis for parsing text files formatted in SGML (Standard Generalized Mark-up Language). In fact, it does not provide a full SGML parser — it only parses SGML insofar as it is used by HTML, and the module only exists as a basis for the htmllib module. htmllib In particular, the parser is hardcoded to recognize the following elements: The SGMLParser class must be instantiated without arguments. It has the following interface methods:
\begin{funcdesc}{reset}{}
Reset the instance. Loses all unprocessed data. This is called
implicitly at instantiation time.
\end{funcdesc}

\begin{funcdesc}{setnomoretags}{}
Stop processing tags. Treat all following inpu...
... provided so the HTML tag \code{<PLAINTEXT>}
can be implemented.)
\end{funcdesc}

\begin{funcdesc}{setliteral}{}
Enter literal mode (CDATA mode).
\end{funcdesc}

\begin{funcdesc}{feed}{data}
Feed some text to the parser. It is processed insof...
...a is buffered until more data is
fed or \code{close()} is called.
\end{funcdesc}

\begin{funcdesc}{close}{}
Force processing of all buffered data as if it were fo...
...e
redefined version should always call \code{SGMLParser.close()}.
\end{funcdesc}

\begin{funcdesc}{handle_charref}{ref}
This method is called to process a charact...
..., the method
\code{unknown_charref(\var{ref})} is called instead.
\end{funcdesc}

\begin{funcdesc}{handle_entityref}{ref}
This method is called to process an enti...
...herwise, it calls the method
\code{unknown_entityref(\var{ref})}.
\end{funcdesc}

\begin{funcdesc}{handle_data}{data}
This method is called to process arbitrary d...
...n by a derived class; the base class implementation does
nothing.
\end{funcdesc}

\begin{funcdesc}{unknown_starttag}{tag\, attributes}
This method is called to pr...
...s \code{unknown_starttag('a', [('href', 'http://www.cwi.nl/')])}.
\end{funcdesc}

\begin{funcdesc}{unknown_endtag}{tag}
This method is called to process an unknow...
...n by a derived class; the base class implementation
does nothing.
\end{funcdesc}

\begin{funcdesc}{unknown_charref}{ref}
This method is called to process an unkno...
...n by a derived class; the base class
implementation does nothing.
\end{funcdesc}

\begin{funcdesc}{unknown_entityref}{ref}
This method is called to process an unk...
...n by a derived class; the base class
implementation does nothing.
\end{funcdesc}
Apart from overriding or extending the methods listed above, derived classes may also define methods of the following form to define processing of specific tags. Tag names in the input stream are case independent; the tag occurring in method names must be in lower case:
\begin{funcdesc}{start_\var{tag}}{attributes}
This method is called to process a...
...has the same meaning as described for \code{unknown_tag()} above.
\end{funcdesc}

\begin{funcdesc}{do_\var{tag}}{attributes}
This method is called to process an o...
...has the same meaning as described for \code{unknown_tag()} above.
\end{funcdesc}

\begin{funcdesc}{end_\var{tag}}{}
This method is called to process a closing tag \var{tag}.
\end{funcdesc}
Note that the parser maintains a stack of opening tags for which no matching closing tag has been found yet. Only tags processed by start_tag() are pushed on this stack. Definition of a end_tag() method is optional for these tags. For tags processed by do_tag() or by unknown_tag(), no end_tag() method must be defined.