Standard Module htmllib

htmllib This module defines a number of classes which can serve as a basis for parsing text files formatted in HTML (HyperText Mark-up Language). The classes are not directly concerned with I/O — the have to be fed their input in string form, and will make calls to methods of a ``formatter'' object in order to produce output. The classes are designed to be used as base classes for other classes in order to add functionality, and allow most of their methods to be extended or overridden. In turn, the classes are derived from and extend the class SGMLParser defined in module sgmllib. sgmllib SGMLParser The following is a summary of the interface defined by sgmllib.SGMLParser: The module defines the following classes:
\begin{funcdesc}{HTMLParser}{}
This is the most basic HTML parser class. It defi...
...code{<PLAINTEXT>} (the latter is terminated only by end of file).
\end{funcdesc}

\begin{funcdesc}{CollectingParser}{}
This class, derived from \code{HTMLParser},...
...\code{<TITLE>...</TITLE>}, \code{<NEXTID>}, and \code{<ISINDEX>}.
\end{funcdesc}

\begin{funcdesc}{FormattingParser}{formatter\, stylesheet}
This class, derived f...
...nted later in this section.
\index{formatter}
\index{style sheet}
\end{funcdesc}

\begin{funcdesc}{AnchoringParser}{formatter\, stylesheet}
This class, derived fr...
...ormatter to display the anchor in a different font or color, etc.
\end{funcdesc}
Instances of CollectingParser (and thus also instances of FormattingParser and AnchoringParser) have the following instance variables:
\begin{datadesc}{anchornames}
A list of the values of the \code{NAME} attributes of the \code{<A>}
tags encountered.
\end{datadesc}

\begin{datadesc}{anchors}
A list of the values of \code{HREF} attributes of the \code{<A>} tags
encountered.
\end{datadesc}

\begin{datadesc}{anchortypes}
A list of the values of the \code{TYPE} attributes of the \code{<A>}
tags encountered.
\end{datadesc}

\begin{datadesc}{inanchor}
Outside an \code{<A>...</A>} tag pair, this is zero. ...
... \code{anchors},
\code{anchornames} and \code{anchortypes} lists.
\end{datadesc}

\begin{datadesc}{isindex}
True if the \code{<ISINDEX>} tag has been encountered.
\end{datadesc}

\begin{datadesc}{nextid}
The attribute list of the last \code{<NEXTID>} tag encountered, or
an empty list if none.
\end{datadesc}

\begin{datadesc}{title}
The text inside the last \code{<TITLE>...</TITLE>} tag pair, or
\code{''} if no title has been encountered yet.
\end{datadesc}
The anchors, anchornames and anchortypes lists are ``parallel arrays'': items in these lists with the same index pertain to the same anchor. Missing attributes default to the empty string. Anchors with neither a HREF nor a NAME attribute are not entered in these lists at all. The module also defines a number of style sheet classes. These should never be instantiated — their class variables are the only behavior required. Note that style sheets are specifically designed for a particular formatter implementation. The currently defined style sheets are:
\begin{datadesc}{NullStylesheet}
A style sheet for use on a dumb output device such as an \ASCII{}
terminal.
\end{datadesc}

\begin{datadesc}{X11Stylesheet}
A style sheet for use with an X11 server.
\end{datadesc}

\begin{datadesc}{MacStylesheet}
A style sheet for use on Apple Macintosh computers.
\end{datadesc}

\begin{datadesc}{StdwinStylesheet}
A style sheet for use with the \code{stdwin} ...
...\code{X11Stylesheet} or \code{MacStylesheet}.
\bimodindex{stdwin}
\end{datadesc}

\begin{datadesc}{GLStylesheet}
A style sheet for use with the SGI Graphics Libra...
...modules \code{gl} and \code{fm}).
\bimodindex{gl}
\bimodindex{fm}
\end{datadesc}
Style sheets have the following class variables:
\begin{datadesc}{stdfontset}
A list of up to four font definititions, respective...
...use is as a parameter to the
formatter's \code{setfont()} method.
\end{datadesc}

\begin{datadesc}{h1fontset}
\dataline{h2fontset}
\dataline{h3fontset}
The font s...
...various headers (text inside \code{<H1>...</H1>}
tag pairs etc.).
\end{datadesc}

\begin{datadesc}{stdindent}
The indentation of normal text. This is measured in ...
...ally support
variable-spacing fonts) in pixels or printer points.
\end{datadesc}

\begin{datadesc}{ddindent}
The indentation used for the first level of \code{<DD>} tags.
\end{datadesc}

\begin{datadesc}{ulindent}
The indentation used for the first level of \code{<UL>} tags.
\end{datadesc}

\begin{datadesc}{h1indent}
The indentation used for level 1 headers.
\end{datadesc}

\begin{datadesc}{h2indent}
The indentation used for level 2 headers.
\end{datadesc}

\begin{datadesc}{literalindent}
The indentation used for literal text (text inside
\code{<PRE>...</PRE>} and similar tag pairs).
\end{datadesc}
Although no documented implementation of a formatter exists, the FormattingParser class assumes that formatters have a certain interface. This interface requires the following methods:
\begin{funcdesc}{setfont}{fontspec}
Set the font to be used subsequently. The \var{fontspec} argument is
an item in a style sheet's font set.
\end{funcdesc}

\begin{funcdesc}{flush}{}
Finish the current line, if not empty, and begin a new one.
\end{funcdesc}

\begin{funcdesc}{setleftindent}{n}
Set the left indentation of the following lines to \var{n} units.
\end{funcdesc}

\begin{funcdesc}{needvspace}{n}
Require at least \var{n} blank lines before the next line. Implies
\code{flush()}.
\end{funcdesc}

\begin{funcdesc}{addword}{word\, space}
Add a \var{word} to the current paragraph, followed by \var{space}
spaces.
\end{funcdesc}

\begin{datadesc}{nospace}
If this instance variable is true, empty words should ...
... It should be set to false after a non-empty word has
been added.
\end{datadesc}

\begin{funcdesc}{setjust}{justification}
Set the justification of the current pa...
...'r'} (right justified) or \code{'lr'} (left and
right justified).
\end{funcdesc}

\begin{funcdesc}{bgn_anchor}{id}
Begin an anchor. The \var{id} parameter is the value of the parser's
\code{inanchor} attribute.
\end{funcdesc}

\begin{funcdesc}{end_anchor}{id}
End an anchor. The \var{id} parameter is the value of the parser's
\code{inanchor} attribute.
\end{funcdesc}
A sample formatter implementation can be found in the module fmt, which in turn uses the module Para. These modules are not intended as standard library modules; they are available as an example of how to write a formatter. fmt Para