This section covers some technical details on how HTML2IPF works, and how to make it work when it does not :-)

If HTML2IPF makes an empty IPF file (or empty sections), this usually means that original HTML file is a bit non-standard. To verify this, you can enable debugging:

HTML2IPF -DEBUG+ index.html

This normally will produce a log file HTML2IPF like this:

--- [30 Mar 1997, 11:42pm] Conversion started: Index file index.html
--- Parsing file "index.html" ...
--- Parsing file "opengl_index_alpha.html" ...
--- Parsing file "glAccum.html" ...
--- Parsing file "glAlphaFunc.html" ...
--- Parsing file "glArrayElementEXT.html" ...
[...]

If after a 'Parsing file "..."' you see a lot of 'unexpected tag' and 'unexpected text' messages, this usually means that HTML2IPF's state machine gets out of synch with the HTML file sections, i.e. for example, it encounters tags used in <BODY> section in <HEADER> section, plain text in the <HEADER> section and so on. You should fix such files manually.

If you get lots of 'Unexpected tag' message on certain tag, and this tag is crucial for you, you can add your own handler for such tags into HTML2IPF. For this, you should do two things:

Tag handler gets called from the ParseContents subroutine (and can recursively call ParseContents too, like the handlers for <HEADER> and <BODY> do), and can use following variables to get information about its environment:

Beside this, to parse tokens with multiple subtags (like <IMG SRC="a.gif" ALT="missing picture" ALIGN=center >), there is a procedure called ParseTag. It accepts as a parameter the initial token (i.e. the contents of described above Token variable). All 'subtag' handlers should have the name in the form: doTag[Prefix]_[Subtag] where Prefix is the 'main' HTML tag, i.e. the handler for <IMG SRC="..."> subtag will have the doTagIMG_SRC name etc. Following variables are usable in the context of such a handler:

To output a text string into output text, you can use either PutToken or PutText subroutines. The difference is that first is used to put control tokens (like :p. or :ehp4.) while second is meant for plain text, and does conversion of control codes (i.e. if you`ll call PutText(':ehp3.') in the output file you`ll got the "&col.ehp3&per." text, which is not a valid IPF tag, but will be seen as the original :ehp3.' in the INF book). Also there is a NewLine routine which does a NewLine in the output stream if current line is non-empty.

One more detail: the Global. stem is used to keep all global variables, while any other variable is usualy local to the procedure. Each procedure contains an

expose Global.;
operator, which makes every member of Global. stem visible everywhere.

If you`ll make some useful changes, please send the changes to me, so I will incorporate any valuable changes into the HTML2IPF in future versions.


Browse through REXX script | Return to title page