The latest version of this document and other information about RTFtoHTML is kept at http://www.sunpack.com/RTF. You may contact the author: Chris Hector at chris@sunpack.com
The string "ShortFileDigits" sets the number of digits to use in short file names. The default value of 2 supports 99 split files, 3 supports 999 files.
The string "DoAllPars" if set to 1 will allow empty paragraphs to be output. Normally, these are suppressed (which is a nice feature within lists - otherwise you get empty bullet or numbered list entries.)
Previous versions generated a trailing <BR> in all cells to work around a bug in Netscape that caused empty cells to lose their border. Now only empty cells get the BR so that no empty space is generated at the end.
The <TD> tag now has the valign=TOP attribute. This matches how MSWord aligns cell contents.
This fails when in mid sentence - .removed it.
This showed up as a list item without a <LI> in front following a nested list.
The 3.8 feature of allowing multiple character sets worked with Macintosh documents containing Windows (ANSI) characters - the other way around now works.
The first footnote (and sometimes the first paragraph) of a document could have text theat preceded the <P> starting tag.
The string in html-trn "FNSep" can be used to specify the text that is inserted between the text of the document and the footnotes.
"The string in html-trn "ExtTarget" can be used to specify the text of a TARGET option. This allows external links to (links pointing outside of the current document) to be loaded in a full window rather than the old default behavior of loading the reference into the current frame.
In an RTF document, you can link to an external graphics file. When these links exist, the RTF file may contain a copy of the graphic file internally. The default behavior of RTFtoHTML is to ignore the embedded graphic and generate a link to the graphic file. (This can save space if you re-use the graphic multiple times.) You can now define the string "PreferrEmbed" to 1 and the filter will choose the internal RTF coipy of the graphic.
Some platforms generate a color table where `red' text has an RGB representation which is close to "FF0000", but not exact. The filter would not recognize this text as `red' and therefore not treat it as a cross-reference to a URL or heading. Now the filter finds the closest match to red in the color table and treats it as red.
A bookmark in RTF now generates an anchor tag <A>bookmarkname</A> at the start of the bookmarked text.
Word version 7 used field codes to represent a symbol character. The implementation was incorrect leaving no `result' value in the RTF. Now the filter recognizes this error, and generates the proper field result.
Word allows a style name to be specified as "heading 6,COM 1H,C.head1" which means that the style name is "heading 6" and it can also be referenced by the aliases "COM 1H" and "C.head1". These `alias' style names are now supported by having the .Pmatch table specify any of the allowable aliases.
Thanks to Alan Flavell the output translation and conversion has been modified to produce output that conforms to RFC2070/HTML4.0 standards, using, where necessary, unicode numeric character references (&#bignumber; representations). The html-map.v4 file is the UNICODE version of the mappings. These mappings require support from browsers - the default version will work on older browsers (as well as the new ones.) The filter ships with a "practical" version of this translation, a more accurate is the "pedantic" version available athttp://www.physics.gla.ac.uk/r2h-extras/ . The "practical" variants of these files contain various dumbing-down from the "pedantic" versions, in the interest of getting something to be actually displayed on some current browsers (e.g a normal asterisk in place of the mathasterisk, etc.).
This
allows different translation options to be used for different RTF input. (For
example different authors/conventions or output styles.)
The filter first looks in the directory containing the input file, then in the
current directory (UNIX) then the directory containing the filter itself.
If an RTF file was created on a Macintosh and then moved and edited on a PC (as well as the reverse scenario), the RTF file could contain both character sets. (Macintosh and PC character encodings are different for some characters above decimal 128. RTFtoHTML now recognizes and correctly interprets both encodings and switches from one to the other correctly.
A bug in footnote processing caused some footnotes to be ignored when following styled text (bold/italic/fon-changes)
_Literal is a way to allow the user to specify literal HTML markup. The filter adding <P> markup coud result in incorrect HTML.
The 3.7 release was never made public. All changes between 3.6 and 3.8 are listed above.
A regression in version 3.5 caused footnotes to be lost. The footnote references appeared, but not the underlying text.
Fields containing non-graphic results were being lost in version 3.5. In earlier versions (3.0.1) fields could cause the remainder of the document to be discarded. Now fields are properly parsed.
When an image was linked into a document AND the image itself was included, the filter would generate a link for each of these. Now the filter ignores the link to the image and uses the imbedded version of the image.
When hot text, footnotes or Names prior to a heading was encountered, the heading was incorrectly imbedded in the previous markup.
The accent character ` or right quote was being dropped from Headings and Title. There were several other `special' characters that were also dropped. These now appear correctly.
When RTFtoHTML was invoked to translate more than one document, the links/titles for table of contents and index appeared blank after the first document. This was caused by a flag not being re-initialized correctly.
An internal error was causing the first pass to process empty heading paragraphs differently than the second pass. Prior to 3.4 this could result in crashes. In 3.4 the problem was detected and the filter would issue an internal error and exit. The mismatch is now corrected.
The style "Body Text" was added to html-trn and mapped to "Normal".
An error occured if two headings in a document contained the same leading text, but one had lower case characters and the other had upper case. The file name processing assumed that the filename `SomeHead.rtf' was different than `somehead.rtf'. On Macintosh, DOS and Windows systems this is not true, so the filter would overwrite one file with another. The filter now makes these names unique (even on Unix platforms.)
In 3.4 the library tmpfile() was used to create temporary files. However in some installations, this would attempt to create temp files on a locked or full disk. Now temp files are created in the same directory as the output files (which will be writeable.)
Some tables (empty tables, or tables that were exactly the same as a prior table) could cause aborts of the filter.
The Windows 95 and Windows NT binaries are console applications, which mean that they can still be run from a DOS prompt. They are 32-bit applications which should improve performance. They also are distributed with batch files which support drag-n-drop conversions, and allow you to specify default conversion options.
The Windows 95 and Windows NT versions of the filter now support long filenames. By default, the long filenames are used, but short filenames can still be forced by the -s command line option.
All of the error and warning messages are emitted to standard error as before, but now are also saved to a .err file. This is particularly useful for DOS/Win/Win95 installations where stderr cannot be re-directed to a file. It is also helpful for converting several document - since each document gets its own .err file.
In version 3.0.1, converting a document produced by Office ` 97 would generate errors such as:
ReadStyleSheet: unknown token "\adjustright" ReadStyleSheet: unknown token "\cgrid"
In
some cases the conversion would still be successful, but in others, part of the
RTF input would be ignored.
Office `97 documents are now completely processed, with no warning messages.
An
often requested feature was to preserve the paragraph numbering on headings
(heading 1, ... heading 6 styles). In previous versions, the filter stripped
these heading numbers. Now the heading numbers appear in the body of the
document as well as the table of contents.
Note that if you use the feature of linking to headings with red text, the
linking text must now contain the heading numbers as well.
The HEIGHT and WIDTH tags generated for embedded pictures (<IMG> markup) have several changes:
By setting "SkipDimsOnIMGs" to 1 in the strings table, you can force the filter to not generate HEIGHT and WIDTH tags. This feature depends on the new ".Strings" section of the html-trn file. Sample use is as follows:
.Strings "SkipDimsOnIMGs",1
If an imbedded image was enlarged or reduced within the RTF document, the HEIGHT and WIDTH tags now are generated using the scaled sizes. This information will allow the browser to scale the image when it is displayed.
HEIGHT and WIDTH tags for WMF files were incorrectly generated (although this scaling matches how they are treated by Word 6 on the Macintosh.) There is now an additional scaling factor (133%) applied to the HEIGHT and WIDTH tags of WMF files. This scaling factor can also be changed my modifying the following line in html-trn:
"WMFAdjust",133
The 133 indicates that WMF HEIGHT and WIDTH tags should be 133% larger than would be generated normally by the filter. This value appears to be correct (from the testing that I have done) and should not have to be modified.
By default, all translated documents that are split into multiple documents by the -h option (Split Level) have a navigation panel to get from one document to another. This can be disabled by setting "SkipNavPanel " to 1 in the .Strings table of html-trn:
.Strings "SkipNavPanel",1
By default, if any headings are found in a document, there will be an internal table of contents generated at the start of the HTML document. This can be disabled by setting "SkipLeadingToc" to 1 in the .Strings table of html-trn:
.Strings " SkipLeadingToc ",1
By default, if any headings are found in a document, there will be an internal table of contents generated at the end of the HTML document. This can be disabled by setting "SkipTrailingToc" to 1 in the .Strings table of html-trn:
.Strings " SkipTrailingToc ",1
In 3.0.1, the filter generated "- Title" at the end of document titles for the title page. This was designed for split documents, but also appeared in non-split documents. Now the title for the Title page is simply the title from the RTF document (or the command line using the -T option.)
In 3.0.1 if no title was supplied in the RTF document or on the command line (with the -T option), the filter supplied "No Title" for a default. Now the default title is empty. This allows browsers to substitute the filename for a title. Alternatively, you can set a default title in the .Strings table of html-trn:
.Strings " DefaultTitle ","My Default Title"
By default, tables generated by the filter will have borders turned on if the first row of the table contained a border. If you want to override this checking, you can force all tables to be generated with borders by setting "AllBorder" in the.Strings table of html-trn:
.Strings " AllBorder ",1
In 3.0.1 if multiple files were dropped on the filter, a fatal error in an document would not correctly terminate all subsequent processing. This could result in invalid conversions, application aborts or freezes. Now a fatal error in a document will terminate processing. You will need to drag-n-drop the remaining files on to the filter.
In version 3.4 and above, automatically numbered footnotes that start at a number other than the default(1) are now supported. Note that footnotes that are reset on page boundaries or section boundaries (in the RTF) will continue to increment throughout the document in the HTML version - since there are no page-breaks in HTML.
In 3.0.1 a table that was immediately followed by a heading would not generate correct markup. The heading would be incorrectly included in the table.
In Windows and DOS versions, footnotes were accumulated in a file whose name could be too long if the original input file was 8 characters (+3 character extension.) This could result in a variety of conversion time errors. Now a correct temp file name is used.
In 3.0.1, if a heading appeared within a table, the table markup would be incorrectly generated. The table itself could be split (incorrectly) across multiple files. Now, headings found within tables will not be used for file splitting, and will not appear in the table of contents.
In 3.0.1 the font size in the Tmatch table needed to be twice the point size in order to match, thus a 10 point font would require a 20 point Tmatch entry. Now the point size matches correctly.