Package:
This package includes following files:
HTML.DOT WinWord 2.0 template file including some
styles for writing and some macros to handle
document as HTML.
RTFTOHTM.DLL The conversion DLL.
RTFTOHTM.LIB The import library for the DLL.
RTFTOHTM.H Public interface of the DLL.
HTMTOOLS.DOC Some documentation about the converter (this file).
HTMTOOLS.HTM Documentation converted to .HTM.
RTF2HTM.EXE QuickWin application for converting files.
RTF2HTM.C Source for above.
RTF2HTM.MAK Makefile for above.
README.TXT Quick document.
There are couple words about installation at
the end of this document.
Changes and corrections from 0.60 to 0.70
- RTFTOHTM.INI added. There are some variables in .INI file including
line length proposals for preformatted and the rest of styles.
- The <HTML> tags added.
- Some extra (unneeded) <P> tag generations removed.
- The attribute values of <A> and <IMG> tags enclosed
by double quotes. Some browsers and editors require this.
Changes and corrections from 0.50 to 0.60
- Some entities have been removed.
- Handling of \par<CRLF>\par<CRLF> correct now. Thanks
to Robert Thompson to notice this bug.
- Some internal structures changed to be more robust.
- The function RTFtoHTM returns the length of the destination correctly.
Document template
HTML.DOT includes some HTML styles. This document template is not
meant to be perfect HTML editing system, but I found it quite useful
with companion of the conversion dll; RTFTOHTM.DLL. Here are some
features offered by this software.
Document HEADer
There is possible to use page header to generate <HEAD> section
to html file. There is no error checking on it, so the feature should
be used correctly by convention. There should be only one header
in the document. You can create such by View_Header/Footer.
In the Header/Footer dialog You should check 'Different First Page',
and select Header. This makes 'headerf' style to the beginning of
the first page.
In the header You should put only TITLE and LINK styles. You can
make a paragraph to TITLE by selecting it, and clicking T
button in the button bar or selecting a style Title from style
list in ribbon. You can make LINKs into header by selecting text
and setting its style to Link.
If you define a header into document, converter creates both HEAD
and BODY sections to resulting HTML file.
Heading 1 to Heading 6
Paragraph to be converted to a heading should be selected. It is
enough if a caret is blinking within a paragraph. Then either clicking
E button in the button bar or selecting an appropriate style
from a style list.
Emphasizing
There are some emphasis inline styles. You can use Bold, Italics,
Underline, Word Underline by appropriate WinWord
emphasis. The interpretation of Bold is <B>, Italics
is <I>, Underline is <U>, and Word Underline
is <EM>.
Block emphasizing
There are three types of block emphasizing styles:
Preformatted
This is a preformatted text as HTML style <PRE>.
This style is applied by selecting this text, and clicking P button in the button bar.
This style can contain other HTML styles, such as anchors, emphasizing inline styles...
Best way to break lines in preformatted style is to use LineBreak (Shift-Enter) in
WinWord. See more info at chapter Installation.
Block Quote
This style is BlockQuote. This is intended to quote some larger block
of text, and is emphasized in some way. This style is applied by
selecting text, and clicking B button in the button bar.
Address
This is an Address style. These styles are presented differently
in different parsers. You can correct a presentation in WinWord simply
by formatting style by your own taste. Paragraph written in this
style is aligned right.
Lists
There are two types of lists available. Ordered list as style Order
List, and bulleted list as style Unnum List. These types You can
attach to text by selecting text, and clicking a number list or bullet
list button in the button bar. Each paragraph within selected text
becomes a list item. The converter doesn't allow nesting these styles.
If you want to nest them, you must use HTML style (below) and write
a markup yourself.
Numbered list
- This is a numbered list item one
- This is item two
- And item three
- Style is Order List
Bulleted list
- This is a bulleted list item one
- This is item two
- And item three
- Style is Unnum List
HTML style
Those <HR> patterns (visible in original WinWord document)
are horizontal lines, and must be a style of HTML. The style HTML
allows user to embed 'plain' HTML into Word document. The converter
doesn't touch to this style in any way. The style HTML allows user
to enter such HTML tags or styles this converter doesn't support
in other way. Here is an example.
Hypertext links
Hypertext links has specific syntax in this template, but nothing
denies user to make his own... The styleurl is composed as follows:
There is an url separated by space from following text. An url part
of the text becomes hidden red, and button part becomes double underlined
blue.
http://www.ncsa.uiuc.edu/demoweb/demo.html This is a demo page
of NCSA.
By selecting whole url including a button text and clicking L
button in the button bar makes text as following:
This is a demo page of NCSA.
You can also define an hypertext link pointing to specific part
of current or exterrnal document by deffining a name in the link.
Here is an example of that kind of link. Only difference to the 'ordinary'
link is that a position within destination document is present.
This is a test jump to the anchor marked 'testjump'.
The anchor can be made by defining a name and label. This can
be made by selecting an anchor text and pressing D button in the
button bar.
This is a destination for above 'testjump'.
Notice that although I use SmallCaps style for a name of the anchor,
the case of the characters is preserved in the resulting HTML file.
Inline images
Inline graphics is quite problematic in WYSIWYG point of view. I
have made it in easy way (for now). It is presented simply as strike
through formatting in WinWord. This allows its use as a link button
in the hypertext link.
This is stand alone image:
Here is a button image:
Inline image can act as button. Converter cannot generate following:
<A HREF=anchor><IMG SRC=jchs.gif> The Author</A>
A button field of the hyperlink can contain either an image or plain text but not both.
Undoing HTML
N button in the button bar resets style of the selected paragraph(s)
back to normal. U button removes some character formatting
properties from the selected text. It can be used to reset hyperlinks
or image links back to normal.
Saving a file
You can save a file as *.HTM by pressing S button in the button bar.
File can be saved as *.RTF by pressing R button. Saved file gives
a same name as current WinWord document and an extension .HTM or
.RTF depending a save format.
Fine tuning of a resulted HTML file
The resulted HTML file should be edited with plain text editor to
clean it. WinWord 2.0 RTF filter may do nasty breaks within anchors,
button text, or inline, if it spans certain memory limits, so they
should be corrected before publishing a file.
Installation .
Copy HTML.DOT into WinWord directory, and RTFTOHTM.DLL somewhere
in PATH (eg. WINWORD or WINDOWS).
If you choose to change .INI settings, RTFTOHTM.INI file must
be in the same directory with the converter dll. If this .INI file
doesn't exist, the converter uses internal defaults which are equivalent
with following .INI settings:
[System]
ZoneMin=64
ZoneMax=80
PrefMin=0
PrefMax=80
ZoneMin and ZoneMax define the proposal for line
break zone for most of the styles in resulting .HTM file. If one
of these is zero, the converter ignores both of these variables.
Anchor and Img tags may go over the limits. The converter doesn't
break line within those tags.
PrefMin
and PrefMax define the proposal for line
break zone for preformatted style in resulting .HTM file. If one
of these is zero, the converter ignores both of these variables.
Note that the default zone for preformatted is 0...80. This means
that you must explicitly break lines either by <Enter> or <Shift-Enter>.
The line break should be <Shift-Enter> in preformatted.
RTFTOHTM.DLL .
There is one exported function in the converter DLL. C-prototype
is as follows:
long FAR PASCAL RTFtoHTM(LPSTR iname, int ilen, LPSTR oname, int olen);
LPSTR ibuff; //input buffer or filename.
int ilen; //length of input buffer. if zero, iname is treated as filename.
LPSTR obuff; //output buffer or filename.
int olen; //length of output buffer. if zero, oname is treated as filename.
The return value represents a length (in chars) of a resulted HTML
file or return buffer.
Basic (or Word macro) declaration is as follows:
Declare Function RTFtoHTM Lib "rtftohtm.dll" (iname$, ilen As Integer, oname$, olen As Integer) As Integer
The meaning of parameters are the same as above. If the function
is to be used with Visual Basic, all the parameters must be declared
ByVal.
Name: Jorma Hartikka Senior Programmer
Analyst
Org: VTKK Government Systems ltd/Information Systems
Email: Jorma.Hartikka@csc.fi