-= HTML2TEXT v1.50 =-
HTML2TEXT v1.50
1997-1998 (c) Gavin Spearhead
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
I. What is it?
HTML2TEXT is a utility that converts HTML files to plain text. Optionally
it also tries to figure out if the HTML file is well-constructed.
II. What's the legal stuff?
All Rights Reserved.
Permission to use, copy, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting
documentation, and that the name Gavin Spearhead not be used in
advertising or publicity pertaining to distribution of the software
without specific, written prior permission.
*** DISCLAIMER ***
GAVIN SPEARHEAD DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
EVENT SHALL GAVIN SPEARHEAD BE LIABLE FOR ANY SPECIAL, INDIRECT OR
CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF
USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTUOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
***
III. General information
This documentation is written in HTML format in way that it is readable in
a text viewer as good as possible and as long as the HTML format allows
it. A to-text-converted version is also included.
All my documentations are, since about June 1997, written in HTML. Prior
to that date they were written in plain text. This means a) that I can put
them on the big web, b) they're readable at all time and c) at all
machines that have a browser for it and d) they're easily converted to
other formats, such as Postscript, Wordperfect and of course plain-text e)
I can easily make formatted documentations.
Any bugs, errors, suggestions, thought, ideas, etc should be sent to the
author, these also includes errors in the documentation. Also the
existence of not supported HTML-tags or entity sequences can be sent to
the author, along with a description, restrictions and options. No matter
how puny or important your help is, I need your help to improve this
program.
If you want to become a beta tester of this program contact me and I'll
send you the details. Unfortunately I cannot give rewards other then
gratitude.
You are encouraged to register this piece of software. This means that you
will either receive the latest version when it is released or a note that
a new version is released. It also gives me an idea about how many people
use this program and how it's spread. The information provided to register
will not be used for other purpose then HTML2TEXT and by any other persons
other then me.
There are three ways to register:
i. Start your web-browser and fill in the form.
ii. Convert register.htm to a text file, edit it to fill in the
entries and email it to me.
iii. Same as above but print it and send it to my postal address.
Be sure to enter the correct email address on the form! If not I might not
be able to send you the registration key.
Note that registration is Free of charge!
When you're registered you will become a private registration key, so that
your name is written when you execute the program instead of Unregistered.
However no other function will be available in the registered version. In
other words the unregistered version is not crippled This file will be
sent via email if possible. This is currently the only way to receive the
registration key. When you order (see below) you will automatically be
registered and the key can be found on the disk.
IV. Which files are contained in the package?
HTML2TXT.EXE The executable.
HTML2TXT.CFG Configuration file with options.
HTML2TXT.INI Ini-file with entity references.
HTML2TXT.HTM Documentation for HTML2TEXT in HTML format.
HTML2TXT.TXT Documentation for HTML2TEXT in plain text format.
REGISTER.HTM Registration form in HTML format.
LONGFILE.BTM 4DOS batch file to convert HTML files which does support
Windows 95 long filenames.
If one of the files is missing, throw the package away and ask the author
for a new and complete copy. The address is at the end of the file.
V. What does it do?
HTML2TEXT converts HyperText Mark-up Language (HTML) files to plain-text
(ASCII) files. The following rules are applies for this:
* The title is optionally displayed on the first line of the output
file. Optionally the complete filename plus path are written too.
* Entity references (&...;) are converted to character sequences
according to the input file or pre-programmed characters.
* Any tags will perform the task according to the HTML specification
as good as possible, note that some tags cannot have any output in
plain ASCII text files (eg. blinking, fonts, colours).
* Newlines and tabs are converted to spaces and are removed if
obsolete, as are spaces.
* Lines are written and justified according to settings, lines are
wrapped by words when they are too long. Word delimiters are
user-definable.
* Tables are reformatted and forms are output so that they can be
filled in after conversion. The output of forms is highly
user-specifiable.
* Warnings are generated on ill-constructed tags or entity references.
VI. How to start it?
A quick start instruction is to type on the commandline:
HTML2TXT file.htm
This will convert the file "file.htm" to "file.txt".
The full syntax of HTML2TEXT is:
HTML2TXT @
is the name of the files to convert, it may include
wildcards (* and ?). More than one file specification may appear on the
command line. Note that long filenames (Windows 95/NT) are not supported.
This means that input filenames have to be of the 8.3 format (Every W95/NT
file has a 8.3 filename and optionally a long filename). The output will
always be a 8.3 filename. 4DOS users (v5.5+) can use the %@SFN[...]
function to get the short filename of a long filename (see 4DOS
documentation for details). A 4DOS batch file is also included to perform
conversion of files with long filenames, see below. Windows 95 users can
drag and drop files to the executable, Windows then uses the short
filename anyway.
@ is a name of a file that contains the names of files to
convert. The may not contain wildcards. Each of these filenames
must appear on a single line. Empty lines are permitted and files may have
leading or trailing whitespace characters. The names of these files have
the same restrictions as the files from the . You
cannot use options in a listfile.
can be the following:
-a- Do not display text for links.
-b- Do not mark bold text.
-b+ Mark bold text by embracing it with stars (*).
-b: Specifies the two characters used to mark bold text.
Exactly two characters have to be given.
-B- Don't print borders for tables.
-B: Use predefined border , where is
1. 7 bit ASCII (using -|+)
2. Single lines
3. Double lines
4. Double horizontal lines and single vertical lines
5. Single horizontal lines and double vertical lines
6. Single frame only
7. Double frame only
8. Double frame, single cell separators
9. Single frame, double cell separators
For a graphical example of those tables look at the config
file.
-c Automatically create directory when the specified output
path does not exist.
-c- Ask to create when the specified output path does not
exist.
-C: Sets the charset to use. can be any number from 1 to
9. By default one is chosen, except when windows is
detected (in enhanced mode, but who cares) then the
default is 3.
1 : ASCII (7 bit)
2 : Extended ASCII (8 bit)
3 : Windows ISO 8859/1
4-9: user definable
-f Get input from standard input, then read the files
specified. Input from standard input will be converted and
then output to standard output.
-f- Only read the files specified.
-F- Do not display input fields in forms.
-h Do not display HTML2TEXT messages. (hush)
-h- Do display HTML2TEXT messages.
-H Stop after reading