home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Hack-Phreak Scene Programs
/
cleanhpvac.zip
/
cleanhpvac
/
HTMLCO18.ZIP
/
HTMLCON.TXT
< prev
Wrap
Text File
|
1995-04-24
|
13KB
|
335 lines
HTMLCon Version 1.8 (June, 1995)
An HTM(L) to ASCII Document Converter
Satore Township
P.O. Box 750836
Petaluma, CA 94975-0836
WWW to http://www.crl.com/~mikekell/index.html
FTP to ftp.crl.com/ftp/users/ro/mikekell/ftp
This program may be distributed freely as long as no
modifications are made to it or this documentation. We
ask that you register this program if you find it useful.
The registration fee of $7.00 (U.S., by check) should be
mailed to Satore Township at the address given above. If
you register this program and provide us with your e-mail
address, we will provide you with the command to eliminate
the registration request screen which appears when the
program is initiated.
E-mail to mikekell@crl.com for comments or suggestions.
About the Program
-----------------
HTMLCon converts HTML/HTM files to standard ASCII files, making them ready
for viewing, editing or printing with standard DOS, OS/2 or Windows tools.
HTMLCon operates under MSDOS or under any program capable of providing an
MSDOS session and using COMMAND.COM as a command interpreter. After
processing the input document, output will be displayed on a viewer or
editor of your choice, or printed if you choose.
HTMLCon recognizes HTML symbology through HTML+ level as of this date.
It will automatically detect HTML files created in either an MSDOS or
UNIX environment and process them correctly. HTMLCon will attempt to
process the raw HTML file such that the output is as readable as
possible, eliminating unfavorable formatting to every extent practical.
A variety of options are available as defined in the control file
(HTMLCON.INI). The control file is necessary for the proper operation
of HTMLCon. This file may be modified with any text editor and is
heavily commented to allow you to set various options.
Installation
------------
Copy HTMLCON.EXE and HTMLCON.INI to a new directory of your choice.
Now set the environment variable "HTMLCON" to point to the directory
where HTMLCON.INI resides. This will allow you to run the program
from any location on your system. For example, if you put HTMLCON.EXE
and HTMLCON.INI in the directory C:\UTILS, use the following command
in your AUTOEXEC.BAT file:
SET HTMLCON=C:\UTILS
Notice that a trailing backslash should not be used with the environment
variable HTMLCON. Even if HTMLCon is unable to locate the HTMLCON.INI
file it will operate, however none of the important directives in the
HTMLCON.INI file will be used. If HTMLCon is unable to locate the control
file it will advise of the problem, wait thirty seconds, then proceed
with processing the files you have selected using default values.
The program is now ready to run. Source files may be located in any
directory. Output files will be created in the directory from which
HTMLCon was run. If you are using the optional filter file (HTMLCON.FIL),
it should be located in the same directory as HTMLCON.EXE and HTMLCON.INI.
Operation
---------
HTMLCon can be operated in the interactive mode by running "HTMLCon"
from the MSDOS session. It can also be run without operator
intervention by using the following command line arguments:
HTMLCon input_file[.html] line_length output_file[.ASC], or
HTMLCon input_file[.html] output_file[.ASC], or
HTMLCon input_file[.html]
where "line_length" indicates where HTMLCon should try to break a line
for the output file, using values between 40 and 200 characters per
line. Preferences can be stated in HTMLCON.INI as shown below. The
default file extensions can be overridden on the command line for both
input and output files (as well as in the HTMLCON.INI file).
HTMLCon has the ability to process multiple input files. When used
in this mode HTMLCon will automatically assign the file extension '.ASC'
to all output files unless the default file extension has been changed
in the HTMLCON.INI file. HTMLCon will automatically detect the multiple file
input mode by the presence of a '*' or '?' in the input file name.
For example, suppose that HTMLCon resides in the directory "C:\HTMLCON"
and that there are several HTM/HTML files in the directory "C:\HTMLWRIT"
that you wish to process. First, move to the "C:\HTMLCON" directory,
then issue the command "HTMLCON C:\HTMLWRIT\*.html". HTMLCon will
process the files, one-by-one, asking you each time if you wish to
proceed with processing the next file. When asked if you wish to
proceed, you will be given the following options: Y)es (the default), N)o
(no to this file only), Q)uit (quit processing all files), or A)ll
(process all of the remaining files without pausing).
HTMLCon also has the ability to print processed files. By placing the
following line in the HTMLCON.INI file you are able to activate printing
capabilities:
useprinter=yes
This command will tell HTMLCon to query each file processed to be sent
to LPT1. You may respond Y)es or N)o to the query (default YES). If
the above line does not appear in the HTMLCON.INI file then HTMLCon will
not ask about printing files after they are processed. Please note that
HTMLCon will only use LPT1 and provides no other processing to the
output file. HTMLCon assumes you have a printer connected to LPT1 if you
use this option and further assumes that the printer is working
properly.
Images found in the HTM file are output as [IMAGE], HREF references as
[*]. Forms are properly noted and marked, as is preformatted text and
other special HTML symbols. Derivatives are ignored except when the
text is preformatted and unless the special HTMLCON.FIL file is used.
HTMLCon can make use of a special filter file (HTMLCON.FIL in the
default directory) in order to translate HTML ENTITIES of the user's
choice. Use of this filter is activated by the statement
"usefilter=yes" in the HTMLCON.INI file (see below). The user may
define up to 300 such filters in the HTMLCON.FIL file. See the
sample HTMLCON.FIL file for further details. This is an advanced
feature and is not necessary for non-demanding HTMLCon use.
Since the HTM Language is evolving continuously, it is possible that
HTMLCon may not recognize certain symbols properly. Also, since there
is great variation in the creation of HTML documents, it may not be
possible to ideally format all output. Problems with the output will be
corrected in future versions and we ask that you let us know of any
problems by sending us e-mail, including the original HTML document that
is not being processed correctly.
HTMLCon Control File
--------------------
The control file should be named HTMLCON.INI and exist in the same
directory as HTMLCon. Here is a sample, with explanations, of the
control file:
# HTMLCon Initialization File (current through version 1.8)
# ---------------------------------------------------------
#
# ----- ABOUT THE HTMLCON.INI CONTROL FILE -----
#
# Lines beginning with a pound sign are considered comments.
# All other lines are considered instructions and must exactly follow
# the format described in this sample file. Arguments are seperated
# by an equal sign (=) which must not be preceeded or succeeded by
# a space or tab.
#
#
# ----- DEFINING THE OUTPUT LINE LENGTH -----
#
# Define the default point at which HTMLCon should attempt to break a
# line for the output file. The break is not guaranteed to occur at
# this point, but as close to it as possible to retain the syntax of
# the input line. Default=65.
#
linebreak=75
#
#
# ----- COLLECTING STATISTICS -----
#
# Statistics can be compiled and written to the output file. Default=No.
# Use of this function does not increase the processing time and it does
# provide some interesting information in the output file.
#
statistics=yes
#
#
# ----- VIEWING OR PROCESSING THE OUTPUT FILE AUTOMATICALLY -----
#
# You may launch another program after HTMLCon finishes its work. This
# may be an ASCII file viewer, editor, or whatever. The launched program
# must be able to take the output file name as an argument. In order to
# accomplish this you must provide the FULL PATH to your program. This
# is a handy function to allow you to automatically and immediately see
# the results of the HTMLCon conversion process.
#
#launchprog=c:\utils\list.com
#
#
# ----- FINDING AND REPLACING THINGS -----
#
# Find and replace: you may specify up to 50 strings to be located in
# the HTML file and replaced in the ASCII output file. These will be a
# direct replacement using the two commands "find=" and "replace=". Each
# "find" element will be replaced by a "replace" element, therefore you
# cannot have a "find=" statement without a following "replace=" statement.
# To specify leading or ending spaces in a statement, surround the statement
# with quotations ("). The strings cannot exceed 40 characters each.
#
find=" -- "
replace=--
#
# Here is an example replacing all HTMLCon reference symbols [*] with just *.
#
#find=[*]
#replace=*
#
# Or just ignore all references altogether...
#
#find=[*]
#replace=
#
# And replace all HTMLCon image symbols [IMAGE] with a shorter one.
#
#find=[IMAGE]
#replace=[I]
#
# Or just ignore them altogether...
#
#find=[IMAGE]
#replace=
#
# And replace all HTMLCon list/tab markers with two spaces.
#
find=->
replace=" "
#
# Or replace the list/tab markers with something else...
#
#find=->
#replace=|
#
# Or just ignore them altogether...
#
#find=->
#replace=
#
#
# ----- KEEPING THE AUTHOR'S ORIGINAL FORMATTING -----
#
# You may elect to keep the formatting characteristics of the original
# HTML file intact. This will preserve white spaces, line breaks, etc. as
# originally constructed by the author of the HTML page. This option
# will also eliminate the HTMLCon tab markers (->) and replace them with
# four spaces to indicate tab lists. Uncomment the following line to
# preserve the original formatting:
#
#keepformatting=yes
#
#
# ----- IGNORING HTMLCON'S MARKERS IN THE OUTPUT FILE -----
#
# You may choose to have HTMLCon not replace certain HTML constructs
# with its own markers (for example, HTMLCon replaces URL references
# with the symbol [*]). To have HTMLCon simply ignore its own symbols and
# not reference certain items in the original HTML file, uncomment the
# next line:
#
#ignoresymbols=yes
#
#
# ----- PRESERVING HREF MARKERS IN THE OUTPUT FILE -----
#
# You may instruct HTMLCon to preserve all <A HREF...> constructs when
# converting the HTML file. These references will be preserved intact,
# without modification. To use this feature, uncomment the next line:
#
#keephref=yes
#
#
# ----- ELIMINATING ADVERTISEMENTS AND DELAYS -----
#
# Eliminate the advertisements and delays
# [available to registered users only]
#
#
# ----- PRINTING THE OUTPUT FILE ON LPT1 -----
#
# If you would like the option to send the processed file to LPT1
# then uncomment the next line:
#
#useprinter=yes
#
# Note that you may only send the processed file to a line printer
# attached to LPT1 and that HTMLCon assumes the printer is connected
# and operating properly.
#
#
# ----- SPEED PROCESSING MULTIPLE FILES -----
#
# Uncomment the following line to tell HTMLCon to NEVER pause for any
# prompt, including the call to your file viewer or other
# post-processor.
#
#nopause=yes
#
#
# ----- IGNORING CERTAIN FILE TYPES -----
#
# The following directive lists file extensions which should always be
# ignored by HTMLCon. If an input file name contains one of these
# extensions than it will never be processed. Note that the file
# extension must always include the "." in this directive:
#
ignore=.ZIP.EXE.COM.LZH.GIF.LPG.ARC.ASC.SYS.INI.TXT.DOC
#
#
# ----- USING USER-DEFINED FILTERS -----
#
# Uncomment the next directive to have HTMLCon apply a set of filter
# replacements contained in the file HTMLCON.FIL in HTMLCon's default
# directory. This filter file will find and replace HTML ENTITIES
# in your output file.
#
usefilter=yes
#
#
# ----- CHANGING THE DEFAULT OUTPUT FILE NAME EXTENSION -----
#
# HTMLCon normally uses the default file extension ".ASC" when multiple
# files are processed or the file extension is not specified. You may
# specify your own default file extension using the following command.
# This file extension MUST be preceeded by a "." and contain no more than
# three characters.
#
#extension=.TXT
#
#
# End of file