home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 5 Edit
/
05-Edit.zip
/
dehtml11.zip
/
DEHTML.MAN
< prev
next >
Wrap
Text File
|
1997-06-08
|
5KB
|
123 lines
===============================================================================
DeHTML
for OS/2 ver.1.10
Let's convert HTML documents into nornal documents!
===============================================================================
by HAMAGUCHI, Takashi
(c) 1997
EMAIL: htakashi@mse.biglobe.ne.jp
NBC03301@niftyserve.or.jp
Home Page: http://www2d.meshnet.or.jp/~htakashi/index.html
===============================================================================
[* Preface *]
I got some transcriptions of "Larry King Live" on CNN Home Page via PC-VAN
WWW Direct Gateway Service with World Talk ver.1.4. HTML documents are cached
to HDD. We can use these HTML documents but there are some problems:
(1) Too long logical lines! Some GREP utilities will be hung up!!
(2) Existence of <BR> tags in the body of text.
Though I got a free software to delete HTML tags from FINET(NIFTY-Serve),
many of HTML documents could not be read by the software because files were too
long. This experience forced me to make a tool to delete HTML tags.
[* How to process *]
* Delete CR/LF because it is meaningless in HTML documents.(^_^)
* Replace <BR> with CR/LF. (^_^)
* Delete frequently used tags.
* Convert some escape sequences:
< ==> <
> ==> >
& ==> &
" ==> "
® ==> (R)
© ==> (C)
* Output to files. Files are opened by DeHTML automatically.
[* Limitation *]
I have only a few HTML documents. I can't be sure this program is perfect.
[* About Character Code Conversion *]
Character Code Conversion is not supported. I'm planning to .....
[* Type of Software *]
* Please share expenses to buy reference books on HTML and compilers.
This is not a SHAREWARE but your financial assistance enables me to
develop more convienient versions(maybe...).
NIFTYSERVE SW Number 2632 (300 [Japanese yen] + TAX)
or
Postal Giro (Post Office of Japan)
01130-5-34430 HAMAGUCHI, Takashi
[* Command line Options *]
-e<NUM> : Format the output text with <NUM> chars (default 70) per line.
-o : Overwrite former outputfiles.
-c<NUM> : Character code conversion
+1 : JIS->SHIFT-JIS
+2 : EUC(Japanese)->SHIFT-JIS
e.g. dehtml -c3 *.htm (JIS/EUC->SHIFT-JIS)
[* USAGE *]
Prompt>dehtml html_files
For example,
C:\>dehtml *.HTM[Enter]
[D000001.HTM]===>>[D:\WTALK\DATA\SV00002\D000001.000] .....done.
[D000002.HTM]===>>[D:\WTALK\DATA\SV00002\D000002.000] .....done.
[D000003.HTM]===>>[D:\WTALK\DATA\SV00002\D000003.000] .....done.
[D000004.HTM]===>>[D:\WTALK\DATA\SV00002\D000004.000] .....done.
[D000005.HTM]===>>[D:\WTALK\DATA\SV00002\D000005.000] .....done.
[D000006.HTM]===>>[D:\WTALK\DATA\SV00002\D000006.000] .....done.
[D000007.HTM]===>>[D:\WTALK\DATA\SV00002\D000007.000] .....done.
[D000008.HTM]===>>[D:\WTALK\DATA\SV00002\D000008.000] .....done.
[D000009.HTM]===>>[D:\WTALK\DATA\SV00002\D000009.000] processing
[* Future Schedule *]
* More tags will be able to be deleted....
[* History of Updates *]
ver.0.00 1996-01-20 Prototype version
Posted to NIFTY-Serve FENG LIB 4
This version delete CR/LF, replace <BR> with CR/LF, delete frequently
used tags in "Larry King Live."
ver.0.01 1996-01-28
The size of read buffer increased to 5120 bytes.
The numbers of tags to be deleted is increased.
Bug fix.
ver.0.02 1996-01-30 Test version
Posted to NIFTY-Serve FENG LIB 4 and FINET LIB 3
This version outputs precessed data to files.
The numbers of tags to be deleted is increased.
ver.1.00 1996-01-30 Donationware
Posted to NIFTY-Serve FENG LIB 4
Poor text format function available.
ver.1.01 1996-02-04 Donationware
Posted to NIFTY-Serve FENG LIB 4
Bug fixing + alpha
ver.1.02 1996-02-16 Donationware
Posted to NIFTY-Serve FENG LIB 4
Bug fixing + alpha
ver.1.03 1996-03-15
<pre></pre> supported.
..........
ver.1.06 1996-04-21
<FONT> supported.
<UL> tag -->CR/LF
<A HREF = ...> supported.
JIS->Shift-JIS character code conversion supported.
Bug Fix.
ver.1.08 1996-08-01
<DT><DL><TT><EM> tags are supported.
EUC->Shift-JIS character code conversion supported.
ver.1.08a 1997-01-05
<SMALL><BLINK><HR ...><TH><FRAMESET><FRAME><NOFRAMES><LINK><BASE>
<TEXTAREA> Supported
ver.1.08d 1997-02-01
<STRONG> supported.
ver.1.08d 1997-02-16 OS/2 Version
ver.1.09 1997-04-20
Bug Fix
ver.1.10 1997-06-08
Enhanced tag remove function.