home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 5 Edit
/
05-Edit.zip
/
pms_126.zip
/
pmstrip.doc
< prev
next >
Wrap
Text File
|
2002-06-06
|
21KB
|
495 lines
PMStripper
I. Overview:
This PM shareware utility strips HTML codes from Web pages, leaving
only the text and URLs (optionally). Some of the page's formatting
is retained, but since PMStripper is not an HTML interpreter most
formatting is lost. While the layout of tables and lists is lost
during stripping, data is sorted to separate lines for legibility.
PMStripper is designed to provide a quick conversion of HTML coded
files into plain ASCII text. Although the converted files can be
edited while loaded in PMStripper, only simple edit commands are
available. Therefore, if extensive editing is needed, the text
should be loaded into a more capable word processor or text editor.
The registered version offers a menu item to easily move stripped
files to programs suited for advanced editing.
A convenient way to use PMStripper is to install it as the raw
HTML viewer in the IBM Web Explorer. This makes it easier to
save information from Web pages or cut and paste URLs from Web pages.
PMStripper is a shareware program and if you continue to use the
program you should register it. PMStripper does not have any
code to check on how long the program has been in use, so it is
up to the user to determine a reasonable trial period.
The shareware version of PMStripper is fully functional, some of
the convenience features are disabled but they do not effect the
function of the utility. Trying the disabled features will bring
up an unregistered message requiring a user response.
II. Installing PMStripper:
1) Unzip the archive.
2) If REXX is installed: Run the INSTALL.CMD script from an OS/2
command prompt, or by double clicking on the install file's icon.
The script will create a destination directory and transfer program
files to it. Optionally, you may use the unzip directory as the
working directory. In either case the script will create a
PMStripper program object on the desktop and set file associations
for .HTM and .HTML files. Setting associations this way allows
instant loading, and stripping, of saved Web pages by double clicking
their icons.
If the install program cannot create the desired directory, just move
all unzipped files to the working directory before running the
install program.
3) If REXX is not installed: Unzip the archive in the desired
working directory and manually: a) Create a desktop program object,
and b) Set .HTM and HTML associations. (See OS/2 documentation for
instructions, if needed.)
III. Files
PMStripper is distributed as a compressed archive. The registered
version is PMSR_xxx.zip and the shareware version 1s PMS_xxx.zip,
where xxx is the version number. The contents of the archive is
detailed in the file named FILES.
IV. Uninstalling PMStripper:
If you find it necessary to remove PMStripper, simply delete the
unzipped files, program object, associations and directory.
PMStripper makes no entries in configuration or initialization files.
V. Using PMStripper
PMStripper is a simple program with only five menu bar items:
1. 'File' offers nine pull-down menu items: 'Open File',
'Reload Source File', 'Reload Source File As Raw HTML', 'Save As',
'Save - No Prompt', 'Save Marked Text To File','Hard code word wrap',
'Print On Default Printer' and 'Exit'. All except the Reload,
Save - No Prompt, Hard code word wrap and Print selections perform
in a standard OS/2 manner.
The 'Reload Source File' menu item reloads the current HTML file and
is a handy way to make changes in the stripp options and then view the
same file, processed differently.
The 'Reload Source File As Raw HTML' menu item reloads the current
HTML file without stripping the HTML codes. This was added so that
installing PMStripper as the raw HTML viewer does not rob the user
of an easy way to view the raw HTML code.
Picking a file name for the 'Save As' is easy: Highlight some text
for the name and then click on 'Save As', or simply highlight and
then press Alt+S. If you have not highlighted text for the file
name, the original file's name (with the extension .htm or .html
replaced by .txt) is offered as the default. The option to use
highlighted text is only available in the regisered version. A
check has been added to warn the user if he is about to overwrite an
existing file. If the file is write protected an error message is
displayed. If the file is not write protected, the user is prompted
for an 'Ok' or 'Cancel' response.
The 'Save - No Prompt' menu item saves the stripped file without
opening a file dialog box. It uses the file name that would have
been offered in the file dialog box that is used in the normal
'Save As' menu selection.
The 'Save Marked Text To File' menu item opens a standard file
dialog box and after the use has entered a destination file
name, the marked text is saved.
The 'Hard code word wrap' menu item adds CR-LF pairs to each line
in the display window to make the current word wrap permanent.
The 'Print On Default Printer' menu item sends the stripped file
to the default printer without any special formatting. This method
bypasses the WPS print manager and uses the printer's default font.
Additionally, since word wrap in the PMStripper display window does
not reformat the text, line lengths must be user adjusted to fit the
printer. The user can select a printer as the file's destination by
selecting the "Save As' menu selection and entering 'lpt1' or 'lpt2'
as the file name.
If the INSTALL.CMD file is used to install PMStripper, the association
for .htm and .html is set so that a double click will load files with
those extensions into PMStripper.
The utility will also load HTML coded files for stripping via drag
and drop of the file's icon onto that of the PMStripper. However,
the capability to load files by drag and drop onto an open edit
window does not exist.
2. 'Edit' has five sub-menu items which also operate as expected.
They are 'Cut', 'Copy', 'Paste', 'Select All' and 'Undo Change'. The
'Undo Change' selection will undo the last change made to the text in
the window and is only one level deep.
The sixth sub-menu item is 'Select to End' and marks text from the
current cursor position to the end of the file. I added this
function before I found out that it was already part of the MLE class
library that I used for the main PMStripper window.
The following key combinations work in PMStripper and many other OS/2
applications.
Shift+Home moves the cursor to the top of the text
Shift+End moves the cursor to the end of the text
Ctrl+Shift+Home selects the text from the current position to
the top of the text
Ctrl+Shift+End selects the text from the current position to
the end of the text
3. 'Options' has seven sub-menu items. They are 'Display Options',
'URL Settings', 'Strip Options', 'External Editor Settings',
'Filename Settings', 'Use idle time priority' and 'Save Settings'.
'Display Options' has three sub-menu items. They are 'Font',
'Reset to default colors' and 'Word Wrap'. 'Font' brings up
a standard OS/2 font dialog box and will allow the selection of
any of the installed fonts. This option only changes the font
in the main window. To change the font for the menubar or the
information area below the main window, use the OS/2 font palette
and drag and drop a new font on either area. For the main window,
font drag and drop does not always work correctly. The font
selections are only saved when 'Save Settings' is selected.
The 'Reset to default colors' option restores the system default
background and foreground colors. To change the colors on the
menubar, main window, or the information area use the OS/2 color
palette and drag and drop a color selection. When 'Save Settings
is selected , the color selections are made the default colors.
The 'Word Wrap' option is a toggle setting that turns word wrap on
or off. Word wrap is set on each time the application is loaded.
The wrap function does not actually reformat the text, instead it
effects only the way text is displayed.
'URL Settings' has three sub-menu items. They are 'Add URLs,
'Leave URLs'and 'Only http type'. These options effect how the HTML
file is processed and the file must be reloaded for these changes to
effect the current file. 'Add URLs' appends the URLs found in the HTML
file to the end of the stripped text. 'Leave URLs' leaves the URLs
found in the HTML file in the stripped text. The 'Only http type'
limits the URLs to those links containing a http reference. The
"normal" URL detection looks for htlm code containing href and will
find gopher, ftp, mailto, and relative links to other web pages as
well as complete URLs.
'Strip Options' has seven sub-menu items: 'Ignore <BR>' and 'Ignore
cr-lf', 'Translate quotes'and 'Translate iso8859-1 character codes'.
The first two selections are mutually exclusive. These options are
useful when the stripped output has excessive blank lines. This
often occurs in Web published poetry since many are formatted with
both carriage return - line feed (cr-lf) pairs and the HTML code <BR>
which prevents text reformatting by the browser. PMStripper normally
translates <BR> into a cr-lf pair thereby producing unnecessary blank
lines. These two menu items strip either the cr-lf pairs OR the <BR>
codes from the text before any other actions are performed. The results
of using either option should be similar, but one method may produce
better results depending on how the text was originally formatted.
Selecting one or the other and reloading via the ALT-R command can
produce better results.
The 'Translate quotes' option translates the "smart quotes" used on
some web pages into the standard ascii values (0x93 and 0x94 are changed
to 0x22). The "smart apostrophes" are translated to standard ascii (0x91
and 0x92 are changed to 0x22). The two "special hyphen" characters are
translated to standard ascii (0x96 and 0x97 are changed to 0x2d). The
0x85 character is translated into 3 periods (0x2e) to approximate an
elipsis character. In addition the 0xA0 and the 0x99 characters are
each translated to a space. The translation is done before any html
character enties are translated, so this option should not effect
languages that use those characters as part of their normal text.
The 'Translate iso8859-1 character codes' option translates the upper
characters (decimal 128 thru 255) of the iso8859-1 character set into
the appropriate html character enties. The translation is done before
any html character enties are translated. This option should be used
if the stripped text contains the wrong international characters and
it is unlikely to be helpful on english text.
The 'Translate quotes'and 'Translate iso8859-1 character codes'
options are mutually exclusive.
The next three options discard starting at line 10 the next 100, 200,
or 300 lines of raw html source before stripping. These options are
mutually exclusive. The keyboard accelerator for these options work
differently from the menu selections. Pressing Alt+1 , Alt+2, Alt+3,
or Alt+C sets the option and then reloads and strips the source file.
The state of these three options are not saved when the
'Save Settings" option is selected.
The title is found within the first 10 lines, so that is why the first
10 lines are not discarded.
These options effect how the HTML file is processed and the file must
be reloaded (via ALT+R) for these changes to effect the current file.
'External Editor Settings' has two sub-menu items. They are 'Use
__TMP2__ File' and 'Use Clipboard'. 'Use __TMP2__ File' causes the
temporary file __TMP2__ to be left in the current working directory
for use by an external editor. 'Use Clipboard' causes the stripped
file to be copied to the OS/2 clipboard when the user selects 'Exit
to Word Processor'. These option settings are only effective in the
registered version.
'Filename Settings' has seven sub-menu items. They are 'Replace Space
with Underscore Character', 'Leave Space in Filename',
'Enter Default Save Path', 'Enable Use of Default Save Path',
'Enter Default Load Path', 'Enable Use of Default Load Path',
and 'Enter Default Save Extent'. The first two items are
toggles and only one setting is active. They determine how the
highlighted text is converted to a destination file name for the
stripped HTML file. The following option settings are only effective in
the registered version. 'Enter Default Save Path' and
'Enter Default Load Path' bring up a dialog boxes that allow the user
to enter paths for saving and loading files.
'Enable Use of Default Save Path' and 'Enable Use of Default Load Path'
are toggles that enable the use of the default paths. These toggles
allow the user to disable the default paths without clearing out the
path information. 'Enter Default Save Extent' brings up a dialog box
that allows the user to specify a default extent for the stripped
HTML file when it is saved to disk.
Note: A period is not part of the extent.
'Use idle time priority' reduces the priority of the thread that
processes the source file. Using this option reduces the impact
PMStripper has on other tasks that are active. Users with faster
CPUs may not notice any difference when this option is selected.
'Save Settings' saves all of the option settings to an INI file named
PMSTRIP.INI. The file will only be created when 'Save Settings' is
selected. The utility reverts to word wrap on when loaded.
For PMStripper users who wish to add an environment variable to their
config.sys file, PMStripper will use that environment variable to
determine where the PMSTRIP.INI is located if it is not found in the
working directory.
The environment variable is specified in your config.sys file.
SET PMSTRIPPER=C:\YOURPATH
The C:\YOURPATH should be changed to the location of PMStripper or
the drive and directory that you want to locate the PMSTRIP.INI file
The install routine does not add the line to your config.sys.
NOTE: When PMStripper is activated by dropping the icon of a HTML
file onto that of PMStripper, the location of the HTML file becomes
the current working directory. PMStripper will look for its INI file
in that directory before checking the location specified in the
config.sys file. This is convenient for those who may want several
INI files, each with different attributes, according to the location
of the source HTML file.
4. 'Exit' has two sub-menu items. They are 'Exit' and 'Exit to Word
Processor'. 'Exit' causes the stripped file to be discarded and
PMStripper to close. 'Exit to Word Processor' causes the OS/2 CMD
file PMS_CMD.CMD to be executed and PMStripper to close. The 'Exit
to Word Processor' option is only effective in the registered
version.
5. 'About' displays copyright and contact information.
VI. The active keyboard accelerators (short cut keys) are:
Exit Alt+X
Copy Ctrl+Insert
Cut Shift+Delete
Paste Shift+Insert
Select All Ctrl+/
Open File Alt+F
Print On Default Printer Alt+P
Reload File Alt+R
Reload Source File As Raw HTML Ctrl+R
Save As Alt+S
Save - No Prompt Ctrl+S
Save Marked Text To File F9
Undo Change Alt+U
Word Processor Alt+W
Discard first 100 lines Alt+1
of raw source file after
line 10 and reload
Discard first 200 lines Alt+2
of raw source file after
line 10 and reload
Discard first 300 lines Alt+3 , Alt+C
of raw source file after
line 10 and reload
Mark text from the current Ctrl+E
cursor position to the end
of text
The keyboard accelerators are not case sensitive.
VII. Miscellaneous Notes:
When dragging a file from Web Explorer the file must be dropped on the
desktop (or in a folder) before it can be dropped on the PMStripper
program object.
This utility will only run on OS/2 Warp and later releases.
One useful feature is the ability to mark text in the stripped file
and use the highlighted text as the file's 'Save As' name. This is
very useful if you have HPFS formatted drives. NOTE: Spaces and
some punctuation characters are converted to "_" characters in the
file name unless the option to use spaces is selected. Then any
converted characters are converted to spaces. The "/" and "\"
characters are deleted and not replaced. This feature is only
activated in the registered version of PMStripper.
The HTML specification defines "Character Entity Sets" or tags to
represent particular graphic characters which have special meanings
in the markup language, or may not be part of the character set
available to the writer. PMStripper does not scan for all possible
tags, but does try to resolve the most common.
This version of PMStripper has support for code pages 437 and 850 and
if code page 850 is in use, the 850 character set is used. The code
pages only make a difference when &xxxx; tags are present in the file
If the correct character or an acceptable alternate is not available
a space character is used. If the tag is unknown to PMStripper, then
the &xxxx; tag will be left in the file.
Registered users who frequently encounter particular &xxxx and nnn
tags should contact the author so that the tags may be included in the
next release.
VIII. Why & How to Register:
Registered users feel good about supporting OS/2 developers and enjoy
these additional benefits:
Registered users of PMStripper will have access to two additional
executable files that were compiled with the 486 and 586 compiler
options. These versions will offer some performance improvement
for users with 468 , Pentium, or equivalent CPUs.
The Word Processor option runs the PMS_CMD.CMD file located in the
working directory specified in the Program Object. This file is used
to start the word processor or editor of your choice to edit the
stripped text file named __TMP2__ or to allow you to paste the
stripped file into your editor. PMStripper will close after the
executing the PMS_CMD.CMD file.
NOTE: The __TMP2__ file is discarded if PMStripper is closed via the
the 'Exit' menu item. Double clicking the PMStripper's upper left
corner, using Alt+F4 or selecting that menu's 'Close' may cause the
temporary stripped file (named __TMP2__ ) to remain in the working
directory.
This menu item is disabled in the unregistered version. Instead of
invoking the command script an unregistered message requiring a user
response will be shown.
Example PMS_CMD.CMD files:
To use the system editor E.EXE, the PMS_CMD.CMD file would contain:
E __TMP2__
To use a word processor or editor whose executable is not in the
path, the command script must copy the __TMP2__ file to the desired
program's data directory, change to that directory and then launch
the word processor/editor. An example PMS_CMD.CMD file to use
DeScribe is shown below.
copy __TMP2__ g:\describe\__TMP2__
g:
cd \describe
describe __TMP2__
In addition to the activation of the Word Processor option, the
opening unregistered message requiring a user response is eliminated
along with the unregistered line that is inserted at the top of the
stripped file.
Registered users are supported via e-mail. Send help requests and
program suggestions to me at dwhawk@intcon.net
There are two places to register PMStripper. Through BMT Micro and
directly with the author.
Registration through BMT Micro:
BMT Micro will accept credit cards and will be more convenient for
OS/2 users outside the United States. BMT Micro's price to register
PMStripper is $9.95 (US Dollars). BMT Micro also has an FTP area
where the registered version can be obtained after registration.
Direct registration:
Stuff small bills, gold coins, diamonds or even checks (US banks
only, please) valued at $9.95 (US dollars) into an envelope and mail
to:
Don Hawkinson
4555 N Hillcrest
Wichita KS, 67220-3832
USA
PayPal registration is also available on the author's web site
at http://www.cottagesoft.com/~dwhawk/share.html
Please don't send $100 bills (or larger) in the mail without
purchasing full postal insurance. Also, no change will be
returned because it is absolutely unsafe, and unwise, to send cash
through the mail.
PayPal registration is also available on the author's web site
at http://www.cottagesoft.com/~dwhawk
The registered version of PMStripper will be distributed by
download from my web page, so make certain that your e-mail
address is included with your registration fee.
Registered users will be notified of updates via e-mail.
Registration covers all 1.xx versions of PMStripper.
Copyrights and trademarks remain the property of their owners.
Don Hawkinson
dwhawk@intcon.net