home *** CD-ROM | disk | FTP | other *** search
-
- PMStripper 1.17
-
-
- I. Overview:
-
-
- This PM shareware utility strips HTML codes from Web pages, leaving
- only the text and URLs (optionally). Some of the page's formatting
- is retained, but since PMStripper is not an HTML interpreter most
- formatting is lost. While the layout of tables and lists is lost
- during stripping, data is sorted to separate lines for legibility.
-
- PMStripper is designed to provide a quick conversion of HTML coded
- files into plain ASCII text. Although the converted files can be
- edited while loaded in PMStripper, only simple edit commands are
- available. Therefore, if extensive editing is needed, the text
- should be loaded into a more capable word processor or text editor.
-
- The registered version offers a menu item to easily move stripped
- files to programs suited for advanced editing.
-
- A convenient way to use PMStripper is to install it as the raw
- HTML viewer in the IBM Web Explorer. This makes it easier to
- save information from Web pages or cut and paste URLs from Web pages.
-
- PMStripper is a shareware program and if you continue to use the
- program you should register it. PMStripper does not have any
- code to check on how long the program has been in use, so it is
- up to the user to determine a reasonable trial period.
-
- The shareware version of PMStripper is fully functional, some of
- the convenience features are disabled but they do not effect the
- function of the utility. Trying the disabled features will bring
- up an unregistered message requiring a user response.
-
-
- II. Installing PMStripper:
-
-
- 1) Unzip the archive.
-
- 2) If REXX is installed: Run the INSTALL.CMD script from an OS/2
- command prompt, or by double clicking on the install file's icon.
-
- The script will create a destination directory and transfer program
- files to it. Optionally, you may use the unzip directory as the
- working directory. In either case the script will create a
- PMStripper program object on the desktop and set file associations
- for .HTM and .HTML files. Setting associations this way allows
- instant loading, and stripping, of saved Web pages by double clicking
- their icons.
-
- If the install program cannot create the desired directory, just move
- all unzipped files to the working directory before running the
- install program.
-
- 3) If REXX is not installed: Unzip the archive in the desired
- working directory and manually: a) Create a desktop program object,
- and b) Set .HTM and HTML associations. (See OS/2 documentation for
- instructions, if needed.)
-
-
- III. Files
-
-
- PMStripper is distributed as a compressed archive. The registered
- version is PMSR_xxx.zip and the shareware version 1s PMS_xxx.zip,
- where xxx is the version number. The contents of the archive is
- detailed in the file named FILES.
-
-
- IV. Uninstalling PMStripper:
-
-
- If you find it necessary to remove PMStripper, simply delete the
- unzipped files, program object, associations and directory.
- PMStripper makes no entries in configuration or initialization files.
-
-
-
- V. Using PMStripper
-
-
- PMStripper is a simple program with only five menu bar items:
-
- 1. 'File' offers seven pull-down menu items: 'Open File', 'Reload
- Source File', 'Reload Source File As Raw HTML', 'Save As',
- 'Save - No Prompt', 'Print On Default Printer' and 'Exit'. All
- except the Reload, Save - No Prompt, and Print selections perform
- in a standard OS/2 manner.
-
- The 'Reload Source File' menu item reloads the current HTML file and
- is a handy way to make changes in the stripp options and then view the
- same HTML, processed differently.
-
- The 'Reload Source File As Raw HTML' menu item reloads the current
- HTML file without stripping the HTML codes. This was added so that
- installing PMStripper as the raw HTML viewer in Web Explorer does
- not rob the user of a way to view the raw HTML code.
-
- Picking a file name for the 'Save As' is easy: Highlight some text
- for the name and then click on 'Save As', or simply highlight and
- then press Alt+S. If you have not highlighted text for the file
- name, the original file's name (with the extension .htm or .html
- replaced by .txt) is offered as the default. The option to use
- highlighted text is only available in the registered version. A
- check has been added to warn the user if he is about to overwrite an
- existing file. If the file is write protected an error message is
- displayed. If the file is not write protected, the user is prompted
- for an 'Ok' or 'Cancel' response.
-
- The 'Save - No Prompt' menu item saves the stripped file without
- opening a file dialog box. It uses the file name that would have
- been offered in the file dialog box that is used in the normal
- 'Save As' menu selection.
-
- The 'Print On Default Printer' menu item sends the stripped file
- to the default printer without any special formatting. This method
- bypasses the WPS print manager and uses the printer's default font.
- Additionally, since word wrap in the PMStripper display window does
- not reformat the text, line lengths must be user adjusted to fit the
- printer. The user can select a printer as the file's destination by
- selecting the "Save As' menu selection and entering 'lpt1' or 'lpt2'
- as the file name.
-
- If the INSTALL.CMD file is used to install PMStripper, the association
- for .htm and .html is set so that a double click will load files with
- those extensions into PMStripper.
-
- The utility will also load HTML coded files for stripping via drag
- and drop of the file's icon onto that of the PMStripper. However,
- the capability to load files by drag and drop onto an open edit
- window does not exist.
-
-
- 2. 'Edit' has five sub-menu items which also operate as expected.
- They are 'Cut', 'Copy', 'Paste', 'Select All' and 'Undo Change'. The
- 'Undo Change' selection will undo the last change made to the text in
- the window and is only one level deep.
-
-
- 3. 'Options' has six sub-menu items. They are 'Display Options',
- 'URL Settings', 'Strip Options', 'External Editor Settings',
- 'Filename Settings' and 'Save Settings'.
-
- 'Display Options' has two sub-menu items. They are 'Font' and 'Word
- Wrap'. 'Font' brings up a standard OS/2 font dialog box and will
- allow the selection of any of the installed fonts. The font that
- is active when 'Save Settings' is selected will be made the default
- font. 'Word Wrap' is a toggle setting that turns word wrap on or off.
- The wrap function does not actually reformat the text, instead it
- effects only the way text is displayed.
-
- 'URL Settings' has three sub-menu items. They are 'Add URLs,
- 'Leave URLs'and 'Only http type'. These options effect how the HTML
- file is processed and the file must be reloaded for these changes to
- effect the current file. 'Add URLs' appends the URLs found in the HTML
- file to the end of the stripped text. 'Leave URLs' leaves the URLs
- found in the HTML file in the stripped text. The 'Only http type'
- limits the URLs to those links containing a http reference. The
- "normal" URL detection looks for htlm code containing href and will
- find gopher, ftp, mailto, and relative links to other web pages as
- well as complete URLs.
-
- 'Strip Options' has three sub-menu items: 'Ignore <BR>' and 'Ignore
- cr-lf', and 'Translate quotes'. The first two selections are mutually
- exclusive. These options are useful when the stripped output has
- excessive blank lines. This often occurs in Web published poetry since
- many are formatted with both carriage return - line feed (cr-lf) pairs
- and the HTML code <BR> which prevents text reformatting by the browser.
- PMStripper normally translates <BR> into a cr-lf pair thereby producing
- unnecessary blank lines. These two menu items strip either the cr-lf pairs
- OR the <BR> codes from the text before any other actions are performed.
- The results of using either option should be similar, but one method may
- produce better results depending on how the text was originally formatted.
-
- The 'Translate quotes' option translates the "smart quotes" used on
- some web pages into the standard ascii values (0x93 and 0x94 are changed
- to 0x22). The "smart apostrophes" are translated to standard ascii (0x91
- and 0x92 are changed to 0x22). The two "special hyphen" characters are
- translated to standard ascii (0x96 and 0x97 are changed to 0x2d). The
- 0x85 character is translated into 3 periods (0x2e) to approximate an
- elipsis character. In addition the 0xA0 and the 0x99 characters are
- each translated to a space. The translation is done before any html
- character enties are translated, so this option should not effect
- languages that use those characters as part of their normal text.
-
- These options effect how the HTML file is processed and the file must
- be reloaded for these changes to effect the current file.
-
- 'External Editor Settings' has two sub-menu items. They are 'Use
- __TMP2__ File' and 'Use Clipboard'. 'Use __TMP2__ File' causes the
- temporary file __TMP2__ to be left in the current working directory
- for use by an external editor. 'Use Clipboard' causes the stripped
- file to be copied to the OS/2 clipboard when the user selects 'Exit
- to Word Processor'. These option settings are only effective in the
- registered version.
-
- 'Filename Settings' has seven sub-menu items. They are 'Replace Space
- with Underscore Character', 'Leave Space in Filename',
- 'Enter Default Save Path', 'Enable Use of Default Save Path',
- 'Enter Default Load Path', 'Enable Use of Default Load Path',
- and 'Enter Default Save Extent'. The first two items are
- toggles and only one setting is active. They determine how the
- highlighted text is converted to a destination file name for the
- stripped HTML file. The following option settings are only effective in
- the registered version. 'Enter Default Save Path' and
- 'Enter Default Load Path' bring up a dialog boxes that allow the user
- to enter paths for saving and loading files.
- 'Enable Use of Default Save Path' and 'Enable Use of Default Load Path'
- are toggles that enable the use of the default paths. These toggles
- allow the user to disable the default paths without clearing out the
- path information. 'Enter Default Save Extent' brings up a dialog box
- that allows the user to specify a default extent for the stripped
- HTML file when it is saved to disk.
-
- Note: A period is not part of the extent.
-
- 'Save Settings' saves all of the option settings to an INI file named
- PMSTRIP.INI. The file will only be created when 'Save Settings' is
- selected. The utility reverts to word wrap on when loaded.
-
- For PMStripper users who wish to add an environment variable to their
- config.sys file, PMStripper will use that environment variable to
- determine where the PMSTRIP.INI is located if it is not found in the
- working directory.
-
- The environment variable is specified in your config.sys file.
-
- SET PMSTRIPPER=C:\YOURPATH
-
- The C:\YOURPATH should be changed to the location of PMStripper or
- the drive and directory that you want to locate the PMSTRIP.INI file
-
- The install routine does not add the line to your config.sys.
-
- NOTE: When PMStripper is activated by dropping the icon of a HTML
- file onto that of PMStripper, the location of the HTML file becomes
- the current working directory. PMStripper will look for its INI file
- in that directory before checking the location specified in the
- config.sys file. This is convenient for those who may want several
- INI files, each with different attributes, according to the location
- of the source HTML file.
-
- 4. 'Exit' has two sub-menu items. They are 'Exit' and 'Exit to Word
- Processor'. 'Exit' causes the stripped file to be discarded and
- PMStripper to close. 'Exit to Word Processor' causes the OS/2 CMD
- file PMS_CMD.CMD to be executed and PMStripper to close. The 'Exit
- to Word Processor' option is only effective in the registered
- version.
-
-
- 5. 'About' displays copyright and contact information.
-
-
-
- VI. The active keyboard accelerators (short cut keys) are:
-
- Exit Alt+X
- Copy Ctrl+Insert
- Cut Shift+Delete
- Paste Shift+Insert
- Select All Ctrl+/
- Open File Alt+F
- Print On Default Printer Alt+P
- Reload File Alt+R
- Reload Source File As Raw HTML Ctrl+R
- Save As Alt+S
- Save - No Prompt Ctrl+S
- Undo Change Alt+U
- Word Processor Alt+W
-
- The keyboard accelerators are not case sensitive.
-
-
- VII. Miscellaneous Notes:
-
-
- When dragging a file from Web Explorer the file must be dropped on the
- desktop (or in a folder) before it can be dropped on the PMStripper
- program object.
-
- This utility will only run on OS/2 Warp and later releases.
-
- One useful feature is the ability to mark text in the stripped file
- and use the highlighted text as the file's 'Save As' name. This is
- very useful if you have HPFS formatted drives. NOTE: Spaces and
- some punctuation characters are converted to "_" characters in the
- file name unless the option to use spaces is selected. Then any
- converted characters are converted to spaces. The "/" and "\"
- characters are deleted and not replaced. This feature is only
- activated in the registered version of PMStripper.
-
- The HTML specification defines "Character Entity Sets" or tags to
- represent particular graphic characters which have special meanings
- in the markup language, or may not be part of the character set
- available to the writer. PMStripper does not scan for all possible
- tags, but does try to resolve the most common.
-
- This version of PMStripper has support for code pages 437 and 850 and
- if code page 850 is in use, the 850 character set is used. The code
- pages only make a difference when &xxxx; tags are present in the file
- If the correct character or an acceptable alternate is not available
- a space character is used. If the tag is unknown to PMStripper, then
- the &xxxx; tag will be left in the file.
-
- Registered users who frequently encounter particular &xxxx and nnn
- tags should contact the author for consideration of the tags inclusion
- in the next release.
-
-
- VIII. Why & How to Register:
-
-
- Registered users feel good about supporting OS/2 developers and enjoy
- these additional benefits:
-
- Registered users of PMStripper will have access to two additional
- executable files that were compiled with the 486 and 586 compiler
- options. These versions will offer some performance improvement
- for users with 468 , Pentium, or equivalent CPUs.
-
- The Word Processor option runs the PMS_CMD.CMD file located in the
- working directory specified in the Program Object. This file is used
- to start the word processor or editor of your choice to edit the
- stripped text file named __TMP2__ or to allow you to paste the
- stripped file into your editor. PMStripper will close after the
- executing the PMS_CMD.CMD file.
-
- NOTE: The __TMP2__ file is discarded if PMStripper is closed via the
- the 'Exit' menu item. Double clicking the PMStripper's upper left
- corner, using Alt+F4 or selecting that menu's 'Close' may cause the
- temporary stripped file (named __TMP2__ ) to remain in the working
- directory.
-
- This menu item is disabled in the unregistered version. Instead of
- invoking the command script an unregistered message requiring a user
- response will be shown.
-
-
- Example PMS_CMD.CMD files:
-
- To use the system editor E.EXE, the PMS_CMD.CMD file would contain:
-
- E __TMP2__
-
- To use a word processor or editor whose executable is not in the
- path, the command script must copy the __TMP2__ file to the desired
- program's data directory, change to that directory and then launch
- the word processor/editor. An example PMS_CMD.CMD file to use
- DeScribe is shown below.
-
- copy __TMP2__ g:\describe\__TMP2__
- g:
- cd \describe
- describe __TMP2__
-
- In addition to the activation of the Word Processor option, the
- opening unregistered message requiring a user response is eliminated
- along with the unregistered line that is inserted at the top of the
- stripped file.
-
- Registered users are supported via e-mail. Send help requests and
- good ideas to me at dwhawk@southwind.net.
-
- There are two places to register PMStripper. Through BMT Micro and
- directly with the author.
-
- Registration through BMT Micro:
-
- BMT Micro will accept credit cards and will be more convenient for
- OS/2 users outside the United States. BMT Micro's price to register
- PMStripper is $9.95 (US Dollars). BMT Micro also has an FTP area
- where the registered version can be obtained after registration.
-
-
- Direct registration:
-
- Stuff small bills, gold coins, diamonds or even checks (US banks
- only, please) valued at $7.50 (US dollars) into an envelope and mail
- to:
-
- Don Hawkinson
- 4555 N Hillcrest
- Wichita KS, 67220-3832
- USA
-
-
- Please don't send $100 bills (or larger) in the mail without
- purchasing full postal insurance. Also, no change will be
- returned because it is absolutely unsafe, and unwise, to send cash
- through the mail.
-
- The registered version of PMStripper will be distributed by e-mail in
- the form of an uuencoded zip file, so make certain that your e-mail
- address is included with your registration fee.
-
- Registered users will be notified of updates via e-mail.
-
- Registration covers all 1.xx versions of PMStripper.
-
-
- IX. Acknowledgments:
-
-
- Thanks to the following Netizens for their help in testing and
- helpful comments during development.
-
- DenverD@IBM.net
- Emil_Kucera@Environment.gov.MB.CA
- vlaming@ibm.net
- jhiatt@ibm.net
- jlink@best.com
- p_daley@conknet.com
- tombeck@usemail.com
-
- Thanks to a Net WordSmith (WordSmith@IBM.Net) for editing help.
- (Actually, he converted my very rough draft to the initial release's
- document, and has provided continued editing services.)
-
-
- Copyrights and trademarks remain the property of their owners.
-
-
-
- Don Hawkinson dwhawk@southwind.net
-