home *** CD-ROM | disk | FTP | other *** search
-
- !HTMLScan's ReadMe file
- ~~~~~~~~~~~~~~~~~~~~~~~
- This program was originally designed to add "height=" and "width=" tags to
- HTML files by reading the information from the headers of the relevant image
- files.
-
- This is desirable as it enables intelligently designed browsers to render the
- entire file's text before trying to load the images. If the images cause
- difficulties for any reason, then it is still possible to navigate the page.
- If the browser multitasks properly, then it should be possible to follow
- links while the graphics are being redrawn. If a graphic fails to load
- because of local resource problems this should not interfere with the
- rendering of most of the page.
-
- An analysis of the web sites of many companies seems to reveal that they are
- in dire need of a tool very much like this one.
-
- While the file is being scanned, various syntactical checks are also
- performed. These are not exhaustive by any means, but are usually
- sufficient to catch many common mistakes. The following tags are checked
- for to make sure that they have matching partners at appropriate points in
- the file:
-
- <H1 - H6>, <HTML>, <HEAD>, <BODY>, <TITLE>, <ADDRESS>, <TEXTAREA>, <TABLE>,
- <FORM>, <QUOTE>, <KBD>, <UL>, <OL>, <A>, <B>, <I>, <EM>, <OL>, <KBD>,
- <CODE>, <TT>, <SUB>, <SUP>, <STRIKE>, <RIGHT>, <LEFT>, <FONT>, <DFN>,
- <BLOCKQUOTE>, <PRE>, <CITE>, <CENTER>, <QUOTE> and <STRONG>.
-
- <P> tags are also counted and a report is made if they are used unusually.
- <META> tags are checked to see if they are located in the header.
-
- Quotes are matched within tags and tags without open or close markers are
- noted. Within <img> tags, the existence of a "src=" is checked, along with
- the presence of "alt=" and of course, "width=" and "height=".
-
- If "width=" or "height=" are missing or do not match with the values found in
- the header of the relevant GIF or JPEG files, these tags are added (if the
- "Make Changes" option is ticked) and or old values can be overwritten with
- the new (if the "Overwrite" option is ticked).
-
- HREF parameters in anchor files may optionally be parsed, and checks made
- to ensure that the files being referred to actually exist. The existence
- of appropriate markers within the files (when these are referred to) is not
- currently checked.
-
- If "Remove spaces" (Crunch SP) or "Remove carriage returns" (Crunch CR) are
- selected then unpreformatted text has duplicate strings of the relevant
- characters are deleted from the file. This may squeeze a few bytes from the
- file so it can travel down the wires a little quicker.
-
- If "Don't check legality" is not selected then ampersands, quotes, less-than
- and greater than symbols and top-bit-set characters will be queried.
-
- If "Don't check entities" is not selected then anything between a "&" symbol
- and a ";" which is non-numerical will be parsed and checked against an
- internal list of entities.
-
- Groups of HTML files may be dragged to the icon bar icon and processed
- simultaneously.
-
- The front end of the program relies on the existence of Acorn's "FrontEnd"
- module which is part of the DDE. It was once on an "Archimedes World" cover
- disc, but it is recognised that many people will not have access to it from
- that source.
-
- The program may be used from the command line. It is in the "Library"
- directory and is called "HTMLScan". When run with no parameters it prints
- its command-line syntax.
-
- HTMLScan should work from within archives, or from non-writable media,
- though you will not be able to save any options you select.
-
- Problems and wishes
- ~~~~~~~~~~~~~~~~~~~
- Not all GIF files store their width and height in the same manner. The
- "Warning: Unusual GIF format" messages that you will sometimes encounter
- should be nothing to worry about. The program has not failed yet with a
- 'weird' GIF, but I am not absolutely positive I have understood the GIF
- specification correctly, so please advise me if there are problems. GIFs
- containing multiple images of differing sizes may cause confusion. If you
- do not like the "Unusual GIF format" warnings appearing, then you can try
- loading the relevant GIFs into a bitmap editor and then saving them.
- !WebGif2 always produces files which !HTMLScan does not give warnings about.
-
- The program works with all the JFIF style of JPEG file that it has been
- tried with. Colour and greyscale JPEGs are supported, though note that
- progressive JPEGs are not handled. Once again, the exact specification has
- been guessed at to some extent, so there may be files that do not work.
-
- Sprites and PNGs and other graphics formats apart from JPEGs and GIFs are
- not dealt with.
-
- More (and more useful) checking could usefully be implemented. Checking the
- links within files exist when "href=" has an associated marker point would
- be a useful start. The messages given could sometimes be made more helpful.
- Some messages do not give any indication about where in the file the problem
- is most likely to lie.
-
- Tags are treated as though they are heirarchical, but this is not necessarily
- the case and "Some <b>bold, <cite>bold cited,</b> cited and </cite> back to
- normal" is certainly unambiguous, though not all browsers can cope with it,
- and I do not think it is supported by any standard document, HTMLScan
- should probably not query it in the way it currently does.
-
- Add an option to insert the entity which needs to be inserted automatically.
- Now that the Zap HTML mode does this adding it to HTMLScan seems quite
- unimportant.
-
- HTMLScan continues gaily parsing through HTML comments as though they are
- not present. This is a known bug, and will hopefully be addressed.
-
- History
- ~~~~~~~
- 1.20 - Released 08-Jun-98:
- * Added support for HTML-4 entities.
- * HTMLScan now works properly if no task is registered for throwback.
-
- 1.19 - Released 23-Jun-97:
- * Modification to hopefully allow a larger range of JPEG files to be
- processed by HTMLScan. It has not been tested with progressive JPEGs
- yet to see if processing of these is successful.
- * Stopped stupidly adding unnecessary carriage returns when inserting
- image dimensions.
-
- 1.18 - Released 03-Apr-97:
- * Removed bug which caused unnecessary warnings if an anchor tag
- contained both "href=" and "name=" attributes.
-
- 1.17 - Released 05-Feb-97:
- * Fixed bug which involved pages whose local references started with
- the "/" character.
-
- 1.16 - Released 23-Jan-97:
- * Added knowledge about mailto: and gopher: directives, so these are
- no longer flagged as warnings (as long as they are in lower case).
- * References to things in directories with "cgi-bin" in their paths
- are treated less severely.
- * References to directories are now treated more sensibly. However
- errors involving the directory not being present are more likely
- to cause HTMLScan internal problems.
- * Made entity checking case sensitive and removed an illegal entity or
- three.
-
- 1.15 - Released 14-Dec-96:
- * Throwback implemented.
-
- 1.14 - Released 25-Nov-96.
- * Fixed problems with the <a name="name"> construct which has no </a>
- ending tag. !HTMLScan now knows this.
-
- 1.13 - Released 21-Nov-96:
- * Characters such as ", &, >, and < now have their entity equivalent
- indicated by !HTMLScan when they are found.
-
- 1.12 - Released 20-Nov-96:
- * Added a huge list of entities and options for !HTMLScan to
- check all the entities in the document for ones that are not known to
- it.
- * Characters such as ", &, >, and < are now queried as they would
- be better expressed as entities.
- * Problems with the <FORM> tag resolved.
-
- 1.11 - Released 13-Nov-96:
- * Incorrect command-line options in "Desc" file changed.
- * Problems with the <QUOTE> tag resolved.
-
- 1.10 - Released 10-Nov-96:
- * !HTMLScan now copes with files whose paths are not in quotation
- marks provided the path name stays in the restricted case available
- when quotes are not used, i.e. 0-9, A-Z, a-z, '.' and '/'.
- * A dump at the end of the scan of any unmatched tags is now made.
- This should make the task of tracking down unclosed tags easier.
- * More checking is now performed on <CENTER> and <QUOTE> tags.
-
- 1.09 - Released 08-Nov-96:
- * Added the extended command line functionality provided by Acorn's
- "DDEUtils" module to the program.
- * Changed the internal format of the storage of tags internally to
- make it easier to add new tags. This should make tracing backwards
- through the tag-stack to find a tag matching a missing one easier to
- implement.
- * Added dozens of new tags found during my research for ZapHoTMeaL.
- * <META> and <TITLE> tags are now only allowed in the header.
- * <TT> tag added to list queried if strict checking is enabled.
-
- 1.08 - Released 20-Oct-96:
- * Added ability to follow "HRef="s in anchor tags.
- * Added switch to control the above feature off.
- * Added checking to "Background=" parameters of <BODY> tags.
- * Added switches to control the reporting of non-local "HRef="s,
- "Src="s and "Background="s.
- * Strict checking now includes warnings about missing "alt=" parameters
- in image tags, and missing "Text=", "BGColor=", "Link=", "VLink=" and
- "ALink=" parameters in <BODY> tags which use background images.
- * Added switch to make "Be very strict" mode optional.
- * Cured bug causing occasional failure to find 'Src=' files if dozens
- of them had already failed to be located.
- * Tidied up a number of the reported messages.
- * More options have been changed to their opposites. Sorry if this
- causes angst amongst users who are using batchfiles. Once more, this
- makes the command-line syntax more sensible.
- * <HTML>, <BODY> and <TITLE> now all need to be missing for a fatal
- error to be generated. This is now trying to be especially kind to
- errant files with poor headers.
-
- 1.07 - Released 13-Oct-96:
- * Queried <I> and <B> tags as some people have requested a strict mode
- where these tags are faulted as being too specific in their nature,
- with <EM> and <STRONG> tags being recommended in preference.
- * "-v" [verbose] option changed to its opposite "-q" [quiet] partly to
- benefit command-line users, and partly in order to reduce the length
- of the command-line call which can cause problems if both your !Scrap
- directory and !HTMLScan are buried deep in the directory structure.
- Those who are upgrading are advised to delete their !Choices file
- because of this change.
- * Added start up message to tell people that the program is alive and
- well.
- * Added "Processing" file line to output so when processing multiple
- files, accessing the command line is not needed when trying to find
- out which output window relates to which file.
- * <BODY> and <HTML> now both need to be missing for a fatal error to
- be generated. This is now in line with the specification for HTML.
- * "Mismatched tags at end of file" warning replaced by a more specific
- message with the number of tags involved listed.
-
- 1.06 - Released 06-Oct-96:
- * Corrected problems with some GIFs giving 'Unable to locate expected
- comma in GIF file' errors.
-
- 1.02 - Released 14-Sep-96:
- * Corrected messages to remain agnostic with respect to differing <P>
- conventions.
- * Corrected bug associated with images higher in the directory tree
- than the source HTML file (i.e. paths with "../" structures).
- * Template file made more conventional by filling its buttons.
-
- 1.01 - Released 04-Sep-96:
- * Added support for <CENTER>, <P> and <STRONG> tags.
- * Used "Squeeze" instead of proprietary compression because of
- the possibility of StrongARM related problems.
-
- 1.00 - Released 01-Aug-96:
- * First version.
-
- Enjoy
- __________
- |im |yler The Mandala Centre - tt@cryogen.com - http://www.mandala.co.uk
-