ARM Club 3

home *** CD-ROM | disk | FTP | other *** search

/ ARM Club 3 / TheARMClub_PDCD3.iso / programs / comms_networking / htmlscan / !HTMLScan / !ReadMe < prev next >

Wrap

Text File | 1998-06-08 | 11.8 KB | 249 lines

!HTMLScan's ReadMe file ~~~~~~~~~~~~~~~~~~~~~~~ This program was originally designed to add "height=" and "width=" tags to HTML files by reading the information from the headers of the relevant image files. This is desirable as it enables intelligently designed browsers to render the entire file's text before trying to load the images. If the images cause difficulties for any reason, then it is still possible to navigate the page. If the browser multitasks properly, then it should be possible to follow links while the graphics are being redrawn. If a graphic fails to load because of local resource problems this should not interfere with the rendering of most of the page. An analysis of the web sites of many companies seems to reveal that they are in dire need of a tool very much like this one. While the file is being scanned, various syntactical checks are also performed. These are not exhaustive by any means, but are usually sufficient to catch many common mistakes. The following tags are checked for to make sure that they have matching partners at appropriate points in the file: <H1 - H6>, <HTML>, <HEAD>, <BODY>, <TITLE>, <ADDRESS>, <TEXTAREA>, <TABLE>, <FORM>, <QUOTE>, <KBD>, <UL>, <OL>, <A>, <B>, <I>, <EM>, <OL>, <KBD>, <CODE>, <TT>, <SUB>, <SUP>, <STRIKE>, <RIGHT>, <LEFT>, <FONT>, <DFN>, <BLOCKQUOTE>, <PRE>, <CITE>, <CENTER>, <QUOTE> and <STRONG>. <P> tags are also counted and a report is made if they are used unusually. <META> tags are checked to see if they are located in the header. Quotes are matched within tags and tags without open or close markers are noted. Within <img> tags, the existence of a "src=" is checked, along with the presence of "alt=" and of course, "width=" and "height=". If "width=" or "height=" are missing or do not match with the values found in the header of the relevant GIF or JPEG files, these tags are added (if the "Make Changes" option is ticked) and or old values can be overwritten with the new (if the "Overwrite" option is ticked). HREF parameters in anchor files may optionally be parsed, and checks made to ensure that the files being referred to actually exist. The existence of appropriate markers within the files (when these are referred to) is not currently checked. If "Remove spaces" (Crunch SP) or "Remove carriage returns" (Crunch CR) are selected then unpreformatted text has duplicate strings of the relevant characters are deleted from the file. This may squeeze a few bytes from the file so it can travel down the wires a little quicker. If "Don't check legality" is not selected then ampersands, quotes, less-than and greater than symbols and top-bit-set characters will be queried. If "Don't check entities" is not selected then anything between a "&" symbol and a ";" which is non-numerical will be parsed and checked against an internal list of entities. Groups of HTML files may be dragged to the icon bar icon and processed simultaneously. The front end of the program relies on the existence of Acorn's "FrontEnd" module which is part of the DDE. It was once on an "Archimedes World" cover disc, but it is recognised that many people will not have access to it from that source. The program may be used from the command line. It is in the "Library" directory and is called "HTMLScan". When run with no parameters it prints its command-line syntax. HTMLScan should work from within archives, or from non-writable media, though you will not be able to save any options you select. Problems and wishes ~~~~~~~~~~~~~~~~~~~ Not all GIF files store their width and height in the same manner. The "Warning: Unusual GIF format" messages that you will sometimes encounter should be nothing to worry about. The program has not failed yet with a 'weird' GIF, but I am not absolutely positive I have understood the GIF specification correctly, so please advise me if there are problems. GIFs containing multiple images of differing sizes may cause confusion. If you do not like the "Unusual GIF format" warnings appearing, then you can try loading the relevant GIFs into a bitmap editor and then saving them. !WebGif2 always produces files which !HTMLScan does not give warnings about. The program works with all the JFIF style of JPEG file that it has been tried with. Colour and greyscale JPEGs are supported, though note that progressive JPEGs are not handled. Once again, the exact specification has been guessed at to some extent, so there may be files that do not work. Sprites and PNGs and other graphics formats apart from JPEGs and GIFs are not dealt with. More (and more useful) checking could usefully be implemented. Checking the links within files exist when "href=" has an associated marker point would be a useful start. The messages given could sometimes be made more helpful. Some messages do not give any indication about where in the file the problem is most likely to lie. Tags are treated as though they are heirarchical, but this is not necessarily the case and "Some <b>bold, <cite>bold cited,</b> cited and </cite> back to normal" is certainly unambiguous, though not all browsers can cope with it, and I do not think it is supported by any standard document, HTMLScan should probably not query it in the way it currently does. Add an option to insert the entity which needs to be inserted automatically. Now that the Zap HTML mode does this adding it to HTMLScan seems quite unimportant. HTMLScan continues gaily parsing through HTML comments as though they are not present. This is a known bug, and will hopefully be addressed. History ~~~~~~~ 1.20 - Released 08-Jun-98: * Added support for HTML-4 entities. * HTMLScan now works properly if no task is registered for throwback. 1.19 - Released 23-Jun-97: * Modification to hopefully allow a larger range of JPEG files to be processed by HTMLScan. It has not been tested with progressive JPEGs yet to see if processing of these is successful. * Stopped stupidly adding unnecessary carriage returns when inserting image dimensions. 1.18 - Released 03-Apr-97: * Removed bug which caused unnecessary warnings if an anchor tag contained both "href=" and "name=" attributes. 1.17 - Released 05-Feb-97: * Fixed bug which involved pages whose local references started with the "/" character. 1.16 - Released 23-Jan-97: * Added knowledge about mailto: and gopher: directives, so these are no longer flagged as warnings (as long as they are in lower case). * References to things in directories with "cgi-bin" in their paths are treated less severely. * References to directories are now treated more sensibly. However errors involving the directory not being present are more likely to cause HTMLScan internal problems. * Made entity checking case sensitive and removed an illegal entity or three. 1.15 - Released 14-Dec-96: * Throwback implemented. 1.14 - Released 25-Nov-96. * Fixed problems with the <a name="name"> construct which has no </a> ending tag. !HTMLScan now knows this. 1.13 - Released 21-Nov-96: * Characters such as ", &, >, and < now have their entity equivalent indicated by !HTMLScan when they are found. 1.12 - Released 20-Nov-96: * Added a huge list of entities and options for !HTMLScan to check all the entities in the document for ones that are not known to it. * Characters such as ", &, >, and < are now queried as they would be better expressed as entities. * Problems with the <FORM> tag resolved. 1.11 - Released 13-Nov-96: * Incorrect command-line options in "Desc" file changed. * Problems with the <QUOTE> tag resolved. 1.10 - Released 10-Nov-96: * !HTMLScan now copes with files whose paths are not in quotation marks provided the path name stays in the restricted case available when quotes are not used, i.e. 0-9, A-Z, a-z, '.' and '/'. * A dump at the end of the scan of any unmatched tags is now made. This should make the task of tracking down unclosed tags easier. * More checking is now performed on <CENTER> and <QUOTE> tags. 1.09 - Released 08-Nov-96: * Added the extended command line functionality provided by Acorn's "DDEUtils" module to the program. * Changed the internal format of the storage of tags internally to make it easier to add new tags. This should make tracing backwards through the tag-stack to find a tag matching a missing one easier to implement. * Added dozens of new tags found during my research for ZapHoTMeaL. * <META> and <TITLE> tags are now only allowed in the header. * <TT> tag added to list queried if strict checking is enabled. 1.08 - Released 20-Oct-96: * Added ability to follow "HRef="s in anchor tags. * Added switch to control the above feature off. * Added checking to "Background=" parameters of <BODY> tags. * Added switches to control the reporting of non-local "HRef="s, "Src="s and "Background="s. * Strict checking now includes warnings about missing "alt=" parameters in image tags, and missing "Text=", "BGColor=", "Link=", "VLink=" and "ALink=" parameters in <BODY> tags which use background images. * Added switch to make "Be very strict" mode optional. * Cured bug causing occasional failure to find 'Src=' files if dozens of them had already failed to be located. * Tidied up a number of the reported messages. * More options have been changed to their opposites. Sorry if this causes angst amongst users who are using batchfiles. Once more, this makes the command-line syntax more sensible. * <HTML>, <BODY> and <TITLE> now all need to be missing for a fatal error to be generated. This is now trying to be especially kind to errant files with poor headers. 1.07 - Released 13-Oct-96: * Queried <I> and <B> tags as some people have requested a strict mode where these tags are faulted as being too specific in their nature, with <EM> and <STRONG> tags being recommended in preference. * "-v" [verbose] option changed to its opposite "-q" [quiet] partly to benefit command-line users, and partly in order to reduce the length of the command-line call which can cause problems if both your !Scrap directory and !HTMLScan are buried deep in the directory structure. Those who are upgrading are advised to delete their !Choices file because of this change. * Added start up message to tell people that the program is alive and well. * Added "Processing" file line to output so when processing multiple files, accessing the command line is not needed when trying to find out which output window relates to which file. * <BODY> and <HTML> now both need to be missing for a fatal error to be generated. This is now in line with the specification for HTML. * "Mismatched tags at end of file" warning replaced by a more specific message with the number of tags involved listed. 1.06 - Released 06-Oct-96: * Corrected problems with some GIFs giving 'Unable to locate expected comma in GIF file' errors. 1.02 - Released 14-Sep-96: * Corrected messages to remain agnostic with respect to differing <P> conventions. * Corrected bug associated with images higher in the directory tree than the source HTML file (i.e. paths with "../" structures). * Template file made more conventional by filling its buttons. 1.01 - Released 04-Sep-96: * Added support for <CENTER>, <P> and <STRONG> tags. * Used "Squeeze" instead of proprietary compression because of the possibility of StrongARM related problems. 1.00 - Released 01-Aug-96: * First version. Enjoy __________ |im |yler The Mandala Centre - tt@cryogen.com - http://www.mandala.co.uk