home *** CD-ROM | disk | FTP | other *** search
- | _ | __ | __ !LinkCheck 1.30ß 07Jan99
- |_| | |_) | |_ | |_ 1.32 21Feb99
- | | | | \ | |__ | | Checks a local directory
- —————+—————+—————+————— containing a web-site to
- / | /¯\ | // | \ verify the existence of
- \ | |¯| | // | / all files referenced in
- —————+—————+//———+————— hypertext links and image
- \\ | // | tags, eg: <A HREF=...>,
- \\| // | <AREA HREF=...>, <BODY BACKGROUND=...>,
- ————\\———//+—————+————— <IMG SRC=...>, <IMG USEMAP=..>, <FORM ACTION=..>,
- \\ // | | <INPUT NAME="linkto" VALUE=..>, & <FRAME SRC=..>;
- |\×/ | | tabling a “connectivity” map.
-
- Operating Instructions (please read! ;-)
- ======================
-
- To achieve full functionality, LinkCheck needs four options files:
- a BASIC file called !Choices, a text file called !SrcDiry, a text
- file called !BaseURLs, and a text file called !NoFollow.
- A default !Choices file is supplied, and LinkCheck WILL run with
- NONE of these present, so you may well be able to skip setting these up,
- and get straight on with using this application “as is”.
-
- But you are recommended to refer to the descriptions of these, which
- are at the end of these instructions... at least eventually!
-
- Running/installing
- ------------------
- To Run !LinkCheck, double-click on its icon, and it will install itself
- on the icon-bar.
-
- The icon-bar menu
- -----------------
- If you click Menu on the icon-bar icon, you will get a standard menu:
- the first and last items are the usual “Info” and “Quit” items.
-
- Clicking on the second item “Help...” brings up this instructions file.
-
- The third item “Options =>” leads to a window showing the current
- settings for maximum numbers and sizes of files that can be processed,
- and of any special requirements to check FORMs-submissions against.
- If these need amending, see the section “!Choices/Options” at the end
- of these instructions.
-
- If a valid !SrcDiry file is present, containing the full path name of
- a directory containing a site (or an HTML file in it): clicking on the
- fourth option (the leafname of the above path name) will analyse it.
- Otherwise it is the greyed-out word “!SrcDiry”: to set this up, see the
- “!SrcDiry” section at the end of these instructions.
-
- The icon-bar icon
- -----------------
- If you click (Select) on the icon-bar icon, a “drop target” window
- appears.
-
- Later on in a session, if you click Adjust on the icon-bar icon,
- it will offer to re-analyse the last thing it did;
- this is intended for when you’ve done one run, corrected some errors,
- and want to re-run to check that the errors have now been removed,
- but should NOT be used if you have added any NEW files to the site.
-
- Now to actually get it to do something:
- ======================================
- There are three ways of starting up:
-
- [1]
- Drag EITHER the directory icon of the directory containing your site,
- OR an HTML file within that directory; on to any of:
- (a) the icon-bar icon, or
- (b) the “drop target” window if it is showing, or
- (c) (later on in a session) any LinkCheck window that’s open.
-
- [2]
- To re-do a previously-analysed directory, Adjust on the icon-bar icon.
-
- [3]
- To analyse the directory specified inside !SrcDiry (if that has been
- set up), click Menu on the icon-bar icon, and select the fourth item.
-
- IF "!SrcDiry" contains the pathname of the "root" directory holding the
- full site,
- AND the directory [or file] you drag in is [or is in] a sub-directory
- of that root:
- THEN you are offered the option to process the sub-directory
- "as a sub-site" (meaning that although it will only process pages inside
- the sub-directory, it will follow links/references outside that directory
- to anywhere in the "root");
- OTHERWISE it will only check everywhere inside the sub-directory
- but nowhere outside it
- (and so flag errors for all links to outside the sub-directory).
-
- A window will appear saying “Cataloguing...” (and an hourglass until
- it has finished).
- This happens regardless of whether it was a file or a directory that
- was dragged in.
- If the site directory contains sub-directories, these will be indexed
- recursively.
- Any corrupt files that get encountered will pop up a "warning box":
- this box will self-cancel if you do not acknowlege within 5 seconds;
- but all the warnings are logged and appended to the "Catalog" file.
-
- That window then gets replaced by the
-
- LinkCheck Control window
- ------------------------
- The first three items show:
- the path-name of the file or directory,
- the number of HTML (and .map) files, and
- the total number of files in the directory
-
- [If you had chosen the "sub-site" option, the first number is of
- .html (and .map) files within the sub-site only; the second number
- is the total of all files everywhere in the main root directory].
-
- There are then seven tickable option buttons:
-
- [ ] Do this page only : only available if a FILE was dragged in;
- (if it is, it will be pre-ticked).
- [ ] Follow #fragments : eg as in HREF=page2.html#para3: this adds to
- the processing time, so should be left unticked
- the first time you check a potentially error-
- ridden directory, to reduce output ;-)
- [€] Check TARGET dest : check that the TARGET given in <A HREF=...>
- has been declared in a <FRAME NAME=...>
- (this MAY give spurious error reports if the
- pages don't get processed in the right order).
- [ ] Check image dimns : both for their presence, and being correct:
- there is no significant increase in processing
- time, but this can generate a lot of output if
- your IMG WIDTH/HEIGHT attributes are missing!
- [ ] Verbose reporting : in the Report file generated: this will name
- every HTML/.map file that it processes even if
- error-free; and also notes any “off-site” URLS
- (unless they are in an enabled “NoFollow” list)
- so don’t turn it on for pages full of Links!
- [ ] Use NoFollow list : only available if !NoFollow exists: any URLs
- in that file will not be commented on during
- verbose reporting.
- [ ] Ignore fName case : allows for “name.ext” in the HTML, but
- “NAME/EXT” as the local filename (as perhaps
- a site directory in a DOS partition).
-
- The [Catalog...] button brings up a menu allowing you to View, Print,
- or Save the catalogue that has just been indexed.
- You will need this index to interpret the “Matrix” later; but there’s
- no need to get the index out immediately:
- If you JUST want this catalogue (do not intend to do a full analysis),
- now’s the time to do it; but if you DO intend to carry on with the full
- analysis, the opportunity to access the Catalog will be repeated later.
-
- The [Cancel] button does just that.
-
- Click on [Analyse] to start the process; this produces the
-
- LinkCheck Analysis window
- -------------------------
- The first three long display lines show:
- the name of the file currently being processed;
- the tag being commented upon (if any) and its line number;
- any comment (in blue) or error (in red) it has found.
-
- At bottom left are running counts of the number of files processed, and
- the accumulative number of errors found (“comments” are not counted).
-
- The hourglass is displayed during the scanning process.
-
- Although there is no “Stop” button, if you realise you’ve started
- something you didn’t intend to, you can interrupt the process by
- pressing the [Esc] key. Obviously, any Report file that was being
- generated will be incomplete, and no Matrix file is generated.
-
- When it’s finally finished, it will summarise any instances of
- pages leading nowhere, unreferenced files, and failed links;
- Note that a a reference to "/cgi-bin/whatever" will usually generate
- a "failed link" error (but "http://www.domain/cgi-bin/..." wouldn't).
-
- Then four more buttons appear:
- [Close] [View...] [Print..] [Save...]
-
- Clicking on [Close] merely closes the window (I bet you guessed that ;-)
-
- Inspecting/saving the results
- -----------------------------
- Clicking on any of the other three brings up a three-item menu:
- Catalog
- Report
- Matrix
-
- “Catalog” refers to a list of all the files in the site directory;
- it also shows the “file numbers” which are VITAL for
- interpreting the “Matrix” output!
-
- “Report” refers to the report of all errors found (and possibly
- comments too, if Verbose reporting was selected).
-
- “Matrix” is the “connectivity map” showing all links between all files,
- highlighting any unconnected files, and counting failed links.
-
- When you select a file, what happens depends on the previous button:
-
- [View...] just throws the relevant file into your configured text editor;
-
- [Print..] simply bangs ASCII text out of the parallel printer port
- (none of this fancy PrinterDriver_InscrutableOp stuff ;-)
- however, it will turn condensed on if printing a wide matrix.
-
- [Save...] produces a “Save as” dialogue box, which is not quite standard
- in that
- (a) it’s not transient, so doesn’t disappear when you go to
- open a directory to drag the text icon to; and
- (b) it doesn’t have one of those [OK] buttons which merely
- generate an error telling you you’re an idiot to click on it!
-
- So you must drag the textfile to a destination directory
- (you may edit the suggested leafname first if you want to).
-
- However, saving is actually done by using *Copy, so it
- does NOT implement application-to-application transfer.
-
- Note that all three of the above files can always be found by
- SHIFT-double-clicking on the !LinkCheck icon to open its directory.
-
- The connectivity Matrix
- -----------------------
- [If only a single page was analysed, this report summmarises all the
- links from that page, but there is no “matrix” as such; the following
- description only applies when a whole site directory has been analysed]
-
- If there are n HTML files, and N files in total:
-
- The top line says “\Fr” (meaning “from”), followed by the numbers
- 1 to n, followed by “To”.
-
- The left-hand column is headed “To\”, followed by the numbers 1 to N
- (with a gap after the first n, to separate HTML files from “others”).
-
- Note that nowhere is there any mention of actual fileNAMES; you must
- refer to the “Catalog” to decode the numbers (basically, there just
- isn’t room to squeeze full names into a potentially very wide table).
-
- The number in each cell of the table is the number of times there is
- a reference in the file with that column number to the file with that
- row number.
-
- The (n+1)th column contains the totals for each row, ie the number of
- times the file in that row has had a reference to it.
- If a total of zero occurs, it will be asterisked, because it means that
- that page or file is never accessed or referenced.
-
- The (N+1)th row contains the totals for each column, ie the number of
- references from the file in that column (there are also subtotals
- after the nth file, which should include all on-site hypertext links,
- but exclude references to IMaGe SouRCes)
-
- Again, any totals of zero are highlighted, because that means that the
- page doesn’t lead anywhere (not even back to the index page).
-
- Below that there is a row labelled “Bad” enumerating the number of
- failed links from that page: these should all be zero!
-
- In the penultimate line, the “column heading” is repeated.
-
- Finally, a summary of frame-names (if any) and where declared,
- pages leading nowhere, unreferenced files, and failed links.
-
- If it is a framed site, “failed links” includes instances where an
- anchor has a TARGET= attribute, but the frame name could note be found
- in a <FRAME> tag; this can occur erroneously if the frame-defining page
- is not “index”, and does not get analysed until after the page containing
- the TARGET.
-
- Also in framed sites, the “index” page probably has no references to it:
- this would be normal, but reported by !LinkCheck as if it were an error.
-
-
- Setting up the four Options Files
- -------------------==============
- Any or all of these can be missing, and the program will still work; but
- setting them up will enable you to get the most out of the application.
-
- File “!Choices”
- --------------
- If this is absent, internal default options will be used.
-
- If present, it is a short BASIC LIBRARY procedure which initialises
- some parameters; it is not necessary for all parameters to be set.
-
- Once LinkCheck is running, you can read the settings by clicking Menu
- on the icon-bar icon, and moving across the item “Options =>”.
-
- If you want to edit it: this is a BASIC file, so you need to
- SHIFT-double-click on it to put it into an editor.
-
- The first three are used to determine the size of arrays inside:
- “max1%=” sets a maximum to the number of HTML and “.map” files, and
- “max2%=” sets the maximum total number of files to be expected:
- the default values for these are 51 and 255 respectively, and I do
- not normally recommend that max1% be increased beyond about 85,
- although you may do if necessary;
- “maxF%=” sets a limit to the largest file-size (in bytes) of an HTML
- or “.map” file that can be loaded and analysed; the default is 72K.
-
- If your site needs these values increasing, then you should do so;
- but if the existing values are large enough, increasing them will
- merely waste memory!
-
- Please note that if you alter the values in “!Choices”, the new
- values will not take effect until after you have quit and re-run.
-
- If these values are increased greatly, it may become necessary to
- also increase the WimpSlots in the “!Run” file.
-
- The next three are used for parameters for server-side reply form
- processing; you can ignore them if they are not relevant to you.
-
- “formMethod$=” and “formAction$=” specify “trigger” values for the
- METHOD= and ACTION= attributes within a <FORM ...> tag: these are
- set to the values required to access the server-side form-decoder
- (which for Argonet are "GET" and "http://www.argonet.co.uk/cgi-bin/mail"
- respectively), and if present, subsequent <INPUT ...> tags are searched
- for a NAME= attribute whose value matches that specified by the last
- parameter “formName$=” (whose value for Argonet is "linkto").
- (If you haven’t understood the above technobabble, don’t worry :-)
-
-
- File “!SrcDiry”
- --------------
- If you have one particular web-site directory that you will want to
- check regularly (eg the local copy of your own site, as you update it),
- its pathname should be in this text-file.
-
- An empty !SrcDiry file is supplied; double-click on it to load it
- into your text editor. You could then type in the full pathname of
- the directory; but you can simplify this (in !Edit) by SHIFT-dragging
- the directory icon on to the page.
-
- This file may be empty or even missing, and the program will still
- operate; but if you want to use the “sub-site” option, this file
- must exist and must contain the pathname of the “root” of the site.
-
-
- File “!BaseURLs”
- ---------------
- You should also create a textfile inside !LinkCheck called “!BaseURLs”
- containing the base URL(s) of your “real” site (up to ten of them).
-
- (This is so that the program can look at full absolute URLs and be
- able to “know” whether they refer to your site, in which case it will
- expect to be able to find the leafname locally; or whether it refers
- to a different site altogether, so there’s no point in looking!)
-
- If not required, this file may be absent or empty.
-
-
- File “!NoFollow”
- ---------------
- You may now have a text-file called "!NoFollow" inside the application.
- This can contain a list of up to 100 URLs or references which you do
- NOT want to be commented upon when checked but not found.
-
- There are now two circumstances under which reporting of failed links
- or missing files can be suppressed:
-
- "External", ie absolute URLs to other sites; and
- "Internal", ie on-site files which should be present, but might be
- (knowingly/deliberately) missing from your local site.
-
- These two filters are invoked separately according to context.
-
- However, the "filter template" specifications for both types are all put
- inside the one !NoFollow file (as was just used for external before):
-
- (a) Lines with an asterisk at the END are "external", ie
-
- ftp://*
-
- would suppress "Not local" messages for any URL beginning "ftp://"
- (the same as the original LinkCheck did).
-
- (b) Lines STARTING WITH an asterisk are "internal", ie
-
- *.cgi
-
- would suppress "Not found" messages for anything with a ".cgi" extension.
-
- (c) Lines with an INTERMEDIATE asterisk are also "internal", ie
-
- reviews/*.jpeg
-
- would suppress "Not found" messages for a reference which:
- (i) contains the string "reviews/" anywhere in it, /AND/
- (ii) ends with a ".jpeg" extension.
-
- (d) Lines with no asterisk at all are put into BOTH the "external"
- AND "internal" lists;
- but for suppression to happen, the file-ref must match exactly
- (and I've a sneaky suspicion this either won't work, or else
- may not be very useful ;-)
-
- (e) Lines with 2 or more asterisks:
- Please don't do this; I confidently expect it to crash ;-)
-
- Note that it still CHECKS all links etc; the NoFollow filter
- merely suppresses the error reports:
- for example, one of my test runs produced a summary of
- "4 files, 0 Errors, 98 failed links" ;-)
- because it still fills in the "connectivity matrix".
-
- If not required, this file may be absent or empty.
-
-
- John Alldred 18Jan99
- john@protovale.co.uk
- http://www.protovale.co.uk/john/
- http://www.argonet.co.uk/users/protovale/john.html
-