CHECKURL Manual
CHECKURL.REX - Program to Check URLS (Windows and OS/2) |
This program's basic task is to check URLs in an attempt to eliminate
404 errors etc from your web pages.
It can be run under Windows 95/98/NT/W2K as well as OS/2 (rename to ".CMD").
The ppwizard "VALRURL.H" header file macros is one way that
URL lists could be created. Another is the Netscape "Create URL object"
option (or Netscape drag and drop pages). When used with the ppwizard header
the location in your source is identified (not the location in the target
html).
regina.exe CHECKURL.REX Command [CommandsParameters] [Options]
To use in windows requires the
free regina interpreter as well as
the contents of the enclosed and free "W32RXSCK.ZIP" for sockets
support (unzip contents into current or windows directory).
CHECKURL.CMD Command [CommandsParameters] [Options]
The native interpreter is used under OS/2, the only requirement is that
you rename "ppwizard.rex" to "ppwizard.cmd".
This program supports a large number of commands and options, most of
which you will not be interested in, you may wish to look at the example
below which is the batch file I used when checking all my URLs.
Options may contain text of the form "{x??}" where "??" is the
hexadecimal value of the character you wish substituted.
You will not need to do this very often but when you do you will probably
wish to use "{x20}" for the space character.
The program takes a number of commands as follows:
- VERSION?
The program will return (not displayed) its version number.
- SOCKETVERSION?
The program will return (not displayed) the version of
"RxSock.DLL" in use (for "http" checks).
- FTPVERSION?
The program will return (not displayed) the version of
"RxFtp.DLL" in use (for "ftp" checks).
- SOCKETREADY?
The program will return (not displayed) "OK" if "http"
checking is possible or a reason why not.
- FTPREADY?
The program will return (not displayed) "OK" if "ftp"
checking is possible or a reason why not.
- CHECK1URL OneUrl
The program will return (not displayed) "OK" if the URL
(with begins with "http://" or "ftp://"
can be accessed or a reason why it couldn't.
- CHECKLISTEDURLS [+]UrlFileMask1 ...
The program will read the file or files specified and sort the
URLs listed.
If the file mask is preceeded by "+" then all subdirectories
below the one specified are also processed.
A file that ends in '.url' and whose first line starts with '['
is considered to be a Windows URL shortcut and the URL on the line
that begins with 'URL=' is processed and all other lines are ignored.
This option also allows OS/2 WPS URL objects to be tested,
just open up the properties to determine the path.
Within each file blank lines and lines that begin
with ';' are ignored as is leading and trailing whitespace
on all lines and URL duplicates.
The progress of URL checking is displayed on screen and the
return code is the total number of failing URL's.
It is highly recommended that you use the "/CheckDays"
switch even if you give it a value of 1 (recheck every day).
- CHECKURLSINHTML [+]HtmlFileMask1 ...
The program will read the file or files specified. The files are
scanned for URLs, otherwise it functions pretty much like
the "CHECKLISTEDURLS" command.
- /MemoryFile[:NameOfFile]
A memory file is used to hold details about checked URLs.
It becomes really useful if the "/CheckDays" option is also used
as this program will then not retest all URLs but can be more
selective (for example not retest URLs you only checked yesterday).
- /CheckDays[:OkUrlMinAge[-OkUrlMaxAge]]
This will determine how long ago is too long since the last
successful URL check.
For example setting this to 14-14 means that a URL which tested OK
today will not be retested for 2 weeks.
You must have used the "/MemoryFile" switch to specify a file.
If you had have said "14-21" then for each URL a random day
between 14 and 21 would be chosen.
This allows the URL checking to gradually spread, that is every
two weeks you are not doing the full check.
Note that once a URL has failed this checking is bypassed, the
next time you run this program will will always retest URLs that
failed on the previous run.
- /Exclude[:[+]UrlFileMask]
This switch is only useful when the CHECKLISTEDURLS
command is also used, you can use this switch as many times as
you wish and order can be important (when input filemasks are mixed
with exclude masks).
- /ForgetDays[:NumberOfDays]
By default old URLs are dropped from memory if not referred to for a
significant period of time.
This option can be used to specify how old a URL can become or to
turn off the dropping (don't specify a value).
- /ReadTimeOut[:NumberOfSeconds]
Allows you to specify the maximum number of seconds we will wait
for a server to respond to a request.
A blank value will restore default.
- /TimeOutRetry[:NumberOfSeconds]
If a URL check fails with a timeout, you may specify that after
URL checking has completed that these URLs be retried.
If a value of 0 is supplied you don't wish a retry otherwise
the value represents the timeout value in seconds that should be
used.
A blank value will restore default.
It may be wise to increase the value past that used on the
"/ReadTimeOut" switch.
- /GetEnv:NameOfEnvVar
Allows you to pick up options from an environment variable.
- /ErrorFile[:[+]NameOfFile]
Create new or append to old error file.
This file will hold the complete list of failing URLs along with
the reason for the failure in a format that this program can
accept as input (rename file first!).
This is probably less useful if "/IniCheckDays" is specified.
- /FtpEmail:YourEmailAddress
When checking FTP addresses this value is used for the password to
the "Anonymous" user, if not supplied a default (obviously
incorrect) value is used.
- /TestUrl[:Url]
I don't know of any way I can tell if the network is
available. By default a known internet URL is checked as this
is expected to exist, if found you must have access to the
internet. You can either change URLs (if testing intranet URLs etc)
or turn it off altogether.
- /MemoryBackupLevel[:Level]
This determines how many memory file backups are kept. A value
of 0 turns off backups otherwise a value of 1 to 9 is required.
- /OkResponses:OkResponseList
Some pages/servers return some annoying return codes which you
know are "OK", I know of a page that returns "500" and another
actually returns "404".
The most common return codes that would be ignored are "301" and
"302".
The parameter on this switch is the name of a file.
Blank lines or lines starting with ';' are ignored and all
leading an trailing whitespace is removed.
Each remaining line has two or three parameters separated by
one or more spaces.
There are actually a number of different command types
as follows:
- /IgnoreFor:NumberOfDaye
This command will output a "IGNORE" command into the error file
suitable for cut+paste into a "/OkResponses" file.
You specify the period of time (in days) that you would like to
ignore thsi problem.
Quite frequently problems will come and go, where you feel this is
a possibility you can simply cut+paste the text rather than being
forced to enter the command yourself...
- /PageMoved:PageMovedText
Allows you to choose some text which from experience you know
is included in pages that have moved.
Some sites will not return server 301/302 codes to indicate
that a page has been moved.
Note that you will need to use "{x20}" to represent spaces.
- /MaxPageLng:NumberOfCharacters
This program will keep reading html up until the page becomes
longer than the value specified here. Note tham making this value
too small can effect the detection of page movement where the server
does NOT return 301 or 302 response codes.
- /CheckPoint:HowOften
How often (after how many urls) a checkpoint to the
"memory file" is made.
If you test 200 URLS you don't want to start from
scratch if something "happens".
- /HttpUserAgent[:SimulateWhichBrowser]
This allows you to make the checking program look like a
specific browser to the server.
The server will see this information as "HTTP_USER_AGENT".
CHECKURL Environment Variables |
- CHECKURL_DEBUG=[+]NameOfFile
Create new or append to old debug file.
This file will hold much more detail about the programs internal
workings.
This will slow down URL checking.
If reporting problems with this command please send me the debug
output.
- CHECKURL_OPTIONS=Options
This allows you to specify command line options in the environment.
When a filename is required it may contain the text
"{Date}", "{DateNumbers}" or "{Time}", these get replaced with
"YYMonDD", "yyyymmdd" or "hhmmss".