Notes
Execution may be terminated using CTRL-C
. The program exits
as fast as is possible safely. It can take a little while if it is
processing large HTML files at the time. As each task notices the
CTRL-C
, the status will change to "*BREAK*"
.
If a particular host times out on more than 5 occasions, no further
attempts are made to download any files from that host. This avoids wasting
time attempting to connect to a host that is down.
The file "TCPDLDIR:urllist"
The "urllist" file should contain one or more URLs which are to be downloaded.
Each URL should start on a new line, and be followed by the appropriate
flags, separated by spaces.
"urllist" supported flags
Dn
Download n levels of text/html
DEFAULT: 255 (!)
|
Hn
If another host is referenced by an "HREF", max number of levels is set to n.
DEFAULT: H0 (i.e. Current host only)
|
Pn
If path is other than that given in your urllist, max number of levels is set to n.
DEFAULT: Pn (Where n is the same as for Dn [see above])
|
TEXT
Download files referenced by an "HREF" attribute. These will commonly,
but not exclusively, be HTML files.
|
IMG
Download files referenced an attribute other than "HREF". These are
commonly, but not exclusively, images.
|
ALL
Download all types of files. This is the default if neither
"IMG", "TEXT" nor "ALL" have been specified. (Note that this will not
download files with types that appear in IGNORE lines
within tcpdl.config)
| | | | | | | |
"urllist" examples
-
http://www.amiga.com/index.html D2 H3 TEXT
- ...will download 2 levels of HTML files referenced by the specified file
from www.amiga.com, and 3 levels of links to any other host.
-
http://www.ramjam.u-net.com/home.html D5 H0 ALL
- ...will download 5 levels of files referenced by the specified file from
www.ramjam.u-net.com, but will not download any files that are referenced
on any other host.
-
http://www.ramjam.u-net.com/ TEXT
- ...will download all text files referenced by the default home page from
the host www.ramjam.u-net.com.
| |
The file "TCPDLDIR:tcpdl.config"
The "tcpdl.config" file is optional. If present, it will be read in and
the contents will be used in determining what file types will be
downloaded.
Any line with a hash ("#") in column 1 will be ignored as a comment.
White space is ignored, and the commands are not case-sensitive.
Currently the following configuration commands are suported:
"tcpdl.config" supported commands
IGNORE <suffix>
Where suffix is a file suffix which should not be downloaded. Note
that such files mentioned explicitly in urllist will be downloaded,
but any such files referenced within html will not.
Note that suffix may contain any characters except white space,
but will only be matched against the end of a file name.
|
PROXY <proxyserver:port>
This specifies that all HTTP requests should be sent via the
specified server. If the port number is omitted, then a default
of 8080 is used.
By specifying your ISP's proxy server you can improve download
speeds significantly - especially for busy sites.
A proxy will also be required for connections via a firewall.
|
CONTIMEOUT <seconds>
This specifies the initial timeout for each connection in seconds.
The default is 20. The timeout must be within the range 10 to 600.
|
HTTPTIMEOUT <seconds>
This specifies the timeout for each http request in seconds.
The default is 60. The timeout must be within the range 10 to 600.
|
RETRIES <number>
This specifies the number of attempts that will be made to download each
file.
The default is 5. The value must be within the range 1 to 100.
|
USER <mail address>
The mail address is sent to the HTTP server as the address that
mail can be sent to if there are problems caused by tcpdl's requests.
If the USER option is specified without a mail address, then
HTTP requests will not include any mail address (this can help
maintain your anonymity). If the USER option is not present in
the config file, then your current user id and host name are
used.
|
TASKS <number>
The number of URLs that will be downloaded concurrently. If this
option is not specified then the default of 12 will be used.
The valid values are 1 to 15. This value may be overridden by
the TASKS command line option.
|
USERAGENT <string>
The string specifies the user agent name that will be sent to the
HTTP server. The string is assumed to start at the first non-blank
character after the USERAGENT keyword, and to run until the end
of the line - so it may contain spaces. This option allows tcpdl
to appear as if it is some other HTTP client, which is necessary
to access some (broken) sites which only accept requests from
certain browsers.
|
NOWAIT
This specifies that tcpdl will not wait for return to be pressed before it
exits. This option is useful if tcpdl is run from within a script.
|
PUBSCREEN <name>
This specifies the name of the public screen that the tcpdl status window
should use. This can be overridden by the PUBSCREEN command line option.
|
FONT <name>
This specifies the name of the font that the tcpdl status window should
use. Note that the ".font" suffix should not be given, and for best results
only monospaced fonts should be used. This option can be overridden by the
"FONT" command line option.
|
FONTSIZE <value>
This specifies the size of the font that the tcpdl status window should
use. The height of the window will be adjusted appropriately. By default a
font size of 9 will be used. This option can be overridden by the FONTSIZE
command line option.
|
PRIORITY <number>
This specifies the maximum priority to be used by the tcpdl tasks.
Priorities must be in the range 0 to 5 inclusive. By default a priority of
2 will be used. This option can be overridden by use of the PRIORITY
command line option. Using a value of 0 may be useful if you want tcpdl to
operate in the background while you are using a browser or IRC client.
| | | | | | | | | | | | | | |
"tcpdl.config" example
#
# Specify the Demon Internet proxy server
#
PROXY www-cache.demon.co.uk:8080
#
# Specify the timeouts - small since we're using a proxy
#
CONTIMEOUT 10
HTTPTIMEOUT 30
#
# Specify the number of attempts for each file
#
RETRIES 2
#
# Specify that no mail address is to be sent to the server
#
USER
#
# Specify the number of URLs to be downloaded concurrently.
# This can be overridden by the TASKS command line option.
#
TASKS 12
#
# Specify the maximum priority to be used by the tcpdl tasks
#
PRIORITY 2
#
# Specify the FONT and FONTSIZE to be used
#
FONT Xen
FONTSIZE 9
#
# Specify the file suffixes not to be downloaded
#
# ignore lha archives
IGNORE .lha
# ignore zip archives
IGNORE .zip
# ignore .wav sound files
IGNORE .wav
# ignore MS-DOS executables
IGNORE .exe
| |
The file "TCPDLDIR:tcpdlpp.config"
The tcpdlpp.config file is optional. If present, the character translations
defined in it are applied to each URL processed. Note that the actual file
names are NOT changed, but only the URLs within each html file.
Each line of this file consists of a character literal to be converted
and a character literal or string that should replace it. Character
literals should be enclosed by single quotes ('), and strings should
be enclosed by double quotes (").
White spaces (spaces and tabs) are ignored unless inside a string.
Any line where the first non-whitespace character is a hash (#) is
treated as a comment and ignored.
Certain escape characters are allowed in character literals and strings:
"tcpdlpp" escape characters
\a
bell
|
\b
backspace
|
\f
formfeed
|
\n
newline
|
\r
carriage return
|
\t
horizontal tab
|
\v
vertical tab
|
\\
backslash
|
\'
single quote
|
\"
double quote
|
\nnn
character with octal value nnn
|
\xnnn
character with hexadecimal value nnn
|
e.g. the following line will convert MS-DOS style backslashes into the
AmigaDOS & Unix style forward slashes:
'\\' '/'
The following line will convert tilde into the safe "%xx" equivalent:
'~' "%7E"
Each character is translated using a single rule - even if the end result
includes a character which would have been translated by some other rule.
This allows two characters to be swapped over.
| | | | | | | | | | | | | |
"tcpdlpp" listing
The tcpdlpp program post-processes the files downloaded by tcpdl. It
expects the same "tcpdldir:" assign as tcpdl does.
A listing is sent to "stdout
", which consists of 3 sections:
- A list of each file processed or skipped. Since only HTML files will
contain URLs to be updated, all other files are skipped. This acts
as a progress indicator.
- A list of all local files. The number of references to each is given.
If a file has no references to it, either it is a top level HTML file,
or it is simply not referenced. If you are building a browsable copy
of your favourite sites, you may want to delete any unreferenced files
to save disk space.
- A list of all non-local URLs. As for the local files, the number of
references made to each by the local files is given. If a particular
URL has a lot of references, you may want to download that URL too.
Each time you add or remove files from the "tcpdldir:http" directory, you
should re-run tcpdlpp to adjust any links that require amendment.
Files which are present, but which have had the URLs which reference
them modified in some way by the translations in the "tcpdlpp.config" file
will be listed as non-local unless the file name has been modified also.
If such files are listed, then the file names should be changed, and
tcpdlpp re-run to identify those files as local.