[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]

Notes

Execution may be terminated using CTRL-C. The program exits as fast as is possible safely. It can take a little while if it is processing large HTML files at the time. As each task notices the CTRL-C, the status will change to "*BREAK*".

If a particular host times out on more than 5 occasions, no further attempts are made to download any files from that host. This avoids wasting time attempting to connect to a host that is down.


The file "TCPDLDIR:urllist"

The "urllist" file should contain one or more URLs which are to be downloaded.

Each URL should start on a new line, and be followed by the appropriate flags, separated by spaces.

"urllist" supported flags

Dn

Download n levels of text/html

DEFAULT: 255 (!)

Hn

If another host is referenced by an "HREF", max number of levels is set to n.

DEFAULT: H0 (i.e. Current host only)

Pn

If path is other than that given in your urllist, max number of levels is set to n.

DEFAULT: Pn (Where n is the same as for Dn [see above])

TEXT

Download files referenced by an "HREF" attribute. These will commonly, but not exclusively, be HTML files.

IMG

Download files referenced an attribute other than "HREF". These are commonly, but not exclusively, images.

ALL

Download all types of files. This is the default if neither "IMG", "TEXT" nor "ALL" have been specified. (Note that this will not download files with types that appear in IGNORE lines within tcpdl.config)



"urllist" examples

http://www.amiga.com/index.html D2 H3 TEXT
...will download 2 levels of HTML files referenced by the specified file from www.amiga.com, and 3 levels of links to any other host.
http://www.ramjam.u-net.com/home.html D5 H0 ALL
...will download 5 levels of files referenced by the specified file from www.ramjam.u-net.com, but will not download any files that are referenced on any other host.
http://www.ramjam.u-net.com/ TEXT
...will download all text files referenced by the default home page from the host www.ramjam.u-net.com.


The file "TCPDLDIR:tcpdl.config"

The "tcpdl.config" file is optional. If present, it will be read in and the contents will be used in determining what file types will be downloaded.

Any line with a hash ("#") in column 1 will be ignored as a comment.

White space is ignored, and the commands are not case-sensitive.

Currently the following configuration commands are suported:

"tcpdl.config" supported commands

IGNORE <suffix>

Where suffix is a file suffix which should not be downloaded. Note that such files mentioned explicitly in urllist will be downloaded, but any such files referenced within html will not.

Note that suffix may contain any characters except white space, but will only be matched against the end of a file name.

PROXY <proxyserver:port>

This specifies that all HTTP requests should be sent via the specified server. If the port number is omitted, then a default of 8080 is used.

By specifying your ISP's proxy server you can improve download speeds significantly - especially for busy sites.

A proxy will also be required for connections via a firewall.

CONTIMEOUT <seconds>

This specifies the initial timeout for each connection in seconds.

The default is 20. The timeout must be within the range 10 to 600.

HTTPTIMEOUT <seconds>

This specifies the timeout for each http request in seconds.

The default is 60. The timeout must be within the range 10 to 600.

RETRIES <number>

This specifies the number of attempts that will be made to download each file.

The default is 5. The value must be within the range 1 to 100.

USER <mail address>

The mail address is sent to the HTTP server as the address that mail can be sent to if there are problems caused by tcpdl's requests. If the USER option is specified without a mail address, then HTTP requests will not include any mail address (this can help maintain your anonymity). If the USER option is not present in the config file, then your current user id and host name are used.

TASKS <number>

The number of URLs that will be downloaded concurrently. If this option is not specified then the default of 12 will be used. The valid values are 1 to 15. This value may be overridden by the TASKS command line option.

USERAGENT <string>

The string specifies the user agent name that will be sent to the HTTP server. The string is assumed to start at the first non-blank character after the USERAGENT keyword, and to run until the end of the line - so it may contain spaces. This option allows tcpdl to appear as if it is some other HTTP client, which is necessary to access some (broken) sites which only accept requests from certain browsers.

NOWAIT

This specifies that tcpdl will not wait for return to be pressed before it exits. This option is useful if tcpdl is run from within a script.

PUBSCREEN <name>

This specifies the name of the public screen that the tcpdl status window should use. This can be overridden by the PUBSCREEN command line option.

FONT <name>

This specifies the name of the font that the tcpdl status window should use. Note that the ".font" suffix should not be given, and for best results only monospaced fonts should be used. This option can be overridden by the "FONT" command line option.

FONTSIZE <value>

This specifies the size of the font that the tcpdl status window should use. The height of the window will be adjusted appropriately. By default a font size of 9 will be used. This option can be overridden by the FONTSIZE command line option.

PRIORITY <number>

This specifies the maximum priority to be used by the tcpdl tasks. Priorities must be in the range 0 to 5 inclusive. By default a priority of 2 will be used. This option can be overridden by use of the PRIORITY command line option. Using a value of 0 may be useful if you want tcpdl to operate in the background while you are using a browser or IRC client.



"tcpdl.config" example

#
# Specify the Demon Internet proxy server
#
PROXY www-cache.demon.co.uk:8080

#
# Specify the timeouts - small since we're using a proxy
#
CONTIMEOUT 10
HTTPTIMEOUT 30

#
# Specify the number of attempts for each file
#
RETRIES 2

#
# Specify that no mail address is to be sent to the server
#
USER

#
# Specify the number of URLs to be downloaded concurrently.
# This can be overridden by the TASKS command line option.
#
TASKS 12

#
# Specify the maximum priority to be used by the tcpdl tasks
#
PRIORITY 2

#
# Specify the FONT and FONTSIZE to be used
#
FONT Xen
FONTSIZE 9

#
# Specify the file suffixes not to be downloaded
#

# ignore lha archives
IGNORE .lha

# ignore zip archives
IGNORE .zip

# ignore .wav sound files
IGNORE .wav

# ignore MS-DOS executables
IGNORE .exe



The file "TCPDLDIR:tcpdlpp.config"

The tcpdlpp.config file is optional. If present, the character translations defined in it are applied to each URL processed. Note that the actual file names are NOT changed, but only the URLs within each html file.

Each line of this file consists of a character literal to be converted and a character literal or string that should replace it. Character literals should be enclosed by single quotes ('), and strings should be enclosed by double quotes (").

White spaces (spaces and tabs) are ignored unless inside a string.

Any line where the first non-whitespace character is a hash (#) is treated as a comment and ignored.

Certain escape characters are allowed in character literals and strings:

"tcpdlpp" escape characters

\a bell
\b backspace
\f formfeed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn character with octal value nnn
\xnnn character with hexadecimal value nnn

e.g. the following line will convert MS-DOS style backslashes into the AmigaDOS & Unix style forward slashes:

'\\' '/'

The following line will convert tilde into the safe "%xx" equivalent:

'~' "%7E"

Each character is translated using a single rule - even if the end result includes a character which would have been translated by some other rule. This allows two characters to be swapped over.



"tcpdlpp" listing

The tcpdlpp program post-processes the files downloaded by tcpdl. It expects the same "tcpdldir:" assign as tcpdl does.

A listing is sent to "stdout", which consists of 3 sections:

Each time you add or remove files from the "tcpdldir:http" directory, you should re-run tcpdlpp to adjust any links that require amendment.

Files which are present, but which have had the URLs which reference them modified in some way by the translations in the "tcpdlpp.config" file will be listed as non-local unless the file name has been modified also. If such files are listed, then the file names should be changed, and tcpdlpp re-run to identify those files as local.

[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]