[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]

Using tcpdl & tcpdlpp

Overview

Both tcpdl and tcpdlpp expect the assign "tcpdldir:" to refer to a directory. This directory is the work area for both programs.

By default, the "urllist" file, containing the list of URLs to be downloaded is expected to be in this directory. The optional configuration files, "tcpdl.config" and "tcpdlpp.config" are also expected to be in this directory.

When tcpdl downloads URLs it will create three directories below tcpdldir:, "TEMP", "DATA" and "HTTP". "TEMP" is - as its name suggests - used only for temporary work files. Beneath "DATA" and "HTTP" one directory will be created for each host, and beneath each of these will be the directories and files which are downloaded.

The "HTTP" directory contains the actual files that are downloaded, while the "DATA" directory contains files holding information about each file downloaded.

Example:

The following directory tree shows the structure that might result from use of the example URLs given here.

Example directory structure

         tcpdldir:
            |
            |
            +------- urllist
            |
            |
            +------- TEMP
            |         |
            |         |
            |        ...
            |
            |
            +------- HTTP
            |         |
            |         |
            |         +------- www.ramjam.u-net.com
            |         |                  |
            |         |                  |
            |         |                  +------- index.html
            |         |                  |
            |         |                  |
            |         |                  +------- amiga
            |         |                  |          |
            |         |                  |          |
            |         |                 ...        ...
            |         |
            |         |
            |         +------- www.amiga.com
            |         |                  |
            |         |                  |
            |         |                  +------- index.html
            |         |                  |
            |         |                  |
            |        ...                ...
            |
            |
            +------- DATA
                      |
                      |
                      +------- www.ramjam.u-net.com
                      |                  |
                     ...                ...

Thus, once a file has been downloaded, it appears within a directory that identifies the host from which it came.

The "DATA" directory is used by tcpdl during the download and update process. The final files appear in the "HTTP" directory.

This directory tree will mirror that of the remote HTTP server, but the files will contain the response from the HTTP server (including information that tcpdl uses, like the date and time of download (used when performing an UPDATE), and for HTML files, a list of all the URLs that are referenced).

The "TEMP" directory holds temporary files during processing by tcpdl. Unless the DEBUG option is specified, all temporary files should be deleted by tcpdl upon exit. Any files that are left behind in this directory may be safely deleted.

The HTML files downloaded by tcpdl will have references to URLs replaced by a reference to a local file. For example:

http://www.ramjam.u-net.com/home.html

...will become:

file://localhost/tcpdldir:http/www.ramjam.u-net.com/home.html

The post processor, tcpdlpp, processes all files within the "tcpdldir:http" directory and converts references to other files that are present in this file hierarchy to relative URLs, whilst leaving references to non-local URLs as absolute.

This allows the files within the "tcpdldir:http" directory to be browsed offline, while allowing links to other URLs to be followed if the user happens to be online.

By downloading your favourite pages, you can browse the web much faster, while still being able to follow links to other sites.

The downloaded pages may be updated periodically using tcpdl with the UPDATE option, and then running tcpdlpp again to adjust any amended references.


To Start

  1. Assign "tcpdldir:" to an existing directory which is to contain the downloaded files. (e.g. assign tcpdldir: Work:tcpdldir)
  2. Edit the file "tcpdldir:urllist" such that it contains the files to be downloaded.
  3. Edit "tcpdldir:tcpdl.config" as required
  4. Edit "tcpdldir:tcpdlpp.config" as required
  5. Check that there is enough disk space for the pages you intend to download!
  6. Connect to the Internet
  7. Run tcpdl from a shell
  8. If required, run tcpdlpp from a shell (this can be done offline).

If on checking the output of tcpdlpp there are a lot of references to a non-local URL you may want to use tcpdl to download that URL.

After downloading it, re-run tcpdlpp to change all links to that URL to refer to the local file.


Command-line Options

tcpdl accepts a number of command line options:

URL=<URL> The URL specification can either be just a URL, or a URL with download options (as in the urllist file). If options are specified, then the URL and options must all be enclosed within quotes.

e.g.

tcpdl url=http://www.ramjam.u-net.com/

tcpdl url="http//www.ramjam.u-net.com/ TEXT"

URLLIST=<file> The URLLIST option specifies a file containing a list of URLs to be downloaded. If this option is not specified then "tcpdldir:urllist" is used by default. Click here for details of the format of the "urllist" file.
CONFIG=<file> The CONFIG specifies a file containing configuration options. If this option is not specified then "tcpdldir:tcpdl.config" is used by default.
TASKS=<number> The TASKS option specifies how many URLs will be downloaded at once. This overrides any TASKS value specified in "tcpdl.config". The valid range of values is 1 to 15.
UPDATE The UPDATE option specifies that any file that has been downloaded will be checked to see whether it has been updated since then. If it has it will be downloaded again.
NOSAVE The NOSAVE option specifies that the downloaded files should not be saved. This may be useful if tcpdl is used to prime a local proxy server, or in testing HTTP servers.
NOWAIT The NOWAIT option specifies that tcpdl won't wait for return to be pressed before exiting. This makes it easier to use tcpdl from within scripts.
DEBUG The DEBUG option specifies that the files within the "DATA" hierarchy should contain a copy of the "HTTP" request that was sent to the server, as well as the response and other usual information. It also disables the deletion of temporary files from the "TEMP" directory for transfers that failed. This can be useful when investigating the reason for failed transfers.
PUBSCREEN=<name> The PUBSCREEN option specifies the name of the public screen that the tcpdl window will be opened on. By default the Workbench screen will be used.
FONT=<name> The FONT option specifies the name of the font that the tcpdl window should use. Note that the ".font" suffix should not be specified. Note also that a monospaced font should be used (a proportional font will cause the columns not to align correctly).
FONTSIZE=<number> The FONTSIZE option specifies the size of the font the tcpdl window should use. The size of the tcpdl status window will be adjusted accordingly. By default a size of 9 is used - if you have a very high screen resolution you may wish to increase the font size.
PRIORITY=<number> The PRIORITY option specifies the maximum priority to be used by the tcpdl tasks. Priorities must be in the range 0 to 5 inclusive. By default a priority of 2 will be used. Using a value of 0 may be useful if you want tcpdl to operate in the background while you are using a browser or IRC client.


tcpdl Task Status Window

The tcpdl status window is updated approximately once per second (Note: Not every change in status will have a chance to appear in the window).

tcpdl can download a number of files at once. There is one line in the status window for each of these tasks. The fields on each line are described below:

Header
Status Connecting Trying to connect a host.
Sending Sending request.
Receiving header.
Updating Requesting using "If-Modified-Since" from a host or loading data from "tcpdldir:data/"
OK File downloaded successfully.
Receiving Receiving data.
Wait. html A limit of 512k html-data to process. Processing will continue when the amount outstanding falls below the limit.
Proc. html Processing html.
Copying Copying html-file to "tcpdldir:http/".
Not Found The server reported the URL not found.
LIB ERR Unable to open bsdsocket.library.
HOST ERR Unknown or unreachable host.
SOCK ERR Unable to open socket.
CON ERR Unable to connect to host.
HDR ERR Failed to download header.
RECV ERR Failed while receiving data.
FILE ERR Failed to open output file.
SRVR ERR Server reported an error.
DISK ERR Failed while writing to output file (most likely the disk is full).
ERROR Some other error occurred.
*BREAK* The task has recognised a user break or an error is causing an abort (e.g. the disk is full).
Time Elapsed time since trying to connect.
CPS The current download rate achieved for this file.
CSize The current size of the data received.
FSize The final size of the data, if given by the server.
Request The URL requested.

The top line of the status window also contains an overall progress indicator:

(DONE:<n> TOTAL:<m>)

<n>:
The number of files downloaded so far
<m>:
The number of files listed in memory

The bottom line of the status window gives some overall performance figures:

Total time:
Elapsed time since tcpdl started execution.
Total bytes:
The total number of bytes downloaded so far.
Average cps:
The average number of characters per second downloaded.
[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]