[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]

Using tcpdl & tcpdlpp

Overview

Both tcpdl and tcpdlpp expect the assign "tcpdldir:" to refer to a directory. This directory is the work area for both programs.

By default, the "urllist" file, containing the list of URLs to be downloaded is expected to be in this directory. The optional configuration files, "tcpdl.config" and "tcpdlpp.config" are also expected to be in this directory.

When tcpdl downloads URLs it will create three directories below tcpdldir:, "TEMP", "DATA" and "HTTP". "TEMP" is - as its name suggests - used only for temporary work files. Beneath "DATA" and "HTTP" one directory will be created for each host, and beneath each of these will be the directories and files which are downloaded.

The "HTTP" directory contains the actual files that are downloaded, while the "DATA" directory contains files holding information about each file downloaded.

Example:

The following directory tree shows the structure that might result from use of the example URLs given here.

Example directory structure

         tcpdldir:
            |
            |
            +------- urllist
            |
            |
            +------- TEMP
            |         |
            |         |
            |        ...
            |
            |
            +------- HTTP
            |         |
            |         |
            |         +------- www.ramjam.u-net.com
            |         |                  |
            |         |                  |
            |         |                  +------- index.html
            |         |                  |
            |         |                  |
            |         |                  +------- amiga
            |         |                  |          |
            |         |                  |          |
            |         |                 ...        ...
            |         |
            |         |
            |         +------- www.amiga.com
            |         |                  |
            |         |                  |
            |         |                  +------- index.html
            |         |                  |
            |         |                  |
            |        ...                ...
            |
            |
            +------- DATA
                      |
                      |
                      +------- www.ramjam.u-net.com
                      |                  |
                     ...                ...

Thus, once a file has been downloaded, it appears within a directory that identifies the host from which it came.

The "DATA" directory is used by tcpdl during the download and update process. The final files appear in the "HTTP" directory.

This directory tree will mirror that of the remote HTTP server, but the files will contain the response from the HTTP server (including information that tcpdl uses, like the date and time of download (used when performing an UPDATE), and for HTML files, a list of all the URLs that are referenced).

The "TEMP" directory holds temporary files during processing by tcpdl. Unless the DEBUG option is specified, all temporary files should be deleted by tcpdl upon exit. Any files that are left behind in this directory may be safely deleted.

The HTML files downloaded by tcpdl will have references to URLs replaced by a reference to a local file. For example:

http://www.ramjam.u-net.com/home.html

...will become:

file://localhost/tcpdldir:http/www.ramjam.u-net.com/home.html

The post processor, tcpdlpp, processes all files within the "tcpdldir:http" directory and converts references to other files that are present in this file hierarchy to relative URLs, whilst leaving references to non-local URLs as absolute.

This allows the files within the "tcpdldir:http" directory to be browsed offline, while allowing links to other URLs to be followed if the user happens to be online.

By downloading your favourite pages, you can browse the web much faster, while still being able to follow links to other sites.

The downloaded pages may be updated periodically using tcpdl with the UPDATE option, and then running tcpdlpp again to adjust any amended references.

To Start

Assign "tcpdldir:" to an existing directory which is to contain the downloaded files. (e.g. assign tcpdldir: Work:tcpdldir)
Edit the file "tcpdldir:urllist" such that it contains the files to be downloaded.
Edit "tcpdldir:tcpdl.config" as required
Edit "tcpdldir:tcpdlpp.config" as required
Check that there is enough disk space for the pages you intend to download!
Connect to the Internet
Run tcpdl from a shell
If required, run tcpdlpp from a shell (this can be done offline).

If on checking the output of tcpdlpp there are a lot of references to a non-local URL you may want to use tcpdl to download that URL.

After downloading it, re-run tcpdlpp to change all links to that URL to refer to the local file.

Command-line Options

tcpdl accepts a number of command line options:

`URL=<URL>`	The URL specification can either be just a URL, or a URL with download options (as in the urllist file). If options are specified, then the URL and options must all be enclosed within quotes. e.g. `tcpdl url=http://www.ramjam.u-net.com/` `tcpdl url="http//www.ramjam.u-net.com/ TEXT"`
`URLLIST=<file>`	The `URLLIST` option specifies a file containing a list of URLs to be downloaded. If this option is not specified then "tcpdldir:urllist" is used by default. Click here for details of the format of the "urllist" file.
`CONFIG=<file>`	The `CONFIG` specifies a file containing configuration options. If this option is not specified then "tcpdldir:tcpdl.config" is used by default.
`TASKS=<number>`	The `TASKS` option specifies how many URLs will be downloaded at once. This overrides any `TASKS` value specified in "tcpdl.config". The valid range of values is 1 to 15.
`UPDATE`	The `UPDATE` option specifies that any file that has been downloaded will be checked to see whether it has been updated since then. If it has it will be downloaded again.
`NOSAVE`	The `NOSAVE` option specifies that the downloaded files should not be saved. This may be useful if tcpdl is used to prime a local proxy server, or in testing HTTP servers.
`NOWAIT`	The `NOWAIT` option specifies that tcpdl won't wait for return to be pressed before exiting. This makes it easier to use tcpdl from within scripts.
`DEBUG`	The `DEBUG` option specifies that the files within the "DATA" hierarchy should contain a copy of the "HTTP" request that was sent to the server, as well as the response and other usual information. It also disables the deletion of temporary files from the "TEMP" directory for transfers that failed. This can be useful when investigating the reason for failed transfers.
`PUBSCREEN=<name>`	The `PUBSCREEN` option specifies the name of the public screen that the tcpdl window will be opened on. By default the Workbench screen will be used.
`FONT=<name>`	The `FONT` option specifies the name of the font that the tcpdl window should use. Note that the ".font" suffix should not be specified. Note also that a monospaced font should be used (a proportional font will cause the columns not to align correctly).
`FONTSIZE=<number>`	The `FONTSIZE` option specifies the size of the font the tcpdl window should use. The size of the tcpdl status window will be adjusted accordingly. By default a size of 9 is used - if you have a very high screen resolution you may wish to increase the font size.
`PRIORITY=<number>`	The `PRIORITY` option specifies the maximum priority to be used by the tcpdl tasks. Priorities must be in the range 0 to 5 inclusive. By default a priority of 2 will be used. Using a value of 0 may be useful if you want tcpdl to operate in the background while you are using a browser or IRC client.

tcpdl Task Status Window

The tcpdl status window is updated approximately once per second (Note: Not every change in status will have a chance to appear in the window).

tcpdl can download a number of files at once. There is one line in the status window for each of these tasks. The fields on each line are described below:

Header

Status	`Connecting`	Trying to connect a host.
	`Sending`	Sending request.
	Receiving header.
	`Updating`	Requesting using "If-Modified-Since" from a host or loading data from "tcpdldir:data/"
	`OK`	File downloaded successfully.
	`Receiving`	Receiving data.
	`Wait. html`	A limit of 512k html-data to process. Processing will continue when the amount outstanding falls below the limit.
	`Proc. html`	Processing html.
	`Copying`	Copying html-file to "tcpdldir:http/".
	`Not Found`	The server reported the URL not found.
	`LIB ERR`	Unable to open bsdsocket.library.
	`HOST ERR`	Unknown or unreachable host.
	`SOCK ERR`	Unable to open socket.
	`CON ERR`	Unable to connect to host.
	`HDR ERR`	Failed to download header.
	`RECV ERR`	Failed while receiving data.
	`FILE ERR`	Failed to open output file.
	`SRVR ERR`	Server reported an error.
	`DISK ERR`	Failed while writing to output file (most likely the disk is full).
	`ERROR`	Some other error occurred.
	`BREAK`	The task has recognised a user break or an error is causing an abort (e.g. the disk is full).
Time	Elapsed time since trying to connect.
CPS	The current download rate achieved for this file.
CSize	The current size of the data received.
FSize	The final size of the data, if given by the server.
Request	The URL requested.

The top line of the status window also contains an overall progress indicator:

(DONE:<n> TOTAL:<m>)

<n>:: The number of files downloaded so far
<m>:: The number of files listed in memory

The bottom line of the status window gives some overall performance figures:

Total time:: Elapsed time since tcpdl started execution.
Total bytes:: The total number of bytes downloaded so far.
Average cps:: The average number of characters per second downloaded.

[Contents] [Copyright] [Disclaimer] [Introduction] [Requirements]
[Support] [Problems] [Thanks] [History] [Usage] [Notes]