Configuration File Options

When Templeton starts, it loads in any default or specified configuration files. These files are written in a simple option-value format. Each option has a single keyword which specifies the desired option to set. The keyword is followed by the desired value.

The options may be classified into groups:

Although the keywords are not case sensitive (you may use upper or lower case letters), the parameters for some options are case sensitive, including options where a text string or URL is used.

Lines beginning with a "#" character are treated as comments.


Registration Options

Register

Register registration_code
example:
Register 12-34F67-891011
Software that is registered contains a unique registration code. This code should be entered exactly as it is provided. If your site contains multiple registrations, you may list each registration code on a line starting with the key word "Register".

Please read the licensing agreement for registration information.


Restriction Options

RestrictHost

RestrictHost boolean
example:
RestrictHost TRUE
This parameter informs the program not to leave the designated host. Links to machines not on the current host are not traversed.

RestrictPath

RestrictPath absolute path
example:
RestrictPath /people
This parameter is only used when a host is restricted. When a host is restricted, a subpath on that host may also be restricted. Hypertext references to documents outside this subtree are not traversed. Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.

RestrictDepth

RestrictDepth numeric value
example:
RestrictDepth 3
Hyperlinks are travered in a breadth-first search. An unrestricted search may download an entire WWW server's data. By restricting the depth, only immediate portions of the server will be received. Images and non-href links are considered to be at the same depth as the document.

A restricted depth of 0 means no restriction. The default value is 1.

RemoveRestricted

RemoveRestricted boolean
example:
RemoveRestricted FALSE
This parameter informs the program to remove untraversed links. Links to restricted machines or restricted depths are removed from the HTML file, but the visible test is still available (just not a hyperlink). The default value is FALSE.

Exclusion

Exclusion boolean
example:
Exclusion TRUE
This parameter determines whether Templeton will support server provided robot exclusion files (robot.txt). Many servers maintain exclusion files to prevent robots from wandering around virtual directory trees, from retrieving very temporary or uncomplete files, or copyright materials. It is considered "polite" for web agents to obey the exclusion files when they exist. The default value, TRUE, means that robot exclusion files are obeyed. Setting Exclusion to FALSE will ignore robot exclusion files.

It should be noted that robot exclusion files that explicitly restrict Templeton will be honored regardless of the exclusion parameter.

Deny

Deny URL
example:
Deny http://foo.com/archive/
The URL provided, as well as all subtrees or the URL, are not processed. Many times specific directory subtrees are not desirable. You can deny retrieval of these URL's using this setting.

For example, to NOT retrieve the "archive" subtree of the host loco.com, you would specify:

Deny http://loco.com/archive/
If you do not include the trailing slash (http://loco.com/archive) then all subdirectories beginning with "archive" are not processed. This includes "archive.1", "archive.old", "archive_from_1994", etc.

Multiple Deny statements may be specified.

Allow

Allow URL
example:
Allow http://foo.com/archive/January/
Similar to "Deny", "Allow" explicitly specifies that a subtree is retrievable. When used in conjunction with Deny URL, branches of a subtree may be specified for access, while other subtrees are ignored.

Multiple Allow statements may be specified.

Sleep

Sleep seconds
example:
Sleep 10
Sleep determines the number of seconds to pause before sending a request to a WWW server. SLEEP IS IMPORTANT.

Warning: Templeton can generate thousands of requests per minute. Many WWW servers cannot handle a sudden onslaught of requests. Setting the Sleep parameter to 0 (zero) may generate too many requests for the server and kill the server. This is bad.

A sleep setting of 0 is known to kill the following types of servers:

Low sleep values may also generate large amounts of network traffic and hog network resources.

For safety, you should set the sleep interval to at least 5 seconds. The longer, the better. Remember, this program is automated and can easily run for hours. What's the rush?

Unregistered versions of Templeton cannot have a sleep period less than 5 seconds.


File System

LocalPath

LocalPath absolute path
example:
LocalPath /
LocalPath informs the program where to store the downloaded files. IF this path is:
LocalPath none
THEN no files are generated. Only a log file containing the remote servers WWW map is created in the current directory.

Currently, files should be stored in the root directory of the file system. For WWW servers, this is the server's root directory. (This limitation will be removed in future releases.) For DOS based machines, this path may include a drive letter:

LocalPath e:\server.www\
Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.

FileOverwrite

FileOverwrite boolean
example:
FileOverwrite TRUE
Files that already exist on the local system are not normally downloaded. Setting the FileOverwrite option to TRUE will overwrite files on the local file system. Default value is FALSE.

ISMAP

ISMAP absolute path to executable
example:
ISMAP /cgi-bin/imagemap
For WWW servers, many imagemaps use a program that takes coordinates from a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the more common methods use a data file containing known coordinates and a program to identify which URL is activated. Commonly, this program is called "imagemap" or "imagemap.exe".

The ISMAP parameter specifies the WWW server's path to the imagemap program.

MapType

MapType NCSA or CERN
example:
MapType NCSA
For the executable specified in the ISMAP parameter, this option determines the format of the file. If the image map file can be retrieved, then it is converted into this specified format. Valid options are either "CERN" or "NCSA". The default is NCSA.

FATFormat

FATFormat boolean
example:
FATFormat FALSE
Determines the file name format for the current operating system. DOS based machines using drives formatted with a File Allocation Table (FAT) can only handle file names containing 8 characters and a 3 character extension. Setting this option to TRUE will generate 8.3 character file names. The default is FALSE, and will generate unlimited length file names.

NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file names). Under OS/2, this value becomes TRUE automatically if the destination path (LocalPath) is located on a FAT partition.

Index

Index file name
example:
Index index.html
For hypertext references that only specify a directory, this is the default HTML file in the directory.

NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to this file name.

The default name is "index.html"

Server-File

Server-File filename
example:
Server-File serverfile
A data file is generated containing the host name, IP address, and WWW server type for each server visited. For servers listed as IP address only, the host name is also the IP address.

The default is no server logging.

Mailto-File

Mailto-File filename
example:
Mailto-File mailtofile
Similar to Server-File logging, the file name listed on the "Mailto-File" line contains a list of e-mail addresses found in the HTML documents. Only e-mail addresses that are active (hyperlinks) are used. E-mail addresses displayed as plain text in the document or contained in CGI scripts are not listed in the mailto logfile.

NOTE: This list MAY contain duplicate entries. Duplication removal may be added in later versions. A very useful feature for generating mailing lists.

The default is no mailto logging.


Network

User

User e-mail address
example:
User webmaster@host.machine.org
In case of emergency, this is the person who is running the program and who should be contacted to stop the program from running. This MUST be a valid e-mail address, and SHOULD also be available with a network "talk" command.

As a side note, it is never a good idea to let automatic software run unsupervised (especially this type of software). The "User" should be available to read their e-mail at all times during the execution of this program.

The default is the user running the program on the current machine, and the IP address of the current machine. For operating systems with no user accounts, such as DOS or OS/2, the username is taken from the USER environment variable, or "root" if the variable is undefined.

ProxyHost

ProxyHost hostname or IP address
example:
ProxyHost proxyhost.network.net
Proxy agents are machines that act as a gateway through a firewall. If your local network uses a proxy agent, specify the name of the proxy agent here. If you are uncertain about your network, consult your network manager or provider.

A proxy server is only used when a server is specified.

ProxyPort

ProxyPort integer
example:
ProxyPort 80
When using a proxy server, the port on the proxy server should be specified. The default port is 80. This values is not used if no proxy host is specified with ProxyHost.

Spoof

Spoof text-string
example:
Spoof Mozilla (Templeton)
Some WWW servers make incorrect assumptions about the browser/robots. (Most of these are the Netscape servers.) These servers assume that, since the browser is not "Netscape" the browser cannot handle the HTML documents and therefore, the document is not transfered. By "spoofing" a different name, the WWW robot can use a qualified browser name to retrieve the HTML document.

NOTE: The first word of the spoof-name is used for restrictions when robot exclusion is honored (see Exclusion). This means, if Templeton tells the WWW server that it is "Netscape" and the server does not permit Netscape browsers, then the server will also not permit Templeton.

Common spoof names (and browsers) are:

Add

Add URL
example:
Add http://www.cs.tamu.edu/people/
This configuration option adds a URL to the list to be processed. Restrictions are applied. The "add" feature make it easy for automated operation of Templeton.

Multiple Add statements may be specified.


Advanced Settings

The advanced configuration commands should be used with caution. These commands allow other applications to perform tasks on the retrieved documents. Applications that are spawned (operate concurrently) with Templeton may overwhelm the user or operating system. Spawned applicatons include those begun with "start" under OS/2, or followed by "&" under Unix.

NOTE: Templeton has the capability to spawn thousands of applications in a few seconds. On Unix-type systems, Templeton introduces security risks when executed as root.

For applications that are not spawned, Templeton will pause until the application has ended. This allows for a guarenteed order of processing for the called applications.

Command_html

Command_html string
example:
Command_html /usr/local/bin/html2txt %s
Execute a system command on each HTML document stored on the file system. This may be useful for counting documents, storing statistics, printing, converting, etc. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.

Commang_image

Command_image string
example:
Command_image /usr/local/bin/viewpict %s
Execute a system command on each image-file stored on the file system. Similar to Command_html, Command_image is executed on all image files. This may be useful for counting documents, storing statistics, printing, converting, etc. NOTE: no distinction is made between different image formats. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.

Command_map

Command_map string
example:
Command_map echo %s >> maplog
Execute a system command on each image-map stored on the file system. Similar to Command_html, Command_map is executed on all image-map file. This may be useful for counting documents, storing statistics, or converting. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.

Command_default

Command_default string
example:
Command_default echo %s >> filelist
Execute a system command on each file stored on the file system. Similar to Command_html, Command_default is executed on all files that have no other executable specified. This may be useful for counting documents, storing statistics, printing, converting, etc. The string should contain the executable to run and a "%s" for the file name. The string "none" turns off this command. This is the default.

Interactive

Interactive boolean
example:
Interactive TRUE
Interactive determines whether the user should be prompted for configuration information or if Templeton should start running automatically. The default setting is TRUE, causing Templeton to prompt for user interaction.
[Main Menu]
Document revision: 20 Oct. 1996 for Templeton 1.77 beta
Copyright 1996 N.A. Krawetz
Modification, republication, and redistribution of this document is strictly prohibited. All rights reserved.