Sample Configuration File

Here is a sample configuration file for Templeton. Templeton can accept multiple configuration files. This allows the system to have default settings, users on the system to have other settings, and special instances of Templeton to run with even more setting changes.
# *************************************************************
#  Templeton, copyright 1995, 1996 N.A. Krawetz
#  All rights reserved.
# *************************************************************

# configuration for Templeton
#
# Lines beginning with a '#' are comments and are ignored.
# Lines should not be more than 80 characters.
# Operands in this file are in the form:
#    parameter value
# The parameter is case insensitive, except where a text string or URL
# is required.
# Boolean values ("true" or "false") are case insensitive.
# Numeric values should be numbers -- non-numbers are regarded as 0.
# All other types of values ARE case sensitive.


# ************************************************
# Register: registration code
# Software that is registered contains a unique registration
# code.  This code should be entered exactly as it is provided.
# If your site contains multiple registrations, you may list
# each registration code on a line starting with the
# key word "Register".
# Please read the licensing agreement for registration
# information.
#   Register 12-34567-891011

# ************************************************
# LocalPath: absolute path
# LocalPath informs the program where to store the downloaded files.
# IF this path is:
#   LocalPath none
# THEN no files are generated.  Only a log file containing the remote
# servers WWW map is created in the current directory.
#
# Currently, files should be stored in the root directory of the file system.
# For WWW servers, this is the server's root directory.
# (This limitation will be removed in future releases.)
# For DOS based machines, this path may include a drive letter:
#   LocalPath e:\server.www\
#
# Either slash "/" or backslash "\" are valid for specifying a directory.
# The trailing slash or backslash is optional.
#
# This option is only used when the "Interactive" option is FALSE.
LocalPath /

# User: e-mail address
# In case of emergency, this is the person who is running the program
# and who should be contacted to stop the program from running.
# This MUST be a valid e-mail address, and SHOULD also be available with
# a "talk" command.
# As a side note, it is never a good idea to let automatic software run
# unsupervised (especially this type of software).  The "User" should be
# available to read their e-mail at all times during the execution of this
# program.
# The default is the account running the program on the current machine.
#  User webmaster@host.machine.org


# ********************* Restrictions *****************************

# RestrictHost: boolean
# This parameter informs the program not to leave the designated host.  Links
# to machines not on the current host are not traversed.
RestrictHost TRUE

# RestrictPath: absolute path
# This parameter is only used when a host is restricted.
# When a host is restricted, a subpath on that host may also be restricted.
# Hypertext references to documents outside this subtree are not traversed.
# Either slash "/" or backslash "\" are valid for specifying a directory.
# The trailing slash or backslash is optional.
RestrictPath /

# RestrictDepth: numeric value
# Hyperlinks are travered in a breadth-first search.  An unrestricted search
# may download an entire WWW server's data.  By restricting the depth,
# only immediate portions of the server will be received.
# Images and non-href links are considered to be at the same depth as the
# document.
# A restricted depth of 0 means no restriction.
# The default is 1
RestrictDepth 1

# RemoveRestricted: boolean
# This parameter informs the program to remove untraversed links.  Links to
# restricted machines or restricted depths are removed from the HTML file,
# but the visible test is still available (just not a hyperlink).
# The default value is FALSE.
RemoveRestricted FALSE

# Add: URL
# Place a specific URL on the list of URLs to process.
# Be aware that restrictions apply.

# Exclusion: boolean
# This parameter determines whether Templeton will support server provided
# robot exclusion files (robots.txt).  Many servers maintain exclusion files
# to prevent robots from wandering around virtual directory trees, from
# retrieving very temporary or uncomplete files, or copyright materials.  It
# is considered "polite" for web agents to obey the exclusion files when they
# exist.  The default value, TRUE, means that robot exclusion files are obeyed.
# Setting Exclusion to FALSE will ignore robot exclusion files.
Exclusion TRUE

# Deny: URL
# The URL provided, as well as all subtrees or the URL, are not processed.
# Many times specific directory subtrees are not desirable.  You can deny
# retrieval of these URL's using this setting.
# For example, to NOT retrieve the "archive" subtree of the host loco.com,
# you would specify:
#   Deny http://loco.com/archive/
# If you do not include the trailing slash (http://loco.com/archive) then
# all subdirectories beginning with "archive" are not processed.  This
# includes "archive.1", "archive.old", "archive_from_1994", etc.
# Multiple Deny statements may be specified.

# Allow: URL
# Similar to "Deny", "Allow" explicitly specifies that a subtree is
# retrievable.  When used in conjunction with Deny URL, branches of a
# subtree may be specified for access, while other subtrees are ignored.
# Multiple Allow statements may be specified.

# Sleep: numeric
# Sleep determines the number of seconds to pause before sending a request to
# a WWW server.  SLEEP IS IMPORTANT.
# Warning: Templeton can generate thousands of requests per minute.  Many 
# WWW servers cannot handle a sudden onslaught of requests.  Setting the
# Sleep parameter to 0 (zero) may generate too many requests for the server
# and kill the server.  This is bad.
# A sleep setting of 0 (zero) is known to kill the following types of servers:
#   All WWW servers that run under Microsoft Windows (TM)
#   Old generation (HTML/1.0) CERN servers on all platforms
# Low sleep values may also generate large amounts of network traffic and
# hog network resources.
# For safety, you should set the sleep interval to at least 5 seconds.
# The longer, the better.  Remember, this program is automated and can
# easily run for hours.  What's the rush?
Sleep 10


# ********************* Network *****************************

# ProxyHost: hostname or IP address
# Proxy agents are machines that act as a gateway through a firewall.
# If your local network uses a proxy agent, specify the name of
# the proxy agent here.  If you are uncertain about your network, consult your
# network manager or provider.
# A proxy server is only used when a server is specified.
#  ProxyHost	proxyhost.network.net

# ProxyPort: integer
# When using a proxy server (see ProxyHost), the port on the proxy server
# should be specified.  The default port is 80.  This values is not
# used if no proxy host is specified with ProxyHost.
ProxyPort	80

# Spoof: text-string
# Some WWW servers make incorrect assumptions about the browser/robots.  (Most
# of these are the Netscape servers.)  These servers assume that, since the
# browser is not "Netscape" the browser cannot handle the HTML documents and
# therefore, the document is not transfered.  By "spoofing" a different name,
# the WWW robot can use a qualified browser name to retrieve the HTML
# document.  
# NOTE: The first word of the spoof-name is used for restrictions when 
# robot exclusion is honored (see Exclusion).  This means, if Templeton tells
# the WWW server that it is "Netscape" and the server does not permit
# Netscape browsers, then the server will also not permit Templeton.
# Common spoof names (and browsers) are:
#   Mozilla	Netscape Browser
#   WebCrawler	WebCrawler robot
#   InfoSeek    InfoSeek robot
#   WebExplorer IBM WebExplorer for OS/2
#   Harvest	a web robot
#   Mosaic	NCSA Mosaic
#   Lynx	Lynx, text browser
#   Microsoft Internet Explorer
#   PRODIGY-WB	Prodigy browser
# Spoof Mozilla (Templeton)


# ********************* Preferences *****************************

# FATFormat: boolean
# Determines the file name format for the current operating system.
# DOS based machines using drives formatted with a File Allocation Table (FAT)
# can only handle file names containing 8 characters and a 3 character
# extension.  Setting this option to TRUE will generate 8.3 character file
# names.  The default is FALSE, and will generate unlimited length file names.
# NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file
# names).  Under OS/2, this value becomes TRUE automatically if the destination
# path (LocalPath) is located on a FAT partition.
FATFormat FALSE

# FileOverwrite: boolean
# Files that already exist on the local system are not normally downloaded.
# Setting the FileOverwrite option to TRUE will overwrite files on the
# local file system.  Default value is FALSE.
FileOverwrite TRUE

# Index: file name
# For hypertext references that only specify a directory, this is the
# default html file in the directory.
# NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to
# this file name.
# The default name is "index.html"
Index index.html

# ISMAP: absolute path to executable
# For WWW servers, many imagemaps use a program that takes coordinates from
# a selected image <IMG SRC=... ISMAP> and return a new URL.  Some of the
# more common methods use a data file containing known coordinates and a
# program to identify which URL is activated.  Commonly, this program is
# called "imagemap" or "imagemap.exe".
# The ISMAP parameter specifies the WWW server's path to the imagemap program.
ISMAP /cgi-bin/imagemap

# MapType: NCSA or CERN
# For the executable specified in the ISMAP parameter (see above), this
# option determines the format of the file.  If the image map file can be
# retrieved, then it is converted into this specified format.
# Valid options are either "CERN" or "NCSA".  The default is NCSA.
MapType NCSA


# ********************* Logging *****************************
# Mailto-File: file name
# Similar to "Server-File" logging, the file name listed on the "Mailto-File"
# line contains a list of e-mail addresses found in the HTML documents.  Only
# e-mail addresses that are active (hyperlinks) are used.  E-mail addresses
# displayed as plain text in the document or contained in CGI scripts are not
# listed in the mailto logfile.
# NOTE:  This list MAY contain duplicate entries.  Duplication removal may be
# added in later versions.
# (Some people have found this to be a very useful feature for generating
# mailing lists.)
# The default is no mailto logging.
# Mailto-File mailtolist

# RemoteMapping: boolean
# Determines whether remote mapping will be done.  The default is TRUE
# while does perform mapping.  The map file name is mapindex.html and is
# either located at the root of the LocalPath or in the current directory
# if the system is not mirroring files.
# Note: if you change the default index name, for example, to "welcome.html"
# then the default map file will be "mapwelcome.html".
RemoteMapping TRUE

# Server-File: file name
# A data file is generated containing the host name, IP address, and
# WWW server type for each server visited.  For servers listed as IP
# address only, the host name is also the IP address.
# The default is no server logging.
# Server-File serverlist


# ********************* Advanced *****************************
# The advanced configuration commands should be used with caution.
# These commands allow other applications to perform tasks on the
# retrieved documents.  Applications that are spawned (operate
# concurrently) with Templeton may overwhelm the user or operating system.
# Spawned applicatons include those begun with "start" under OS/2,
# or followed by "&" under Unix.
# NOTE: Templeton has the capability to spawn thousands of applications
# in a few seconds.
# On Unix-type systems, Templeton introduces security risks when executed
# as root.
# For applications that are not spawned, Templeton will pause until
# the application has ended.  This allows for a guarenteed order of processing
# for the called applications.

# Command_html: string
# Execute a system command on each HTML document stored on the file system.
# This may be useful for counting documents, storing statistics, printing,
# converting, etc.
# The string should contain the executable to run and a %s for the file name.
# The string "none" turns off this command.  This is the default.
# For example: to convert all HTML documents to text using the program
# html2txt (not provided with the Templeton distribution), you would use:
#   Command_html html2txt %s

# Command_image: string
# Execute a system command on each image-file stored on the file system.
# Similar to Command_html, Command_image is executed on all image files.
# This may be useful for counting documents, storing statistics, printing,
# converting, etc.  NOTE: no distinction is made between different image
# formats.
# The string should contain the executable to run and a %s for the file name.
# The string "none" turns off this command.  This is the default.

# Command_map: string
# Execute a system command on each image-map stored on the file system.
# Similar to Command_html, Command_map is executed on all image-map file.
# This may be useful for counting documents, storing statistics, or converting.
# The string should contain the executable to run and a %s for the file name.
# The string "none" turns off this command.  This is the default.

# Command_default: string
# Execute a system command on each file stored on the file system.
# Similar to Command_html, Command_default is executed on all files that have
# no other executable specified.  This may be useful for counting documents,
# storing statistics, printing, converting, etc.
# The string should contain the executable to run and a %s for the file name.
# The string "none" turns off this command.  This is the default.

# Interactive: boolean
# Determines whether the user should be prompted for
# configuration information or if Templeton should
# start running automatically.
# The default setting is TRUE.


[Main Menu]
Document revision: 20 Oct. 1996 for Templeton 1.77 beta
Copyright 1996 N.A. Krawetz
Modification, republication, and redistribution of this document is strictly prohibited. All rights reserved.