The options may be classified into groups:
Although the keywords are not case sensitive (you may use upper or lower case letters), the parameters for some options are case sensitive, including options where a text string or URL is used.Lines beginning with a "#" character are treated as comments.
Register registration_codeexample:
Register 12-34F67-891011Software that is registered contains a unique registration code. This code should be entered exactly as it is provided. If your site contains multiple registrations, you may list each registration code on a line starting with the key word "Register".
Please read the licensing agreement for registration information.
RestrictHost booleanexample:
RestrictHost TRUEThis parameter informs the program not to leave the designated host. Links to machines not on the current host are not traversed.
RestrictPath absolute pathexample:
RestrictPath /peopleThis parameter is only used when a host is restricted. When a host is restricted, a subpath on that host may also be restricted. Hypertext references to documents outside this subtree are not traversed. Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
RestrictDepth numeric valueexample:
RestrictDepth 3Hyperlinks are travered in a breadth-first search. An unrestricted search may download an entire WWW server's data. By restricting the depth, only immediate portions of the server will be received. Images and non-href links are considered to be at the same depth as the document.
A restricted depth of 0 means no restriction. The default value is 1.
RemoveRestricted booleanexample:
RemoveRestricted FALSEThis parameter informs the program to remove untraversed links. Links to restricted machines or restricted depths are removed from the HTML file, but the visible test is still available (just not a hyperlink). The default value is FALSE.
Exclusion booleanexample:
Exclusion TRUEThis parameter determines whether Templeton will support server provided robot exclusion files (robot.txt). Many servers maintain exclusion files to prevent robots from wandering around virtual directory trees, from retrieving very temporary or uncomplete files, or copyright materials. It is considered "polite" for web agents to obey the exclusion files when they exist. The default value, TRUE, means that robot exclusion files are obeyed. Setting Exclusion to FALSE will ignore robot exclusion files.
It should be noted that robot exclusion files that explicitly restrict Templeton will be honored regardless of the exclusion parameter.
Deny URLexample:
Deny http://foo.com/archive/The URL provided, as well as all subtrees or the URL, are not processed. Many times specific directory subtrees are not desirable. You can deny retrieval of these URL's using this setting.
For example, to NOT retrieve the "archive" subtree of the host loco.com, you would specify:
Deny http://loco.com/archive/If you do not include the trailing slash (http://loco.com/archive) then all subdirectories beginning with "archive" are not processed. This includes "archive.1", "archive.old", "archive_from_1994", etc.
Deny statements may also include a '*' as a wild character. This symbol represents 0 or more characters for matching. If, for example, you do not wish to retrieve GIF files, you would use:
Deny *.gif
Only one '*' is permitted in each URL string, but it may be located anywhere within the string.
Multiple Deny statements may be specified.
Allow URLexample:
Allow http://foo.com/archive/January/Similar to "Deny", "Allow" explicitly specifies that a subtree is retrievable. When used in conjunction with Deny URL, branches of a subtree may be specified for access, while other subtrees are ignored.
Multiple Allow statements may be specified.
Authorize "realm" base64-codeexample:
Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==This complex command allows you to specify a username and password for basic WWW-authentication. The realm is a quoted string. The base64-code contains the encoded username and password. Use the pwd64.exe program to generate your base64-code.
The realm is a case-sensitive string provided by the WWW server. If you do not know the realm for the pages you wish to retrieve, use Templeton to interactively retrieve the page. Templeton will display the realm name and ask for your username and password.
Be aware that realms are not unique. If different documents use the same realm but require different passwords, Templeton will require you to enter the username and password.
To skip a realm, use the username "-" and password "-", or the base64-code: LTot
Authorize "Secret Password" LTot
See also Proxy-Authorize.
LocalPath absolute pathexample:
LocalPath /LocalPath informs the program where to store the downloaded files. IF this path is:
LocalPath noneTHEN no files are generated. Only a log file containing the remote servers WWW map is created in the current directory.
Currently, files should be stored in the root directory of the file system. For WWW servers, this is the server's root directory. (This limitation will be removed in future releases.) For DOS based machines, this path may include a drive letter:
LocalPath e:\server.www\Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
FATFormat booleanexample:
FATFormat FALSEDetermines the file name format for the current operating system. DOS based machines using drives formatted with a File Allocation Table (FAT) can only handle file names containing 8 characters and a 3 character extension. Setting this option to TRUE will generate 8.3 character file names. The default is FALSE, and will generate unlimited length file names.
NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file names). Under OS/2, this value becomes TRUE automatically if the destination path (LocalPath) is located on a FAT partition.
FileOverwrite boolean or "modified"examples:
FileOverwrite TRUE
FileOverwrite MODIFIEDFiles that already exist on the local system are normally overwritten. Setting the FileOverwrite option to FALSE will not overwrite files on the local file system. Setting the FileOverwrite option to "Modified" (no quotes) will only retrieve documents (non-HTML) that have been changed since the last retrieval. The modified option is useful when retrieving the same URL multiple times; modified will not waste time retrieving GIF and JPG files that have already been retrieved. FileOverwrite does NOT effect HTML documents -- HTML documents are always retrieved. Templeton can only determine links by retrieving HTML documents. Skipping an HTML document would mean skipping possible links. Default value is MODIFIED, only retrieving newer non-HTML files.
ISMAP absolute path to executableexample:
ISMAP /cgi-bin/imagemapFor WWW servers, many imagemaps use a program that takes coordinates from a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the more common methods use a data file containing known coordinates and a program to identify which URL is activated. Commonly, this program is called "imagemap" or "imagemap.exe".
The ISMAP parameter specifies the WWW server's path to the imagemap program.
MapType NCSA or CERNexample:
MapType NCSAFor the executable specified in the ISMAP parameter, this option determines the format of the file. If the image map file can be retrieved, then it is converted into this specified format. Valid options are either "CERN" or "NCSA". The default is NCSA.
Index file nameexample:
Index index.htmlFor hypertext references that only specify a directory, this is the default HTML file in the directory.
NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to this file name.
The default name is "index.html"
Server-File filenameexample:
Server-File serverfileA data file is generated containing the host name, IP address, and WWW server type for each server visited. For servers listed as IP address only, the host name is also the IP address.
The default is no server logging.
Mailto-File filenameexample:
Mailto-File mailtofileSimilar to Server-File logging, the file name listed on the "Mailto-File" line contains a list of e-mail addresses found in the HTML documents. Only e-mail addresses that are active (hyperlinks) are used. E-mail addresses displayed as plain text in the document or contained in CGI scripts are not listed in the mailto logfile.
NOTE: This list MAY contain duplicate entries. Duplication removal may be added in later versions. A very useful feature for generating mailing lists.
The default is no mailto logging.
User e-mail addressexample:
User webmaster@host.machine.orgIn case of emergency, this is the person who is running the program and who should be contacted to stop the program from running. This MUST be a valid e-mail address, and SHOULD also be available with a network "talk" command.
As a side note, it is never a good idea to let automatic software run unsupervised (especially this type of software). The "User" should be available to read their e-mail at all times during the execution of this program.
The default is the user running the program on the current machine, and the IP address of the current machine. For operating systems with no user accounts, such as DOS or OS/2, the username is taken from the USER environment variable, or "root" if the variable is undefined.
Proxy-Authorize "realm" base64-codeexample:
Proxy-Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==Similar to Authorize, this complex command allows you to specify a realm, username, and password for a secure HTTP proxy server. You should use the pwd64 program to help generate the base64-code from the username and password.
See also Authorize.
ProxyHost hostname or IP addressexample:
ProxyHost proxyhost.network.netProxy agents are machines that act as a gateway through a firewall. If your local network uses a proxy agent, specify the name of the proxy agent here. If you are uncertain about your network, consult your network manager or provider.
A proxy server is only used when a server is specified.
ProxyPort integerexample:
ProxyPort 80When using a proxy server, the port on the proxy server should be specified. The default port is 80. This value is not used if no proxy host is specified with ProxyHost.
Spoof text-stringexample:
Spoof Mozilla (Templeton)Some WWW servers make incorrect assumptions about the browser/robots. (Most of these are the Netscape servers.) These servers assume that, since the browser is not "Netscape" the browser cannot handle the HTML documents and therefore, the document is not transfered. By "spoofing" a different name, the WWW robot can use a qualified browser name to retrieve the HTML document.
NOTE: The first word of the spoof-name is used for restrictions when robot exclusion is honored (see Exclusion). This means, if Templeton tells the WWW server that it is "Netscape" and the server does not permit Netscape browsers, then the server will also not permit Templeton.
Common spoof names (and browsers) are:
Add URLexample:
Add http://www.cs.tamu.edu/people/This configuration option adds a URL to the list to be processed. Restrictions are applied. The "add" feature make it easy for automated operation of Templeton.
Multiple Add statements may be specified.
Sleep secondsexample:
Sleep 10Sleep determines the number of seconds to pause before sending a request to a WWW server. SLEEP IS IMPORTANT.
Warning: Templeton can generate thousands of requests per minute. Many WWW servers cannot handle a sudden onslaught of requests. Setting the Sleep parameter to 0 (zero) may generate too many requests for the server and kill the server. This is bad.
A sleep setting of 0 is known to kill the following types of servers:
For safety, you should set the sleep interval to at least 5 seconds. The longer, the better. Remember, this program is automated and can easily run for hours. What's the rush?
Unregistered versions of Templeton cannot have a sleep period less than 5 seconds.
NOTE: Templeton has the capability to spawn thousands of applications in a few seconds. On Unix-type systems, Templeton introduces security risks when executed as root.
For applications that are not spawned, Templeton will pause until the application has ended. This allows for a guarenteed order of processing for the called applications.
Command_default string Command_html string Command_image string Command_map stringexample:
Command_default echo %s >> filelist Command_html /usr/local/bin/html2txt %s Command_image /usr/local/bin/viewpict %s Command_map echo %s >> maplogExecute a system command on each document stored on the file system. The different command types are for HTML documents, images, map files, or the default command when any of the other commands are not set. These commands are useful for counting documents, storing statistics, printing, converting, etc.
The string "none" turns off these commands. The default is "none".
The command string will replace special characters with desired information:
Characters: | Becomes: |
%d | depth |
%h | host (server) |
%l | local file |
%n | current time (now) in RFC 822 format. Time is shown in GMT. |
%N | current time (now) in RFC 822 format. Time is shown in the local timezone. |
%p | remote parent URL (where this link came from) |
%P | local parent file (file refering to this link) |
%r | remote file (URL without server information) |
%s | saved file (same as %l) |
%t | file timestamp in RFC 822 format. Time is shown in GMT. |
%T | file timestamp in RFC 822 format. Time is shown in the local timezone. |
%u | URL |
%% | % (percent sign) |
NOTE: Command_image and Command_default do not distinguish between different file formats.
The time formats, determined by %t (in Greenwich Mean Time [GMT], also called Coordinated Universal Time), and %T (in local time), show the file modification time for the file on the server. If the server does not provide the file modification time, then the time Templeton found the file is used. Alternately, %n and %N can be use in place of %t and %T for the current time. This format can be specified using modifiers contained in '{' and '}'. These modifiers ARE case sensitive.
Characters: | Format: | Example: |
%t or %n %T or %N |
Default format: RFC 822 | Tue, 04 Nov 1995 01:29:45 GMT
Tue, 04 Nov 1995 07:29:45 |
%t{rfc822} or %n{rfc822} %T{rfc822} or %N{rfc822} |
RFC 822 The most common format used by web servers. |
Tue, 04 Nov 1995 01:29:45 GMT
Tue, 04 Nov 1995 07:29:45 |
%t{rfc850} or %n{rfc850} %T{rfc850} or %N{rfc850} |
RFC 850 Uncommon, but standard. Most applications using this format are not "Year 2000" compliant. |
Tuesday, 04-Nov-95 01:29:45 GMT
Tuesday, 04-Nov-95 07:29:45 |
%t{ansi-c} or %n{ansi-c} %T{ansi-c} or %N{ansi-c} |
ANSI C Common computer (Unix) and programming format. |
Tue Nov 4 01:29:45 1995
Tue Nov 4 07:29:45 1995 |
%t{iso8601} or %n{iso8601} %T{iso8601} or %N{iso8601} |
ISO 8601 International standard. Commonly found in databases. |
1995-11-04 01:29:45Z
1995-11-04 07:29:45 |
%t{iso8601c} or %n{iso8601c} %T{iso8601c} or %N{iso8601c} |
ISO 8601 Compressed Commonly used when data size is a premium. |
19951104T012945Z
19951104T072945 |
Command_url stringexample:
Command_url echo %u >> urllogSimilar to Command_html, the command line string is executed by every URL found. This includes other protocols such as "ftp://", "gopher://", and "mailto:". No effort is made toward uniqueness; the same URL may be seen hundreds of times.
Because this command is processed each and every time a URL is found, it may significantly slow the runtime performance of Templeton.
The string "none" turns off this command. The default is "none".
The command string will replace special characters with desired information:
Characters: | Becomes: |
%d | depth |
%h | host (server) |
%n | current time (now) shown in GMT |
%N | current time (now) shown in local |
%p | remote parent URL (where this link came from) |
%P | local parent file (file refering to this link) |
%r | remote file (URL without server information) |
%t | time shown in GMT (same as %n) |
%T | time shown in local (same as %N) |
%u | URL |
%% | % (percent sign) |
NOTE: Command_url do not distinguish between different file formats. Also, the execution of Command_url does not effect the execution of Command_default, Command_html, Command_image, or Command_map.
The time formats, determined by %t (in Greenwich Mean Time [GMT], also called Coordinated Universal Time) and %T (in local time), show the time the URL was found, NOT the timestamp on the file. Alternately, %n and %N can be use in place of %t and %T. The time format can be specified using modifiers contained in '{' and '}'. See Command_default for the modifier descriptions.
Interactive booleanexample:
Interactive TRUEInteractive determines whether the user should be prompted for configuration information or if Templeton should start running automatically. The default setting is TRUE, causing Templeton to prompt for user interaction.