The options may be classified into groups:
Although the keywords are not case sensitive (you may use upper or lower case letters), the parameters for some options are case sensitive, including options where a text string or URL is used.Lines beginning with a "#" character are treated as comments.
Register registration_codeexample:
Register 12-34F67-891011Software that is registered contains a unique registration code. This code should be entered exactly as it is provided. If your site contains multiple registrations, you may list each registration code on a line starting with the key word "Register".
Please read the licensing agreement for registration information.
RestrictDepth numeric valueexample:
RestrictDepth 3Hyperlinks are travered in a breadth-first search. An unrestricted search may download an entire WWW server's data. By restricting the depth, only immediate portions of the server will be received. Images and non-href links are considered to be at the same depth as the document.
A restricted depth of 0 means no restriction. The default value is 1.
RestrictDuration HH:MMexample:
RestrictDuration 2:30Templeton can run for hours or days. You can specify a runtime duration by entering the maximum number of hours (HH) and minutes (MM). The number of hours does not need to be restricted to a 24-hour period. Entering 0:0 disables this option. The default value is 0:0.
Note: when using DNS to resolve hostnames, Templeton may run beyond the stopping time by as much as 2 minutes. This is due to a known limitation with hostname resolution.
RestrictHost booleanexample:
RestrictHost TRUEThis parameter informs the program not to leave the designated host. Links to machines not on the current host are not traversed.
RestrictImages booleanexample:
RestrictImages FALSEMost HTML documents contain both text and graphics. Frequently, these graphics come from links to other computers. When restricting to a specific host, these images would not be retrieved (not on the host). Setting RestrictImages to FALSE allows inline graphics and image maps to be located on a remote host, but will not affect restrictions to hyperlinks. Setting the value to TRUE will apply all restrictions to all files (images, text, etc.).
This option is available for people who want to mirror entire web documents, not just sites. The default value is FALSE, indicating that entire documents should be retrieved.
Note: images and image maps restricted by the Deny configuration option or by robot exclusion are not retrieved.
RestrictPath absolute pathexample:
RestrictPath /peopleThis parameter is only used when a host is restricted. When a host is restricted, a subpath on that host may also be restricted. Hypertext references to documents outside this subtree are not traversed. Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
RestrictStopTime HH:MMexample:
RestrictStopTime 14:35Templeton can run for hours or days. You can specify a specific stoptime by entering the hour (HH) and minute (MM). Times are provided in 24-hour notation, where 1PM is 13:00, etc. (00:00 is midnight.) This option only works over a 24-hour period. Midnight is 24:00, but 1 minute after midnight is 00:01. Invalid times, such as 28:00, are ignored. Specifying 0:0 (the default value), disables this option.
Note: when using DNS to resolve hostnames, Templeton may run beyond the stopping time by as much as 2 minutes. This is due to a known limitation with hostname resolution.
RemoveRestricted booleanexample:
RemoveRestricted FALSEThis parameter informs the program to remove untraversed links. Links to restricted machines or restricted depths are removed from the HTML file, but the visible text is still available (just not a hyperlink). The default value is FALSE.
Exclusion booleanexample:
Exclusion TRUEThis parameter determines whether Templeton will support server provided robot exclusion files (robot.txt). Many servers maintain exclusion files to prevent robots from wandering around virtual directory trees, from retrieving very temporary or uncomplete files, or copyright materials. It is considered "polite" for web agents to obey the exclusion files when they exist. The default value, TRUE, means that robot exclusion files are obeyed. Setting Exclusion to FALSE will ignore robot exclusion files.
It should be noted that robot exclusion files that explicitly restrict Templeton will be honored regardless of the exclusion parameter.
LinkChecking booleanexample:
LinkChecking TRUEThe LinkChecking flag allows Templeton to verify unretrieved links. URLs that are not retrieved (restricted) are checked for validity.
When LinkChecking is TRUE, all unretrieved HTTP URLs are checked for validity. This setting makes sure that all URLs are valid (working). The "TRUE" setting cannot determine if the URL points to the correct document, only that it points to an existing document.
The "FALSE" setting (default) ignores restricted URLs.
Non-HTTP URLs, such as "FTP://", "NEWS://", and "MAILTO:", are not checked.
Deny URLexample:
Deny http://foo.com/archive/The URL provided, as well as all subtrees or the URL, are not processed. Many times specific directory subtrees are not desirable. You can deny retrieval of these URL's using this setting.
For example, to NOT retrieve the "archive" subtree of the host loco.com, you would specify:
Deny http://loco.com/archive/If you do not include the trailing slash (http://loco.com/archive) then all subdirectories beginning with "archive" are not processed. This includes "archive.1", "archive.old", "archive_from_1994", etc.
Deny statements may also include a '*' as a wild character. This symbol represents 0 or more characters for matching. If, for example, you do not wish to retrieve GIF files, you would use:
Deny *.gif
Multiple Deny statements may be specified.
Allow URLexample:
Allow http://foo.com/archive/January/Similar to "Deny", "Allow" explicitly specifies that a subtree is retrievable. When used in conjunction with Deny URL, branches of a subtree may be specified for access, while other subtrees are ignored.
Multiple Allow statements may be specified.
Authorize "realm" base64-codeexample:
Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==This complex command allows you to specify a username and password for basic WWW-authentication. The realm is a quoted string. The base64-code contains the encoded username and password. Use the pwd64.exe program to generate your base64-code.
The realm is a case-sensitive string provided by the WWW server. If you do not know the realm for the pages you wish to retrieve, use Templeton to interactively retrieve the page. Templeton will display the realm name and ask for your username and password.
Be aware that realms are not unique. If different documents use the same realm but require different passwords, Templeton will require you to enter the username and password.
To skip a realm, use the username "-" and password "-", or the base64-code: LTot
Authorize "Secret Password" LTot
See also Proxy-Authorize.
LocalPath absolute pathexample:
LocalPath /LocalPath informs the program where to store the downloaded files. IF this path is:
LocalPath noneTHEN no files are generated. Only a log file containing the remote servers WWW map is created in the current directory.
Currently, files should be stored in the root directory of the file system. For WWW servers, this is the server's root directory. (This limitation will be removed in future releases.) For DOS based machines, this path may include a drive letter:
LocalPath e:\server.www\Either slash "/" or backslash "\" are valid for specifying a directory. The trailing slash or backslash is optional.
FATFormat booleanexample:
FATFormat FALSEDetermines the filename format for the current operating system. DOS based machines using drives formatted with a File Allocation Table (FAT) can only handle filenames containing 8 characters and a 3 character extension. Setting this option to TRUE will generate 8.3 character file names. The default is FALSE, and will generate unlimited length filenames.
NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file names). Under OS/2, this value becomes TRUE automatically if the destination path (LocalPath) is located on a FAT partition.
FileCaseFormat {upper|lower|none}example:
FileCaseFormat lowerTempleton assumes the web server file system is case-insensitive. This is due to the vase numbers of Windows NT, Win95, and OS/2 web servers and the inability to distinguish the type of operating system running the server. Because queries to the server may be (and frequently are) case-sensitive, Templeton tries to keep filenames the same case as the query. For example, the URL "http://www/Project/" generates a directory called "Project" with a capital "P".
In some situations, it may be preferable to have all files and directories in the same case. Specifying the argument "upper" will make all filenames upper-case (e.g. "PROJECT"). Similarly, the argument "lower" makes all filenames lower-case (e.g. "project").
Some places where this configuration option is desirable include:
For best results when burning a CD-ROM, make note of whether the files are converted to upper-case or lower-case and set the FileCaseFormat option appropriately. Set the FATFormat to TRUE. The combination of filename-case and FAT format will allow the retrieved web site (burned to the CD-ROM) accessible to both DOS and unix operating systems.
FileOverwrite boolean or "modified"examples:
FileOverwrite TRUE
FileOverwrite MODIFIEDFiles that already exist on the local system are normally overwritten. Setting the FileOverwrite option to FALSE will not overwrite files on the local file system. Setting the FileOverwrite option to "Modified" (no quotes) will only retrieve documents (non-HTML) that have been changed since the last retrieval. The modified option is useful when retrieving the same URL multiple times; modified will not waste time retrieving GIF and JPG files that have already been retrieved.
FileOverwrite does NOT effect HTML documents -- HTML documents are always retrieved. Templeton can only determine links by retrieving HTML documents. Skipping an HTML document would mean skipping possible links. For updating HTML documents, see Update-File. Default value is MODIFIED, only retrieving newer non-HTML files.
ISMAP absolute path to executableexample:
ISMAP /cgi-bin/imagemapFor WWW servers, many imagemaps use a program that takes coordinates from a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the more common methods use a data file containing known coordinates and a program to identify which URL is activated. Commonly, this program is called "imagemap" or "imagemap.exe".
The ISMAP parameter specifies the WWW server's path to the imagemap program.
MapType NCSA or CERNexample:
MapType NCSAFor the executable specified in the ISMAP parameter, this option determines the format of the file. If the image map file can be retrieved, then it is converted into this specified format. Valid options are either "CERN" or "NCSA". The default is NCSA.
Index filenameexample:
Index index.htmlFor hypertext references that only specify a directory, this is the default HTML file in the directory.
NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to this filename.
The default name is "index.html"
RemoteMapping booleanexamples:
RemoteMapping TRUEWhen RemoteMapping is TRUE, an HTML file is generated containing all the URLs and how they are linked (which links are contained in each document). When set to FALSE, this map of the remote server(s) is not generated. The default value is "TRUE", telling Templeton to generate a map of the remote web site(s).
Server-File filenameexample:
Server-File serverfileA data file is generated containing the host name, IP address, and WWW server type for each server visited. For servers listed as IP address only, the host name is also the IP address.
Setting the filename to "none" disables logging. The default is no server logging.
Mailto-File filenameexample:
Mailto-File mailtofileSimilar to Server-File logging, the filename listed on the "Mailto-File" line contains a list of e-mail addresses found in the HTML documents. Only e-mail addresses that are active (hyperlinks) are used. E-mail addresses displayed as plain text in the document or contained in CGI scripts are not listed in the mailto logfile.
NOTE: This list MAY contain duplicate entries. Duplication removal may be added in later versions. A very useful feature for generating mailing lists.
Setting the filename to "none" disables logging. The default is no mailto logging.
Update-File filenameexample:
Update-File urllistThe update file list is useful for downloading only files which have been modified. Although the option "FileOverwrite modified" will update newer images, it does not work with HTML documents. The Update-File option is useful for refreshing HTML documents as well as images. To use the saved update-file, include the file name on the command line.
Setting the filename to "none" disables the update file. The default is "none".
User e-mail addressexample:
User webmaster@host.machine.orgIn case of emergency, this is the person who is running the program and who should be contacted to stop the program from running. This MUST be a valid e-mail address, and SHOULD also be available with a network "talk" command.
As a side note, it is never a good idea to let automatic software run unsupervised (especially this type of software). The "User" should be available to read their e-mail at all times during the execution of this program.
The default is the user running the program on the current machine, and the IP address of the current machine. For operating systems with no user accounts, such as DOS or OS/2, the username is taken from the USER environment variable, or "root" if the variable is undefined.
DNSLookup booleanexample:
DNSLookup TRUEA single machine may be refered by many hostnames. For example, "www", "www.crumple.com", and "paper.crumple.com" are all the same webserver. When DNSLookup is TRUE, Templeton will correctly identify these different hostnames as the same machine. When DNSLookup is FALSE, each of these hostnames are treated as different hosts. The DNS (domain name service) may take time to resolve a hostname (up to 2 minutes) so setting DNSLookup to FALSE can dramatically increase Templeton's speed, especially when processing HTML documents with many links to other machines. The default setting is TRUE.
Proxy-Authorize "realm" base64-codeexample:
Proxy-Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==Similar to Authorize, this complex command allows you to specify a realm, username, and password for a secure HTTP proxy server. You should use the pwd64 program to help generate the base64-code from the username and password.
See also Authorize.
ProxyHost hostname or IP addressexample:
ProxyHost proxyhost.network.netProxy agents are machines that act as a gateway through a firewall. If your local network uses a proxy agent, specify the name of the proxy agent here. If you are uncertain about your network, consult your network manager or provider.
A proxy server is only used when a server is specified.
ProxyPort integerexample:
ProxyPort 80When using a proxy server, the port on the proxy server should be specified. The default port is 80. This value is not used if no proxy host is specified with ProxyHost.
Spoof text-stringexample:
Spoof Mozilla (Templeton)Some WWW servers make incorrect assumptions about the browser/robots. (Most of these are the Netscape servers.) These servers assume that since the browser is not "Netscape", the browser cannot handle the HTML documents and, thus, the document is not transfered. By "spoofing" a different name, the WWW robot can use a qualified browser name to retrieve the HTML document.
NOTE: The first word of the spoof-name is used for restrictions when robot exclusion is honored (see Exclusion). This means, if Templeton tells the WWW server that it is "Netscape" and the server does not permit Netscape browsers, then the server will also not permit Templeton.
Common spoof names (and browsers) are:
Add URLexample:
Add http://www.cs.tamu.edu/people/This configuration option adds a URL to the list to be processed. Restrictions are applied. The "add" feature make it easy for automated operation of Templeton.
Multiple Add statements may be specified.
Sleep secondsexample:
Sleep 10Sleep determines the number of seconds to pause before sending a request to a WWW server. SLEEP IS IMPORTANT.
Warning: Templeton can generate thousands of requests per minute. Many WWW servers cannot handle a sudden onslaught of requests. Setting the Sleep parameter to 0 (zero) may generate too many requests for the server and kill the server. This is bad.
A sleep setting of 0 is known to kill the following types of servers:
For safety, you should set the sleep interval to at least 5 seconds. The longer, the better. Remember, this program is automated and can easily run for hours. What's the rush?
Unregistered versions of Templeton cannot have a sleep period less than 5 seconds.
NOTE: Templeton has the capability to spawn thousands of applications in a few seconds. On unix-type systems, Templeton introduces security risks when executed as root.
For applications that are not spawned, Templeton will pause until the application has ended. This allows for a guarenteed order of processing for the called applications.
Resetexample:
ResetDefaults all configuration options to the initial settings. All configuration options set prior to the "Reset" option are defaulted. The only exception is the Registration configuration option; valid registration is not disabled by the "reset" command.
The Reset option is useful for overriding defaults set in a system configuration file.
CommentTag [Strict|Match]examples:
CommentTag Strict CommentTag MatchDetermines how comment tags "<!-- ... -->" are interpreted. There are 2 settings:
Command_default string Command_html string Command_image string Command_map stringexample:
Command_default echo %s >> filelist Command_html /usr/local/bin/html2txt %s Command_image /usr/local/bin/viewpict %s Command_map echo %s >> maplogExecute a system command on each document stored on the file system. The different command types are for HTML documents, images, map files, or the default command when any of the other commands are not set. These commands are useful for counting documents, storing statistics, printing, converting, etc.
The string "none" turns off these commands. The default is "none".
The command string will replace special characters with desired information:
Characters: | Becomes: |
%c | content-type of the document |
%d | depth |
%h | host (server) |
%l | local file |
%n | current time (now) in RFC 822 format. Time is shown in GMT. |
%N | current time (now) in RFC 822 format. Time is shown in the local timezone. |
%p | remote parent URL (where this link came from) |
%P | local parent file (file refering to this link) |
%r | remote file (URL without server information) |
%s | saved file (same as %l) |
%t | file timestamp in RFC 822 format. Time is shown in GMT. |
%T | file timestamp in RFC 822 format. Time is shown in the local timezone. |
%u | URL |
%% | % (percent sign) |
The special characters ARE case sensitive.
NOTE: Command_image and Command_default do not distinguish between different file formats.
The time formats, determined by %t (in Greenwich Mean Time [GMT], also called Coordinated Universal Time), and %T (in local time), show the file modification time for the file on the server. If the server does not provide the file modification time, then the time Templeton found the file is used. Alternately, %n and %N can be use in place of %t and %T for the current time. The format can be specified using modifiers contained in '{' and '}'. These modifiers ARE case sensitive.
Characters: | Format: | Example: |
%t or %n %T or %N |
Default format: RFC 822 | Tue, 04 Nov 1995 01:29:45 GMT
Tue, 04 Nov 1995 07:29:45 |
%t{rfc822} or %n{rfc822} %T{rfc822} or %N{rfc822} |
RFC 822 The most common format used by web servers. |
Tue, 04 Nov 1995 01:29:45 GMT
Tue, 04 Nov 1995 07:29:45 |
%t{rfc850} or %n{rfc850} %T{rfc850} or %N{rfc850} |
RFC 850 Uncommon, but standard. Most applications using this format are not "Year 2000" compliant. |
Tuesday, 04-Nov-95 01:29:45 GMT
Tuesday, 04-Nov-95 07:29:45 |
%t{ansi-c} or %n{ansi-c} %T{ansi-c} or %N{ansi-c} |
ANSI C Common computer (unix) and programming format. |
Tue Nov 4 01:29:45 1995
Tue Nov 4 07:29:45 1995 |
%t{iso8601} or %n{iso8601} %T{iso8601} or %N{iso8601} |
ISO 8601 International standard. Commonly found in databases. |
1995-11-04 01:29:45Z
1995-11-04 07:29:45 |
%t{iso8601c} or %n{iso8601c} %T{iso8601c} or %N{iso8601c} |
ISO 8601 Compressed Commonly used when data size is a premium. |
19951104T012945Z
19951104T072945 |
%t{seconds} or %n{seconds} %T{seconds} or %N{seconds} |
Number of seconds since 1/1/1970. Commonly used by computer (unix) software. |
815470185
815491785 |
ISO 8601 and ISO 8601 Compressed are part of the HTTP 1.1 standard. The other time formats are part of the HTTP 1.0 standard.
Command_url pattern stringexamples:
Command_url *.gif echo %u >> urllog Command_url ftp://* echo %u >> urllogSimilar to Command_html, the command line string is executed by every URL found that patches the pattern. This includes other protocols such as "ftp://", "gopher://", and "mailto:". No effort is made toward uniqueness; the same URL may be seen hundreds of times.
Because this command is processed each and every time a URL is found, it may significantly slow the runtime performance of Templeton.
The string "none" disables the command associated with the pattern. The default is "none" for all patterns (i.e. no default command execution).
The command string will replace special characters with desired information:
Characters: | Becomes: |
%d | depth |
%h | host (server) |
%n | current time (now) shown in GMT |
%N | current time (now) shown in local |
%p | remote parent URL (where this link came from) |
%P | local parent file (file refering to this link) |
%r | remote file (URL without server information) |
%t | time shown in GMT (same as %n) |
%T | time shown in local (same as %N) |
%u | URL |
%% | % (percent sign) |
The special characters ARE case sensitive.
The time formats, determined by %t (in Greenwich Mean Time [GMT], also called Coordinated Universal Time) and %T (in local time), show the time the URL was found, NOT the timestamp on the file. Alternately, %n and %N can be use in place of %t and %T. The time format can be specified using modifiers contained in '{' and '}'. See Command_default for the modifier descriptions.
Command_pattern pattern stringexample:
Command_pattern *.gif echo %u >> giflistWhile the Command_html family of commands are used on types of hyperlinks, and Command_url executes commands on every found URL, the Command_pattern is used on retrieved URLs. Like Command_url, Command_pattern is more specific than Command_html since you can specify which URLs to execute.
The command line string is executed each time a retrieved URL matches the specified pattern. Only the first matching pattern will be executed. Thus, if there are multiple matching patterns (e.g. "*.htm" and "*.htm*") then the more specific pattern is used (e.g. "*.htm" before "*.htm*"). This is ideal for format converters, such as GIF-to-JPG, indexing software, and news/FTP retrieval software ("news://*" and "ftp://*").
The command string will replace special characters with desired information:
Characters: | Becomes: |
%d | depth |
%f | a temporary file containing the retrieved information from the server |
%F | like %f, but also includes the URL at the beginning of the temporary file |
%h | host (server) |
%n | current time (now) shown in GMT |
%N | current time (now) shown in local |
%p | remote parent URL (where this link came from) |
%P | local parent file (file refering to this link) |
%r | remote file (URL without server information) |
%t | time shown in GMT (same as %n) |
%T | time shown in local (same as %N) |
%u | URL |
%% | % (percent sign) |
The special characters ARE case sensitive.
Interactive booleanexample:
Interactive TRUEInteractive determines whether the user should be prompted for configuration information or if Templeton should start running automatically. The default setting is TRUE, causing Templeton to prompt for user interaction.
Update depth type URLexample:
Update 2 html http://www.farm.com/Explicitly lists a URL to update if the remote file is newer. Unlike "FileOverwrite modified", this option affects both data files and HTML documents. Currently, valid types are: html, image, and map. The depth is the minimal number of links necessary for reaching the URL, >= 1. This command is for use in files generated by update-files -- not intended for manual editing. The format is currently strict (1 space between items) and may change in later releases.
Mapfile string mapped-stringexample:
Mapfile index.html index.htmExplicitly creates a file name mapping. This command is for use in files generated by update-files -- not intended for manual editing. The format is currently strict (1 space between items) and may change in later releases.
x_Object booleanexample:
x_Object TRUEEnable (true) or disable (false) the processing of <Applet...>, <Object...>, and <Param...> tags. Currently being tested. The default value is TRUE.