home *** CD-ROM | disk | FTP | other *** search
- # *************************************************************
- # Templeton, copyright 1995, 1996, 1997 N.A. Krawetz
- # All rights reserved.
- # *************************************************************
-
- # configuration for Templeton
- #
- # Lines beginning with a '#' are comments and are ignored.
- # Lines should not be more than 80 characters.
- # Operands in this file are in the form:
- # parameter value
- # The parameter is case insensitive, except where a text string or URL
- # is required.
- # Boolean values ("true" or "false") are case insensitive.
- # Numeric values should be numbers -- non-numbers are regarded as 0.
- # All other types of values ARE case sensitive.
-
-
- # ******************** Registration ****************************
- # Register: registration code
- # Software that is registered contains a unique registration
- # code. This code should be entered exactly as it is provided.
- # If your site contains multiple registrations, you may list
- # each registration code on a line starting with the
- # key word "Register".
- # Please read the licensing agreement for registration
- # information.
- # Register 12-34567-891011
-
-
- # ******************* File System *****************************
- # LocalPath: absolute path
- # LocalPath informs the program where to store the downloaded files.
- # IF this path is:
- # LocalPath none
- # THEN no files are generated. Only a log file containing the remote
- # servers WWW map is created in the current directory.
- #
- # Currently, files should be stored in the root directory of the file system.
- # For WWW servers, this is the server's root directory.
- # (This limitation will be removed in future releases.)
- # For DOS based machines, this path may include a drive letter:
- # LocalPath e:\server.www\
- #
- # Either slash "/" or backslash "\" are valid for specifying a directory.
- # The trailing slash or backslash is optional.
- #
- # This option is only used when the "Interactive" option is FALSE.
- LocalPath /
-
- # FATFormat: boolean
- # Determines the filename format for the current operating system.
- # DOS based machines using drives formatted with a File Allocation Table (FAT)
- # can only handle filenames containing 8 characters and a 3 character
- # extension. Setting this option to TRUE will generate 8.3 character file
- # names. The default is FALSE, and will generate unlimited length filenames.
- # NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file
- # names). Under OS/2, this value becomes TRUE automatically if the destination
- # path (LocalPath) is located on a FAT partition.
- FATFormat FALSE
-
- # User: e-mail address
- # In case of emergency, this is the person who is running the program
- # and who should be contacted to stop the program from running.
- # This MUST be a valid e-mail address, and SHOULD also be available with
- # a "talk" command.
- # As a side note, it is never a good idea to let automatic software run
- # unsupervised (especially this type of software). The "User" should be
- # available to read their e-mail at all times during the execution of this
- # program.
- # The default is the account running the program on the current machine.
- # User webmaster@host.machine.org
-
-
- # ********************* Network *****************************
-
- # DNSLookup: boolean
- # A single machine may be refered by many hostnames. For example, "www",
- # "www.crumple.com", and "paper.crumple.com" are all the same webserver.
- # When DNSLookup is TRUE, Templeton will correctly identify these different
- # hostnames are the same machine. When DNSLookup is FALSE, each of these
- # hostnames are treated as different hosts. The DNS (domain name service)
- # may take time to resolve a hostname (up to 2 minutes) so setting DNSLookup
- # to FALSE can dramatically increase Templeton's speed, especially when
- # processing HTML documents with many links to other machines.
- # The default setting is TRUE.
- # DNSLookup TRUE
-
- # ProxyHost: hostname or IP address
- # Proxy agents are machines that act as a gateway through a firewall.
- # If your local network uses a proxy agent, specify the name of
- # the proxy agent here. If you are uncertain about your network, consult your
- # network manager or provider.
- # A proxy server is only used when a server is specified.
- # ProxyHost proxyhost.network.net
-
- # ProxyPort: integer
- # When using a proxy server (see ProxyHost), the port on the proxy server
- # should be specified. The default port is 80. This value is not
- # used if no proxy host is specified with ProxyHost.
- ProxyPort 80
-
- # Spoof: text-string
- # Some WWW servers make incorrect assumptions about the browser/robots. (Most
- # of these are the Netscape servers.) These servers assume that, since the
- # browser is not "Netscape" the browser cannot handle the HTML documents and
- # therefore, the document is not transfered. By "spoofing" a different name,
- # the WWW robot can use a qualified browser name to retrieve the HTML
- # document.
- # NOTE: The first word of the spoof-name is used for restrictions when
- # robot exclusion is honored (see Exclusion). This means, if Templeton tells
- # the WWW server that it is "Netscape" and the server does not permit
- # Netscape browsers, then the server will also not permit Templeton.
- # Common spoof names (and browsers) are:
- # Mozilla (Netscape Browser)
- # WebCrawler (WebCrawler robot)
- # InfoSeek (InfoSeek robot)
- # WebExplorer (IBM WebExplorer for OS/2)
- # Harvest (a web robot)
- # Mosaic (NCSA Mosaic)
- # Lynx (Lynx, text browser)
- # Microsoft Internet Explorer
- # PRODIGY-WB (Prodigy browser)
- # Spoof Mozilla (Templeton)
-
-
- # ********************* Restrictions *****************************
-
- # RestrictHost: boolean
- # This parameter informs the program not to leave the designated host. Links
- # to machines not on the current host are not traversed.
- RestrictHost TRUE
-
- # RestrictPath: absolute path
- # This parameter is only used when a host is restricted.
- # When a host is restricted, a subpath on that host may also be restricted.
- # Hypertext references to documents outside this subtree are not traversed.
- # Either slash "/" or backslash "\" are valid for specifying a directory.
- # The trailing slash or backslash is optional.
- RestrictPath /
-
- # RestrictDepth: numeric value
- # Hyperlinks are travered in a breadth-first search. An unrestricted search
- # may download an entire WWW server's data. By restricting the depth,
- # only immediate portions of the server will be received.
- # Images and non-href links are considered to be at the same depth as the
- # document.
- # A restricted depth of 0 means no restriction.
- # The default is 1
- RestrictDepth 1
-
- # RestrictImages: boolean
- # Most HTML documents contain both text and graphics. Frequently, these
- # graphics come from links to other computers. When restricting to a specific
- # host, these images would not be retrieved (not on the host). Setting
- # RestrictImages to FALSE allows inline graphics and image maps to be located
- # on a remote host, but will not affect restrictions to hyperlinks.
- # Setting the value to TRUE will apply all restrictions to all files
- # (images, text, etc.).
- # This option is available for people who want to mirror entire web documents,
- # not just sites. The default value is FALSE, indicating that entire
- # documents *should* be retrieved.
- # Note: images and image maps restricted by the Deny configuration option
- # or by robot exclusion are not retrieved.
-
- # RemoveRestricted: boolean
- # This parameter informs the program to remove untraversed links. Links to
- # restricted machines or restricted depths are removed from the HTML file,
- # but the visible test is still available (just not a hyperlink).
- # The default value is FALSE.
- RemoveRestricted FALSE
-
- # RestrictDuration: HH:MM
- # Templeton can run for hours or days. You can specify a runtime duration
- # by entering the maximum number of hours (HH) and minutes (MM).
- # The number of hours does not need to be restricted to a 24-hour period.
- # Entering 0:0 disables this option. The default value is 0:0.
- # RestrictDuration 2:30
-
- # RestrictStopTime: HH:MM
- # Templeton can run for hours or days. You can specify a specific stoptime
- # by entering the hour (HH) and minute (MM). Times are provided in military
- # notation, where 1PM is 13:00, etc. This option only works over a 24-hour
- # period. Midnight is 24:00, but 1 minute after midnight is 00:01. Invalid
- # times, such as 28:00, are ignored. Specifying 0:0 (the default value),
- # disables this option.
-
- # Add: URL
- # Place a specific URL on the list of URLs to process.
- # Be aware that restrictions apply.
-
- # Exclusion: boolean
- # This parameter determines whether Templeton will support server provided
- # robot exclusion files (robots.txt). Many servers maintain exclusion files
- # to prevent robots from wandering around virtual directory trees, from
- # retrieving very temporary or uncomplete files, or copyright materials. It
- # is considered "polite" for web agents to obey the exclusion files when they
- # exist. The default value, TRUE, means that robot exclusion files are obeyed.
- # Setting Exclusion to FALSE will ignore robot exclusion files.
- Exclusion TRUE
-
- # Deny: URL
- # The URL provided, as well as all subtrees or the URL, are not processed.
- # Many times specific directory subtrees are not desirable. You can deny
- # retrieval of these URL's using this setting.
- # For example, to NOT retrieve the "archive" subtree of the host loco.com,
- # you would specify:
- # Deny http://loco.com/archive/
- # If you do not include the trailing slash (http://loco.com/archive) then
- # all subdirectories beginning with "archive" are not processed. This
- # includes "archive.1", "archive.old", "archive_from_1994", etc.
- # Deny statements may also include a '*' as a wild character. This
- # symbol represents 0 or more characters for matching. If, for example,
- # you do not wish to retrieve GIF files, you would use:
- # Deny *.gif
- # Only one '*' is permitted, but it may be located anywhere in the URL string.
- # Multiple Deny statements may be specified.
-
- # Allow: URL
- # Similar to "Deny", "Allow" explicitly specifies that a subtree is
- # retrievable. When used in conjunction with Deny URL, branches of a
- # subtree may be specified for access, while other subtrees are ignored.
- # Multiple Allow statements may be specified.
-
- # Authorize: "realm" base64-code
- # This complex command allows you to specify a username and password
- # for basic WWW-authentication. The realm is a quoted string.
- # The base64-code contains the encoded username and password. Use
- # the pwd64.exe program to generate your base64-code.
- # The realm is a case-sensitive string provided by the WWW server. If you
- # do not know the realm for the pages you wish to retrieve, use Templeton
- # to interactively retrieve the page. Templeton will display the realm
- # name and ask for your username and password.
- # Be aware that realms are not unique. If different documents use the
- # same realm but require different passwords, Templeton will require
- # you to enter the username and password.
- # To skip a realm, use the username "-" and password "-", or the
- # base64-code: LTot
- # Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==
-
- # Proxy-Authorize: "realm" base64-code
- # Similar to "Authorize", this complex command allows you to specify a
- # realm and password for a secure HTTP proxy server.
- # Proxy-Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==
-
- # Sleep: numeric
- # Sleep determines the number of seconds to pause before sending a request to
- # a WWW server. SLEEP IS IMPORTANT.
- # Warning: Templeton can generate thousands of requests per minute. Many
- # WWW servers cannot handle a sudden onslaught of requests. Setting the
- # Sleep parameter to 0 (zero) may generate too many requests for the server
- # and kill the server. This is bad.
- # A sleep setting of 0 (zero) is known to kill the following types of servers:
- # All WWW servers that run under Microsoft Windows (TM)
- # Old generation (HTML/1.0) CERN servers on all platforms
- # Low sleep values may also generate large amounts of network traffic and
- # hog network resources.
- # For safety, you should set the sleep interval to at least 5 seconds.
- # The longer, the better. Remember, this program is automated and can
- # easily run for hours. What's the rush?
- Sleep 10
-
-
- # ********************* Preferences *****************************
-
- # FileOverwrite: boolean or "modified"
- # Files that already exist on the local system are normally overwritten.
- # Setting the FileOverwrite option to FALSE will not overwrite files on the
- # local file system. Setting the FileOverwrite option to "Modified"
- # (no quotes) will only retrieve documents (non-HTML) that have been changed
- # since the last retrieval. The modified option is useful when retrieving
- # the same URL multiple times; modified will not waste time retrieving GIF
- # and JPG files that have already been retrieved.
- # FileOverwrite does NOT effect HTML documents -- HTML documents are always
- # retrieved. Templeton can only determine links by retrieving HTML documents.
- # Skipping an HTML document would mean skipping possible links.
- # Default value is MODIFIED, only retrieving newer non-HTML files.
- FileOverwrite modified
-
- # Index: filename
- # For hypertext references that only specify a directory, this is the
- # default html file in the directory.
- # NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to
- # this filename.
- # The default name is "index.html"
- Index index.html
-
- # ISMAP: absolute path to executable
- # For WWW servers, many imagemaps use a program that takes coordinates from
- # a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the
- # more common methods use a data file containing known coordinates and a
- # program to identify which URL is activated. Commonly, this program is
- # called "imagemap" or "imagemap.exe".
- # The ISMAP parameter specifies the WWW server's path to the imagemap program.
- ISMAP /cgi-bin/imagemap
-
- # MapType: NCSA or CERN
- # For the executable specified in the ISMAP parameter (see above), this
- # option determines the format of the file. If the image map file can be
- # retrieved, then it is converted into this specified format.
- # Valid options are either "CERN" or "NCSA". The default is NCSA.
- MapType NCSA
-
-
- # ********************* Logging *****************************
- # Mailto-File: filename
- # Similar to "Server-File" logging, the filename listed on the "Mailto-File"
- # line contains a list of e-mail addresses found in the HTML documents. Only
- # e-mail addresses that are active (hyperlinks) are used. E-mail addresses
- # displayed as plain text in the document or contained in CGI scripts are not
- # listed in the mailto logfile.
- # NOTE: This list MAY contain duplicate entries. Duplication removal may be
- # added in later versions.
- # (Some people have found this to be a very useful feature for generating
- # mailing lists.)
- # Setting the filename to "none" disables logging.
- # The default is no mailto logging.
- # Mailto-File mailtolist
-
- # RemoteMapping: boolean
- # Determines whether remote mapping will be done. The default is TRUE
- # while does perform mapping. The map filename is mapindex.html and is
- # either located at the root of the LocalPath or in the current directory
- # if the system is not mirroring files.
- # Note: if you change the default index name, for example, to "welcome.html"
- # then the default map file will be "mapwelcome.html".
- RemoteMapping TRUE
-
- # Server-File: filename
- # A data file is generated containing the host name, IP address, and
- # WWW server type for each server visited. For servers listed as IP
- # address only, the host name is also the IP address.
- # Setting the filename to "none" disables logging.
- # The default is no server logging.
- # Server-File serverlist
-
- # Update-File: filename
- # The update file list is useful for downloading only files which have been
- # modified. Although the option "FileOverwrite modified" will update
- # newer images, it does not work with HTML documents. The Update-File
- # option is useful for refreshing HTML documents as well as images.
- # To use the saved update-file, include the file name on the command line.
- # Setting the filename to "none" disables the update file.
- # The default is "none".
-
- # ********************* Advanced *****************************
- # The advanced configuration commands should be used with caution.
- # These commands allow other applications to perform tasks on the
- # retrieved documents. Applications that are spawned (operate
- # concurrently) with Templeton may overwhelm the user or operating system.
- # Spawned applicatons include those begun with "start" under OS/2,
- # or followed by "&" under Unix.
- # NOTE: Templeton has the capability to spawn thousands of applications
- # in a few seconds.
- # On Unix-type systems, Templeton introduces security risks when executed
- # as root.
- # For applications that are not spawned, Templeton will pause until
- # the application has ended. This allows for a guarenteed order of processing
- # for the called applications.
-
- # Command_html: string
- # Command_image: string
- # Command_map: string
- # Command_default: string
- # Execute a system command on each document stored on the file system.
- # The different command types are for HTML documents, images, map files,
- # or the default command when any of the other commands are not set.
- # This are useful for counting documents, storing statistics, printing,
- # converting, etc.
- # The string "none" turns off these commands. This default is "none".
- # The command string will replace special characters with desired information:
- # characters: becomes:
- # %d depth
- # %h host (server)
- # %p remote parent URL (first URL containing a link to this URL)
- # %P local parent file (first file containing a link to this URL)
- # %l local file
- # %n current time in GMT (see %t)
- # %N current time in local time (see %T)
- # %r remote file (URL without server)
- # %s saved file (same as %l)
- # %t file timestamp (RFC 822 format) in GMT
- # %t{rfc822} file timestamp in RFC 822 format
- # %t{rfc850} file timestamp in RFC 850 format
- # %t{ansi-c} file timestamp in ANSI C format
- # %t{iso8601} file timestamp in ISO 8601 format
- # %t{iso8601c} file timestamp in ISO 8601 compressed format
- # %T similar to %t, but times provided in local time
- # %u url
- # %% %
- # The special characters ARE case sensitive.
- # NOTE: Command_image and Command_default do not distinguish between
- # different file formats.
- # Example: to convert all HTML documents to text using the program
- # html2txt (not provided with the Templeton distribution), you would use:
- # Command_html html2txt %s
-
- # Command_url: string
- # Similar to Command_html, this command line string is executed by *every*
- # URL found. This includes other protocols such as "ftp://", "gopher://"
- # and "mailto:". No effort is made toward uniqueness; the same URL may be
- # seen hundreds of times.
- # Because this command is processed each and every time a URL is found, it may
- # significantly slow the runtime performance of Templeton.
- # The string "none" turns off this command. The default is "none".
- # This command replaces the same characters as Command_html, except for
- # %l and %s; the local filename is unavailable.
- # The time formats, %t and %T, show the time the URL was found by Templeton,
- # *not* the timestamp of the file.
- # The execution of the Command_url string does not effect the execution of
- # the Command_html, Command_image, Command_map, or Command_default strings.
-
- # Interactive: boolean
- # Determines whether the user should be prompted for
- # configuration information or if Templeton should
- # start running automatically.
- # The default setting is TRUE.
-
-