home *** CD-ROM | disk | FTP | other *** search
- # *************************************************************
- # Templeton, copyright 1995, 1996, 1997 N.A. Krawetz
- # All rights reserved.
- # *************************************************************
-
- # configuration for Templeton
- #
- # Lines beginning with a '#' are comments and are ignored.
- # Lines should not be more than 80 characters.
- # Operands in this file are in the form:
- # parameter value
- # The parameter is case insensitive, except where a text string or URL
- # is required.
- # Boolean values ("true" or "false") are case insensitive.
- # Numeric values should be numbers -- non-numbers are regarded as 0.
- # All other types of values ARE case sensitive.
-
-
- # ******************** Registration ****************************
- # Register: registration code
- # Software that is registered contains a unique registration
- # code. This code should be entered exactly as it is provided.
- # If your site contains multiple registrations, you may list
- # each registration code on a line starting with the
- # key word "Register".
- # Please read the licensing agreement for registration
- # information.
- # Register 12-34567-891011
-
-
- # ******************* File System *****************************
- # LocalPath: absolute path
- # LocalPath informs the program where to store the downloaded files.
- # IF this path is:
- # LocalPath none
- # THEN no files are generated. Only a log file containing the remote
- # servers WWW map is created in the current directory.
- #
- # Currently, files should be stored in the root directory of the file system.
- # For WWW servers, this is the server's root directory.
- # (This limitation will be removed in future releases.)
- # For DOS based machines, this path may include a drive letter:
- # LocalPath e:\server.www\
- #
- # Either slash "/" or backslash "\" are valid for specifying a directory.
- # The trailing slash or backslash is optional.
- #
- # This option is only used when the "Interactive" option is FALSE.
- LocalPath /
-
- # FATFormat: boolean
- # Determines the file name format for the current operating system.
- # DOS based machines using drives formatted with a File Allocation Table (FAT)
- # can only handle file names containing 8 characters and a 3 character
- # extension. Setting this option to TRUE will generate 8.3 character file
- # names. The default is FALSE, and will generate unlimited length file names.
- # NOTE: Under DOS, this option is always TRUE (DOS only supports FAT file
- # names). Under OS/2, this value becomes TRUE automatically if the destination
- # path (LocalPath) is located on a FAT partition.
- FATFormat FALSE
-
- # User: e-mail address
- # In case of emergency, this is the person who is running the program
- # and who should be contacted to stop the program from running.
- # This MUST be a valid e-mail address, and SHOULD also be available with
- # a "talk" command.
- # As a side note, it is never a good idea to let automatic software run
- # unsupervised (especially this type of software). The "User" should be
- # available to read their e-mail at all times during the execution of this
- # program.
- # The default is the account running the program on the current machine.
- # User webmaster@host.machine.org
-
-
- # ********************* Network *****************************
-
- # ProxyHost: hostname or IP address
- # Proxy agents are machines that act as a gateway through a firewall.
- # If your local network uses a proxy agent, specify the name of
- # the proxy agent here. If you are uncertain about your network, consult your
- # network manager or provider.
- # A proxy server is only used when a server is specified.
- # ProxyHost proxyhost.network.net
-
- # ProxyPort: integer
- # When using a proxy server (see ProxyHost), the port on the proxy server
- # should be specified. The default port is 80. This values is not
- # used if no proxy host is specified with ProxyHost.
- ProxyPort 80
-
- # Spoof: text-string
- # Some WWW servers make incorrect assumptions about the browser/robots. (Most
- # of these are the Netscape servers.) These servers assume that, since the
- # browser is not "Netscape" the browser cannot handle the HTML documents and
- # therefore, the document is not transfered. By "spoofing" a different name,
- # the WWW robot can use a qualified browser name to retrieve the HTML
- # document.
- # NOTE: The first word of the spoof-name is used for restrictions when
- # robot exclusion is honored (see Exclusion). This means, if Templeton tells
- # the WWW server that it is "Netscape" and the server does not permit
- # Netscape browsers, then the server will also not permit Templeton.
- # Common spoof names (and browsers) are:
- # Mozilla Netscape Browser
- # WebCrawler WebCrawler robot
- # InfoSeek InfoSeek robot
- # WebExplorer IBM WebExplorer for OS/2
- # Harvest a web robot
- # Mosaic NCSA Mosaic
- # Lynx Lynx, text browser
- # Microsoft Internet Explorer
- # PRODIGY-WB Prodigy browser
- # Spoof Mozilla (Templeton)
-
-
- # ********************* Restrictions *****************************
-
- # RestrictHost: boolean
- # This parameter informs the program not to leave the designated host. Links
- # to machines not on the current host are not traversed.
- RestrictHost TRUE
-
- # RestrictPath: absolute path
- # This parameter is only used when a host is restricted.
- # When a host is restricted, a subpath on that host may also be restricted.
- # Hypertext references to documents outside this subtree are not traversed.
- # Either slash "/" or backslash "\" are valid for specifying a directory.
- # The trailing slash or backslash is optional.
- RestrictPath /
-
- # RestrictDepth: numeric value
- # Hyperlinks are travered in a breadth-first search. An unrestricted search
- # may download an entire WWW server's data. By restricting the depth,
- # only immediate portions of the server will be received.
- # Images and non-href links are considered to be at the same depth as the
- # document.
- # A restricted depth of 0 means no restriction.
- # The default is 1
- RestrictDepth 1
-
- # RemoveRestricted: boolean
- # This parameter informs the program to remove untraversed links. Links to
- # restricted machines or restricted depths are removed from the HTML file,
- # but the visible test is still available (just not a hyperlink).
- # The default value is FALSE.
- RemoveRestricted FALSE
-
- # Add: URL
- # Place a specific URL on the list of URLs to process.
- # Be aware that restrictions apply.
-
- # Exclusion: boolean
- # This parameter determines whether Templeton will support server provided
- # robot exclusion files (robots.txt). Many servers maintain exclusion files
- # to prevent robots from wandering around virtual directory trees, from
- # retrieving very temporary or uncomplete files, or copyright materials. It
- # is considered "polite" for web agents to obey the exclusion files when they
- # exist. The default value, TRUE, means that robot exclusion files are obeyed.
- # Setting Exclusion to FALSE will ignore robot exclusion files.
- Exclusion TRUE
-
- # Deny: URL
- # The URL provided, as well as all subtrees or the URL, are not processed.
- # Many times specific directory subtrees are not desirable. You can deny
- # retrieval of these URL's using this setting.
- # For example, to NOT retrieve the "archive" subtree of the host loco.com,
- # you would specify:
- # Deny http://loco.com/archive/
- # If you do not include the trailing slash (http://loco.com/archive) then
- # all subdirectories beginning with "archive" are not processed. This
- # includes "archive.1", "archive.old", "archive_from_1994", etc.
- # Deny statements may also include a '*' as a wild character. This
- # symbol represents 0 or more characters for matching. If, for example,
- # you wish to retrieve all GIF files, you would use:
- # Deny *.gif
- # Only one '*' is permitted, but it may be located anywhere in the URL string.
- # Multiple Deny statements may be specified.
-
- # Allow: URL
- # Similar to "Deny", "Allow" explicitly specifies that a subtree is
- # retrievable. When used in conjunction with Deny URL, branches of a
- # subtree may be specified for access, while other subtrees are ignored.
- # Multiple Allow statements may be specified.
-
- # Authorize: "realm" base64-code
- # This complex command allows you to specify a username and password
- # for basic WWW-authentication. The realm is a quoted string.
- # The base64-code contains the encoded username and password. Use
- # the pwd64.exe program to generate your base64-code.
- # The realm is a case-sensitive string provided by the WWW server. If you
- # do not know the realm for the pages you wish to retrieve, use Templeton
- # to interactively retrieve the page. Templeton will display the realm
- # name and ask for your username and password.
- # Be aware that realms are not unique. If different documents use the
- # same realm but require different passwords, Templeton will require
- # you to enter the username and password.
- # To skip a realm, use the username "-" and password "-", or the
- # base64-code: LTot
- # Authorize "Secret Password" ZHIubmVhbDpyZWdpc3RlciBtZQ==
-
- # Sleep: numeric
- # Sleep determines the number of seconds to pause before sending a request to
- # a WWW server. SLEEP IS IMPORTANT.
- # Warning: Templeton can generate thousands of requests per minute. Many
- # WWW servers cannot handle a sudden onslaught of requests. Setting the
- # Sleep parameter to 0 (zero) may generate too many requests for the server
- # and kill the server. This is bad.
- # A sleep setting of 0 (zero) is known to kill the following types of servers:
- # All WWW servers that run under Microsoft Windows (TM)
- # Old generation (HTML/1.0) CERN servers on all platforms
- # Low sleep values may also generate large amounts of network traffic and
- # hog network resources.
- # For safety, you should set the sleep interval to at least 5 seconds.
- # The longer, the better. Remember, this program is automated and can
- # easily run for hours. What's the rush?
- Sleep 10
-
-
- # ********************* Preferences *****************************
-
- # FileOverwrite: boolean or "modified"
- # Files that already exist on the local system are normally overwritten.
- # Setting the FileOverwrite option to FALSE will not overwrite files on the
- # local file system. Setting the FileOverwrite option to "Modified"
- # (no quotes) will only retrieve documents (non-HTML) that have been changed
- # since the last retrieval. The modified option is useful when retrieving
- # the same URL multiple times; modified will not waste time retrieving GIF
- # and JPG files that have already been retrieved.
- # FileOverwrite does NOT effect HTML documents -- HTML documents are always
- # retrieved. Templeton can only determine links by retrieving HTML documents.
- # Skipping an HTML document would mean skipping possible links.
- # Default value is MODIFIED, only retrieving newer non-HTML file.
- FileOverwrite modified
-
- # Index: file name
- # For hypertext references that only specify a directory, this is the
- # default html file in the directory.
- # NOTE: if FATFormat is TRUE, the 8.3 name translation will be applied to
- # this file name.
- # The default name is "index.html"
- Index index.html
-
- # ISMAP: absolute path to executable
- # For WWW servers, many imagemaps use a program that takes coordinates from
- # a selected image <IMG SRC=... ISMAP> and return a new URL. Some of the
- # more common methods use a data file containing known coordinates and a
- # program to identify which URL is activated. Commonly, this program is
- # called "imagemap" or "imagemap.exe".
- # The ISMAP parameter specifies the WWW server's path to the imagemap program.
- ISMAP /cgi-bin/imagemap
-
- # MapType: NCSA or CERN
- # For the executable specified in the ISMAP parameter (see above), this
- # option determines the format of the file. If the image map file can be
- # retrieved, then it is converted into this specified format.
- # Valid options are either "CERN" or "NCSA". The default is NCSA.
- MapType NCSA
-
-
- # ********************* Logging *****************************
- # Mailto-File: file name
- # Similar to "Server-File" logging, the file name listed on the "Mailto-File"
- # line contains a list of e-mail addresses found in the HTML documents. Only
- # e-mail addresses that are active (hyperlinks) are used. E-mail addresses
- # displayed as plain text in the document or contained in CGI scripts are not
- # listed in the mailto logfile.
- # NOTE: This list MAY contain duplicate entries. Duplication removal may be
- # added in later versions.
- # (Some people have found this to be a very useful feature for generating
- # mailing lists.)
- # The default is no mailto logging.
- # Mailto-File mailtolist
-
- # RemoteMapping: boolean
- # Determines whether remote mapping will be done. The default is TRUE
- # while does perform mapping. The map file name is mapindex.html and is
- # either located at the root of the LocalPath or in the current directory
- # if the system is not mirroring files.
- # Note: if you change the default index name, for example, to "welcome.html"
- # then the default map file will be "mapwelcome.html".
- RemoteMapping TRUE
-
- # Server-File: file name
- # A data file is generated containing the host name, IP address, and
- # WWW server type for each server visited. For servers listed as IP
- # address only, the host name is also the IP address.
- # The default is no server logging.
- # Server-File serverlist
-
-
- # ********************* Advanced *****************************
- # The advanced configuration commands should be used with caution.
- # These commands allow other applications to perform tasks on the
- # retrieved documents. Applications that are spawned (operate
- # concurrently) with Templeton may overwhelm the user or operating system.
- # Spawned applicatons include those begun with "start" under OS/2,
- # or followed by "&" under Unix.
- # NOTE: Templeton has the capability to spawn thousands of applications
- # in a few seconds.
- # On Unix-type systems, Templeton introduces security risks when executed
- # as root.
- # For applications that are not spawned, Templeton will pause until
- # the application has ended. This allows for a guarenteed order of processing
- # for the called applications.
-
- # Command_html: string
- # Command_image: string
- # Command_map: string
- # Command_default: string
- # Execute a system command on each document stored on the file system.
- # The different command types are for HTML documents, images, map files,
- # or the default command when any of the other commands are not set.
- # This are useful for counting documents, storing statistics, printing,
- # converting, etc.
- # The string "none" turns off these commands. This default is "none".
- # The command string will replace special characters with desired information:
- # characters: becomes:
- # %d depth
- # %h host (server)
- # %u url
- # %l local file
- # %s saved file (same as %l)
- # %r remote file (URL without server)
- # %% %
- # The special characters ARE case sensitive.
- # NOTE: Command_image and Command_default do not distinguish between
- # different file formats.
- # Example: to convert all HTML documents to text using the program
- # html2txt (not provided with the Templeton distribution), you would use:
- # Command_html html2txt %s
-
- # Interactive: boolean
- # Determines whether the user should be prompted for
- # configuration information or if Templeton should
- # start running automatically.
- # The default setting is TRUE.
-
-