home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Spezial
/
SPEZIAL2_97.zip
/
SPEZIAL2_97.iso
/
ANWEND
/
ONLINE
/
W3GRAB14
/
WWWGRAB2.ENG
< prev
Wrap
Text File
|
1997-07-07
|
39KB
|
993 lines
Welcome to WWWGrab/2 v1.4
-------------------------
<Czech>
¼esk∞ návod je v souboru WWWGrab2.CZE.
</Czech>
<Spanish>
La versión en castellano es WWWGrab.SPA
</Spanish>
<French>
La documentation en français est dans le fichier WWWGRAB.FRA.
</French>
Table of Contents
-----------------
Introduction
Requirements
Copyright and Disclaimer
Starting WWWGrab/2
Regular Expressions
Using `@' Files
Configuration File Format
Command List - Detailed Reference
Command List - Quick Reference
Quick Reference Chart
Examples
Credits
Introduction
------------
WWWGrab/2 is a utility for making a copy of a remote web site (or part of
a site). WWWGrab/2 makes a local copy, on your hard disk (or a network
drive), of a remote WWW server's files, including HTML files, images, and
more.
You need WWWGrab/2 if:
* You are a web site administrator and need to mirror your site on
different machines. You can configure WWWGrab/2 to run periodically
(by using a "cron"-type utility) and automatically keep your site
mirrored.
* You are having trouble getting a reliable or speedy connection to a
web site. Let WWWGrab/2 spend the time downloading the pages - you
can look at them later.
* You are working on your own web page and want to see the HTML code to
someone else's page to see how they did it.
* You want to have a local copy of a web site for quick, easy reference.
* You want to make a copy of a web page or site because it may
disappear.
* You have a slow connection to the Internet, or simply don't like the
"World Wide Wait".
* You pay for Internet access by the minute.
* You frequently reference a Web site, but don't want to have to go
online every time you need to look at it.
WWWGrab/2 offers many features that make it a very powerful and flexible
tool for mirroring a web site, including:
* Nearly 50 commands and options which provide maximum control and
flexibility over the program's operation. (However, only a handful
are required in most cases.)
* Easy-to-use configuration files to let you control every option and
command in detail. Configuration files may be nested, allowing common
commands and options to be automatically included, and you can have
separate configuration files for a single web site.
* U*IX-like regular expression strings for maximum flexibility and
control over filenames.
* A web site may be checked for modifications if it has been previously
mirrored.
* External programs can be run for every successfully downloaded file,
allowing unlimited customized actions.
* Logging of files successfully downloaded.
Requirements
------------
WWWGrab/2's requirements are simple and few:
* OS/2 Version 2.11 or greater. Merlin or OS/2 Warp Connect suggested
for best performance.
* One of the following TCP/IP packages for OS/2 (listed in order of
preference):
* IBM TCP/IP included in OS/2 Warp Merlin.
* IBM TCP/IP 3.0 included in OS/2 Warp Connect.
* IBM TCP/IP 2.0 Base Kit with CSD64092 or greater applied.
* The Internet Access Kit from OS/2 Warp's Bonus pack.
* A disk with long filename support (HPFS, ext2fs, etc.) is not
required but is strongly recommended!
* Approximately 100K for program files and documentation.
* Sufficient disk space for your mirroring. Depending on how you use
WWWGrab/2, this may be as little as a few kilobytes or megabytes.
Disclaimer etc.
---------------
This program is COPYRIGHTED by J. Rubes.
WWWGrab/2 is a shareware product. It is distributed through public
access channels so that prospective buyers have the opportunity to
evaluate the product before making a decision to buy.
WWWGrab/2 may be used only for legal purposes. CHECK if you are
allowed to mirror a site before doing so.
USE AT YOUR OWN RISK
This program is provided AS IS without any warranty, expressed or
implied, including but not limited to fitness for a particular use. The
user is responsible for the results of correct or incorrect usage of this
software. WWWGrab/2 may not be used to provide commercial services without
written permission of the author.
Starting WWWGrab/2
------------------
To start WWWGrab/2 simply type the following at an OS/2 command
prompt:
WWWGRAB <config_file> [-i] [-c0|-c1]
<config_file> is the configuration file to use. The configuration
file is a plain ASCII text file with commands and options that control
WWWGrab/2's behavior. Its format, and the commands and options available,
are described below. The easiest way to create your first
configuration file is to copy an existing demonstration file and change it
to suit your needs.
-i tells WWWGrab/2 to not load the default configuration file.
Normally, the default configuration file (named "DEFAULT.W3D"), is
processed when WWWGrab/2 is executed. This file should contain commands
and options that never change. However, you can prevent WWWGrab/2 from
processing the default configuration file by using the -i switch. (See
the Quick Reference Chart to see which commands and options may be used in
DEFAULT.W3D.)
-c0 or -c1 write a list of URLs modified since the site was last
mirrored to the file W3GRAB.CHG:
c0 - check sites without the HEAD command. This method is slower, but
safer.
c1 - check sites using the HEAD command. This is faster, but less
safe, because some simple WWW servers don't accept the HEAD
command from a client, and return a error code. (Apache, CERN, ICS,
and Netscape behave correctly).
Note that in order to use -c0 or -c1 the site must have previously
been mirrored.
WWWGrab/2 may be called from command and REXX files and from
Program objects on the OS/2 desktop.
Regular expressions
-------------------
WWWGrab/2 uses U*IX-like regular expressions in some
commands. This allows complex specifications such as
http://www.foo.*/*/index.htm* or c??. This allows considerably wider
flexibility in URL specification, or in extension specification, or
any other circumstance in which this type of pattern matching is wanted.
In the specified pattern string:
`*' matches any sequence of zero or more characters.
`?' matches any single character.
`\' suppresses the syntactic significance of a special character.
[SET] matches any character in the specified set.
[!SET] or [^SET] matches any character NOT in the specified set.
A set is composed of individual characters or character ranges.
A range is two characters separated by a hypen (0-9 or A-Z, for example).
Numerals, letters (uppercase and lowercase), and the underscore (`_') are
the minimal set of characters supported in patterns. Nearly all
operating systems support additional (8-bit) characters.
The `escape character' (`\') is used to suppress the syntactic
significance of the characters `[]*?!^-\', so that such a character may be
matched. For example, the pattern string `file\*' matches the string
`file*', not the string beginning with `file\' and followed by zero or
more characters; the pattern string `file\[*' matches the string `file['
followed by zero or more additional characters.
See the Quick Reference Chart to see which commands support regular
expressions.
Examples:
file*
Match any string beginning with the letters `file', such as
`file', `filestar', `file100'.
??file
Match any six-character string ending in `file', such as
`00file', `dofile', etc.
file[abc]*
Match any string beginning with the letters `file', followed by
`a', `b', or `c', followed by zero or more characters, such as
`filea', `filea100', `fileabcd'.
file[0-9]\-?
Match any string beginning with the letters `file', followed by
a numeral 0-9, followed by a hypen `-', followed by any
character, such as `file3-a', `file0-0', etc.
Using `@' Files
---------------
You are likely to use the same commands and options for multiple web
sites. These can be stored in the default configuration file if they
never change, but include files (`@' files) provide greater flexibility,
letting you store options and commands common to only some sites. Include
files are referenced by the main configuration file (specified on the
command line).
For example, if you often use the MASK command, you may store it in the
DEFAULT.W3G file and it will be applied to all configuration files. But
if you want to use two different MASKs for different sites, you must use
an include file. To do this, you must create two include files, and then
reference the correct include file in each configuration file.
Make one file called (for example) MASKS1 with the following text. This
will be the first include file:
*.HTML
*.HTM
*.?.JPEG
*.0?.GIF
Then create the second include file named (for example) MASKS2 with this
text:
*.SHTML
*.SHTM
*.JPEG
*.GIF
*.WAV
Note that the include file must contain only one parameter per line.
Finally, reference the appropriate include file from the configuration
file. For example, to use MASKS1 add the following line to the
configuration file:
MASK @MASKS1 ; use contents of the MASKS1 file
When WWWGrab/2 reads the configuration file, it will read the parameters
for the MASK command from the MASKS1 file. NOTE: Don't forget the `@'
sign in front of the filename!
You may use multiple include files with the same command, as long as the
command may be used more than once. For example, to reference both MASKS1
and MASKS2 add the two lines below to the configuration file:
MASK @MASKS1 ; use contents of the MASKS1 file
MASK @MASKS2 ; and add contents of the MASK2 file
If you had used just MASK @MASKS2, then only *.SHTML, *.SHTM,
*JPEG, *.GIF, and *.WAV files would be mirrored.
See the Quick Reference Chart to see which commands support include files.
Configuration File Format
-------------------------
All commands and options in the configuration file have the same format:
<command> [parameters]
There may be spaces before the command, and there must be at least one
space after the command if there are any parameters supplied.
Single line comments are preceded by a semi-colon (`;'). Text following
the semi-colon is ignored until the next line is reached. Examples:
URL http://www.foo.com/bar ; This is a comment
; This is also a comment.
All URLs must be in the full http format. Always use
`http://www.foo.com', not `foo', `foo.com', or `www.foo.com'. You may
use IP addresses and port numbers, e.g. `http://127.0.0.1/localhost/' or
`http://www.foo.com:8080/misc'.
Command List - Detailed Reference
---------------------------------
Following is a detailed reference to each of the commands and options
which control WWWGrab/2's behavior.
ADD <path>
Add the specified path to the list of requested URL's. This command
can be used more than once, and always applies to the first URL
command.
Example:
URL http://www.xxx.yyy/path1/index.html
URL http://foobar.com/
ADD /path2/pic/index.html
Mirrors: http://www.xxx.yyy/path1/index.html AND
http://www.xxx.yyy/path2/pic/index.html AND
http://foobar.com/
ALL
Normally, if WWWGrab/2 sees that a file already exists, it will send
a conditional GET to the remote server. The file is only downloaded
again if the version on the server is newer than the local file. If
you want to update all the files regardless of their date and local
existence, you should use the ALL option.
ALLOW <URL-in-http-form>
Explicitly specifies that a subtree is retrievable. This command
can be used more than once and may use regular expressions.
Example:
ALLOW http://www.xxx.yyy/allow/this/path/
CHAM <number>
Some servers (esp. Netscape) try to recognize the client name. If
they don't know the client name, they don't send any data. You may
use this option to "mask" the client name (like CHAMeleon). Numbers
are:
0 - WWWGrab (default)
1 - Mozilla Netscape Browser
2 - WebExplorer IBM WebExplorer/2
3 - WebCrawler WebCrawler robot
4 - InfoSeek InfoSeek robot
5 - Harvest a web robot
6 - Mosaic NCSA Mosaic
7 - Lynx Lynx, text browser
8 - PRODIGY-WB Prodigy browser
9 - Internet Microsoft's web browser
Example:
CHAM 2
Sends the server the WebExplorer client name.
CHANGESITE <num sites>
Normally, if WWWGrab/2 finds a link to another WWW server in an html
file, the link is ignored. If you want to allow WWWGrab/2 to follow
links to another server, use the CHANGESITE command. The default is
0, which means don't change sites. BE CAREFUL what you enter here!
You may start mirroring the entire WWW!
Example:
CHANGESITE 2
CLIENT
When the CLIENT option is used, WWWGrab/2 turns all links to relative
links. <a href="/www/files/foo.html"> becomes <a href=foo.html"> for
example. Use this option if you want to be able to browse a site
locally. (Note that server-side-includes, CGI programs, and Java
programs will not work when a site is browsed locally as these
features require an HTTP server.)
DEFAULTNAME <name>
Sometimes links point to a directory instead of a file. In this case,
if the filename is not known the DefaultName is used for that
directory. The default value for DefaultName is "index.html".
Example:
DEFAULTNAME Welcome.html
DENY <URL-in-http-form>
The URL provided, as well as all subtrees of the URL, are not
processed. Many times specific directory subtrees are not desirable.
You can deny retrieval of these URL's using this setting. It can be
used more than once, and regular expressions can be used too.
Example:
DENY http://www.xxx.yyy/deny/this/path/
Do not download any files from the /deny/this/path/ tree.
If you do not include the trailing slash
(http://www.xxx.yyy/deny/this/path) then all subdirectories beginning
with "path" are not processed. This includes "paths.html",
"path1/news", etc.
DO <DEF | HTML | IMG | SND> <NOTHING | command>
This command allows you execute a command for every
successfully downloaded file. You may specify different commands for
different file types. If no command is associated with a particular
type, the default (DEF) command is executed. You may use the
following options in the command:
%d depth
%h host (www server)
%l local full filename
%p parent URL (where this link came from)
%r remote file (URL without host)
%t file timestamp in RFC 822 format
%u URL
%% % (percent sign)
If you don't want to execute any command for specified type, but you
want to execute the DEF command, then specify NOTHING as the command.
Types are:
HTML - file defined with the text/html content
IMG - file defined with the image/* content
SND - file defined with the audio/* content
DEF - any other file
Programs that are spawned operate concurrently with WWWGrab/2 and
may OVERLOAD your system. Spawned applications include those begun
with "start".
Example:
DO HTML start /b html2txt %l
; spawn html2txt on the background for each html file
DO SND play file=%l
; plays grabbed sound files
DO IMG NOTHING
; does nothing for image files
DO DEF echo %u >>other.files
; logs other grabbed files
EXCL <www-server>
This command defines a WWW server to exclude from mirroring. This
command is usable together with the CHANGESITE command. It can be
used more than once.
Example:
EXCL www.yyy.zzz
EXCL microsoft.is.lame.org BTW: try this URL :-)
EXTENSIONS <list of extensions>
The EXTENSIONS command defines a list of file extension search
strings which are to be downloaded. Extensions are seperated by a
space. If you don't specify any extension, then HTM, HTML, SHTM,
SHTML, JPG, GIF, WAV, AU, CLASS, and JAVA are automatically defined.
You may alternatively use the ':' char as a 'NOT' operator to list
extensions which you wish to ignore. Be careful what you put here!
Including EXE or ZIP extensions could use vast quantities of disk
space if you start mirroring a large site such as hobbes or sunsite!
You may use regular expressions in this command (see above), and this
command may be used more than once.
Example:
EXTENSIONS ZIP C
Use ZIP and C extensions
EXTENSIONS ZIP JAVA :C??
Use ZIP and JAVA, but C++, C--, C00...
FAT
This option turns on FAT compatibility. In this mode WWWGrab/2 stores
all mirrored files in a single directory using the FAT 8.3 filename
format. It automaticaly fixes links. This option is automaticaly
turned on if the local path (LOCALPATH) is located on a FAT partition
or on a partition without long filename support.
FIXSL
Sometimes authors of web pages do not add a trailing slash to links.
You may use the FIXSL option to fixing their "slash-forgetting".
I401
If WWWGrab/2 sends a conditional GET to a protected page, and the
page isn't modified, some servers return a 401 status code. You may
use I401 to override this response and download the file.
INCLUDE <file>
This commmand allows you to include another configuration file into
the configuration file currently being processed. Nesting is allowed,
to a maximum depth of 4 levels. This command is useful for including
commands which are used in multiple configuration files. See also '@'
files.
Example:
INCLUDE realms.inc
INCL urls.inc
LOCALPATH <path>
WWWGrab/2 must have a place to store the files it downloads. This
command tells WWWGrab/2 the path on your local machine under which
the URL will be mirrored.
Example:
LOCALPATH F:\GRAB\IBM\
Stores files mirrored under the F:\GRAB\IBM\ directory.
LOG <log-file> <log-string>
This commands logs all successfully mirrored files to the file
<log-file> in the format described in <log-string>. In the log-string
you may use these special characters:
%d depth
%h host (www server)
%l local full filename
%p parent URL (where this link came from)
%r remote file (URL without host)
%t file timestamp in RFC 822 format
%u URL
%% % (percent sign)
\n new line
\t tab
\\ \ (backslash)
Note: The LOG command doesn't automatically append the CRLF at the end
of each string.
Example:
LOG foo.log URL %u is stored in %l\n
Will produce:
URL http://www/index.html is stored in \grab\www\index.html
URL http://www/foo/foo.gif is stored in \grab\www\foo\foo.gif
...
MAP
This option turns on creating of a map file. This file is named
w3gmap.htm. The map file contain a map of the mirrored site. You may
use it later for to manually.
MASK <file mask>
Use this command if you want to mirror only specified files. This
command overrides EXTENSIONS. You MUST explicitly define every file
mask if using this command, including the defaults in EXTENSIONS such
as HTML, etc.! This command can be used more than once. The file mask
can have wildcard characters (special characters for character
substitution). See the part named "regular expressions".
Example:
MASK *.jpg
Will mirror all files with the .jpg extension
MASK ?a*.html
Will mirror all files beginning with any character,
followed by 'a', having any number of characters following,
and ending with .html, such as zaphod.html, 0a.html, etc.
MASK *.jpg s?n.htm* do*s.large.i*x *.*.html.c*
Will mirror one.jpg, two.jpg, sin.htm, son.htm, sun.html,
dogs.large.idx, doorways.large.index, index.short.html.cz852,
index.of.html.cz.html, try.decode.html.c, etc...
MASK *.jp*g chapter[0-4].htm*
Will mirror any jpg or jpeg file, and chapter0.htm, chapter1.htm,
chapter0.html, chapter1.html, chapter2.htm, chapter3.html, but
not chapter5.html.
MAXDEEP <levels>
MaxDeep defines how many subdirectory levels deep WWWGrab/2 will
mirror. Pages which are lower than <levels> subdirectories in the
tree are ignored.
Example:
MAXDEEP 5
Will get http://www.foo.com/1/2/3/4/5/file.html but not
http://www.foo.com/1/2/3/4/5/6/file.html
NOTE: The shareware version of WWWGrab/2 is limited to 5 levels.
MAXDL <limit>
This defines the maximum number of kilobytes WWWGrab/2 will download.
When WWWGrab/2 is about to download a file, it checks the filesize.
If downloading the file would exceed the limit specified in MAXDL,
WWWGrab/2 will ignore the file.
Example:
MAXDL 3
Download up to 3KB.
MAXFSIZE <file-size-in-kb>
You may use this command to set the largest allowable filesize to
mirror, in kilobytes. Files larger than the size set by MAXFSIZE will
be ignored. This command does not work with servers which don't
return the content length.
Example:
MAXFSIZE 100
Will not mirror files larger than 100kb.
MAXTRIES <num>
MaxTries tells WWWGrab/2 how many times it should try to get a file.
WWWGrab/2 tries to grab all the files sequentially. If a file isn't
successfully retrieved on the first attempt, it is ignored until the
complete list has been processed. Then WWWGrab/2 retries files missed
on the first attempt. This process is repeated until all the files
are retrieved or MAXTRIES attempts have been made.
Example:
MAXTRIES 3
METAFILE <filename>
This command specifies the file WWWGrab/2 uses for saving information
about mirrored files. The default filename is META.DAT, which is
stored in the LOCALPATH\%host% directory.
Example:
META data.met
NICE [delay]
This command defines the adjustable delay in seconds between links so
you don't hog all the resources of the system you're mirroring from.
If you use this command without a value, WWWGrab/2 will delay 10
seconds before requesting the next file. Warning: WWWGrab/2 can
generate requests too fast for some servers. Setting the NICE
parameter too low may generate too many requests for the server and
crash the server. This is not nice :-). A low NICE setting is known
to kill the following types of servers:
All WWW servers that run under Microsoft Windows(TM)
Old generation (HTML/1.0) CERN servers on all platforms
Low NICE values may also generate large amounts of network traffic
and hog network resources. For safety, you should set the NICE
value to at least five seconds. The longer, the better. Remember,
this program is automated and can easily run for hours with no user
interaction.
Example:
NICE 5
NOTE: If you try to set a NICE value of 0 (zero), the value
will be automatically changed to five seconds.
NOAPPLET
Use this option if you don't want to grab applets.
NOIMG
Use this option if you don't want to grab image files.
NOSND
Use this option if you don't want to grab audio files.
OHTML
This option combines NOIMG, NOSND and NOAPPLET.
PPORT <proxy port>
This command specifies the proxy port. The default value is 80. This
command is ignored if no proxy is specified with the PROXY command.
Example:
PPORT 8080
PROXY <hostname>
Use this command if you access the Internet via a Proxy server/cache.
The <hostname> may be the full hostname (i.e. proxy.foo.com) or an IP
address. If you're uncertain about this, counsult your system
administrator or internet service provider.
Examples:
PROXY www.proxy.server
PROXY 123.456.789.10
PROXYAUTH <base64>
Use the PROXYAUTH command if you access the Internet through a
secured proxy server.
Example:
PROXY secured.proxy.net
PROXYAUTH LTot
REALM <host> <"Realm Name"> <encoded username and password>
Defines a secured host, a realmname and a base64 encoded
username+password. REALM can be used more than once. The realmname is
CaSe SeNsItIvE! If you don't know the realmname you may insert an
empty string (i.e. ""), or examine WWWGRAB.LOG. The host is
basic-auth secured host. It may be in IP format (1.22.33.44) or in
the standard "domain" format (www.foo.com). Realms are generated by
the makeauth program. You may use the INCLUDE command to include its
output into the configuration file.
Example:
REALM www.secured.host "This is ReaLmName" LTot
REMOVE
This option informs WWWGrab/2 to remove unused links from a HTML
file. Links are not deleted, but only commented out.
REPL <path>
Specifies a path which replaces the LOCALPATH in a link. For example,
if you specify "REPL /mirrors" and the LOCALPATH is
F:\OS2Httpd\HTML\GRAB\, for a link in the grabbed HTML document to
"<A HREF="/some/pages/index.html"> link </a>", the replaced filename
is "F:\OS2Httpd\HTML\GRAB\www.foo.com\some\pages\index.html". The
link in the document will be changed to:
"/mirrors/www.foo.com/some/pages/index.html"
Example:
REPL /mirrors
SITELIST <hostname>
Normally, if WWWGrab/2 finds a link to another web site in an html
file, the link is ignored. You can use the SITELIST command to
specify allowed hosts. You may use the ':' character as a NOT
operator. This command can be used more than once.
Example:
SITELIST www.xxx.yyy
Allow connections to site www.xxx.yyy.
SITELIST :www.xxx.yyy
All websites except www.xxx.yyy.
NOTE: This command overrides the CHANGESITE command!
SWSLASH
This option turns left slashes to right slashes i.e. from '/' to '\'.
It's useful for "older|dumber" browsers.
TIMC <sec>
The TIMC command tells WWWGrab/2 what the server timeout value is.
If WWWGrab/2 didn't get a response from server in time less than
timeout, then close the connection to server. This value should be
less than or equal to TIMP and greater than 10. The default value is
60 seconds. Do not use this command if you don't understand what it
does!
Example:
TIMC 100
TIMP <sec>
The TIMP command tells WWWGrab/2 what the timeout is for packets. The
connection is closed after timeout. The default value is 60 seconds.
The value should be grater than 10. Don't use this command if you
don't understand what it does!
Example:
TIMP 120
TOP <URL-in-http-form>
Defines the TOP of the path. WWWGrab/2 will ignore files in
directories higher than this path. In other words, the path of the
file must start with this string. You may use regular expressions.
This command can be used more than once.
Example:
TOP http://www.foo.com/path/xxxx/
Ignore files above /path/xxxx/, i.e. DON'T mirror /path/some.file
TOP http://www.*.net/java/
URL <url-in-the-http-form>
This command tells WWWGrab/2 a site you wish to mirror. The complete
URL of the site is required! The URL command can be used more than
once to mirror multiple sites or multiple directories on the same
site. This is a basic command :-)
Example:
URL http://www.geocities.com/SiliconValley/Heights/7262/index.html
Command List - Quick Reference
------------------------------
Following is a quick reference to the nearly 50 commands and options which
control WWWGrab/2's behavior.
ADD <path> Add specified path to the list of requested URLs.
ALL Update all files regardless of date (get all files).
ALLOW <URL> Explicitly specify a subtree to be retrievable.
CHAM <number> Fake a client name (chameleon).
CHANGESITE <num> Follow <num> links to other servers.
CLIENT Change links to be relative, for local browsing.
DEFAULTNAME <name> Set default HTML filename for directories.
DENY <URL> Prevent processing of <URL> tree.
DO <option> <cmd> Execute <cmd> on DEF|HTML|IMG|SND file.
EXCL <server> Exclude WWW server from mirroring.
EXTENSIONS <list> Allowable file extensions to download.
FAT Enable FAT filesystem compatibility.
FIXSL Add trailing slash to links which do not have one.
I401 Override I401 error and enable download of file.
INCLUDE <file> Insert another configuration file at this point.
LOCALPATH <path> Local path to store mirrored files.
LOG <file> <string> Log to <file> using <string>.
MAP Create HTML map of mirrored site.
MASK <mask> Explicitly set allowable file extensions.
MAXDEEP <levels> How many subdirectory levels to mirror.
MAXDL <limit> Maximum kilobytes to download.
MAXFL <size> Maximum filesize to download.
MAXTRIES <num> Maximum number of tries to get file.
METAFILE <file> Specify metafile filename.
NICE <seconds> Delay for <seconds> after each get.
NOAPPLET Do not download applets.
NOIMG Do not download images.
NOSND Do not download audio files.
OHTML Combine NOIMG, NOSND, and NOAPPLET.
PPORT <port> Specify proxy port.
PROXY <hostname> Specify proxy host.
PROXYAUTH <base64> Specify proxy authorization.
REALM <h> <rlm> <pw> Define secure host, realm, and username/password.
REMOVE Remove unused links from HTML files.
REPL <path> Replace local path in a link.
SITELIST <host> Allow connections to <host>.
SWSLASH Convert forward slashes to backslashes.
TIMC <sec> Server timeout value.
TIMP <sec> Server timeout value for packets.
TOP <URL> Defines top of path (don't dl files above this).
URL <URL> URL of site to mirror.
Quick Reference Chart of Commands and Options
---------------------------------------------
COMMAND SHORTCUT '@' DEFCFG OVERRIDES DEFVAL REX REG MULT
--------------------------------------------------------------------------
ADD YES NO NO NO YES
ALL NO NO NO NO NO
ALLOW YES NO YES YES YES
CHAM NO YES 0 NO NO NO
CHANGESITE CHSIT NO NO 0 NO YES NO
CLIENT NO YES NO NO NO
DEFAULTNAME DEF NO YES [3] NO YES NO
DENY YES NO YES YES YES
DO NO YES NO YES NO
EXCL YES NO YES NO YES
EXTENSIONS EXT YES YES [1] YES YES YES
FAT NO YES NO NO NO
FIXSL NO YES NO NO NO
I401 NO YES NO NO NO
INCLUDE INCL NO NO NO NO YES
LOCALPATH LOP NO YES [0] NO NO NO
LOG NO YES NO YES NO
MAP NO YES NO NO NO
MASK YES YES EXTENSIONS YES YES YES
MAXDEEP MDP NO YES 1 NO [2] NO
MAXDL NO YES NO NO NO
MAXFL NO YES NO NO NO
MAXTRIES MTR NO YES NO NO NO
METAFILE META NO NO NO NO NO
NICE NO YES 10 NO NO NO
NOAPPLET NOAP NO YES NO NO NO
NOIMG NO YES NO NO NO
NOSND NO YES NO NO NO
OHTML NO YES [4] NO NO NO
PPORT NO YES 80 NO NO NO
PROXY NO YES NO NO NO
PROXYAUTH NO YES NO NO NO
REALM NO NO NO YES YES
REMOVE NO YES NO YES NO
REPL NO YES NO YES NO
SITELIST SLIST YES NO CHANGESITE NO YES YES
SWSLASH NO YES NO NO NO
TIMC NO YES 60 NO NO NO
TIMP NO YES 60 NO NO NO
TOP NO NO YES NO YES
URL YES NO NO NO YES
[0] - \WWWGrab\Grab
[1] - HTM, HTML, SHTM, SHTML, JPG, GIF, WAV, AU, CLASS, and JAVA.
[2] - The shareware version of WWWGrab/2 is limited to five levels.
[3] - The default value for the shareware version is "index.html".
[4] - Combines NOIMG, NOSND, and NOAPPLET.
Examples
--------
Basic authorization example:
URL http://www.sec1.host/secured/pages/index.html
LOCALPATH \MyGrab\Secured
MAXDEEP 5
MAXTRIES 3
REALM www.sec1.host "Realm 1" WAEFfgSDRGwer==
REALM www.sec1.host "Realm 2" WQREGFbsdgiwheg
The default configuration file example:
;; Definition of common extensions
;
EXTENSIONS HTML HTM SHTML SHTM
EXTENSIONS JPG JPEG GIF XBM
EXTENSIONS WAV VOC AU
EXTENSIONS JAVA CLASS
;
;; The default value for the MAXDEEP command
;
MAXDEEP 5
;
;; The default value for the NICE command
;
NICE 3
Credits
-------
I want to express my thanks to all who have tested WWWGrab/2 on a
voluntary basis and reported errors and gave constructive suggestions for
improvement. Without their help WWWGrab/2 would not have been this
successful.
Special thanks go out to:
* Tom Wheeler
* Andreas Krattenmacher
* Mike Nice
* Stanislav Koci (St/\n)
* Jochen Riemer
* Fernando Cascsia
* Vincent Bernat
A special, BIG thanks goes to Tom Wheeler for checking the documentation.
Spanish language translation by Fernando Cassia (fcassia@theoffice.net).
http://ourworld.compuserve.com/homepages/fcassia/sos2.htm
French language translation by Vincent Bernat (bernat@mail.dotcom.fr).
http://w.home.ml.org ou http://www.mygale.org/07/www/
Thanks also to HELLOWEEN, GAMMA RAY, Michael Kiske, MANOWAR, Alice
Cooper, GREEN DAY, and all the other great musicians who provide me with
music to listen to while I am programming.
---------------------------------------------------------------------------
If you like this program, please:
Send me $15.00, the normal user fee for WWWGrab/2. You may send
more :-)
This registration fee is for INDIVIDUALS. A negotiated site licence
is required for businesses, governments and other institutions if
WWWGrab/2 is to be used on more than one computer at that site. Contact
the author for details on site license discounts.
Upon registration you will receive (via email) a registered
personalized copy of the most recent version of WWWGrab/2. This
registration makes all subsequent versions available free of charge.
See the REGISTER.ENG file for registration information.
If you don't like this program:
Please tell me why not, then delete it.
---------------------------------------------------------------------------
Remember that software of this kind lives or dies by the response it gets.
You may get the most recent version of the WWWGrab/2 at:
http://wwwgrab.home.ml.org or:
http://www.geocities.com/SiliconValley/Heights/7262/
You may send comments, suggestions, bugs, etc. to:
email:
jirkar@writeme.com
jirkar@hotmail.com
Jiri_Rubes@slad.fido.cz
FidoNet:
Jiri Rubes 2:421/37