home *** CD-ROM | disk | FTP | other *** search
-
- ΓòÉΓòÉΓòÉ 1. Sslurp! ΓòÉΓòÉΓòÉ
-
- Sslurp! 1.5
-
- Sslurp! can retrieve Web pages from a HTTP (WWW) server. It can be configured
- to follow all hyperlinks on the page that lead to other pages on the same
- server. Images on the pages can be retrieved as well. All pages are stored on
- disk and can be viewed later using your web browser.
-
- Sslurp! can make use of a proxy HTTP server, speeding up the whole procedure.
- Sslurp! requires at least one HPFS partition!
-
- Topics:
-
- The main window
- Common tasks
- Command line options
- For the techies
- Contacting the author
-
-
- ΓòÉΓòÉΓòÉ 1.1. The main window ΓòÉΓòÉΓòÉ
-
- On the main window you find the following elements:
-
- A drop down list where you enter the URL. The last 15 URLs are saved.
- You can quickly enter an URL here by dragging an URL object from a WPS
- folder to this entry field.
- "Start", "Stop" and "Skip" buttons.
- A list of processed and pending URLs. For processed URLs, a status
- message is displayed. The list is cleared at when starting with a new
- URL.
- A status line. Its contents are:
- - the current URL
- - total number of data bytes retrieved
- - total number of data bytes of the current URL
- - number of bytes retrieved of the current URL
- - number of URLs retrieved
- - number of URLs tried
- - number of pending URLs.
-
-
- ΓòÉΓòÉΓòÉ 1.2. Common tasks ΓòÉΓòÉΓòÉ
-
- Here's how to perform some common task with Sslurp!:
-
- I wan't to suck a complete web site.
-
- In the setup, enable "Follow links", "Inline images". Disable "Don't climb up".
- Then enter the root URL of the site (e.g. "http://www.thesite.com/"), then
- press "Start".
-
- I wan't to suck a subrange of a web site.
-
- In the setup, enable "Follow links", "Inline images" and "Don't climb up". Then
- enter the URL of the site (e.g. "http://www.thesite.com/some/path/start.html"),
- then press "Start".
-
- I wan't to suck a single web page with images, but only if it's changed.
-
- In the setup, disable "Follow links". Enable "Inline images" and "Modified
- pages only". Then enter the URL of the page (e.g.
- "http://www.thesite.com/pageofinterest.html"), then press "Start".
-
-
- ΓòÉΓòÉΓòÉ 1.3. Command line options ΓòÉΓòÉΓòÉ
-
- Sslurp! can be run in automated mode, i.e. it takes one or more URLs as program
- parameters, downloads these pages according to the program options, and exits
- when finished.
-
- The command line syntax is:
-
- SSLURP.EXE [Options] [<url> | @<listfile>]*
-
- In other words, you can specify
-
- options,
- one or more URLs, and
- one or more list files. Each line in a list file is interpreted as URL.
- Empty lines and lines starting with ';' are ignored.
-
- The following command line options are supported:
-
- -T<dir> Retrieved items are stored in the given directory.
-
- -L- No links are followed
-
- -Ls Only links to the same server are followed
-
- -Ld Only links that are not pointing upward are followed
-
- -La All links are followed
-
- -E["extensions"] Only links with one of the given file extensions are followed
-
- -X["extensions"] Only links excluding the ones of the given file extensions
- are followed
-
- -I+ Inline images are downloaded
-
- -I- Inline images are not downloaded
-
- -Ia Inline images are downloaded, even those on different servers
-
- -A+ Applets are downloaded
-
- -A- Applets are not downloaded
-
- -Aa Applets are downloaded, even those on different servers
-
- -U+ Only items newer than local copies are downloaded
-
- -U- All items are downloaded
-
- -S<size> Restricts downloaded items to <size> bytes
-
- -S- Downloads are not restricted by size
-
- -D<number> Restricts followed links to <number> steps
-
- -D- Downloads are not restricted by link depth
-
- -P+ Uses the proxy server
-
- -P- Does not use the proxy server
-
- -O<file> Uses the specified file for logging
-
- Note: Command line options override options given in the setup. For options
- not given in the command line, the setup options are used. So if an option is
- turned on in the setup, you must explicitly switch it off to deactivate it.
- It's not sufficient to just omit the command line option! Stored options are
- not modified by command line options.
-
- When finished, Sslurp! returns one of the following ERRORLEVEL values:
-
- 0 Everything OK
- 1 Invalid command line option
- 2 Problem(s) with one of the list files
- 10 Other error
-
-
- ΓòÉΓòÉΓòÉ 1.4. For the techies ΓòÉΓòÉΓòÉ
-
- Here's some technical information if you're interested:
-
- Sslurp! uses HTTP 1.0. HTTP 0.9 is not supported. If some web site is
- still using a HTTP 0.9 server, its contents may be just as outdated, so
- you might not miss anything. HTTP 1.1 server replies are recognized.
-
- Sslurp! only follows HTTP links, not FTP or others.
-
- Sslurp! regards <IMG SRC=...> and <BODY BACKGROUND=...> as inline images.
-
- If the file name of a retrieved page isn't specified, it's stored as
- INDEX.HTML.
-
- The "Last-Modified" timestamp is stored in the file's EAs. The EA name is
- HTTP.LMODIFIED and is of type EAT_ASCII.
-
- Some characters in the URL are converted when building the path name of
- the file. However, no conversion to FAT (8.3) names is performed!
-
- If a page is redirected, the redirection is automatically followed, but
- only if the new location is on the same server!
-
- Sslurp! has been developed on and tested with OS/2 Warp 4.0. It should
- also work with the following configurations:
-
- - Warp 3.0 with IAK
- - Warp 3.0 with TCP/IP 2.0
- - Warp 3.0 Connect (TCP/IP 3.0)
- - Warp Server
-
-
- ΓòÉΓòÉΓòÉ 1.5. Contacting the author ΓòÉΓòÉΓòÉ
-
- Sslurp! was developed by Michael Hohner. He can be reached electronically at:
-
- EMail: miho@n-online.de (new!)
- Fidonet: 2:2490/2520.17
-
-
- ΓòÉΓòÉΓòÉ 2. File menu ΓòÉΓòÉΓòÉ
-
- Exit
- Ends the program.
-
-
- ΓòÉΓòÉΓòÉ 3. Setup ΓòÉΓòÉΓòÉ
-
- Options
- Specify all program options.
-
- Servers
- Setup server specific options, e.g. authentication.
-
-
- ΓòÉΓòÉΓòÉ 3.1. General ΓòÉΓòÉΓòÉ
-
- Proxy
- Enter the host name of a proxy HTTP server. You may also specify a
- port number for the proxy server. Check Enable to finally use the
- server. Contact your service provider to get this data.
-
- Note: Only enter the host name, not the URL (e.g. "proxy.isp.com",
- not "http://proxy.isp.com:1234/")!
-
- User name
- Enter your user ID here if your proxy server requires
- authentication.
-
- Password
- Password for proxy authentication.
-
- Email address
- Enter your EMail address. It is included in every request. Don't
- enter anything here if you don't want your EMail address to be
- revealed.
-
-
- ΓòÉΓòÉΓòÉ 3.2. Paths ΓòÉΓòÉΓòÉ
-
- Path for retrieved data
- Path where retrieved pages and images are stored. This path and
- subpaths are created automatically.
-
-
- ΓòÉΓòÉΓòÉ 3.3. Logging ΓòÉΓòÉΓòÉ
-
- These options control logging.
-
- Log file
- Path and name of the log file
-
- Additional information
- Log additional (but somewhat optional) messages
-
- Server replies
- Log all lines in the server's reply
-
- Debug messages
- Log messages used for debugging purposes (turn on if requested).
-
-
- ΓòÉΓòÉΓòÉ 3.4. Links ΓòÉΓòÉΓòÉ
-
- none
- No links are followed
-
- all
- All links (even those to other servers) are followed. Be very
- careful with this option!
-
- same server
- Only links to items on the same server are followed.
-
- don't climb up
- Hyperlinks to items that are hierarchically higher than the initial
- URL are not followed. Otherwise, all links to items on the same
- server are followed.
-
- Example:
-
- If you started with http://some.site/dir1/index.html, and the
- current page is http://some.site/dir1/more/levels/abc.html, a link
- that points to http://some.site/otherdir/index.html wouldn't be
- followed, but a link to http://some.site/dir1/x/index.html would.
-
- all types
- All types of links are followed, restricted only by the above
- settings.
-
- including
- You can enter a set of extensions (separated by spaces, commas or
- semicolons) of items to retrieve. Links to items with other
- extensions are ignored.
-
- Example: With "htm html", Sslurp! only follows links to other HTML
- pages, but does not download other hyperlinked files.
-
- excluding
- Reverse of the above option. Only links to items not having one of
- the given extensions are followed.
-
- Max link depth
- Limits the depth of links to follow to the specified number. A level
- of "1" specifies the initial page.
-
- Example:
-
- If page A contains a link to B, and B contains a link to C, A would
- be level 1, B would be level 2 and C would be level 3. A maximum
- link depth of "2" would retrieve pages A and B, but not C.
-
- Max size
- Limits the size of items to download. If the server announces the
- size and it's larger than the number specified, the item is skipped.
- If the server doesn't announce the size, the item is truncated when
- the maximum size is reached.
-
-
- ΓòÉΓòÉΓòÉ 3.5. Options ΓòÉΓòÉΓòÉ
-
- These settings influence which items will be downloaded and how it'll be done.
-
- Inline images
- If checked, inline images are also retrieved.
-
- from other servers
- If checked, inline images located on other servers are also
- retrieved. Otherwise only images from the same server are
- downloaded.
-
- Java applets
- If checked, java applets are also retrieved.
-
- from other servers
- If checked, applets located on other servers are also retrieved.
- Otherwise only applets from the same server are downloaded.
-
- Retrieve modified items only
- An item is only retrieved if it's newer than the local copy.
- Strongly recommended!
-
-
- ΓòÉΓòÉΓòÉ 3.6. Server list ΓòÉΓòÉΓòÉ
-
- A list of base URLs is displayed.
-
- Press New to add a new URL with settings.
-
- Press Change to change the settings of the selected URL.
-
- Press Delete to delete the selected URL.
-
-
- ΓòÉΓòÉΓòÉ 3.7. Server ΓòÉΓòÉΓòÉ
-
- Base URL
- Set of URLs (this item and all items hierarchically below) for which
- these settings apply. This usually specifies a directory on a
- server.
-
- Example:
-
- If you enter "http://some.server/basedir", these settings apply to
- "http://some.server/basedir/page1.html", but not to
- "http://some.server/otherdir/b.html".
-
- User name
- User name or user ID used for basic authorization.
-
- Password
- Password used for basic authorization.
-
-
- ΓòÉΓòÉΓòÉ 4. Help menu ΓòÉΓòÉΓòÉ
-
- General help
- Provides general help
-
- Product information
- Displays name, version number, copyright information etc.
-
-
- ΓòÉΓòÉΓòÉ 5. About ΓòÉΓòÉΓòÉ
-
- This page intentionally left blank.