WebWolf

    User Manual

    Overview

    WebWolf is a personal WebBot that scans Web Sites on the net and compiles a list of Files, links, Ftp Sites and other resources found on the World Wide Web. It works on the same concept as other bots on the net such as Altavista, Webcrawler etc, however it only targets sites with specific content, and the results are returned in real time and are thus always current.

    The specialty of WebWolf is locating files within Internet's Web space. These are usually missed by FTP search engines, and conventional Crawlers.

    WebWolf has an builtin ability to learn and map out the network each time you use it, and it will become better at locating Files with each use.
    Using a 28.8 K dialup line, WebWolf will be eventually able to locate and catalogue 10000's of resources per hour.


    Unregistered Version Restrictions

    If you have not yet registered, you will have cause to despair, for you will not be able to utilise the full potential of this product. There are three restrictions in the Unregistered version. You will be not be able to use more than ten concurrent connections, which means that WebWolf will run slower. Secondly you loose the benefit of previous sessions as you will not have restart capability, and the search will stop after locating 2000 files.

    If you have not done so already, please register your copy. You will find it most worth while.


    Configuring WebWolf

    Before running the program, you need to configure WebWolf. Select OPTIONS followed by PREFERENCES from the main menu. The WebWolf Preferences window should come up. If you are using a proxy server, the settings as well as the port must be correct.


    Specifying an URL

    You may specify an URL of a site where each hunt for resources is to begin. The site should be related to the type of information your are seeking, and idealy have links to other sitez with similar content. If you do not specify an URL, WebWolf may consult an internet search engine to obtain a starting point.


    Selecting Keywords

    Selecting the right combination of keywords means everything!

    WebWolf will only extract information from a Web page, if the page contains a specific keyword. You can specify as many keywords or phrases as you want, separated by commas(,). Keywords are case insensitive.

    If your keywords are too specific, WebWolf may soon run out of links to crawl, but if they are too general, too much information on unrelated topics may be retrieved.

    For example if you are seeking information, files and resources about databases, a good choice of keywords would be:

      oracle,sybase,database,sql

    The two product names 'oracle' and 'sybase' are very specific, whereas 'database' and 'sql' are more general, and will help WebWolf locate pages that may be linked indirectly via sites that cover this topic, but not individual products.

    After you use WebWolf for a while, you should begin to get a feel as to what combinations of keywords yield the best results.

    Note: The most specific keyword should always be first, and should not be a phrase. You should also avoid using general keywords that may be common on unrelated pages. For example by also adding the keyword 'computer':


      oracle,sybase,database,sql,computer


    ...you broaden the search, and any page containing this word will be crawled. Initialy you will get the right results, but after a few levels, the link with the databases topic will get lost all together in the bulk of material that will be located.

    You may also use advanced query and Boolean syntax. For more information and examples, select Query Sytax from the Help menu.

    Once you have entered your keywords, click on [Start] to begin the hunt.



    Selecting Search Parameters

    When you start a search, a Search Parameters window will be displayd with a number of search options. WebWolf's search strategy will be defined based on the settings of these options.

      Search All Library links

      All links in the current library will be connected to and the content searched and indexed.

      Update the Library with new Finds

      If enabled, whenever a page is encountered that contains appropriate file and/or FTP links, it will added to the library for future reference.

      Use Search Engines to locate New Threads

      If enabled, WebWolf will consult Internet Search engines to locate starting Links, which will server as entry points to new web rings for exploration.

      Recurse into and Search all new links

      If enabled, WebWolf will connect to any link present on a page, that is related to the current search topic. If disabled, only the initial page from the specified URL or the library will be searched, and all other links found will not be crawled. For example by disabling this option, and the use of search engines above, you can force WebWolf to scan library links only and stop.

      Limit search to specified Site

      If you have specified a starting URL on the main page, only pages and files on that site will be searched and indexed. All other links will be ignored.

      FTP Links

      If Enabled, links to FTP sites will be indexed.

      File Links

      If enabled, Links to File will be indexed.

      Unix Files

      If enabled, Unix files (eg: .Z .tar .gz etc.) will be indexed and included in search results.

      MAC Files

      If enabled, MAC files (eg: .hqx) will be indexed and included in the search results.

      Text Documents

      if enabled, text files and Documents will be indexed and included in search results.



      Restarting a Sessions

      All WebWolf session are automatically restarted. Any previous results are retained and included in the current Search. To restart a session, click the topmost button from the right. This button may read either [Restart] or [Start]. IF you are using an unregistered version, Restart is disabled and you will need to start a new session.

      Note: It is a good idea to start a new project, or clear out the library, if you are starting a search for a topic unrelated to the previous.


      Starting a new session

      To start a new session, select CLEAR from the main menu. Any previous results will be deleted. (You will be prompted for confirmation). You can click the [START] button to begin with a clean slate.

      Note: The library links will not be deleted.


      Resetting A session

      Selecting RESET from the main menu will reset the status of all known links. Bad linkz will be deleted, and all others re-scanned. Unless you do a reset (or clear), linkz that have already been visited are never searched again.


      Displaying Results

      The buttons on the left hand side display the number of Files, Links and other items of interest that have been found to date. Clicking on one of these buttons will launch a browser to display the results. The [View] button can also be used. This brings up a menu of result files. Please keep in mind the current version of WebWolf does not attempt to validate File and FTP linkz and these are as up-to-date as the source where they have been posted.


      The FIND button

      Once your file lists get very large, it will become difficult to locate specific files of interest. In this case you can use the FIND button to search for specific keywords in the Search Results. Click the [FIND] button, and when the Find windows comes up, enter a keyword or a list of comma(,) delimited keywords to search for. A list of all files matching the search criteria will be displayed.


      Setting a Watch

      The watch screen can be used to display any new search results based on a given keyword. For example if you enter 'sql', WebWolf will tell you whenever a new entry with this keyword is added to the database. The [Browse] button can then be used to display all files that have 'sql' as part of the filename, title or URL.


      Using A Library

      Unless disabled, WebWolf will use and maintain a session history in a library file. This file contains information from previous sessions and includes a list of all sites where file and/or FTP links were found. Neither the RESET or CLEAR menu options will delete library entries.

      Please note that each project has its own library.

      If specified, Library Linkz will always be scanned first (second if an URL is given). For this reason, if you wish to explore a new area of the web, or search for a different topic, you should clear out the existing library, or start a new search project.

      If you frequently search for different types of information, unless you maintain a separate project for each, it may be a good idea to disable library use altogether.

      Note: Any modifications made to the current library may not be saved while a search is in progress.


      WebWolf URL Library



      The main URL Library screen is accessed by clicking the Library button. If the Library Update checkbox is enabled, an entry will be added automaticaly whenever a new site that contains files and/or FTP sites is located. This screen is a summary of all library links in the current project and can also be used to delete, view and visit the displayed sites.

      Whenever you clear or reset the current project, the library links are retained and when enabled, will always be scanned first if a starting url is not specifed.

      To edit or add a new link, click on the [?] next to each entry or use the [Editor] button.


      Library Link Editor Screen



      The link editor can be used to add new links, view, modify or delete existing listings in the project library.

      The priority or possition of each link can be modified by using the Left and right arrow buttons [<]   [>].
      Note: The library entries may not be modified while a search is in progress.



      Important Considerations

      We recommend that you start a new session periodically, and clear out old results, as you may start experiencing memory problems once WebWolf's linkz database gets very large.