WebUtil 

User manual 

 

 

Copyright © 1998 by Harms Software Engineering
All rights reserved.  

 


Table of contents

 

1. Introduction

1.1. Requirements

2. WebUtil

3. Basic skills

4. Installation

5. Configuration

5.1. The [MAIN] block

5.2. The [HTTP] block

5.3. The [FTP] block

6. Error handling

6.1. Web sites and web pages [HTTP]

6.2. FTP sites [FTP]

7. Registration

8. Abbreviations

 

 

    1. Introduction

    The Internet has changed the way that we use computers, in fact, the Internet has changed the way we live. By making information of all kinds available to anyone anywhere in the world, it is safe to say that we have even become dependent on the Internet. The Internet has also caused us to change. We have become more demanding, for example. Since we can get most anything, in terms in terms of information, we want to get is fast as possible. Search engines such as Lycos and Alta Vista, have become a commercial success as a result of our desire for information.

    The Internet also has a down side. It is slow. Millions of people are experimenting with the Internet and many of those people it will become an important part of their lives. The large number of people using Internet makes it a success, but at the same time, slows the Internet down, making it sometimes frustrating to use. Just as Search engines were an answer to finding information quickly, a new concept called Web spiders was an answer to retrieving information from the Internet automatically so that you do not have to do it manually, thereby speeding up the use of the information.

    A Web spider is a tool that collects the information that you are interested in, from the Internet, at a time when it is most convenient for you. For example, a Web spider can pick up the news from an international news agency in the middle of the night, while you are sleeping, so that you can read the news when you wake up in the morning. Stock prices, weather forcasts, new software releases, you name it, a Web spider can get it for you, at a convenient time, and without manual intervention!

    Just as the Internet has grown large and slow, most Web spiders are not cut out to be used without manual intervention. They are large, have thousands of features, and require a college degree to understand what everything means. If you are not familiar with the technical details of the Internet, you probably will not understand most of the options that they offer. The whole purpose of a Web spider is to make your life easier. It must be easy to understand, easy to install, easy to configure, and easy to use!

    WebUtil is an answer to the large and slow Web spiders currently on the market. WebUtil has been designed to be easy to install, easy to configure, and very easy to use. It does not have many options, but it does have a lot of features! The power of WebUtil lies in the fact that it offers the functionality that you would expect without bothering you with the technical details of the Internet that you probably do not care to know about.

    (back)

    1.1 Requirements

    WebUtil runs under OS/2 and requires that you have TCP/IP support installed. We also assume that you have a internet browser available, such as WebExplorer or NetScape. WebUtil will only run when you a connection with the Internet. It does not matter whether this connection is via a dail up line or via a network.

    WebUtil only requires a few kilobytes of harddrive space and it also does not require very much system memory or CPU time. In short, it is a very small and efficient program.

    (back)

    2. WebUtil

    WebUtil, like other Web spiders, collects files from the Internet. The concept behind WebUtil is that it collects files, which can be web pages or files from FTP sites, and you subsequently use your web browser to view those pages. WebUtil will automatically change all of the links in the pages that you instruct it to pick up, so that they point to the other pages that WebUtil has picked up.

    For example, if you instruct WebUtil to pick up an entire web site, then all of the links between the pages of the web site are translated in such a way that you can view the entire site on your local harddrive without having a connection to the Internet!

    If you instruct WebUtil to only pickup a limited number of pages, then only the links to the pages downloaded will be translated. Later, when you view those pages with your favorite internet browser and you select a link that was not translated, your browser will automatically try to retrieve that page from the internet, assuming you have online connection at that point. This means that, assuming you are connected to the Internet when viewing your downloaded pages, all links will still point to valid pages, regardless if they were translated or not. If you do not have a connection to the Internet at that moment, then your browser will generate an error message, saying that it can not find the site.

    To sum up what we have just explained:

    • You instruct WebUtil which pages to collect.
    • WebUtil collects those pages and places them on your harddrive.
    • WebUtil then translates the links in the pages, if necessary.
    • When WebUtil is finished, you use your favorite internet browser to view the retrieved pages.

    For files, this process is only a little different, in the fact that you would not use your browser to view them.

    Another very powerful feature that WebUtil has is the ability to only pick up web pages or files if they have changed since the last download. WebUtil will determine if the file has changed, and in case it has, it will pick it up. If it has not changed, it will not.

    For FTP sites, WebUtil is even more powerful! It can synchronize the contents of a directory on your local harddrive with a directory on an FTP site. You can specify which location is the "leading" location. For example, you may want the FTP site to contain exactly the same files as a directory on your local harddrive. In another situation, you may want exactly the opposite. WebUtil is extremely flexible. You can synchronize a specific set of files, entire directories, or a combination of the two!

    Below is a summary of the features we have just explained in the above paragraphs. WebUtil can:

    • pick up entire web sites;
    • pick up individual web pages, and any linked pages;
    • pick up individual files from the Web;
    • pick up files from an FTP site;
    • place files on an FTP site; and
    • synchronize files or directories, or a combination of the two, with an FTP site.

     

      (back)

      3. Basic skills

      WebUtil consists of one executable and a configuration file. The configuration file contains the information that instructs WebUtil what to pickup from the Internet. WebUtil itself does not contain any buttons or any menu items. When started up, WebUtil will carry out the instructions in the configuration file and then shut down. Executing WebUtil, therefore, consists of simply entering the name WEBUTIL on the OS/2 commandline or by double clicking the WebUtil icon on the desktop.

      WebUtil has only optional commandline that can be used to tell WebUtil to use a different configuration file. Normally, WebUtil will look for a file named WEBUTIL.INI. If a different name is specified on the commandline, WebUtil will use that one instead. Example usage:

      WEBUTIL MYCONFIG.INI

      (back)

      4. Installation

      As we stated in chapter 1, WebUtil is easy to configure.

      To install WebUtil, simply un-zip the WebUtil archive into a new directory.

      Then type install.

      The installation procedure will create a new folder on your desktop with the name WebUtil. This folder will contain icons for the program WEBUTIL.EXE, a sample configuration file WEBUTIL.INI, the help file WEBUTIL.HTM, and the registration file WEBUTIL.REG.

       

      (back)

      5. Configuration

      We stated in Chapter 1 that WebUtil is easy to configure. The instructions that WebUtil uses to determine what to pickup from the Internet are stored in a text file. The name of this text file is WEBUTIL.INI. This text file must contain three different types of information. First, general information about the name of the log file to use, your name, and your registration code, incase you have registered WebUtil. The second and third types of information are HTTP and FTP instruction blocks. Below is a sample WEBUTIL.INI file. It may look a little complex at first, but in the following paragraphs we will explain how it works and you will see that it is extremely simple and intuitive.

      ;
      ; Sample WEBUTIL.INI file.
      ;
      ; Copyright (C) 1998, Harms Software Engineering, all rights reserved.
      ;
      [MAIN]
      LOG=c:\webutil\webutil.log
      KEY=unregistered
      NAME=Harald Harms
      [END]
      ;
      [HTTP]
      NAME=ALLFIX WebSite
      URI=http://www.allfix.com
      LEVEL=3
      LOCATION=d:\webutil\allfix
      IF_MOD=TRUE
      [END]
      ;
      [FTP]
      NAME=ALLFIX FtpSite
      URI=ftp.allfix.com
      USER=harald
      PASSWORD=test
      TRANSFER=c:\files\myfiles.zip,/harald/,UP,IFMOD
      [END]
      ;

       

      As can be seen in the example above, each block begins with a name enclosed in square brackets and it ends with the workd END also enclosed in square brackets. The block MAIN contains the general information. A block of the type HTTP contains instructions for which web sites or web pages need to picked up, and a block of the type FTP contain instructions for which files need to be collected or placed on an FTP site.

      The instructions in the blocks consist of a verb followed by an equal sign which is in turn followed by a value. Some verbs are simple Yes/No items. In those situations, the value of the verb is either YES or NO, as can be seen in the HTTP block above (see verb IF_MOD).

      Lines that start with a semicolon (;), are regarded as comments. This means that WebUtil will ignore those lines. We suggest that you include comments in your configuration file because it makes it easier for you, and for others, to understand what you have done.

      (back)

      5.1 The [MAIN] block

      This block contains general information. It can contain three different verbs. Below is a list of the verbs and their meaning:

      LOG

      This verb is used to give the name and location of the log file. Please be sure to include the entire path and filename. This will reduce the chance of making mistakes.

      KEY

      This verb identifies the registration key. If you have registered WebUtil (see chapter 7), then enter your registration key here. The registration key is not case sensative.

      NAME

      This verb is used to identify your name. It is important that WebUtil know what your name is, so that it can verify that the registration key that you have entered belongs to you.

      (back)

      5.2 The [HTTP] block

      This block contains information that instructs WebUtil which web sites and web pages need to be picked up from the Internet. The configuration file may contain up to 1000 of these blocks. Each block contains the name of one web site or web page along with some other information that tells WebUtil where to store the web pages on your local harddrive, how many levels to pickup, and more.

      Web pages contain links to other pages on the Internet. The main page, often called INDEX.HTML, for example, may contain links to 10 other pages, which in turn contain links to many many more pages. Each time a link is followed from one page to another, we say we have gone a level deeper. This means that if you follow a link in INDEX.HTML to PRODUCTS.HTML, and then to HELP.HTML, we would say that you are currently at level 3, in the web site.

      Levels are very important because you can tell WebUtil how many levels to pickup. If you specify a level of 1, then only the page that you include in the HTTP block will be picked up. If you specify 5 levels, then WebUtil will follow each link picking up the subsequent pages, until it has arrived at level 5.

      The HTTP block can contain a number of different verbs. Below is a list of the verbs and an explanation of what they mean:

      NAME

      This verb can be used by you to give an easy to understand name to the web site or pages that you want to pickup. WebUtil will display this name on the screen while it is busy picking up these pages. WebUtil does not use this information for any other purpose, therefore, you are free to fill in whatever you want.

      URI

      This verb is used to identify the web site or web page you want WebUtil to pick up. URI is the only internet technical term you are going to find in this manual. It is an acronym for Universal Resource Identifier.

      LEVEL

      This verb can be used to instruct WebUtil on how many levels to pickup.

      LOCATION

      This verb is used to identify where, on your local harddrive, the web pages that are picked up, should be stored.

      IF_MOD

      This verb can be used to tell WebUtil to only download those pages that have changed since the last time WebUtil was active. This feature makes it WebUtil much faster since it does not have to download every page each time. The value for this verb is YES or NO.

      TRANSLATE

      This verb tells WebUtil to translate the links in the downloaded HTML files so that they all point to files on your local hard disk. You should use this feature if you want to view your pages off line. The value for this verb is YES or NO.

      SHORTNAMES

      This verb can be used to instruct WebUtil to convert long filenames to short DOS style filenames (8.3). Using this feature does have one unfortunate consequence, namely, that the IF_MOD feature does not work very well anymore. WebUtil will not be able to detect changes to files for which the names have been shortend. The value for this verb is YES or NO.

      When downloading pages, the directory where they are stored can become quite a mess. WebUti will automatically clean up the directories each time it is started up, unless the IF_MOD feature has been turned on. If this feature is turned off, the directories will not be cleaned up.

      (back)

      5.3 The [FTP] block

      This block contains information that instructs WebUtil which files to pickup or place on an FTP site. The configuration file may contain up to 1000 of these blocks. Each block contains a file specification (including wildcards!), the location where those files can be found, and the place to put the files.

      The FTP block can contain a number of different verbs. Below is a list of the verbs and an explanation of what they mean:

      NAME

      This verb can be used by you to give an easy to understand name to the FTP site.. WebUtil will display this name on the screen while it is busy picking up these pages. WebUtil does not use this information for any other purpose, therefore, you are free to fill in whatever you want.

      URI

      This verb identifies the name of the FTP site. It is important that you do not include paths in the FTP site name. For example, use ftp.allfix.com instead of ftp.allfix.com\pub

      USER

      This verb identifies the user name that should be used to log into the FTP site. If the site allows anonymous logins, then you should enter the word "anonymous" here and your email address as the password (see next verb).

      PASSWORD

      This verb identifies the password that should be used to log into the FTP site.

      TRANSFER

      This verb identifies which files should be transferred, where they are located, where they should be placed, and whether or not they should be synchronized or only transferred if they have been modified. Up to 255 TRANSFER commands can be defined per FTP block. The following format should be used for the value entered in this verb:

      [location][filespec],[location][filespec],UP|DOWN|SYNCH,IFMOD|DELETE

      For example:

      c:\files\myfiles.zip,/harald/,UP,IFMOD

      or

      /harald/myfiles.zip,c:\files\,DOWN,IFMOD

      As you can see from the format shown above, the different parameters must be seperated with a comma. The first two parameters are used to specify locations and files. A "filespec" is a filename, which may include wildcards. The keywords UP, DOWN, or SYNCH indicate the direction of the transfer, your point of view. In other words, UP sends something to the FTP site and DOWN picks something up from the FTP site. The last parameter is also a keyword which indicates if files should only be downloaded if they have been modified (IFMOD) or if the files should be deleted after the they have been transfered (DELETE). The last paramter may be left out, or both the IFMOD and the DELETE paramters may be used together, in which case they need to be seperated with a pipe symbol (|).

      The two parameters are a little confusing at first. The first parameter always indicates the location and filespec that an operation must be performed on. In the case of an upload (UP), it indicates the location and filespec that needs to be sent to the FTP site. In the case of a download (DOWN), it indicates the location and filespec on the FTP site that needs to be downloaded. The second parameter always specifies the destination where the transferred file should be placed. This can be either a directory on the FTP site or a directory on your local harddrive.

      It becomes a little trickier when you want to use the synchronize feature. In that case, the first and second parameters may both contain file specs and directory names. There are four different combinations of synchronizing files that can be identified. These four are explained below:

      1. Both directories (local and remote) must contain the same files. In this situation, both the first and the second parameters must indicate a directory, and should not contain any filespecs.

      Example: c:\files\,/harald/,SYNCH

      2. Certain files on the local harddrive need to be synchronized with the files on the FTP site. In this case, the first parameter should contain a location and filespec. The second parameter should contain only a location.

      Example: c:\files\*.ZIP,/harald/,SYNCH

      3. Certain files on the FTP site need to be synchronized with the files on the harddrive. In this case, the first parameter should only contain a location. The second parameter should contain location and filespec.

      Example: c:\files\,/harald/*.ZIP,SYNCH

      4. Certain files on the FTP site need to be synchronized with files on the local harddrive AND certain (other) files on the local harddrive need to be synchronized with the FTP site. In this case both the first and the second parameters should contain location and filespecs.

      Example: c:\files\*.ZIP,/harald/*.ARJ,SYNCH

      Note:

      WebUtil changes the current working directory when processing a TRANSFER command. Before processing the next TRANSFER command, it will automatically restore the working directory to what it was when WebUtil logged into the FTP server. This means that you can assume the same working directory in each TRANSFER command.

      (back)

      6. Error handling

      Every internet user knows that things sometimes go wrong. Some sites may be down, the Internet may be congested resulting in time out errors, or the address of a web site may have changed. WebUtil is smart enough to handle many different kinds of small errors, however, it does occur that it can not carry out its task for one reason or another. In order to help you find out what is going wrong, WebUtil writes all of its actions to a log file. The name of this log file can be specified in the configuration file (see section 5.1). This chapter contains a list of the error messages that the log file can contain and a short explanation of the problem with suggestions on how to solve the problem.

      (back)

      6.1 Web sites and web pages [HTTP]

      This section contains the error messages that WebUtil can report when picking up web sites and weg pages from the Internet.

      Unable to establish connection

      This error means that either the web site could not be found or that WebUtil was not able to connect to the web site. Possible causes could be that the web site has moved to a different location, or that the Internet is extremely congested at this moment which prevented WebUtil from establishing a connection with the web site.

      Unable to find <name>

      This error means that the specific web page specified, could not be found. This error can occur if the page specified in the configuration file could not be found, but can also be given when WebUtil tries to follow a link to another page in order to pick that up as well. If WebUtil can not find a particular file referenced in a web page, then it will also give this error.

      Unable to create file on local drive <name>

      This error is given when WebUtil can not create the file, it wants to pick up, on the local drive. The most probable cause for this error is that the filename is not a valid filename.

      File not modified, skipping download

      This particular message is more of a notification than an error. It simply means that the file that was to be downloaded has not been changed.

      (back)

      6.2 FTP sites [FTP]

      Unable to establish connection

      This error messages is given when WebUtil is unable to establish a connection with the FTP site. Possible causes include an incorrect FTP address, a site that is (temporarily) down, or congestion on the Internet, making it difficult to establish a connection.

      User name incorrect

      This error indicates that the user name, as specified for this site (in WEBUTIL.INI) is incorrect. In other words, there is no user account on the FTP site with this name.

      Password error

      This error indicates that the password for the user account is incorrect. Passwords are almost always case sensative. Therefore, a possible cause for this error could be that the case of some of the letters in the password is incorrect.

      Unable to establish data channel

      Transferring data to and from an FTP site is done via a, so called, data channel. Before transferring data, WebUtil establishes a second connection, namely, the data connection. This error indicates that WebUtil was not able to establish such a connection, making it impossible to upload or download files.

      Unable to find file

      This error indicates that the specified file was not found on the FTP site in the specified directory.

      Login unsuccessful

      This error message is given when logging into the FTP was not successful. This error message is always preceeded by one of the first three error messages.

      File not modified, skipping download

      This particular message is more of a notification than an error. It simply means that the file that was to be downloaded has not been changed.

      (back)

      7. Registration

    WebUtil has been released under the shareware concept. This means that you are allowed to use to use it for a maximum of 30 days. If you enjoy using WebUtil and you wish to continue using it, then you are required to register the program. By registering the program, you receive a registration key which will make all of the features available.

    In the un-registered version of WebUtil, the following features have been disabled:

    • A maximum of 5 web sites can be configured (the registered version supports up to 1000 web sites!).
    • A maximum of 5 ftp sites can be configured (the registered version supports up to 1000 ftp sites!)
    • The directory synchronization routines has been disabled.
    • The ability to only download modified files has been disabled.

    Upon registering WebUtil, the above features will become available to you.

    A registration key is valid for the current release version and for the next two release versions. This means that if you register WebUtil version 1.00, you will also be able to use 1.10 and 1.20 (assuming that those are the next two versions that are released).

    You can register WebUtil by filling out the electronic registration form on our Web site, www.allfix.com or by completing the registration form (WEBUTIL.REG). Please consult the registration form for more information.

    (back)

    8. Abbreviations

    Abbreviation

    Meaning

    TCP/IP

    Transmission Control Protocol/Internet Protocol

    URI

    Universal Resource Identifier

    FTP

    File Transfer Protocol

    HTTP

    Hyper Text Transfer Protocol

    HTML

    Hyper Text Markup Language

    (back)