[Top] [Prev] [Next] [Bottom]




Managing WEBsweeper


WEBsweeper is composed of:

The WEBsweeper management facilities allow a number of features to be configured, in both the proxy server and the Web server. Some of these features are listed below and explained in the rest of this section.

Configuring the HTTP and FTP proxy server:

Configuring logging:

Configuring the Web server:

Accessing the Web server:

The WEBsweeper management facilities are accessed by double-clicking on the WEBsweeper icon, found in the Control Panel.

This displays the main WEBsweeper dialog box, as shown below.

Most of the proxy server and Web server features listed on the previous page can be configured via this dialog box. An exception to this is event logging, which has its own configuration section in the main WEBsweeper configuration file, WEBSWP.CFG. It is recommended that you do not change the configuration details for event logging without assistance from technical support.

Proxy server

The WEBsweeper proxy server is responsible for content security. It accepts HTTP and FTP requests from Web browsers on your internal network and performs the following functions:

Enabling the proxy server

You can enable and then configure certain features of the WEBsweeper proxy server via the Proxy Configuration dialog box, accessed as follows.

1. Click on the Advanced features button of the main WEBsweeper dialog box, shown on page 6-28. This displays the Advanced Features dialog box, shown on page 6-62.
2. Click on the Proxy server... button of the Advanced Features dialog box, to display the Proxy Configuration dialog bow shown below.

3. Enable the proxy server, by checking the Enable proxy server box.
The proxy server must be enabled before you can configure any of the features discussed in this section.

Restricting access to the proxy server

Access to the WEBsweeper proxy server can be restricted to certain Web browsers on your network.

Using the Proxy Configuration dialog box, shown page 6-30:

1. Ensure that the WEBsweeper proxy server is enabled, by checking the Enable proxy server box.
2. In the HTTP services and FTP services fields, type in the names of the browsers that are allowed access to the proxy server, using one or more IP masks (see below for an explanation). Each mask must be separated by a comma.
3. Click on the OK button.

IP Mask

An IP mask can be:

For example:

The * character on its own indicates that all Web browsers are permitted access.

URL blocking

Access to known URLs can be denied to all Web browsers on your network. For example, you could use this facility to disable access to sites known to hold offensive material, or any other material that you consider to be of an inappropriate nature.

Blocking access to known URLs can be achieved by:

Using the Proxy configuration dialog box

You can use the Proxy Configuration dialog box, shown on page 6-30, to block known URLs. This is achieved with the following steps:

1. Ensure that the WEBsweeper proxy server is enabled, by checking the Enable proxy server box.
2. In the Don't serve URLs area type in the URL or URL masks of the sites not to be served by the proxy server (see the next page for an explanation of URL masks).
 
Each URL or URL mask must be separated by a space or a new line. To start a new line press the <Ctrl>and<Enter> keys simultaneously.

3. Click on the OK button.

WEBsweeper will subsequently not serve any of the URLs listed.

URL mask

A URL mask is similar to a normal URL but uses the wildcard character (*) to represent one or more characters within the URL. This character matches any value.

The URL mask comprises three parts:

For example:

All three parts of the URL mask must be present.

Using plug-in URL-blockers

You can also block URLs by using your own or a third-party URL-blocker that has been written specifically for WEBsweeper.

Configuration information is specified in the [URLBlockers]section of the main WEBsweeper configuration file, WEBSWP.CFG.

That is:

;[URLBlockers]

By default this section is commented out and contains no information.

To add your own third-party URL-blocker, firstly, ensure that the [URLBlockers] configuration section is no longer commented out.

That is, change:

;[URLBlockers]

to:

[URLBlockers]

Next, add a directive to the configuration section for each of the URL blockers you wish to use.

For example:

[URLBlockers]
Blocker1=c:\MSW\Config\blocker.dll

The [URLBlockers] section can contain one or more directives, that is, one for each URL blocker used.

The name of the directive is the configuration section for the blocker, the value is the name of the blocker .DLL.

The blocker configuration section must be listed in the same file as the [URLBlockers]section, that is, WEBSWP.CFG.

For example:

[Blocker1]
...
<configuration information>
...
This section contains configuration information specific to the third-party blocker. Refer to the appropriate manual for more information.

Subsequently, details of each URL will be passed to the blocker, which will return a status indicating whether the page is to be blocked. WEBsweeper will replace the blocked page with an information message specified by the blocker.

MIMEsweeper provides a development kit for writing your own URL blockers which includes a sample URL blocker. The development kit can be found in the UBK directory on the MIMEsweeper CD.

Using generic URL-blockers

Many URL blockers are available as Web proxies. The functionality provided by these blockers may be combined with WEBsweeper by chaining the URL blocker proxy to the WEBsweeper proxy.

See the proxy chaining section on page 6-36 for more details.

Chaining a URL blocker to WEBsweeper is not as efficient as using a URL blocker written specifically for WEBsweeper. This should be avoided if the required URL blocker is available as a plug-in.

Proxy chaining

You can specify a proxy server to which WEBsweeper is to chain. All requests will subsequently be channelled through to this proxy server. You may find this facility useful, for example, if you are running WEBsweeper as a local cache, on behalf of a department within your company, and your company itself makes use of a proxy server to talk to the Internet. See page 3-9 for more details on proxy chaining.

Using the Proxy Configuration dialog box, shown on page 6-30:

1. Ensure that the WEBsweeper proxy server is enabled, by checking the Enable proxy server box.
2. In the Chained proxy field, type in the IP address or the host name of the proxy server WEBsweeper is to channel through.

3. Click on the OK button.
The proxy server to which WEBsweeper is chaining may use a port other than 80, for example, port 8080. The entry in the Chained proxy field should reflect this, for example, 193.112.243.1:8080.

You can specify a list of URLs or URL masks for which the chained proxy won't be called. Type this list into the Don't chain URLs area of the Proxy Configuration dialog box. Each URL or URL mask must be separated by a space or a new line (to start a new line press <Ctrl>and<Enter>). See page 6-33 for an explanation of URL masks.

If a user subsequently asks for any of the URLs listed then the proxy named in the Chained proxy area won't be called for that URL. WEBsweeper will retrieve the URL directly and validate it.

Enabling proxy caching

The WEBsweeper proxy server maintains a local cache. This cache is used to store a copy of data requested by Web browsers (or a WEBsweeper inform message if data failed validation) for a specified period of time. If a request is made for the same data within that time it, or the inform message, can be retrieved from the cache, thus improving performance.

Data is only cached if it passes validation, otherwise it is discarded.

To enable WEBsweeper to cache files:

1. Click on the Caching button of the Proxy Configuration dialog box, shown on page 6-30. This displays the Proxy Caching dialog box.

2. Enable caching, by checking the Enable proxy caching box.

The caching information you can configure via this dialog box is discussed on the next two pages.

Cache directory

The Cache directory field specifies the path to the directory in which the cached files are stored, for example, c:\cache.

The Cache directory specified must be accessible to the WEBsweeper proxy server. If it is located on a different fileserver you will need to specify the UNC name of the fileserver, for example, \\integ486\scsi3\cache.

The Cache directory must also be on an NTFS drive, that is, a file system that supports long filenames. This is because cache filenames are longer than the 8.3 filename restriction imposed by FAT drives.

Max cache size

The Max cache size field specifies the maximum amount of disk space used for caching, in Megabytes. The default is 10Mb.

The cache may grow beyond the maximum size throughout the day, but it is purged down to the specified limit at a pre-determined time every day, as specified in the Purge cache at field. As a guide, set the maximum cache size to be about 25% less than the actual maximum you can tolerate.

Purge cache at

The Purge cache at field specifies the time of day that the cache directory is purged down the size specified by the Max cache size field. The time is specified in minutes past midnight. For example, 00 will purge the cache directory at midnight every day. The default is 180, that is, 3 am.

Lifetime of unread cache files

The Lifetime of unread cache files area specifies the length of time for which unused cache files are kept.

Type in a list of URL masks (see page 6-33 for a description) and their corresponding time periods. Enter the time period in hours, days, weeks, or some combination of these. For example, 3 weeks 5 hours 2 days.

Each line should contain only one URL mask and a time, separated by a space. To start a new line press <Ctrl>and<Enter>.

If the URL of a cached file matches a URL mask, the cached file is deleted after it has been unused for the specified time period. If the cached file's URL does not match any URL mask then the Default time period is used.

Maximum lifetime of cache files

The Maximum lifetime of cache files area specifies the maximum lifetime of a cached file, that is, the length of time the file is kept in the cache since it was originally retrieved. This is regardless of whether the file has been used within that time period.

Type in a list of URL masks (see page 6-33 for a description of URL masks) and their corresponding time periods. Enter the time period in hours, days, weeks, or some combination of these. For example, 3 weeks 5 hours 2 days, 3 weeks 2 days, 2 days, and so on.

Each line should contain only one URL mask and time, separated by a space. To start a new line press <Ctrl>and<Enter>.

Inform messages generated by WEBsweeper are only cached for five minutes.

If the URL of a cached file matches a URL mask, it is deleted if it was retrieved earlier than the specified time period. If the cached file's URL does not match any URL mask then the Default time period is used.

For example, if you specify the maximum lifetime of a file as two days then the file will be deleted from the cache after two days, regardless of whether it has been used within that time period or not.

The lifetime of a cached file is determined at the time the file is cached. If you change the lifetime the change only applies to new additions to the cache, not existing files.

Don't cache URLs

The Don't cache URLs area specifies a list of URL and URL masks which are never to be cached.

Type in the URL or URL masks (see page 6-33 for a description of URL masks).

Each URL or URL mask must be separated by a space or a new line. To start a new line press <Ctrl>and<Enter>.

Logging

HTTP transaction logging

WEBsweeper can be configured to log all HTTP requests that it receives from Web browsers to both the Web server and the proxy server.

Logging HTTP requests can give you useful information on the activities of the Web browsers on your network, for example:

All this information can be determined from the log files and can be used to help formulate strategies on proactive caching, maximising network bandwidth, charging for usage and so on.

To log all HTTP requests that WEBsweeper receives from Web browsers, check the Log HTTP Transactions box of the main WEBsweeper dialog box, shown on page 6-28.

The log file is stored in the directory specified by the Log directory field. In the above example the log directory is C:\MSW\Log. A single formatted line is written to the log file for each request that the Web server or proxy server receives.

A new log file is created every day. For performance reasons, the current log file is kept open until the first HTTP transaction of the following day. When this transaction occurs, the preceding day's log file is closed, a new log file is opened, and the transaction is logged to it.

For companies that use the Internet heavily the log directory can become quite large and take up increasing amounts of disk space each day. It is recommended that you regularly archive and delete the old log files.

The format of the log file name is HSYYMMDD.log. For example, the file created for the 4th July 1996 would be HS960704.LOG.

The format of the log file contents is controlled by a log template, specified in the Log template field of the main WEBsweeper dialog box.

This log template may contain a number of formatting tokens. Each token starts with a % character, and is followed by a single upper-case or lower-case letter.

The case of the formatting token is significant as upper-case and lower-case letters will usually have different replacement parameters.

For example:

%R %i %u [%d/%b/%Y:%H:%M:%S %O] "%q" %s %n

The above is the default template and corresponds to the Common Logfile Format used by many other HTTP servers.

When a log file entry is written, each formatting token in the template is replaced by its corresponding parameter. The replacement parameters are shown by the tables following on the next three pages.

If a particular parameter is unavailable, it will be represented as a dash (-) in the log file entry. Any other information in the template is copied to the log file entry without change.

The log template string itself is written to the log file every time the log file is opened, for example, when a new log file is created, when the server is started, or when it is reconfigured. If you do not want the log template string to be written to the log file, ensure that the Strict Common Log Format box is checked on the main WEBsweeper dialog box.

The default log template can be changed by ensuring that the Strict Common Log Format box is unchecked and then editing the tokens listed in the Log Template field.

Log file tokens/replacement parameters

The formatting tokens that can be used in the log template, and their replacement values, are listed in the following three tables.

Transaction information

Token Replacement
%q The complete HTTP request received from the client, including the HTTP method, the URL requested, and the HTTP version number (except for HTTP 0.9 in which this is absent).
%Q The full name of the file or directory to which the URL maps.

%u The authenticated user name.
%e The realm to which the authenticated user name belongs.
%i The user identity as obtained using the RFC931 protocol. Most servers, including this one, do not implement this protocol, so the command always returns -.
%s The HTTP status code returned to the client.
%n Number of bytes sent to the client (excluding HTTP headers).
%f User information, for example, from a CGI DLL.

Names and addresses

Token Replacement
%l The IP address belonging to the server on which this request was received.
%L The host name corresponding to %l.
%r The IP address of the client from whom this request was received.
%R The host name corresponding to %r, or the IP address itself if the host name could not be determined.

Time-related parameters

The time recorded is the time at which the request was received, not the time at which it was completed. As

transactions are logged in the log file after they have been completed, the entries may appear in an unexpected order.

Token Replacement
%t Indicates that subsequent time commands in the template refer to local time.
%T Indicates that subsequent time commands in the template refer to GMT.
%a Weekday name (abbreviated)
%A Weekday name (full)
%b Month name (abbreviated)
%B Month name (full)
%c Short date and time representation appropriate to the server's locale
%d Day of the month as a two-digit number (01-31)
%H Hour as a two-digit number (24-hour format: 00-23)
%I Hour as a two-digit number (12-hour format: 00-11)
%j Day of the year as a three-digit number (001-366)
%m Month as a two-digit number (01-12)
%M Minute as a two-digit number (00-59)
%p AM/PM indicator (appropriate to server's locale) for a 12-hour clock
%S Second as a two-digit number (00-59)
%U Week of the year as a two-digit number (00-51), counting Sunday as the first day of the week
%w Weekday as a single-digit number (0-6), counting Sunday as the first day of the week
%W Week of the year as a two-digit number, with Monday as the first day of the week (00-51)
%x Short date representation appropriate to server's locale
%X Time representation appropriate to server's locale
%y Year as a two-digit number (00-99)
%Y Year as a four-digit number
%z, %Z Server's time zone name or abbreviation; empty if the time zone is unknown
%O Offset of local time from GMT in the format +hhmm.

Event log

WEBsweeper is configured to write all startup errors and operation errors to the Windows NT application event log.

The other method of logging WEBsweeper uses is HTTP transaction logging. See page 6-40 for details.

Configuration details for the event log are found in the main WEBsweeper configuration file, WEBSWP.CFG.

[Logging]
EventLog=3

[EventLog]
EventSource=WEBsweeper
EventId=3221225496
StreamType=AppEvent
MaxLevel=Brief
It is recommended that you do not change the configuration details for event logging without assistance from technical support.

Viewing the event log

To view the Windows NT application event log:

1. Click on the Windows NT 4.0 Start button.
2. Point to the Programs menu option.
3. Point to the Administrative Tools menu option.
4. Click on the Event Viewer program name.
5. Select Application from the Log menu to ensure you are viewing the application event log.

To view more details on any of the events listed in the application event log, double click on the entry. Alternatively, select the entry and then Detail from the View menu.

An Event Dialog box is displayed showing more information about the selected event.

From this dialog box you can view details on the other events listed. Click on the Previous and Next buttons to move through the events.

For more information on the Windows NT application event log refer to your Windows NT documentation.

SNMP traps

WEBsweeper can be configured to issue SNMP traps to a SNMP Manager at startup and shutdown.

Configuration details for SNMP traps are found in the main WEBsweeper configuration file, WEBSWP.CFG. These details reflect the information entered during installation.

For example:

[SNMPTrapConfig]
Community=public
TargetAddress=195.121.24.11
If you are upgrading from a MIMEsweeper installation that is of a version prior to 3.2, or you did not enter any SNMP configuration information during installation, the SNMP trap configuration section is disabled.

For more details on the [SNMPTrapConfig]section, see page 7-19.

Web server

The Web server serves data to Web browsers on the Internet.

You can configure certain features of the Web server via the main WEBsweeper dialog box, shown below. See page 6-28 for details on how to access this dialog box.

The rest of this section provides details on the following features:

Data directories

The Data directory field of the main WEBsweeper dialog box, shown on page 6-48, is used to specify the root data directory for all requests served by the WEBsweeper host. In this example the data directory is C:\MSW\Data. The default data directory is C:\HTTP.

Any Web browser connecting to www.example.com, where www is the name of the WEBsweeper host, will only have access to files in the data directory specified and any of its subdirectories.

Files with `hidden' or `system' attributes are ignored.

You must locate all files you wish to make available to Web browsers within the root data directory tree. Points above the root data directory, or on other disks, are not accessible to Web browsers. The only exception to this is if you are also using virtual path mappings. See page 6-60 for more details on virtual path mappings.

If you require the Data directory to be a mapped drive you will have to use the UNC form of:

\\server\volume\directory

For example:

\\integ486\MSW\Data

In this example, integ486 is the name of the server and MSW is the name of the volume on which the directory is located. Data is the name of the root directory which contains the information.

The Web server must be configured to run as a user with access permission to the directory specified in the Data directory field. This may mean a domain-wide user account rather than an account specific to the computer on which WEBsweeper is running.

Enabling directory browsing

Directory browsing enables a user to see all the files contained in the root data directory (see page 6-49) and any of its sub directories. By default, directory browsing is disabled as it can be very insecure, but you can enable it if desired.

To enable directory browsing, ensure that the Permit directory browsing box is checked on the main WEBsweeper dialog box, shown on page 6-48.

Subsequently, the manner in which you have configured the contents of the data directory, its sub directories, and directory browsing, will determine what information is sent to requesting Web browsers, as explained below.

For example, suppose that a Web browser asks for a URL of the form:

http://www.example.com/MSW/Data/invoices/

The Web Server will do one of three things, depending on the contents of the invoices directory and on how it is configured:

1. If there is a file called default.htm in the invoices directory, it will send that file to the Web browser.

You can change the default filename used. This filename is specified by the entry in the Default file name field of the Other Configuration Options dialog box. See page 6-56 for details.

2. If Permit directory browsing is enabled the Web server will send a list of files and subdirectories within the invoices directory to the Web browser. This list displays an icon for each subdirectory and file found within the directory.

The icon used to represent a file is determined by the MIME type of the file. The MIME type of a file is, in turn, determined by its file extension. See page 6-51 for details on how to associate file extensions to MIME types. See page 6-55 for details on how to associate MIME types to icons.

The icon used to represent a directory is also configurable, see page 6-55 for details. If you don't want a particular directory to be browsable, create a file called NOBROWSE in that directory. The contents of the file are not important, just its presence.

3. If Permit directory browsing is not enabled, and default.htm is not present, the Web server will send an error message to the browser.

Associating file extensions to MIME types

Web server responses to requests include a MIME header. This MIME header informs the Web browser of the type of content to expect. The Web browser can then display the information, launch another application to display it, or save it, as appropriate.

The Web server infers the MIME type of a file from the filename extension, using a mapping table.

The default MIME type used is application/octet-stream. The default contents of the mapping table are shown on page 6-54.

You can configure the contents of the mapping table, as required. This is achieved using the File extension to MIME Type mapping area of the main WEBsweeper dialog box, shown on page 6-48.

Using the File extension to MIME type mapping area you can:

The MIME types associated with file extensions can subsequently be mapped to an icons. These icons are displayed to represent the files found when a directory listing is requested by a Web browser. See page 6-55 for details on how to associate MIME types to icons.

To add a new file extension to MIME type mapping:

1. Click on the New mapping button, found in the File extension to MIME type mapping area of the main WEBsweeper dialog box. This dialog box is shown on page 6-48.

2. A New mapping dialog box is displayed into which you can enter the mapping details. Type the file extension into the File extension field and select the associated MIME type from the MIME type area.

3. Click on the OK button.

To change an existing file extension to MIME type mapping:

1. Select the mapping from the File extension to MIME Type mapping area of the main WEBsweeper dialog box. This dialog box is shown on page 6-48.

2. Click on the Change mapping button. A Change mapping dialog box is displayed where you can make the required changes.
The Change mapping dialog box is similar to the New mapping dialog box shown on the previous page

3. Click on the OK button.

To delete an existing file extension to MIME type mapping:

1. Select the mapping from the File extension to MIME Type mapping area of the main WEBsweeper dialog box. This dialog box is shown on page 6-48
2. Click on the Delete mapping button.

The mapping is deleted with no warning.

The default contents of the file extension to MIME type mapping table are shown on the following table:

File Extension MIME type
HTM text/html
HTML text/html
TXT text/plain
UTF text/plain; charset=unicode-1-1
PS application/postscript
RTF application/rtf
PDF application/pdf
ZIP application/zip
DOC application/msword
JPG image/jpeg
JPEG image/jpeg
GIF image/gif
TIF image/tiff
TIFF image/tiff
XBM image/x-xbitmap
WAV audio/wav
AU audio/wav
MPG video/mpeg
MPEG video/mpeg
Default application/octet-stream
 

Refer to RFC 1590, MIME multipart/related content type, for a full list of MIME types.

Associating MIME types to icons

Whenever a Web browser examines a directory, and assuming directory browsing is enabled (see page 6-50 for details), WEBsweeper returns a directory listing which displays icons for each type of file found.

Some useful icons that can be used for this purpose are automatically installed in the icons directory, as .GIF files. These are:

The icon chosen to represent a file depends on the MIME type of the file. In turn, the MIME type of the file is determined by the file extension. See page 6-51 for details on how to associate file extensions to MIME types.

The default icon that is used, when a MIME type mapping is not specified, is default.gif. The default icon used to represent a directory is folder.gif.

You can configure the MIME type to icon mappings via the Other Configuration Options dialog box. Using this dialog box you can:

See the next page for details on how to access the Other Configuration Options dialog box.

To access the Other Configuration Options dialog box:

1. Click on the Advanced features button of the main WEBsweeper dialog box, shown on page 6-48. This displays the Advanced Features dialog box, shown on page 6-62.
2. Click on the Other... button of the Advanced Features dialog box to display the Other Configuration Options dialog box, as shown below.

The MIME type to icon mapping area of this dialog box displays a list of MIME type/subtype entries and their corresponding icon URLs. 1

Some entries in the list have MIME type/subtype entries of the form:

application/*

That is, they contain a wildcard character (*). This character matches with any subtype. In the above example it indicates that all MIME types starting with application/ are mapped to the icon default.gif. If a specific MIME type is also listed then this overrides the wildcard mapping. For example, application/octet-stream will override the application/* mapping.

To add a new MIME type to icon mapping:

1. Access the Other Configuration Options dialog box, as explained on the previous page.
2. Type the new mapping values into the MIME type and the Icon URL fields of the MIME type to icon mapping area.

3. Click on the Set button.

To change an existing MIME type to icon mapping:

1. Select the mapping from the list displayed in the MIME type to icon mapping area of the Other Configuration Options dialog box.
2. Change the mapping values in the MIME type and Icon URL fields.
3. Click on the Set button.

To delete an existing MIME type to icon mapping:

1. Select the mapping from the list displayed in the MIME type to icon mapping area of the Other Configuration Options dialog box.
2. Click on the Delete button.The mapping is deleted with no warning.
The mapping contains the URL of the icon, not the pathname, so always use forward slashes instead of back slashes to separate components. You can, if required, specify a full URL which points to another Web server, if this is where the icons are located.

Changing the default icons

You can change the icon used to represent a directory, or the default icon used to represent a file when a MIME type is not specified.

Type the appropriate icon URLs into the appropriate fields of the Other Configuration Options dialog box, shown on page 6-56.

The Default icon URL field specifies the icon used when a MIME type mapping is not specified. The Folder icon URL field specifies the icon used to represent directories.

The Other Configuration Options dialog box also allows you to

Configuring the default file name

The Default file name field allows you to specify the file that is sent to the Web browser when a URL does not specify a filename. It is initially set to Default.htm but you can change it if required. See page 6-50 for more details on directory browsing.

Configuring the number of connect requests

You can configure the number of connect requests the Web server will queue for processing. This figure is entered into the Maximum listen backlog field. Default is 50.

TCP/IP port

By default, WEBsweeper listens on TCP/IP port 80. The Web server and the proxy server both use the same TCP/IP port.

You can, if required, set WEBsweeper to listen on an alternative port, for example, port 8080.

Setting WEBsweeper to use an alternative port allows you to use WEBsweeper with multiple Web servers, on the same host.

To change the port that WEBsweeper listens on, type the new port number into the TCP/IP port field of the main WEBsweeper dialog box, shown on page 6-48.

If you change WEBsweeper to listen on a TCP/IP port other than 80, and you have directory browsing enabled2 then, for a Web browser to access the Web server, the URL must include the new port number.

For example:

http://www.example.com:8080/MSW/Data/

If the Web browsers on your network have been configured to send all requests to the WEBsweeper machine then you must reconfigure them all to point to the new TCP/IP port. See page 3-26 for details.

Accessing the server

Virtual paths

Virtual paths are a feature that allows your Web server to serve data to the Web from more than one data directory tree. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-48 for details. The remaining data directories are specified using virtual paths.

Virtual paths can be useful in many situations, for example:

A virtual path is an association between a virtual path name and a directory path. For example, a virtual path may have the virtual path name ~invoices, with a corresponding directory path of c:\invoices.

Using the above example, to access a file called c:\invoices\inv.htm you would quote a URL of http://www.example.com/~invoices/inv.htm.

A virtual path name must always start with a tilde (~).

When the Web server receives a request with a URL whose first path component starts with a tilde (~) it looks to see if a virtual path of that name has been defined.

For example, using the URL http://www.example.com/~invoices/inv.htm:

Configuring virtual paths

You can configure virtual paths via the main WEBsweeper dialog box, shown below.

See page 6-28 for details on how to access this dialog box.

To configure virtual paths:

1. Click on the Advanced features button of the main WEBsweeper dialog box, shown on the previous page.
2. This displays the Advanced Features dialog box, shown below.

3. Click on the Virtual paths... button of the Advanced Features dialog box, to display the Virtual Paths Configuration dialog box.

This dialog box shows a list of all existing virtual path names and their corresponding directories.

Using the Virtual Paths Configuration dialog box shown on the previous page, you can:

To create a new virtual path mapping:

1. Type the virtual path name into the Name field of the Virtual Paths Configuration dialog box, shown on page 6-62.

Virtual path names can be up to 127 characters long. They may not contain the characters '\', '$', space, control characters, or ascii characters greater than 127. They must always start with a tilde (~).

2. Click on the New button. This displays another dialog box into which you can enter the virtual path details. The name of the dialog box reflects the name of the virtual path, for example, ~invoices.

The information you can supply via this dialog box is explained on the following page.

The Directory field allows you to specify which directory path the virtual path name corresponds to, for example, c:\invoices.

The Default file name field allows you to specify the file that the Web server looks for if a URL does not specify a filename. This is the file that is sent to the Web browser in such instances. It is initially set to default.htm but you can change it if required.

See page 6-50 for more details on directory browsing.

The Local host name field specifies the name of the Web server, for example, www.example.com.

If you leave the Local host name field blank then the virtual path may be accessed through any local IP address.

Using the dialog box shown on the previous page you can also:

To change an existing a virtual path mapping:

1. Select the mapping from the Virtual Paths Configuration dialog box, accessed as explained on page 6-62.

2. Click on the Change button. This displays a dialog box, similar to that shown on page 6-64, where you can make the required changes. The name of the dialog box reflects the name of the virtual path that you selected to change.
3. Click on the OK button.

To delete an existing virtual path mapping:

1. Select the mapping from the Virtual Paths Configuration dialog box, accessed as explained on page 6-62.
2. Click on the Delete button.
The virtual path mapping is deleted with no warning.

Users, groups and realms

The Web server utilises a user database as part of its security mechanism.

Using the information held in this database, in conjunction with access control lists,3 you can control which users are permitted to access certain data on the Web server. For example, you may want to limit directory access or file access to only a few, known and trusted users.

The Web server also supports groups and realms:

Configuring realms

Realms are configured via the Realms dialog box.

To access the Realms dialog box:

1. Click on the Advanced features button of the main WEBsweeper dialog box, shown on page 6-61. This displays the Advanced Features dialog box, shown on page 6-62.
2. Click on the Realms... button of the Advanced Features dialog box, to display the Realms dialog box, as shown below.

Using the Realms dialog box you can:

To create a new realm:

1. Type the realm name into the name field of the Realms dialog box, shown on page 6-67.
2. Click the New button.

To delete an existing realm:

1. Select the realm from the Realms dialog box, shown on page 6-67.
2. Click on the Delete button.The realm is deleted with no warning.

If the realm contains any users or groups, you will be prompted to confirm that you wish to delete the realm first.

You cannot delete the Default realm that is associated with the main data directory (see page 6-49).

To specify the location of the Global Access Control (GAC) file for a realm:

1. Select the realm from the Realms dialog box, shown on page 6-67.
2. Click on the Access button. A dialog box is displayed into which you can type the directory where the GAC file is located. This file controls access to the realm and is always called $HTTPS$.GAC

If you have only one realm, that is, the Default realm, you will only have one GAC file. This file controls access to the entire Web server and is always located in the main data directory. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-49 for more details.

Configuring users

To manage the users in a realm:

1. Select the realm from the Realms dialog box, shown on page 6-67. See page 6-73 for a shortcut if you are managing users in the Default realm only.
2. Click on the Users button to display the following dialog box.

The dialog box contains a list of all users in the selected realm. Initially it will be blank. You can add to, change or delete the names on this list:

Configuring groups

To manage the groups in a realm:

1. Select the realm from the Realms dialog box, shown on page 6-67.
2. Click on the Groups button to display the following dialog box.

The dialog box displays a list of groups currently belonging to the realm. The realm name is reflected in the title of the dialog box.

Using the groups dialog box you can:

See page 6-73 for a shortcut if you are managing groups in the Default realm only.

To create a new group for the realm:

1. Type the group name into the name field of the dialog box shown on the previous page.

2. Click the New button. At this point a members dialog box is automatically displayed which allows you to add users to the group.

See page 6-72 for details on how to manage group membership using this dialog box.

Group names may be up to 64 characters. They may not contain the characters '~', '\', '$', space, control characters or ascii characters greater than 127.

To delete a group from the realm:

1. Select the group name and
2. Click on the Delete button. The group is deleted with no warning.

Group membership

To manage group membership:

1. Select the group from the Groups dialog box, shown on page 6-70.
2. Click on the Members button. A dialog box is displayed which allows you to control group membership. The title of the dialog box is the name of the group. This dialog box is displayed automatically when you are creating a new group.

The dialog box displays two lists. The left-hand list contains users who are Non-members of the group. The right-hand list contains the users who are Members of the group. For a new group, initially all users will be displayed in the Non-members list. (See page 6-69 for details on how to associate users to a realm.)

Using these lists you can add and delete users from the group:

You can also use this dialog box to set a descriptive name for the group. To do this, type the descriptive name into the Group full name field.

Managing users and groups for the Default realm only

You can manage users and groups for the Default realm only, via the Advanced Features dialog box.

To access the Advanced Features dialog box:

1. Click on the Advanced features button of the main WEBsweeper dialog box, shown on page 6-61.

Click on the Users... button as a shortcut to the User management dialog box (see page 6-69) for the Default realm only.

Click on the Groups... button as a shortcut to the Groups management dialog box (see page 6-70) for the Default realm only.

The Default realm is the realm that is always associated with the main data directory. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-49 for more details.

Access control

Access to the Web server can be controlled at four levels:

An HTTP request from a Web browser is checked against each of the access levels shown above, in sequence. That is, starting at Global access, then Virtual path access, then Directory access, then File access. For a file to be accessible, access must be permitted at each of these levels.

The Web server authorises access based on the following parameters:

Access at each level is authorised by comparing the above parameters against access control lists. These lists are held in what are known as Access Control List (ACL) files.

Each access level is checked in turn, assuming an ACL file is present at that level. If the HTTP request is permitted it is processed as normal. If it is rejected the user is informed accordingly and may be prompted for further authorisation information, for example, a user name and a password.

ACL files all use a common syntax and have names, or in the case of file access, are stored in directories, starting with $HTTPS$. For this reason, the Web server will treat any URL containing the character sequence $HTTPS$ as invalid. This ensures that Web browsers cannot retrieve the ACL files.

If you have directory browsing enabled you may find it useful to hide the ACL files. This can be achieved using the View/Options menu accessed via Windows NT Explorer. See page 6-50 for more details on directory browsing.

Global access control

Access to a realm4 is controlled by a Global Access Control (GAC) file.

This file must always be called $HTTPS$.GAC.

If you only have one realm, you will only have one GAC file, which controls access to the entire Web server. This realm is always called Default and always has it's GAC file located in the main data directory. The data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-48 for more details.

If you use more than one realm, each realm can have a separate GAC file. The GAC files are associated to realms using the Realms dialog box. See page 6-67 for more details.

If the $HTTPS$.GAC file is not present, or cannot be accessed by the Web server, the server assumes that global access is allowed.

Virtual path access control

Access to a virtual path and its directories is controlled by a Virtual path Access Control (VAC) file.

This file must always be called $HTTPS$.VAC, and it must be located in the virtual path directory.

For example, a virtual path might be ~invoices, with a directory path of c:\invoices. The VAC file would therefore be located in the c:\invoices directory.

See page 6-60 for more details on how to associate virtual path names to directory paths.

If the $HTTPS$.VAC file is not present or cannot be accessed by the Web server, the server assumes that virtual path access is allowed.

Directory access control

Directory access is controlled by a Directory Access Control (DAC) file.

This file must always be called $HTTPS$.DAC and it must be located in the directory it is protecting.

For example, if directory browsing is enabled you may wish to control who can browse a particular directory. This is achieved by creating a DAC file for the directory. See page 6-50 for more details on directory browsing.

If the $HTTPS$.DAC file is not present or cannot be accessed by the Web server, the server assumes that directory access is allowed.

File access control

You can control access to individual files within a directory, by providing an access control list (ACL) for each file.

The access control lists for the files you wish to protect are stored in a subdirectory which must always be called $HTTPS$.ACL. Each ACL is stored in a file within this directory. It has the same name as the file it is controlling access to, but with the extension .ACL.

For example, assume you have a directory named sales and it contains two files, products.htm and contacts .htm.

To protect the file contacts.htm, you would create a subdirectory of sales, called $HTTPS$.ACL. Within that directory, you would then create a file called contacts.acl which contains the ACL directives responsible for controlling access to contacts.htm.

Subsequently, everyone with access to the sales directory will have access to the products.htm file, but only those allowed access through the ACL file sales/$HTTPS$.ACL/contacts.acl will have access to contacts.htm.

If the $HTTPS$.ACL directory is not present or cannot be accessed by the Web server, the server assumes that access to all files is allowed.

Syntax of the Access Control List (ACL) files

An ACL file is an ASCII text file which can be read and written with any normal text editor. It consists of lines of text, each line up to 128 characters long. All ACL files use the same syntax, regardless of the access level they control. ACL files are not case sensitive.

Each line in an ACL file is of the form:

<header-name>: <method>, <name>, <host-mask>

<header-name> can be one of:

<method> can be one of the HTTP methods supported by the server, for example, GET, HEAD or POST. It can also be the wildcard character (*) which represents any method.

<name> should be a user name or a group name within the realm associated with the virtual path under which the ACL file is located. See page 6-66 for details on configuring users, groups and realms.

The wildcard character (*) can be used instead of a user name, that is, in lines where the <header-name> is Allow-user or Reject-user. It cannot be used instead of a group name, that is, in lines where the <header-name> is Allow-group or Reject-group.
 

<host-mask> should be a dot-separated numeric IP address, possibly including wildcards, or a fully-qualified domain name, possibly with an initial wildcard character (*). See page 6-31 for a description of valid IP masks you can use.

An example ACL file is as follows:

Reject-user: GET,fred,129.215.112.3
Allow-group: *,salesgroup,*.example.com
Allow-user: *,joe,*
Allow-user: GET,,129.215.*.*

The first line of the ACL file (Reject-user) disallows the GET operation if executed by the user fred from IP address 129.215.112.3 (performed on the level protected by the ACL file.)

The second line of the ACL file (Allow-group) allows users in the salesgroup to perform any of the HTTP methods, that is, GET, HEAD or POST, from any Web browser in the example.com domain.

The third line of the ACL file (Allow-user) allows the user joe to execute any of the HTTP methods from any IP address.

The fourth line of the ACL file (Allow-user) allows the GET operation by anonymous users from any IP address starting 129.215.

The ordering of lines in the ACL file is very important. The first line that matches the request results in the specified operation being performed. No subsequent lines in the file are checked once a match is found.

If none of the lines in the ACL file match then the request is always rejected.

For example, assume user fred, a member of the salesgroup, tries to perform a GET from address 129.215.112.3. In the example above, this request matches the first line so the request will be rejected.

The same request, performed by any other member of the salesgroup, will match the second line of the file, so the request will be allowed.

For performance reasons, the Web server maintains an internal cache of information read from ACL files. This means that if you change any ACL file, you must empty the Web server cache. This is achieved by removing all the files in the directory specified by the Cache directory field of the Proxy Caching dialog box. See page 6-37 for details.

URL redirection

The Redirect-to directive can also be used in ACL files at file access level. It is used when a file has been moved to another location but you still expect requests for the original location.

This directive can only be placed in access control lists that control file access. See page 6-76 for more details.

For example, if the file:

www.example.com/accounts/invoice.htm

is moved to

www.example.com/sales/inv.htm

The access control list for the original file, invoice.htm, could have the Redirect-to directive added to ensure that all authorised requests can still access the file.

The Redirect-to directive must contain the full URL of the new file location.

For example:

Reject-user: GET,fred,129.215.112.3
Allow-group: *,salesgroup,*.example.com
Allow-user: *,cja,*
Allow-user: GET,,129.215.*.*
Redirect-to: HTTP://www.example.com/sales/inv.htm
The Allow directives and Reject directives are always processed before the Redirect-to directive. This will ensure that the user is authorised to access the original file before being redirected.



[Top] [Prev] [Next] [Bottom]



1 For details on the rest of the information that can be supplied on this screen, see page 6-58.

2 Directory browsing is enabled when the Permit directory browsing box of the main WEBsweeper dialog box is checked, see page 6-50 for details.

3 For more details on access control lists, see page 6-74.

4 A realm is a distinct set of user names and group names. See page 6-66 for more details on groups and realms.

5 Redirect-to is used in ACL files at file access level only, for URL redirection. See page 6-79 for details.

msw.support@mimesweeper.com

Copyright © 1998, Content Technologies Limited. All rights reserved.