The WEBsweeper management facilities allow a number of features to be configured, in both the proxy server and the Web server. Some of these features are listed below and explained in the rest of this section.
Configuring the HTTP and FTP proxy server:
The WEBsweeper management facilities are accessed by double-clicking on the WEBsweeper icon, found in the Control Panel.
This displays the main WEBsweeper dialog box, as shown below.
Most of the proxy server and Web server features listed on the previous page can be configured via this dialog box. An exception to this is event logging, which has its own configuration section in the main WEBsweeper configuration file, WEBSWP.CFG. It is recommended that you do not change the configuration details for event logging without assistance from technical support.
See page 6-31 for details on how to restrict access to the proxy server.
See page 6-36 for details on how to chain the WEBsweeper proxy server to another proxy server.
See page 7-43 for details on how to configure inform messages.
![]()
The content security section on page 5-1, shows some examples on how to configure WEBsweeper to best meet the needs of your company.
![]() |
The proxy server must be enabled before you can configure any of the features discussed in this section. |
Access to the WEBsweeper proxy server can be restricted to certain Web browsers on your network.
Using the Proxy Configuration dialog box, shown page 6-30:
![]() |
The * character on its own indicates that all Web browsers are permitted access. |
Blocking access to known URLs can be achieved by:
You can use the Proxy Configuration dialog box, shown on page 6-30, to block known URLs. This is achieved with the following steps:
WEBsweeper will subsequently not serve any of the URLs listed.
The URL mask comprises three parts:
://
). /
). ![]() |
Configuration
information is specified in the [URLBlockers]
section
of the main WEBsweeper configuration file, WEBSWP.CFG.
;[URLBlockers]
By default this section is commented out and contains no information.
To add your own
third-party URL-blocker, firstly, ensure that the [URLBlockers]
configuration section is no longer commented out.
;[URLBlockers]
[URLBlockers]
Next, add a directive to the configuration section for each of the URL blockers you wish to use.
[URLBlockers]
Blocker1=c:\MSW\Config\blocker.dll
![]() |
The |
The name of the directive is the configuration section for the blocker, the value is the name of the blocker .DLL.
The blocker configuration section must be listed in the same file as the [URLBlockers]section, that is, WEBSWP.CFG.
[Blocker1] ... <configuration information> ...
![]() |
This section contains configuration information specific to the third-party blocker. Refer to the appropriate manual for more information. |
Subsequently, details of each URL will be passed to the blocker, which will return a status indicating whether the page is to be blocked. WEBsweeper will replace the blocked page with an information message specified by the blocker.
![]() |
MIMEsweeper provides a development kit for writing your own URL blockers which includes a sample URL blocker. The development kit can be found in the UBK directory on the MIMEsweeper CD. |
See the proxy chaining section on page 6-36 for more details.
![]() |
Chaining a URL blocker to WEBsweeper is not as efficient as using a URL blocker written specifically for WEBsweeper. This should be avoided if the required URL blocker is available as a plug-in. |
You can specify a proxy server to which WEBsweeper is to chain. All requests will subsequently be channelled through to this proxy server. You may find this facility useful, for example, if you are running WEBsweeper as a local cache, on behalf of a department within your company, and your company itself makes use of a proxy server to talk to the Internet. See page 3-9 for more details on proxy chaining.
Using the Proxy Configuration dialog box, shown on page 6-30:
![]() |
The proxy server to which WEBsweeper is chaining may use a port other than 80, for example, port 8080. The entry in the Chained proxy field should reflect this, for example, 193.112.243.1:8080. |
You can specify a list of URLs or URL masks for which the chained proxy won't be called. Type this list into the Don't chain URLs area of the Proxy Configuration dialog box. Each URL or URL mask must be separated by a space or a new line (to start a new line press <Ctrl>and<Enter>). See page 6-33 for an explanation of URL masks.
If a user subsequently asks for any of the URLs listed then the proxy named in the Chained proxy area won't be called for that URL. WEBsweeper will retrieve the URL directly and validate it.
Data is only cached if it passes validation, otherwise it is discarded.
To enable WEBsweeper to cache files:
The caching information you can configure via this dialog box is discussed on the next two pages.
The Cache directory specified must be accessible to the WEBsweeper proxy server. If it is located on a different fileserver you will need to specify the UNC name of the fileserver, for example, \\integ486\scsi3\cache.
The cache may grow beyond the maximum size throughout the day, but it is purged down to the specified limit at a pre-determined time every day, as specified in the Purge cache at field. As a guide, set the maximum cache size to be about 25% less than the actual maximum you can tolerate.
Lifetime of unread cache files
Type in a list of URL masks (see page 6-33 for a description) and their corresponding time periods. Enter the time period in hours, days, weeks, or some combination of these. For example, 3 weeks 5 hours 2 days.
Each line should contain only one URL mask and a time, separated by a space. To start a new line press <Ctrl>and<Enter>.
If the URL of a cached file matches a URL mask, the cached file is deleted after it has been unused for the specified time period. If the cached file's URL does not match any URL mask then the Default time period is used.
Maximum lifetime of cache files
Type in a list of URL masks (see page 6-33 for a description of URL masks) and their corresponding time periods. Enter the time period in hours, days, weeks, or some combination of these. For example, 3 weeks 5 hours 2 days, 3 weeks 2 days, 2 days, and so on.
Each line should contain only one URL mask and time, separated by a space. To start a new line press <Ctrl>and<Enter>.
![]() |
Inform messages generated by WEBsweeper are only cached for five minutes. |
If the URL of a cached file matches a URL mask, it is deleted if it was retrieved earlier than the specified time period. If the cached file's URL does not match any URL mask then the Default time period is used.
For example, if you specify the maximum lifetime of a file as two days then the file will be deleted from the cache after two days, regardless of whether it has been used within that time period or not.
![]() |
The lifetime of a cached file is determined at the time the file is cached. If you change the lifetime the change only applies to new additions to the cache, not existing files. |
The Don't cache URLs area specifies a list of URL and URL masks which are never to be cached.
Type in the URL or URL masks (see page 6-33 for a description of URL masks).
Each URL or URL mask must be separated by a space or a new line. To start a new line press <Ctrl>and<Enter>.
Logging HTTP requests can give you useful information on the activities of the Web browsers on your network, for example:
All this information can be determined from the log files and can be used to help formulate strategies on proactive caching, maximising network bandwidth, charging for usage and so on.
To log all HTTP requests that WEBsweeper receives from Web browsers, check the Log HTTP Transactions box of the main WEBsweeper dialog box, shown on page 6-28.
The log file is stored in the directory specified by the Log directory field. In the above example the log directory is C:\MSW\Log. A single formatted line is written to the log file for each request that the Web server or proxy server receives.
A new log file is created every day. For performance reasons, the current log file is kept open until the first HTTP transaction of the following day. When this transaction occurs, the preceding day's log file is closed, a new log file is opened, and the transaction is logged to it.
The format of the log file contents is controlled by a log template, specified in the Log template field of the main WEBsweeper dialog box.
This log template may contain a number of formatting tokens. Each token starts with a % character, and is followed by a single upper-case or lower-case letter.
![]() |
The case of the formatting token is significant as upper-case and lower-case letters will usually have different replacement parameters. |
%R %i %u [%d/%b/%Y:%H:%M:%S %O] "%q" %s %n
![]() |
The above is the default template and corresponds to the Common Logfile Format used by many other HTTP servers. |
When a log file entry is written, each formatting token in the template is replaced by its corresponding parameter. The replacement parameters are shown by the tables following on the next three pages.
If a particular parameter is unavailable, it will be represented as a dash (-) in the log file entry. Any other information in the template is copied to the log file entry without change.
The log template string itself is written to the log file every time the log file is opened, for example, when a new log file is created, when the server is started, or when it is reconfigured. If you do not want the log template string to be written to the log file, ensure that the Strict Common Log Format box is checked on the main WEBsweeper dialog box.
The default log template can be changed by ensuring that the Strict Common Log Format box is unchecked and then editing the tokens listed in the Log Template field.
![]() |
The time recorded is the time at which the request was received, not the time at which it was completed. As |
![]() |
The other method of logging WEBsweeper uses is HTTP transaction logging. See page 6-40 for details. |
Configuration details for the event log are found in the main WEBsweeper configuration file, WEBSWP.CFG.
[Logging]
EventLog=3
[EventLog]
EventSource=WEBsweeper
EventId=3221225496
StreamType=AppEvent
MaxLevel=Brief
![]() |
It is recommended that you do not change the configuration details for event logging without assistance from technical support. |
To view the Windows NT application event log:
To view more details on any of the events listed in the application event log, double click on the entry. Alternatively, select the entry and then Detail from the View menu.
An Event Dialog box is displayed showing more information about the selected event.
WEBsweeper can be configured to issue SNMP traps to a SNMP Manager at startup and shutdown.
Configuration details for SNMP traps are found in the main WEBsweeper configuration file, WEBSWP.CFG. These details reflect the information entered during installation.
[SNMPTrapConfig] Community=public TargetAddress=195.121.24.11
The Web server serves data to Web browsers on the Internet.
You can configure certain features of the Web server via the main WEBsweeper dialog box, shown below. See page 6-28 for details on how to access this dialog box.
The rest of this section provides details on the following features:
The Data directory field of the main WEBsweeper dialog box, shown on page 6-48, is used to specify the root data directory for all requests served by the WEBsweeper host. In this example the data directory is C:\MSW\Data. The default data directory is C:\HTTP.
Any Web browser connecting to www.example.com, where www is the name of the WEBsweeper host, will only have access to files in the data directory specified and any of its subdirectories.
![]() |
You must locate all files you wish to make available to Web browsers within the root data directory tree. Points above the root data directory, or on other disks, are not accessible to Web browsers. The only exception to this is if you are also using virtual path mappings. See page 6-60 for more details on virtual path mappings.
If you require the Data directory to be a mapped drive you will have to use the UNC form of:
\\server\volume\directory
In this example, integ486 is the name of the server and MSW is the name of the volume on which the directory is located. Data is the name of the root directory which contains the information.
Directory browsing enables a user to see all the files contained in the root data directory (see page 6-49) and any of its sub directories. By default, directory browsing is disabled as it can be very insecure, but you can enable it if desired.
To enable directory browsing, ensure that the Permit directory browsing box is checked on the main WEBsweeper dialog box, shown on page 6-48.
Subsequently, the manner in which you have configured the contents of the data directory, its sub directories, and directory browsing, will determine what information is sent to requesting Web browsers, as explained below.
For example, suppose that a Web browser asks for a URL of the form:
http://www.example.com/MSW/Data/invoices/
The Web Server will do one of three things, depending on the contents of the invoices directory and on how it is configured:
You can change the default filename used. This filename is specified by the entry in the Default file name field of the Other Configuration Options dialog box. See page 6-56 for details.
The icon used to represent a file is determined by the MIME type of the file. The MIME type of a file is, in turn, determined by its file extension. See page 6-51 for details on how to associate file extensions to MIME types. See page 6-55 for details on how to associate MIME types to icons.
The icon used to represent a directory is also configurable, see page 6-55 for details. If you don't want a particular directory to be browsable, create a file called NOBROWSE in that directory. The contents of the file are not important, just its presence.
The Web server infers the MIME type of a file from the filename extension, using a mapping table.
![]() |
The default MIME type used is application/octet-stream. The default contents of the mapping table are shown on page 6-54. |
You can configure the contents of the mapping table, as required. This is achieved using the File extension to MIME Type mapping area of the main WEBsweeper dialog box, shown on page 6-48.
Using the File extension to MIME type mapping area you can:
![]() |
The MIME types associated with file extensions can subsequently be mapped to an icons. These icons are displayed to represent the files found when a directory listing is requested by a Web browser. See page 6-55 for details on how to associate MIME types to icons. |
To add a new file extension to MIME type mapping:
To change an existing file extension to MIME type mapping:
![]() |
The Change mapping dialog box is similar to the New mapping dialog box shown on the previous page |
To delete an existing file extension to MIME type mapping:
![]() |
![]() |
Refer to RFC 1590, MIME multipart/related content type, for a full list of MIME types. |
Whenever a Web browser examines a directory, and assuming directory browsing is enabled (see page 6-50 for details), WEBsweeper returns a directory listing which displays icons for each type of file found.
Some useful icons that can be used for this purpose are automatically installed in the icons directory, as .GIF files. These are:
The icon chosen to represent a file depends on the MIME type of the file. In turn, the MIME type of the file is determined by the file extension. See page 6-51 for details on how to associate file extensions to MIME types.
![]() |
The default icon that is used, when a MIME type mapping is not specified, is default.gif. The default icon used to represent a directory is folder.gif. |
You can configure the MIME type to icon mappings via the Other Configuration Options dialog box. Using this dialog box you can:
See the next page for details on how to access the Other Configuration Options dialog box.
To access the Other Configuration Options dialog box:
The MIME type to icon mapping area of this dialog box displays a list of MIME type/subtype entries and their corresponding icon URLs. 1
Some entries in the list have MIME type/subtype entries of the form:
To add a new MIME type to icon mapping:
To change an existing MIME type to icon mapping:
To delete an existing MIME type to icon mapping:
Type the appropriate icon URLs into the appropriate fields of the Other Configuration Options dialog box, shown on page 6-56.
The Default icon URL field specifies the icon used when a MIME type mapping is not specified. The Folder icon URL field specifies the icon used to represent directories.
The Other Configuration Options dialog box also allows you to
Configuring the default file name
The Default file name field allows you to specify the file that is sent to the Web browser when a URL does not specify a filename. It is initially set to Default.htm but you can change it if required. See page 6-50 for more details on directory browsing.
Configuring the number of connect requests
You can, if required, set WEBsweeper to listen on an alternative port, for example, port 8080.
![]() |
Setting WEBsweeper to use an alternative port allows you to use WEBsweeper with multiple Web servers, on the same host. |
To change the port that WEBsweeper listens on, type the new port number into the TCP/IP port field of the main WEBsweeper dialog box, shown on page 6-48.
If you change WEBsweeper to listen on a TCP/IP port other than 80, and you have directory browsing enabled2 then, for a Web browser to access the Web server, the URL must include the new port number.
http://www.example.com:8080/MSW/Data/
![]() |
If the Web browsers on your network have been configured to send all requests to the WEBsweeper machine then you must reconfigure them all to point to the new TCP/IP port. See page 3-26 for details. |
Virtual paths are a feature that allows your Web server to serve data to the Web from more than one data directory tree. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-48 for details. The remaining data directories are specified using virtual paths.
Virtual paths can be useful in many situations, for example:
A virtual path is an association between a virtual path name and a directory path. For example, a virtual path may have the virtual path name ~invoices, with a corresponding directory path of c:\invoices.
Using the above example, to access a file called c:\invoices\inv.htm you would quote a URL of http://www.example.com/~invoices/inv.htm.
![]() |
When the Web server receives a request with a URL whose first path component starts with a tilde (~) it looks to see if a virtual path of that name has been defined.
For example, using the URL http://www.example.com/~invoices/inv.htm:
You can configure virtual paths via the main WEBsweeper dialog box, shown below.
![]() |
See page 6-28 for details on how to access this dialog box. |
This dialog box shows a list of all existing virtual path names and their corresponding directories.
Using the Virtual Paths Configuration dialog box shown on the previous page, you can:
To create a new virtual path mapping:
The information you can supply via this dialog box is explained on the following page.
![]() |
The Local host name field specifies the name of the Web server, for example, www.example.com.
![]() |
If you leave the Local host name field blank then the virtual path may be accessed through any local IP address. |
Using the dialog box shown on the previous page you can also:
See page 6-66 for more details on how to create a realm and then assign users and groups to it.
To change an existing a virtual path mapping:
To delete an existing virtual path mapping:
![]() |
The virtual path mapping is deleted with no warning. |
The Web server utilises a user database as part of its security mechanism.
Using the information held in this database, in conjunction with access control lists,3 you can control which users are permitted to access certain data on the Web server. For example, you may want to limit directory access or file access to only a few, known and trusted users.
The Web server also supports groups and realms:
![]()
See page 6-60 for more details on how to configure virtual paths. See page 6-74 for more details on setting up and using access control lists.
If you have only one realm, this controls access to the entire Web server. It is always called the Default realm and is associated with the main data directory. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-49 for more details.
![]()
There is always a Default realm present and it is always associated with the main data directory.
Realms are configured via the Realms dialog box.
To access the Realms dialog box:
Using the Realms dialog box you can:
Access to a realm is controlled by these files. See page 6-74 for more details on the GAC files and the other access control files that WEBsweeper uses.
![]()
Users and groups associated with the Default realm only can be configured directly from the Advanced Features dialog box. See page 6-73 for details.
![]() |
You cannot delete the Default realm that is associated with the main data directory (see page 6-49). |
To specify the location of the Global Access Control (GAC) file for a realm:
If you have only one realm, that is, the Default realm, you will only have one GAC file. This file controls access to the entire Web server and is always located in the main data directory. The main data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-49 for more details.
To manage the users in a realm:
The dialog box contains a list of all users in the selected realm. Initially it will be blank. You can add to, change or delete the names on this list:
![]()
User names may be up to 64 characters. They may not contain the characters '~', '\', '$', space, control characters or ascii characters greater than 127. Passwords may be up to 32 characters long.
To manage the groups in a realm:
The dialog box displays a list of groups currently belonging to the realm. The realm name is reflected in the title of the dialog box.
Using the groups dialog box you can:
![]() |
See page 6-73 for a shortcut if you are managing groups in the Default realm only. |
To create a new group for the realm:
See page 6-72 for details on how to manage group membership using this dialog box.
![]() |
Group names may be up to 64 characters. They may not contain the characters '~', '\', '$', space, control characters or ascii characters greater than 127. |
To delete a group from the realm:
The dialog box displays two lists. The left-hand list contains users who are Non-members of the group. The right-hand list contains the users who are Members of the group. For a new group, initially all users will be displayed in the Non-members list. (See page 6-69 for details on how to associate users to a realm.)
Using these lists you can add and delete users from the group:
You can also use this dialog box to set a descriptive name for the group. To do this, type the descriptive name into the Group full name field.
Managing users and groups for the Default realm only
You can manage users and groups for the Default realm only, via the Advanced Features dialog box.
To access the Advanced Features dialog box:
Click on the Users... button as a shortcut to the User management dialog box (see page 6-69) for the Default realm only.
Click on the Groups... button as a shortcut to the Groups management dialog box (see page 6-70) for the Default realm only.
Access to the Web server can be controlled at four levels:
An HTTP request from a Web browser is checked against each of the access levels shown above, in sequence. That is, starting at Global access, then Virtual path access, then Directory access, then File access. For a file to be accessible, access must be permitted at each of these levels.
The Web server authorises access based on the following parameters:
Access at each level is authorised by comparing the above parameters against access control lists. These lists are held in what are known as Access Control List (ACL) files.
Each access level is checked in turn, assuming an ACL file is present at that level. If the HTTP request is permitted it is processed as normal. If it is rejected the user is informed accordingly and may be prompted for further authorisation information, for example, a user name and a password.
ACL files all use a common syntax and have names, or in the case of file access, are stored in directories, starting with $HTTPS$. For this reason, the Web server will treat any URL containing the character sequence $HTTPS$ as invalid. This ensures that Web browsers cannot retrieve the ACL files.
Access to a realm4 is controlled by a Global Access Control (GAC) file.
This file must always be called $HTTPS$.GAC.
If you only have one realm, you will only have one GAC file, which controls access to the entire Web server. This realm is always called Default and always has it's GAC file located in the main data directory. The data directory is specified using the Data directory field of the main WEBsweeper dialog box. See page 6-48 for more details.
If you use more than one realm, each realm can have a separate GAC file. The GAC files are associated to realms using the Realms dialog box. See page 6-67 for more details.
![]() |
If the $HTTPS$.GAC file is not present, or cannot be accessed by the Web server, the server assumes that global access is allowed. |
This file must always be called $HTTPS$.VAC, and it must be located in the virtual path directory.
For example, a virtual path might be ~invoices, with a directory path of c:\invoices. The VAC file would therefore be located in the c:\invoices directory.
See page 6-60 for more details on how to associate virtual path names to directory paths.
![]() |
If the $HTTPS$.VAC file is not present or cannot be accessed by the Web server, the server assumes that virtual path access is allowed. |
Directory access is controlled by a Directory Access Control (DAC) file.
This file must always be called $HTTPS$.DAC and it must be located in the directory it is protecting.
For example, if directory browsing is enabled you may wish to control who can browse a particular directory. This is achieved by creating a DAC file for the directory. See page 6-50 for more details on directory browsing.
![]() |
If the $HTTPS$.DAC file is not present or cannot be accessed by the Web server, the server assumes that directory access is allowed. |
The access control lists for the files you wish to protect are stored in a subdirectory which must always be called $HTTPS$.ACL. Each ACL is stored in a file within this directory. It has the same name as the file it is controlling access to, but with the extension .ACL.
For example, assume you have a directory named sales and it contains two files, products.htm and contacts .htm.
To protect the file contacts.htm, you would create a subdirectory of sales, called $HTTPS$.ACL. Within that directory, you would then create a file called contacts.acl which contains the ACL directives responsible for controlling access to contacts.htm.
Subsequently, everyone with access to the sales directory will have access to the products.htm file, but only those allowed access through the ACL file sales/$HTTPS$.ACL/contacts.acl will have access to contacts.htm.
![]() |
If the $HTTPS$.ACL directory is not present or cannot be accessed by the Web server, the server assumes that access to all files is allowed. |
Each line in an ACL file is of the form:
<header-name>: <method>, <name>, <host-mask>
<method>
can be one of the HTTP methods supported by the server, for
example, GET
, HEAD
or POST
.
It can also be the wildcard character (*) which represents any
method.
<name> should be a user name or a group name within the realm associated with the virtual path under which the ACL file is located. See page 6-66 for details on configuring users, groups and realms.
<host-mask> should be a dot-separated numeric IP address, possibly including wildcards, or a fully-qualified domain name, possibly with an initial wildcard character (*). See page 6-31 for a description of valid IP masks you can use.
An example ACL file is as follows:
Reject-user: GET,fred,129.215.112.3 Allow-group: *,salesgroup,*.example.com Allow-user: *,joe,* Allow-user: GET,,129.215.*.*
The second line of
the ACL file (Allow-group
) allows users in
the salesgroup
to perform any of the HTTP methods,
that is, GET
, HEAD
or POST
,
from any Web browser in the example.com domain.
The third line of the
ACL file (Allow-user
) allows the user joe
to execute any of the HTTP methods from any IP address.
The fourth line of
the ACL file (Allow-user
) allows the GET
operation by anonymous users from any IP address starting 129.215.
The ordering of lines in the ACL file is very important. The first line that matches the request results in the specified operation being performed. No subsequent lines in the file are checked once a match is found.
If none of the lines in the ACL file match then the request is always rejected.
For example, assume
user fred
, a member of the salesgroup
,
tries to perform a GET
from address 129.215.112.3
.
In the example above, this request matches the first line so the
request will be rejected.
The same request,
performed by any other member of the salesgroup
,
will match the second line of the file, so the request will be
allowed.
![]() |
This directive can only be placed in access control lists that control file access. See page 6-76 for more details. |
www.example.com/accounts/invoice.htm
The access control
list for the original file, invoice.htm, could have the Redirect-to
directive added to ensure that all authorised requests can still
access the file.
![]() |
The Redirect-to
directive must contain the full URL of the new file
location. |
Reject-user: GET,fred,129.215.112.3
Allow-group: *,salesgroup,*.example.com
Allow-user: *,cja,*
Allow-user: GET,,129.215.*.*
Redirect-to: HTTP://www.example.com/sales/inv.htm
![]() |
The Allow directives
and Reject directives are always processed before
the Redirect-to directive. This will ensure
that the user is authorised to access the original file
before being redirected. |
1 For details on the rest of the
information that can be supplied on this screen, see page 6-58.
2 Directory browsing is enabled when the Permit directory browsing box of the main WEBsweeper dialog box is checked, see page 6-50 for details.
3 For more details on access control lists, see page 6-74.
4 A realm is a distinct set of user names and group names. See page 6-66 for more details on groups and realms.
5 Redirect-to is used in ACL files at file access level only, for URL redirection. See page 6-79 for details.
msw.support@mimesweeper.comCopyright © 1998, Content Technologies Limited. All rights reserved.