i

Usage Import Options

The Options window, available from the Tools menu, allows you to configure several settings for Usage Import. This window comprises of eight tabbed groups of choices, as shown in the following figure. You can save these options as default settings for all future imports; otherwise the options are stored in the Windows registry. Any changes you make to the options are in effect while you are working; when you quit and restart Usage Import, the default options are restored.

i

Configuring Usage Import Options

Undisplayed Graphic

The Usage Import Option configuration settings are explained in the following table.

Configuration setting

Explanation

Drop database indexes

Analysis requires database indexes; however, import is much slower when the database has indexes. Therefore, by default, import drops all database indexes and then analysis adds them before beginning. However, once you have accumulated a large amount of usage data within the database, where each import represents only a small percentage of the data in the database, you’ll want to turn this option off because adding indexes to the large database takes longer than the incremental time required for import.

Adjust request timestamps

If this option is turned on, all time stamps in your log files are adjusted to the selected time zone, from the time zone specified for that site in the Internet site manager. This is useful if you have sites in multiple time zones.

Exclude spiders

Checking this option avoids counting hits by Internet search engines, robots, and any other user agents specified on the Spider list tab. Note: This is the only way to exclude user agents.

Lookup HTML titles

When you enable this option, the Import module performs HTML title lookups on new HTML files added to the database during the log file import. You can perform the same operation manually from the Tools menu.

Resolve IP addresses

When you enable this option, the Import module tries to resolve every unresolved IP address it encounters in the log file.

Whois query for unknown domains

This option instructs Usage Import to perform a Whois query when the organization name is not known. You can perform the same operation manually from the Tools menu.



i

Configuring IP Resolution

Undisplayed Graphic

IP resolution settings are listed in the following table. For a more complete explanation of IP resolution and how it is handled by the Import module, see "Resolve IP Addresses."

Configuration setting

Explanation

IP resolution cache period

This setting allows you to specify the duration of any IP lookup before the operation is repeated. The import module remembers the IP/host-name combination for the number of days specified. During the cache period, Usage Import automatically converts all IP references to the resolved host name. After the cache period, Import retries resolution. Longer cache settings will speed import and resolution but may miss intervening changes to IP/host-name combinations.

Timeout

Timeout establishes how long Usage Import will search before it enters an IP address as unresolved. Setting a higher timeout gives more complete resolution but slows completion of import. You can run IP resolution manually from the Tools menu.

Batch resolution size

Specifying the batch size for IP resolution allows you to optimize operation of Usage Import to your DNS server. If your server supports a larger number of simultaneous requests, you can increase the setting over the default of 300 for improved performance. Too large a number may crash your DNS server and cause Report Writer to report an artificially large number of unresolved addresses.



ii

Configuring Log File Overlaps

Undisplayed Graphic

Having time periods overlap in your log files introduces inaccuracies in your database. A number of scenarios can produce time overlaps in log file entries: running logs on separate servers, interrupting and resuming logging on a single server, accidentally re-importing an individual log file, or concatenating distinct log files. The Log File Overlaps window allows you to specify how to treat such redundancies. See the following table for panel settings.

Note
Concatenation of log files makes tracking of overlap extremely difficult. It should not be done.

Configuration setting

Explanation

Overlap period

This setting allows you to specify the period to be considered an overlap by the import module. Shorter periods will reduce apparent overlap but may affect accuracy of later analysis for the period in question.

Action on overlap detection

Import all records: ignores overlap entirely, includes all redundancies in the database (default)

Stop the import: halts the import for the log file in question only and continues any other imports underway.

Stop all imports: halts the current import of all log files.

Discard records and proceed: Discard the overlapping records and proceed.



iiiiiThe adjustable "grace period" in Usage Import makes allowance for variations in the methods of web servers in logging requests. Some servers log a request at the time it’s received. Others record the time stamp at the time of the request but don’t log it until the transaction is complete. Depending on your individual system, you may want more or less tolerance for such situations, which can produce apparent overlap. For example, if you have an FTP site where users routinely make very large file downloads that take hours to complete, increase the period for download overlaps or use the default "ignore overlaps" option.

If an import is stopped because of overlaps, the Import Statistics window reports the result, as shown in the following figure.

Undisplayed Graphic

The Log File Manager in the Import module now makes it possible to correct mistakes by deleting imports selectively. (For more information, see "Deleting Log Files" in Chapter 5.)

i

Directory Options

The Directory Options tab allows you to specify a default directory for import files and log files.

Undisplayed Graphic

The settings here affect:


ii

Configuring IP Servers

Undisplayed Graphic

On the IP Server window, Usage Import asks you to specify the servers and domain required for the Internet-connected functions of the program (lookups, mail, and IP resolution). These settings are explained in the following table.

Configuration setting

Explanation

HTTP proxy server

If you specify a proxy server host name and port, the Import module uses this address for all HTML title lookups. If you are unsure of this information, check with your system administrator.

Note: If your proxy requires a user name and password, specify the proxy host name as:

username:password@hostname

SMTP server

Use this setting if you plan to distribute analysis reports via email using the MAIL.EXE utility.

Local domain of DNS server

Used to clarify hosts returned from IP resolution. Defaults to the local domain of the computer, but if your DNS server is maintained by an ISP, then this setting should be entered.



i

Identifying Spiders

If the Exclude Spiders box is checked on the Import Options window, the Spider List shown below allows you to identify engines for which log entries will be removed.

Undisplayed Graphic

The entries in the box are common user-agent strings for spiders. If for any reason you want to exclude any other user agents, you can specify them here.

i

Intranet Organization Definition

For some large or complex site and log structures (for example, ISPs or large intranets with many subdomains), it may be useful to define organizations further down a domain tree than the default assignment of the organization to the Internet domain. The Intranet organization panel allows you to set the number of levels beyond the domain for Usage Import and Report Writer to recognize as distinct organizations.

Undisplayed Graphic

If you make the Intranet setting one domain part beyond the organization, you will have three-part, two-dotted organization names. Two levels beyond, and Usage Import will define organizations with four-part names in the database.

Intranet setting

Organizations in database

Zero domain parts beyond Internet organization

company.com

One domain part beyond Internet organization

ca.company.com
ny.company.com

Two domain parts beyond Internet organization

marketing.ca.company.com
engineering.ca.company.com
marketing.ny.company.com
engineering.ny.company.com



i

Log File Rotation

Because log file rotation requires an arbitrary cutoff of data produced at your sites, there will inevitably be visits that are interrupted. For these visits, information will be divided between the end of one log file and the beginning of the next. Usage Import gives you a number of options for handling this situation.

Undisplayed Graphic

Options in the At The End Of An Import list box are explained in the following table.

Configuration setting

Explanation

Commit open visits to database

If you routinely commit open visits to the database, there will be a small exaggeration of statistics at the opening and closing of the log file period, because those visits will be counted twice.

Discard open visits

Discarding open visits will under-report visits at the ends of the log files, because those visits will be dropped.

Store open visits for next import

Storing open visits for the next import is the most accurate alternative, because it reconstructs the actual visit as if there were one seamless log. There is a small cost in speed as the open visits must be called up from the cache at each new import.

Clear open visits cache

Clearing the cache of open visits produces a clean slate for the new import. This option is particularly useful if you ordinarily store open visits but occasionally want to discard them.

Note: The cache is maintained per site per database. Clearing the cache clears all of them.



ii


© 1996-1997 Microsoft Corporation. All rights reserved.