i

The Server Manager

Use the Server Manager to configure each of your servers and sites, and the log data they produce, within the database. Before any data can be imported into the database, the servers and sites that produced that data must be configured in the Server Manager.

The Server Manager is designed to import the most complex server environment into the database, and then aggregate and filter the data within the Analysis module.

Configurations in the Server Manager are hierarchical, with three distinct levels:

Note
In the future, data might not be recorded in log files, so "log data source" is used to refer generically to the producer of information to be analyzed by Site Server Express Analysis.

When a log file is imported into the database, that log file is associated with the log data source that produced it.

Within a Microsoft, NCSA, or O’Reilly environment, multiple servers can be logged to one log file. In these cases, there can be multiple servers per log data source.

Within an Apache or Netscape server environment, each server produces its own log file, thus there is one log data source per server.

Note
The term server, as used here, does not refer to a physical hardware server but to a program that responds to a request from a user. A site has one software server for each type of content—HTTP, FTP, Gopher, or Real Audio—that it publishes.


The Server Manager provides a list of all information currently configured within the database. (See the following figure.) Each log data source, server, or site in the Server Manager is given a default name when it is configured. This name can be changed by selecting the icon in the Server Manager tree and renaming the object. This name is used throughout Analysis to refer to the data associated with the log data source, server, or site. Use the Server Manager to add sources, servers, and sites and to edit and remove existing ones.

Undisplayed Graphic

Server information is organized in a graphical hierarchy, with log data sources at the root (represented by a scroll), specific servers down one branch (represented by a spider for Web servers, a file cabinet for FTP servers, a rodent face for Gopher servers, and an ear for Real Audio), and individual sites (represented by a sphere) at the tip.

The Server Manager can be opened from Usage Import.

Undisplayed Graphic

ii

The Log Data Source

Internet server software records client connections in a log file. For the Usage Import to parse your log files correctly, you must specify the correct log file format. If you specify an incorrect log file format, you cannot import data into the database or, worse yet, you’ll import incorrect data.

Usage Import also supports importing directly from an ODBC log database created by Microsoft and Spry Internet servers. These database imports are treated as a log file format.

The type of information recorded in your log file greatly affects the accuracy and flexibility of your analysis. Specifically, you should try to record referrer, user agent, and cookie information in your log files.

Note
If you use Microsoft, Apache, Netscape, or O’Reilly web server software, server extensions are available at http://www.Interse.com/serverext/ to assist your analysis.

iii

Log File Format

The log file format for an individual log data source is set when the source is added. Clicking a log data source or Server icon in the Server Manager with the right mouse button opens the Log Data Source Properties window, shown in the following figure.

Undisplayed Graphic

The Log Data Source Properties window can also be opened by selecting the log data source and clicking the Properties button on the Usage Import window toolbar, as shown below.

Undisplayed Graphic

The Server Manager allows you to add only servers and sites of a type supported by the Log file data source. (For example, a Real Audio site cannot be added to a log type that supports only HTTP.) The same applies to multihomed sites; the Server Manager will not allow you to add multiple servers for formats that do not support them.

Finally, keep in mind the following restrictions.


ii

The Server

In order for Analysis to function accurately, each server must be set up properly, and the appropriate Server type must be identified.

Configuring a Server

Configuring a server is done with the Server Properties window, which you open by right-clicking a Log data source (choose New server from the pop-up menu) or the Server icon (choose New server or Edit, as appropriate). You can also open this window by selecting the log data source or Server icon and clicking the Properties button on the toolbar.

Undisplayed Graphic

iThe options for the Server properties are shown in this table:i

Server property

Description

Server type (required)

The type of Internet site (World Wide Web, FTP, Gopher, Real Audio). This property defines the list of log file formats and also provides an essential grouping for aggregation during analysis.

Directory index (required)

This setting indicates the file that the server returns when a directory is requested. (Both requests for directories and requests for directory index files are treated as requests for the directory in the database.) Defaults will depend on server software; typically, INDEX.HTML, and INDEX.HTML and HOME.HTML for servers with multiple directory index files.

IP address and IP port (optional)

Use this setting to:

· distinguish servers which are multihomed

· set the default for the Exclude hosts site property

· distinguish internal and external referrers used in the visit algorithm

Local time zone (required)

This value establishes the default GMT offset of a server. Set this option to the time zone where your content is hosted. Selecting Adjust Time Zone in Import Options adjusts time calculations in relation to this setting, which may be changed to reflect the time zone for your analysis.

Local domain (required)

Insert the domain name used on the local network of the hardware hosting your content. The local domain setting is used to resolve any incompletely resolved host names in your log files.

i

Server Types

Site Server Express Analysis can analyze four types of server:


Most log file formats support only one server type, and only that entry will be available on the Server Properties window. However, if you are using a log file format that supports multiple server types, such as Microsoft, then you will be able to make the appropriate selection for your sites.

Note
The dimension ServerType can be used to filter and aggregate this property in Analysis. (See Chapter 6, "The Report Writer.")

ii

Directory Index Files

This property should include the file name that the server returns when a directory is requested (that is, when the request ends with a /). During import, Usage Import aggregates all requests for directory index files as requests for the directory. For example, if you specify that you have a directory index of INDEX.HTML, then a request for /INDEX.HTML and / will both be recorded as / in the database. If your server has multiple directory index files configured, then the syntax of this property should be INDEX.HTML and HOME.HTML, the default for Netscape servers.

i

IP Address and IP Port

This property has three uses:


i

Local Time Zone

This property should include the time zone of the hardware where your content is hosted. Choose Adjust Requests Timestamp During Import on the Import tab to determine the offset that is applied to every request from this log data source. For example, if your server is hosted in New York and you would like to analyze your data in California time, then you would set this property to GMT -05 Eastern and set the Import option to GMT -08 Pacific. When log files are imported into this log data source, Analysis subtracts three hours from all time stamps.

i

Local Domain

This property should include the domain name used on the local network of the hardware where your content is hosted.


The local domain name is used to fully resolve any partially resolved host names in your log files. For example, if a Microsoft employee at computer bubba.microsoft.com accessed www.microsoft.com, the log file would contain bubba. The local domain entry allows Usage Import to resolve this host name to bubba.microsoft.com. This will default to the local domain of the current computer.

iii

The Site

The site is the lowest level of configuration needed to carry out your analyses. You must make the structure of your sites explicit before their statistics can be calculated. In the Site Properties window, you provide the information that defines the sites whose activity you wish to analyze.

ii

Configuring a Site

When you add or edit an Internet site, the Server Manager displays the Site Properties window. (See the following figure.) Add or delete a site by right-clicking the server icon from which it branches in the graphical tree. Edit or delete an existing site or add a new site for the same server by clicking a site icon. To edit, you can also highlight an icon and choose the Properties button.

Undisplayed Graphiciii

Home Page URLs (Required)

This property specifies the URLs used to access this site. If your site has multiple URLs, then you should list all of them with the syntax http://www.yourcompany.com and http://yourcompany.com. The visit algorithm used during import compares the referring URLs of the hits in the log file to the host names of the home page URLs to determine if the referrer was external to the site.

Note
The first URL listed is used to hyperlink to the site in your analysis reports.ii

Server File System Paths for this Site (Optional)

If the current site’s content resides within a specific path on this server (for example, /thissite/* ) or within a collection of paths on this server (for example, /filepath1/* and /filepath2/*), type those paths here.

A blank entry in this option designates this site as the default site on the server. All files not assigned to another site on this server are assigned to the default site. If there are no sites on the server with a blank entry for this property, then any request that does not match one of the set file paths will be discarded during import as not belonging to a configured site.ii

Internal Hosts to Exclude from Import (Optional)

You can specify the host names or IP addresses whose requests you want to exclude from the database. For Internet sites, this entry is typically used to exclude requests from employees and testing software. However, you can specify any host here. Specify the complete Class C Internet address and domain name such as *.yourcompany.com and 206.86.22.* in case some IP addresses are not successfully resolved in the log file.

Note
All requests associated with these host names will not be available for detailed usage analysis. The hits associated with these host names are, however, counted in the aggregate hits and bandwidth statistics.

Undisplayed Graphicii

Inline Images to Exclude from Import (Optional)

Specify the file names you want to exclude from the database. This is typically used to prevent requests for inline images (that is, the decorative images on a page) from being imported into the database.

Note
Bandwidth calculations are just as accurate and useful if inline images are excluded. The more you exclude, the faster the import process, the smaller the database, and the faster the analysis process. Excluding images will not exclude advertising views.

i

Site Properties: The Inferences Tab

Undisplayed Graphic

Usage Import applies inferences to calculate requests, visits, users, and organizations from the hits in the log file. These inferences make assumptions which can be adjusted here.i

Insert Missing Referrers into Clickstream (Optional)

This feature helps compensate for caching on the networks connecting your users and your servers.

Consider a situation where a user traces the following path:

Step

Page

1

Page A

2

Page B

3

Page A (cached)

4

Page C



In this situation, the server log file will record these hits:

Request

Referrer

Page A


Page B

Page A

Page C

Page A



Without referrer inferences, Usage Import will import one request each for Page A, Page B, and Page C, as above. This is the default.

If Insert Missing Referrers is selected,Usage Import reconstructs the following clickstream:

Request

Referrer

Page A


Page B

Page A

Page A


Page C

Page A



Caching typically flattens out the request profile of a site, as shown in the following figure. This feature helps determine the true request profile of your site.

Undisplayed Graphic

Impact of inserting missing referrers on request statistics

Note
The Insert Missing Referrers feature works only if your log files include referrer data.ii

Visit Timeout (Required)

Usage Import infers visits based upon, among other things, a timeout—the length of time after which any visit is considered closed. Some arbitrary time limit is required to define a visit, otherwise every visit would be infinite. You can choose an appropriate setting here.

The Internet advertising industry has agreed on a timeout of 30 minutes for standardized reporting, so 30 minutes is the default value. However, this value has a tremendous impact upon your analysis results, so any refinement of this number that you can provide based upon empirical experience will improve accuracy. For example, you might assume that visitors spend much less than 30 minutes at a navigation site, so you might set this value to 10 minutes. At the other end of the spectrum, visitors might spend much more than 30 minutes at a customer support site, so this value might be set as high as 120 minutes.i

Multiple Users Use the Same User Name (Optional)

Usage Import infers users based upon, among other things, the user name recorded in the log file.

If your site has a section where many people log in under a single user name (for example, as guest or evaluator), the inference algorithms normally identify all of them as a single user. Selecting this option causes Usage Import to not identify users based on user name but to attempt to assign unique user IDs based on other information.

Note
Distributing persistent cookies circumvents the problem entirely, because each user is permanently and uniquely identified without resorting to user name.

ii

Site Properties: The Query Strings Tab

Undisplayed Graphic

A file name requested from a web server has several components. Consider a request for:

/cgi-bin/getquote?symbols=interse+microsoft&display=table&alpha=
beta#top:1000

Here "/cgi-bin/getquote" is the URL, "symbols=interse+microsoft&display=
table&alpha=beta" is the query string, "#top" is a fragment, and ":1000" is a parameter.

During import, all fragments and parameters are removed from file names. It also separates query strings from file names, and optionally store them in the database for the individual requests to your sites so you can analyze the requests, visits, and users according to a particular query string. To take advantage of this feature, your query strings must be formatted in name=value pairs.

File System Paths Whose Query Strings Should be Stored (Optional)

You need to specify which query strings to retain by indicating their file path. Typically, you are interested in the information from only a subset of all your file names with associated query strings. For example, if all of the CGI scripts you are interested in parsing are stored in /cgi/I_care_about_these/, then you would type /cgi/I_care_about_these/* in this box. Type multiple paths separated by and. The standard wildcard operators also apply. (For example, type /* to store all query strings.)

Site Properties: The File Names Tab

Undisplayed Graphic

Remove Top-Level Directory From File System Paths (Optional)

Some hosting services add a customer name as a directory name before file names. If you select this option, then the first level directory is removed from each file name string before the string is recorded in the database.

File System Paths to Apply Regular Expressions To (Optional)

Regular expressions are a powerful string replacement function in UNIX and PERL. In certain situations, the file names recorded in your logs are not exactly as you want them to be recorded in the database. To change them, you can use the regular expression search-and-replace function.

In the path property, provide the file system paths that need a search-and-replace correction applied to them. For example, to change the path "/analyst/*" to "/ourproducts/*", type /analyst/* for this property. For multiple paths, use the syntax /analyst/* and /otherpath/*.

Regular Expressions to Apply to File Names (Optional)

Type the actual regular expression that will be applied, with the following syntax:

s/string1/string2/

where s=search-replace

string1=string to search for

string2=replacement string

In the example given above, to replace "/analyst/*" with "/ourproducts/*", type:

Undisplayed Graphic

To apply two regular expressions, type:

Undisplayed Graphic

The back-slash ( \ ) is an escape character for this command, which allows the next front-slash ( / ) to be interpreted as the directory hierarchy divider rather than the regular expression’s own separator-character. The back-slash is required for including any of regular expression’s special characters within a string for search and replace. Without the back-slash, the special characters have the following wildcard values in regular expressions:i

Character

Meaning

^

Matches the beginning of a string

$

Matches the end of a string

.

Matches any character

[ ]

Character class, or the complement of a character class if the first character inside the brackets is a caret ( ^ )

*

Repeat previous, zero or more times

+

Repeat previous, one or more times

?

Repeat previous, zero or one time

\

Escape next character (treat next special character as a literal to be included in the string rather than as a wildcard)

{ }

Tagged match (Note: Usage for tagged matches is extremely complex, and the function is recommended only for expert users of UNIX or PERL. It is not otherwise supported by Usage Analyst 2.0.)



The following examples illustrate how to use the wildcard characters:

Pattern

Matches

^stuff

strings that start with "stuff"

stuff$

strings that end with "stuff"

^…$

any 3-character string

[AEIOU]

any uppercase vowel

[0-9]

any digit

[A-Z a-z] [0-9]

any letter followed by any digit

[^0-9]

any character except a digit

[A-Z] [0-9]*

any upper-case letter optionally followed by any number of digits

[A-Z] [0-9]+

any upper-case letter followed by at least one digit

[A-Z] [0-9]?

any upper-case letter optionally followed by one digit

[+-] ? [0-9]+

any integer optionally preceded by a sign

[+-] ? [0-9] + \.?[0-9]*

any real number



Note
All regular expressions listed in this property are applied to all paths listed in the previous property.


© 1996-1997 Microsoft Corporation. All rights reserved.