monitor.log
Contains the log of web site events and performances. It
can be broken down in
-
events: Failure/Restart/Warning events
which occurred on the web site;
-
cycperf: average performance per [15]
urls or since inception, duration (millisec) of name lookup, connection,
download;
-
urlperf: individual performance per url,
of download (duration in millisec, debit in kb per sec)
FORMAT
{ #GMTTIME,
#AGENT,
#URL,
#TYPE,
#LOCALTIME,
#MESSAGE
}
#TYPE is one of: "Failure", "Restart", "Warning", "Info".
monitor.log: events
-
Failure/Restart/Warning events which occurred on the web
site;
EXAMPLE
This message records that the web site works ok (on the
first time the web site is tested).
#GTMTIME #AGENT #URL #TYPE #LOCALTIME
#MESSAGE
"09/07/99 09:44","WebMonitor1.1_cont",
"http://www.mycompany.com/","Restart","09/07/99 09:44","Works ok !"
This message records a web site failure.
#GTMTIME #AGENT #URL #TYPE #LOCALTIME
#MESSAGE
"06/07/99 11:51","WebMonitor1.1_cont","http://www.mycompany.com/","Failure","
06/07/99 11:51",
"NoAlarm:Invalid
HTTP response code 0"
This message records a web site restart.
#GTMTIME #AGENT #URL #TYPE #LOCALTIME
#MESSAGE
"06/07/99 12:15","WebMonitor1.1_cont","http://www.mycompany.com/","Restart","
06/07/99 12:15"," Works again!"
The message may start with "Alarm:" or "NoAlarm:" indicating
if an alarm was raised and if there is an entry in alarms.log.
MESSAGES (see httpd error code
for more)
type
|
message
|
note
|
"Failure" |
"NoAlarm:Invalid HTTP response
code 0" |
Web site failure, see HTTP error
code table. |
"Failure" |
"Content has changed: verify manually" |
The content has changed
or the keyword expected is not found in the content |
"Failure" |
"HTTP was public, is now protected" |
The web site was public and now
is password protected (this may or may not be allowed). |
"Failure" |
"HTTP was protected, is now public" |
The web site was password protected
and now is public (this may or may not be allowed). |
"Failure" |
"Sensor timed out too many times." |
There has been [3] successive
timeouts when testing the URL. This means the web site may be down or hanging. |
"Warning" |
"DNS lookup failure for [url]" |
The name lookup failed but the
server is up (may or may not be allowed) |
"Warning" |
"Performance is downgraded" |
The performance average over the
last [15] connections is significantly lower that the average performance
since inception (see performance curves) |
"Warning" |
"Sensor timed out.1"
"Sensor timed out.2"
"Sensor timed out.3" |
The url connection could not be
done within [1] minute,
if this happens [3] times, there is a failure "Sensor
timed out too many times". |
"Info" |
"IP address for site http://www.mycompany.com/
is http://192.123.59.134/" |
Initial record of the IP address
on the first test (after restart) |
"Info" |
"IP address for site http://www.mycompany.com/
was http://192.123.59.134/ is now http://192.123.59.137/" |
The IP address has changed. |
"Info" |
"Shutdown" |
The application was shutdown. |
"Info" |
"Agent restarted" |
The application was restarted. |
This list is not exhaustive.
HTTPD ERROR CODES
Most Failure message will contain the HTTPD error code
and give an explanation, according to the http norm:
6.1.1 Status Code and Reason
Phrase
The Status-Code element is a
3-digit integer result code of the
attempt to understand and satisfy
the request. These codes are fully
defined in section 10. The Reason-Phrase
is intended to give a short
textual description of the Status-Code.
The Status-Code is intended
for use by automata and the
Reason-Phrase is intended for the human
user. The client is not required
to examine or display the Reason-
Phrase.
The first digit of the Status-Code
defines the class of response. The
last two digits do not have
any categorization role. There are 5
values for the first digit:
. 1xx: Informational - Request
received, continuing process
. 2xx: Success - The action
was successfully received,
understood, and accepted
. 3xx: Redirection - Further
action must be taken in order to
complete the request
. 4xx: Client Error - The request
contains bad syntax or cannot
be fulfilled
. 5xx: Server Error - The server
failed to fulfill an apparently
valid request
The individual values of the
numeric status codes defined for
HTTP/1.1, and an example set
of corresponding Reason-Phrase's, are
presented below. The reason
phrases listed here are only
recommendations -- they MAY
be replaced by local equivalents without
affecting the protocol.
Status-Code =
"100" ; Section 10.1.1: Continue
| "101" ; Section 10.1.2: Switching
Protocols
| "200" ; Section 10.2.1: OK
| "201" ; Section 10.2.2: Created
| "202" ; Section 10.2.3: Accepted
| "203" ; Section 10.2.4: Non-Authoritative
Information
| "204" ; Section 10.2.5: No
Content
| "205" ; Section 10.2.6: Reset
Content
| "206" ; Section 10.2.7: Partial
Content
| "300" ; Section 10.3.1: Multiple
Choices
| "301" ; Section 10.3.2: Moved
Permanently
| "302" ; Section 10.3.3: Found
| "303" ; Section 10.3.4: See
Other
| "304" ; Section 10.3.5: Not
Modified
| "305" ; Section 10.3.6: Use
Proxy
| "307" ; Section 10.3.8: Temporary
Redirect
| "400" ; Section 10.4.1: Bad
Request
| "401" ; Section 10.4.2: Unauthorized
| "402" ; Section 10.4.3: Payment
Required
| "403" ; Section 10.4.4: Forbidden
| "404" ; Section 10.4.5: Not
Found
| "405" ; Section 10.4.6: Method
Not Allowed
| "406" ; Section 10.4.7: Not
Acceptable
| "407" ; Section 10.4.8: Proxy
Authentication Required
| "408" ; Section 10.4.9: Request
Time-out
| "409" ; Section 10.4.10: Conflict
| "410" ; Section 10.4.11: Gone
| "411" ; Section 10.4.12: Length
Required
| "412" ; Section 10.4.13: Precondition
Failed
| "413" ; Section 10.4.14: Request
Entity Too Large
| "414" ; Section 10.4.15: Request-URI
Too Large
| "415" ; Section 10.4.16: Unsupported
Media Type
| "416" ; Section 10.4.17: Requested
range not satisfiable
| "417" ; Section 10.4.18: Expectation
Failed
| "500" ; Section 10.5.1: Internal
Server Error
| "501" ; Section 10.5.2: Not
Implemented
| "502" ; Section 10.5.3: Bad
Gateway
| "503" ; Section 10.5.4: Service
Unavailable
| "504" ; Section 10.5.5: Gateway
Time-out
| "505" ; Section 10.5.6: HTTP
Version not supported
| extension-code
extension-code = 3DIGIT
Reason-Phrase = *<TEXT, excluding
CR, LF>
HTTP status codes are extensible.
HTTP applications are not required
to understand the meaning of
all registered status codes, though such
understanding is obviously desirable.
However, applications MUST
understand the class of any
status code, as indicated by the first
digit, and treat any unrecognized
response as being equivalent to the
x00 status code of that class,
with the exception that an
unrecognized response MUST NOT
be cached. For example, if an
unrecognized status code of
431 is received by the client, it can
Extract from draft-ietf-http-v11-spec-rev-06.txt
Source Internet Engineering Task Force
How to read monitor.log (events)
The usual sequence of events for a web site is:
Time Type Message Note
10:00 Restart Works ok! // first
test: web site is ok !
…
15:00 Failure Alarm:cannot connect.
// failure is detected, alarm is sent
…
15:35 Restart Works again! //
the web site was restarted
When as sequence "Failure", "Restart" occurs, it is possible
to determine the downtime. In this example the web site was down for about
35 minutes.
monitor.log: cycperf
-
average performance per [15] urls or since inception, duration
(millisec) of name lookup, connection, download;
For example,
#GTMTIME #AGENT #URL #TYPE #LOCALTIME
#MESSAGE
11/07/99 00:02 WebMonitor1.1_cont
http://www.mycompany.com/ Info 11/07/99 00:02
"cycPerfDuration = 928, avgPerfDuration
= 157, avgPerfCount = 765"
The performance information is contained in the message:
-
cycPerfDuration: it took 928 milliseconds to do the name
lookup, connection and content download in average over the past [15] tests.
-
avgPerfDuration: it took 157 milliseconds to do the name
lookup, connection and content download in average since inception
-
avgPerfCount: the url was tested 765 times since inception
(indicates the accuracy of the avgPerfDuration)
This example will have generated a performance warning, because
928 is significantly longer than 157, the usual duration.
Why not indicate the debit ?
Because the name lookup and open connection do not depend
on the length of the content, it makes to sense to calculate a debit with
the cycPerfDuration.
monitor.log: urlperf
-
individual performance per url, of download (duration in
millisec, debit in kb per sec)
For example,
#GTMTIME #AGENT #URL #TYPE #LOCALTIME
#MESSAGE
11/07/99 00:02 WebMonitor1.1_cont
http://www.mycompany.com/ Info 11/07/99 00:02
"urlPerfDuration = 105, length
= 2096, debit = 19 kb per sec"
The performance information is contained in the message:
-
urlPerfDuration: it took 105 milliseconds to download the
content during this one test
-
length: the content contains 2096 bytes
-
debit: the resulting debit is 19 kb per sec.
How to read cycperf and urlperf
The logs of several monitors will be needed to do an accurate
performance analysis. cycperf and urlperf must
be read together for diagnostic.
Tracing curves
The curves can be generated with MS Excel using the CSV
files and extracting the performance information. For example:
Cycle avg duration variation during the day (ms), using
cycperf
Debit variation during the day (kb/s), using urlperf