home *** CD-ROM | disk | FTP | other *** search
-
- NAME
- page-stats.pl - Check WWW page accesses (v1.3)
-
- SYNOPSIS
- page-stats.pl -h
- page-stats.pl [ -b ] [ -i identfile ] [ -l logfile ]
-
- DESCRIPTION
- page-stats.pl will examine the acceslog of a http daemon and search
- it for occurrences of certain references. These references are then
- counted and put into a HTML file that is ready to be displayed to
- the outside world as a "Page Statistics" page. Each page can be
- selected from the statistics page.
-
- The identfile contains the references that should be counted. A
- line in this file should be in the following format:
-
- URL@title@reference[@reference...]
-
- which could look like this:
-
- ~gnu/index.html@Gnu's pages@/gnu.html@~gnu*
-
- Comments are allowed, and should be preceded by a "#". Everything
- following that character will be ignored. Each line should at least
- contain the following:
-
- URL The URL of the page, as it should be referenced from the
- "Page Statistics" page.
-
- title The title of the page, as you want visitors to see it. Note
- that leading spaces are significant, so it is possible to
- make use of indentation for different levels of documents.
-
- reference
- A reference of how the page might be accessed. For instance,
- if a directory contains a file index.html, it can be
- accessed by leaving out the "index.html" part, or even the
- "/" before it. If this is the case, put all references
- behind each other, separated by "@". You may use a wildcard
- "*" at the end of a string to match only the begin of an
- URL.
-
- The order of the lines in the identfile matters. Only the first
- match will be taken into account. Be careful when using wildcards,
- as they might filter out hits for lines below. Take a look at the
- (faulty) example below:
-
- # Wrong; second line will never be reached!
- ~gnu/index.html@Gnu's pages@~gnu*
- ~gnu/info/index.html@Gnu's info files@~gnu/info*
-
- The first line will filter out all URLs ending in ".html", which
- automatically means that URLs that would match /info/*.html are
- matched as well. Place the second line above the first to solve
- the problem:
-
- # Right!
- ~gnu/info/index.html@Gnu's info files@~gnu/info*
- ~gnu/index.html@Gnu's pages@~gnu*
-
- Currently page-stats.pl will skip lines in the access_log that
- contain references to ".gif", ".jpg" or ".jpeg" files, even if you
- specify matching URLs. If you need the program to be able to
- handle references to those pictures, you should outcomment the
- lines as indicated in the code.
-
- Note that once the first matching reference is found, the quest
- for matches is ended. Only the first page will be recognized as a
- matching reference and its counter will be increased.
-
- The HTML "Page Statistics" file is created from two files. These
- are the ident file with references to check, and a source file that
- contains the basic HTML page as desired. The name of the source
- file is determined by replacing the mandatory ".ident" ending of
- the ident file by ".source". The HTML file that is created will be
- named in the same way, ending in ".html".
-
- It is possible to use certain variables in the source file. These
- variables will be replaced by page-stats.pl as it rummages through
- the file.
-
- $date The current date and time will be inserted for this
- variable.
-
- $firstrequest
- The date and time of the first request logged in the
- access_log will be inserted for this variable.
-
- $lastrequest
- This variable is replaced by the last request logged in the
- access_log.
-
- $list This will be replaced by the complete list of references
- and their number of hits.
-
- $topN This will insert a sorted list of the N most visited pages,
- where N can be any number . Of course setting a number
- greater than the number of references is silly. There must
- be no space between "$top" and the number.
-
- OPTIONS
- -b Benchmark; print used user and system times when ready.
-
- -h Displays this manual page.
-
- -i identfile
- Specify the file that determines which references to look
- for in the logfile. This defaults to 'page-stats.ident'.
-
- -l logfile
- Specify the access_log of the http daemon. The default
- location is '/usr/local/httpd/logs/access_log'.
-
- FILES
- access_log (generated by httpd)
- <identname>.ident
- <identname>.source (optional)
- <identname>.html (generated by page-stats.pl)
-
- SEE ALSO
- httpd(1).
- http://www.sci.kun.nl/thalia/guide/#page-stats
- For the latest version.
- http://www.sci.kun.nl/thalia/page-stats/
- For a working example.
-
- CHANGES
- 03-01-1995: (v1.0) First draft of the program.
- 03-17-1995: (v1.1) Added 'total number of requests' at the bottom
- of the page.
- 05-26-1995: (v1.2) Added '$topN' and '$list'; juggled with the
- code. Improved performance by skipping images in
- access_log. Allowed comments in the ident file. Also
- moved the external README into the code.
- 07-17-1995: (v1.3) You can now use wildcards to define URLs to
- recognize. Using arrays to administrate URLs instead
- of strings.
-
- BUGS
- If the accesslog is big, and there are many references to check,
- this program can take very long to complete. It is recommended
- that both the size of the accesslog and the number of references
- are kept to acceptable levels.
-
- The program might not work because the path to Perl in the first
- line of page-stats.pl is wrong. See if the path is correct by
- doing 'which perl' at your Unix prompt. If it is not correct, you
- will have to edit the first line.
-
- AUTHOR
- Mark Koenen <markko@sci.kun.nl>,
- changes by Patrick Atoon <patricka@cs.kun.nl>
-
-