Kermit sitemap script

C-Kermit 9.0 Sitemap Script

Frank da Cruz
The Kermit Project
Columbia University
Last update: Tue Dec 7 09:57:13 2010

Builds a sitemap.xml file for a website, with Google image extensions. Requires: C-Kermit 9.0 Alpha.03 or later.

Totally data driven, reads a "filelist" file containing the names and attributes of the pages and images to be included in the sitemap.

Optional command-line argument: path of filelist file. If the argument is given the web directory is assumed to be the same directory where the filelist is. If not given, a file named "filelist" in the current directory is assumed if it exists.

The filelist file contains names of html and image files relative to the web directory. It can contain comment lines that begin with # like this line. Blank lines are ignored. Nonblank, non-comment are in this format:

tag:value

A colon (:) separates the tag from the value (USE =) if you need to include a coloin the the value, surrong the value with "" The first data lines apply to the whole site. The first line must have a tag of "home" and a value giving the URL of the website root directory, ending with slash, for example:

home:http://kermit.columbia.edu/

This is used to form the full URLs of the files and images in the website. All filenames in the filelist must be relative to this URL. The rest of the file contains lines with the following tags: An optional "geo" line can contain the site's geographic location:

geo:New York City

An optional "lic" line can contain the filename of the pages that contains the site's copyright and license notices.

lic:copyright.html

The remainder of the file contains lines for each file and image you want to include in your sitemap. For each page, the lines should appear in the following order

url: name file an html file in the current directory or in a subdirectory. pri: priority of the page, 0.0 to 1.0 (OPTIONAL)

If there are images on the page that you want to include in the sitemap:

img: name of an image file cap: caption for image file (OPTIONAL) tit: title for image file (OPTIONAL)