![]() |
The Kermit Project |
Columbia
University
612 West 115th Street, New York NY 10025 USA • kermit@columbia.edu
| ||||||||
|
Frank da Cruz
The Kermit Project
Columbia University
Last update: Tue Dec 7 16:54:21 2010
Download: | http://kermit.columbia.edu/ftp/scripts/ckermit/ksitemap |
Requires: | C-Kermit 9.0 Alpha.03 or later. |
The ksitemap script builds a sitemap.xml file for a website based on a data file that you provide listing the files and (using Google Sitemap Image Extensions) images you wish to include in your sitemap, along with their properties, so that search engines like Google, Yahoo, Bing, and Ask can index them better. Read about sitemaps here.
Totally data driven, ksitemap reads a file-list file (or “filelist” for short) containing the names and attributes of the pages and images to be included in the sitemap. The filelist file is kept in the web directory itself, but it need not be world readable.
If you give a directory name without a filename, 'filelist' will be used as the filename.
$ ksitemap /www/filelist (absolute) $ ksitemap ~/web/filelist (symbolic) $ ksitemap web/filelist (relative) $ ksitemap ../web/filelist (relative)
If you invoke ksitemap without a command-line argument then:
export KSITEMAPDIR=/net/w/0/htdocs/username/web/
and the name of the file-list file is 'filelist', then you can run ksitemap from any directory any time without any command-line argument.
To invoke for debugging and testing, do:
$ DEBUG=1 ksitemap args
This gives progress messages and it writes the sitemap.xml file in a "tmp" directory.
# This is a comment lineAnd it can contain blank lines, which are ignored. Nonblank, non-comment are in this format:
tag=value
An equal sign (=) separates the tag from the value. If you need to include an equal sign in the value itself, surround the value with ASCII doublequotes. Examples:
cap=View from the Empire State Building looking East cap="A+B=C"
The first few lines define parameters for the whole website:
Tag Status Value home Required The URL of the website's home directory (with no filename part) geo Optional The default geographical location for images, if any lic Optional The default filename, if any, for a page containing copyright or license information for the site's original images
These items should come before any of the page-specific items that are described below. If you include a geo or lic tag before any url tag (see below), these will be used for any image for which you do not specify a geo or lic tag. In other words the ones in the top section are global and the ones in an img section are local to that image.
The "home" line's value is the URL of the website root directory, ending with slash, for example:
home:http://kermit.columbia.edu/
This is used to form the full URLs of the files and images in the website. Example:
home:http://kermit.columbia.edu/ geo:New York City USA lic:copyright.html
The remainder of the file contains lines for each file and image you want to include in your sitemap. For each page, the lines should appear in the following order:
Tag Status Value url Required Name file an html file in the root directory or in a subdirectory. pri Optional Priority of the page, 0.0 to 1.0
For each URL, the page date is supplied automatically based on the modification date of the file and the change frequency (daily, weekly, monthly, yearly) is supplied based on when the file was last modifed.
If there are images on the page that you want to include in the sitemap:
Tag Status Value img Required Name file an image file in the root directory or in a subdirectory. cap Optional A text caption for the image title Optional A text title for the image geo Optional The geographical localation of this image only lic Optional The filename of URL of this image only
Here's a brief example that has three files. For the first file (index.html), a priority is specified; for the others, the default priority is accepted. The second file is in a subdirectory. The third file has images. Comments, blank lines, and indentation are used for clarity, but they do not do not affect the result. There should be no spaces before or after the equal signs.
# ksitemap filelist for building sitemap.xml home=http://kermit.columbia.edu/ geo=New York City USA lic=copyright.html url=index.html pri=1.0 url=cudocs/ilosetup.html url=cable.html img=connectors-340.jpg cap=Male and Female RS-232 Connectors title=Serial Data Connectors img=modemcable.jpg cap=Modem Cable Schematic geo=Bedford MA img=nullmodem-480.jpg cap=Null Modem Cable Schematic lic=special.html
The resulting sitemap.xml looks like this:
<?xml version="1.0" encoding="ISO-8859-1"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://kermit.columbia.edu/</loc> <lastmod>2010-12-07</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> <url> <loc>http://kermit.columbia.edu/cudocs/ilosetup.html</loc> <lastmod>2010-12-07</lastmod> <changefreq>daily</changefreq> <priority>0.5</priority> </url> <url> <loc>http://kermit.columbia.edu/cable.html</loc> <lastmod>2010-12-07</lastmod> <changefreq>daily</changefreq> <image:image> <image:loc>http://kermit.columbia.edu/connectors-340.jpg</image:loc> <image:caption>Male and Female RS-232 Connectors</image:caption> <image:title>Serial Data Connectors</image:title> <image:geo_location>New York City USA</image:geo_location> <image:license>http://kermit.columbia.edu/copyright.html</image:license> </image:image> <image:image> <image:loc>http://kermit.columbia.edu/modemcable.jpg</image:loc> <image:caption>Modem Cable Schematic</image:caption> <image:geo_location>Bedford MA</image:geo_location> <image:license>http://kermit.columbia.edu/copyright.html</image:license> </image:image> <image:image> <image:loc>http://kermit.columbia.edu/nullmodem-480.jpg</image:loc> <image:caption>Null Modem Cable Schematic</image:caption> <image:geo_location>New York City USA</image:geo_location> <image:license>http://kermit.columbia.edu/special.html</image:license> </image:image> <priority>0.5</priority> </url> </urlset>