XML: Services Lists
This documentation outlines the principles and format behind the
service lists that we keep for myChannels.opml, myStrips.opml and
myWebCams.opml. As per the myServices
document, the format closely resembles Dave Winer's Userland
format, but OCS has been
taken into consideration as well.
There are three kinds of services lists available for each type of service:
- The complete list is every service
still operational - we don't check for validity, we don't check
for freshness, we don't check for content - we only check if it's
there. For the channels service list, this file would be called
services-channels-complete.xml.
- The recent list is a paired down version of
the complete list, and contains services updated in the past
month. For the channels service list, this file would be called
services-channels-recent.xml.
- Finally, the failure list are services
that 404'd or host unreachable, had internal server errors, or were
otherwise not valid XML. For the channels service list, this file
would be called services-channels-failure.xml. The
failure list is randomly checked for revived listings, which are
added back to the complete list.
A complete (yet rather minimal) service is shown below.
<?xml version="1.0"?>
<servicelist>
<header>
<docs>http://www.disobey.com/amphetadesk/xml_services_lists.htm</docs>
<entries>1255</entries>
<updated>Thu, 15 Mar 2002 00:10:39 GMT</updated>
<version>1</version>
</header>
<services>
<service>
<added>Thu, 15 Mar 2002 14:28:19 GMT</added>
<description>The best little example description around!</description>
<error>No headers downloaded.</error>
<htmlurl>http://www.superlugnuts.com/example.html</htmlurl>
<id>f3cc9706db585cd9b776ae143897106b</id>
<imageurl>http://www.superlugnuts.com/image.jpg</imageurl>
<language>en</language>
<lastchecked>Mon, 05 Mar 2002 14:32:37 GMT</lastchecked>
<lastmodified>Thu, 08 Jun 2000 13:40:05 GMT</lastmodified>
<timeschecked>2</timeschecked>
<title>Super LugNuts and Happy Examples</title>
<xmlurl>http://www.superlugnuts.com/example.xml</xmlurl>
</service>
</services>
</servicelist>
|
You can see how this would show up if a user subscribed to it by checking out the myServices documentation.
<added>
Allowed Within: <service>
Frequency: Once
Required: Yes
<added> contains the date the service was added, in GMT.
|
<description>
Allowed Within: <service>
Frequency: Once
Required: No
The <description> element is determined by the actual service
and can be a string of any length. Most of the time, the service
author will put a little blurb about the service - other authors
will put the last modified date of the service.
|
<docs>
Allowed Within: <header>
Frequency: Once
Required: Yes
<docs> points to a valid URI explaining what all this means.
|
<entries>
Allowed Within: <header>
Frequency: Once
Required: Yes
<entries> remarks the total number of services within the list.
|
<error>
Allowed Within: <service>
Frequency: Once
Required: No
<error> represents a short, simple, generic message (or
number, see below) for why the service is considered a failure.
Current applicable values are "No headers downloaded" (common when
the service host has timed out on responding, or when the service
host is unavailable) and "Error parsing XML" (common due to custom
404 pages, redirects to home pages, or when the data isn't valid
XML).
If <error> shows up within our complete or recent service
lists, then it represents the number of consecutive times
the service has errored. After three consecutive errors, the service
is automatically added to the failure list.
|
<header>
Allowed Within: <servicelist>
Frequency: Once
Required: Yes
The <header> block portrays some housecleaning and administrative
information, such as <docs> (where to go
for more information), <entries> (how
many services are in the list), <updated>
(when the service list was updated), and <version>
(what evolution this service list is at).
|
<htmlurl>
Allowed Within: <service>
Frequency: Once
Required: No
See Also: <imageurl>, <xmlurl>
Convincingly enough, the <htmlurl> attribute contains the full
URI to the webpage of the service in question.
|
<id>
Allowed Within: <service>
Frequency: Once
Required: Yes
<id> is a unique identifier for the service and is a 32
character md5 hash of the xmlurl (the xmlurl is just a seed, and
shouldn't be given any importance - in myStrips.opml and myWebCams.opml,
you'd use the imageurl).
|
<imageurl>
Allowed Within: <service>
Frequency: Once
Required: Yes (myWebCams.opml, myStrips.opml); No (myChannels.opml)
See Also: <htmlurl>, <xmlurl>
Although <imageurl> sounds easy enough, there is a bit of
confusion concerning what exactly it means in different contexts.
In myChannels.opml, it means nothing, and you'll never see it.
If, on the other hand, you're messing with myStrips.opml
and myWebCams.opml, the <imageurl> points to the URI of the comic
strip or webcam image in question.
|
<language>
Allowed Within: <service>
Frequency: Once
Required: No
<language> holds the dialect that the service is published in.
This is currently ignored, but that may change in the future
(perhaps a filter by language? auto creation of urls for net
translators? any other suggestions?). Examples of valid languages:
"en", "en-us", "fr", etc.
|
<lastchecked>
Allowed Within: <service>
Frequency: Once
Required: No
See Also: <timeschecked>
<lastchecked> tells us when the last time the service entry was
checked for information - whether it be for existence, title and
description, or just for the last modification date.
|
<lastmodified>
Allowed Within: <service>
Frequency: Once
Required: No
If the webserver reports a "Last-Modified:" field in the
response request of the service, we insert that value in this
element. Typically, the results are in GMT. Not all servers
implement the "Last-Modified:" response, so we can't rely
on it as being an adequate measure of service age.
|
<service>
Allowed Within: <services>
Frequency: Multiple
Required: Yes
The <service> tag contains one service definition. There can
be an unlimited number of services in a single file.
|
<timeschecked>
Allowed Within: <service>
Frequency: Once
Required: No
See Also: <lastchecked>
Much like <lastchecked> tells
us when we last looked at the service for data, <timeschecked>
measures how often we have looked at the service. This can be used
to make determinations on failed services, or just as a measure of
longevity in the service list.
|
<title>
Allowed Within: <service>
Frequency: Once
Required: Yes
<title> showcases the title or name of the service in question and can be any length.
|
<updated>
Allowed Within: <header>
Frequency: Once
Required: Yes
The last time the list was updated, in GMT.
|
<xmlurl>
Allowed Within: <service>
Frequency: Once
Required: Yes (myChannels.opml); No (myWebCams.opml, myStrips.opml)
See Also: <htmlurl>, <imageurl>
<xmlurl> is the full URI to the document that contains the
channel xml. It's ignored for myStrips.opml and myWebCams.opml.
|
<version>
Allowed Within: <header>
Frequency: Once
Required: Yes
The <version> tag allows us to track different
variations of the service list in question - the current version is "1".
|
Any questions about the above? Email morbus@disobey.com.
This footer was last updated 05/25/01; odds are the whole document was too.
|