home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
drafts
/
draft_s_z
/
draft-vangulik-http-search-01.txt
< prev
next >
Wrap
Text File
|
1997-07-15
|
39KB
|
1,019 lines
INTERNET DRAFT Clive Best, Dirk-Willem van Gulik,
draft-vangulik-http-search-01.txt Michael Kleih, ISIS/CEO, JRC Ispra
Expires: 23/01/1998 Zavisa Bjelogrlic, Web Bridges
HTTP based Geo-temporal Searching (HGS).
Status of this Memo
===================
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
This draft expires in six months.
Abstract:
=========
This draft specifies the first three levels of operation of an http
based distributed search protocol. It is designed for parallel
client-side searching of geospatial catalogues. An important design
objective is to minimize the impact and extra resources for catalogue
sites which already have existing WWW gateways and search interfaces.
Recent Changes: 00 to 01
========================
Since the first draft; several interoperable prototypes are made availble
from the url: http://www.ceo.org/hgs/ which span a wide array of EO data
services such as ewse, enrm, isis, dali, fgdc, etc, etc..
Removal of Keywords on Directory level, because of difficulty of defining
a standard thesaurus. Generalise the Advertise directory structure to allow
distributed collections of resources. Explain now referenced by URI and
made explicitly for allowing local attribute searching. 'SetURI' and 'SiteURI'
merged into 'URI'. 'free' vs 'text' tag names corrected.
Introduction
============
WWW interfaces to databases referenced by geospatial and temporal
attributes are a growing public resource for Environment, GIS and
Earth Observation. Each provider has developed their own specialised
Web interface that best represents their particular database schema.
These databases are then searched by a Web Browser typically via a map
and time interface. The results are returned by generated html.
Experience with these interfaces suggests that the most sought-after
requirements of end-users for searching these databases are few in
number and relatively standard in form. End-Users require to search
for resources which relate to a spatial area, a certain temporal
coverage, and perhaps one or more keywords. This search-request is
typically repeated on different data collections at several,
operationally disjoint sites. Only if sufficient information is
found, is the user willing to explore the collection-specific
retrieval facility.
The HTTP based Geo-temporal Searching (HGS) protocol then
defines a mechanism whereby disparate remote databases can be searched
through a single standard HTTP interface. Making use of either HTTP
based GET or POST operations, several standard queries are defined. A
database with an HGS interface will become remote searchable
via any HTTP client, robot, or special user-agent. Easy deployment will
have minimal impact on existing web-based query-and-retrieval
infrastructure. This has been one of the main design considerations.
The core of the protocol consists of three types of stateless
client-server interactions over HTTP: 'Advertise', 'Search' and
'Explain'. The 'Advertise' request yields information on the dataset
and services provided by an information provider; the 'Search' request
allows for actual searching; whereas 'Explain' provides a mechanism to
publish specific database attributes in use by a specific provider.
This information can then be used to enhance the 'Search' beyond
the simple, and mandatory, spatial, temporal and free text
criteria.
There is no 'Retrieve' request as such since existing (perhaps
standard) web interfaces will to be pointed to by means of a URI.
The URI itself can contain services like Browse and Ordering,
but this is outside the domain of HGS.
One aim of HGS is to allow the construction of general purpose
(Java) clients and user-agents. Ultimately a user could specify the
search criteria, and then expect the user-agent to contact the
various collections, and present a single collated result set.
Once one or more promising resources are identified via URIs,
conventional technology is used to further interface to the site.
HGS does not aim to intervene in this part of the process, since
it will be site dependent.
Service discovery will be supported by maintaining a register of
compliant servers, possibly distributed at each site. These registries
allow clients to configure themselves dynamically. This service layer
is defined as the 'Advertise' layer.
All servers must support the 'Advertise' layer even if they contain
just one reference to the self same service. However it is expected
that user and research communities will maintain structured lists of
related services. In addition one can envisage that a specialist client will
support a local user defined register of search sites customised
according to given interests.
The above notion of service discovery borrows from the popularity of
'hotlists' and lists-of-lists. The idea is that this level 0
information might be presented by the user-agent in a partially nested
or linked way. The user is than able to select certain services or
groups of services, and move these to a list of preferred servers.
The advertise layer is both distributed and structured. Entries can
be grouped into "collections". Entries can point directly to a "Search"
URI or link to another collection. A provider can therefore publish
their data holdings according to different classifications (collections)
e.g. 'by Platform', 'by Sensor', 'by Dataset'.
The 'Search' function will be satisfied by all servers and allow an
interoperable geographic/temporal search across all systems. An
additional, optional free text field will be supported. It is
envisioned that the returned information will allow the construction
of graphic coverage maps and footprints.
The optional 'Explain' function allows a server to publish their particular
database scheme for more targeted searching. Such customised searching
can only apply to servers with the same attributes. The Explain data is
accessed on demend by the client.
Architecture
============
Although the protocol definition allows for a very flexible
infrastructure. The dogma that 'a client can do what it
wants' holds true. The design assumes that:
1. The client will do most of the work. It will contact the
advertise and search services directly; and will be responsible
for any of caching. The cleint is responsible for result
collation and possible state handling.
2. The Advertise services do not nessesarily share the same
machine as the Search services. Furthermore it is expected that
user-communities might operate shared directory services or
interconnected directories for specific datasets and resources.
3. The search service does not nessesarily share the same machine
as the normal web-based retrival interface; a third party may
also reverse engineer an retrival URL on an arbitrary web
catalogue.
4. Although the search request can be 'fuzzy'; the returned records
do have accurate spatial and temporal coverage information; thus
reducing; where possible, the need for repeated, refining, queries.
'Advertise'
===========
A directory of servers is accessed via a url. Servers should
publish and maintain the directory on a standard url construction,
such as http://fully.qualified.host.name/hgs.txt. Agencies can
maintain master copies of directories and each server can
maintain a local version for updating and access. Service discovery
can therefore be distributed. A given physical http server can
support more than one service. Each service should be specified
by a short text descriptor.
The information below is provided over the normal HTTP service
layer, including the MIME encapsulation of specific languages,
character sets and encoding. A server must be able to serve a
request for a version of this document in the latin-1 character
set, 8 bit encoding and the English language.
The cache/mirror support of HTTP can be used by providers to
give an indication of the time-to-live, date of last modification
and or expiry date. Thus enabling a clever enough UserAgent to
be able to refresh the directory data when needed.
The content of the reply to the 'Advertise' request uri consists of an
entry describing the discovery service, followed by a series of
directory entries, separated by a blank line. The entry for the
discovery service is identical to the directory entry, except for
the omission of the URL entry. In addition a pointer to an optional
'Explain' resource may be provided.
Server Specific Information
---------------------------
Firstly the general collection/directory information itself is
presented from the server.
The content of the collection node for directory discovery is as
follows, each item is encoding using colon separated RFC822 field
value pairs.
Version The HGS protocol version understood by this
directory service; major and minor version number.
Changes in minor version number are expected to be
backward and forward compatible (at a possible loss
of functionality).
Example: 1.00
Mandatory, non-repeatable
Name The name of this directory service. Clients must be
able to cope with names of at least 70 symbols in
length. However a client should be able to handle
strings of arbitrary length.
Example: The Wetland HGS Directory service
Mandatory, non-repeatable
Description A short description of this directory service and
the community it addresses. Clients must be able to
cope with a description of at least 4 kbyte in
length.
Example: A collection of services, with dataset
relating to wetlands, marches, estuaries and
coastal zones, mostly in a ecology and bio-
diversity context.
Optional, non-repeatable
URI A full URL to a more elaborate description of this
directory service, the organization which operates it
and a possible contact addresses. Repeatable entries
are each to refer to pages which convey the same
information.
Example: http://www.ceo.org/wetland-hgs-service.html
Optional, repeatable
Contact ???
Currently under consideration is the addition of Expiry and Time
to live information; and its position relative to similar information
already passed in the HTTP request.
Directory Information
---------------------------
A server then presents the entries in the directory held at that
site, separated by a blank line.
The content of each entry in the directory for the given
collection node is as follows; each item is encoded using colon
separated RFC822 field value pairs.
Version The HGS protocol version, major and minor
version number. Changes in minor version number are
expected to be backward and forward compatible (at a
possible loss of functionality).
Example: 1.00
Mandatory, non-repeatable
Name The name of the service or the collection. Clients
must be able to cope with names of at least 70
symbols in length. However a client should be able
to handle strings of arbitrary length.
Example: BRSC Bird sightings, by location, time and
stock migration
Mandatory, non-repeatable
Description A short description of the service or data
collection. Clients must be able to cope with a
description of at least 4 kbyte in length.
Example: Processed Field data collected by the BRSC
service for the mainland of Germany, the Ems
Estuary, the Waddensea and the Skylle. With
confirmed sightings and ring identifier
numbers.
Optional, non-repeatable
URI A full URL to a more elaborate description of the
resource, collection and organization responsible for
the service pointed at by the 'URL'. Repeatable
entries are each to refer to pages which convey the
same information.
Example: http://www.rbrc.org/field/desc.html
Optional, repeatable
DirectoryURI The full URL pointing to another Advertise directory.
The presense of this variable indicated that this
resource is effectively a collection of resources
which can be broken into one or more resource sets.
Example: http://www.rbgs.org/hgs.txt
Optional, repeatable
SearchURI The full URL of the search interface(s) Repeatable
entries are each to refer to pages which operate on
the same information space.
The presense of this entry indicates that this resource
is a searchable set.
Example: http://fully.qualified.domain.name/cgi-
bin/hgs-search.pl
Mandatory, repeatable
ExplainURI The full URL of an optional "Explain" resource.
Example: http://www.food.bar/explain.txt
Optional, repeatable
It is of importance that the 'DirectoryURI', 'SearchURI' and 'ExplainURI'
are not mutually exclusive; but can be used in parallel. Most databases
consists of logical datasets; and often the publisher whishes to make
possible searches on both the entire dataset as well as on individual sets.
Currently under consideration is the addition of administrative
information, such as a technical contact email address and date
of last modification, expiry and time to live.
Example: Service Information
-------------------------------
Version: 1.00
Name: Marine Environment Resources
Description: The Marine Environment Unit aims to develop,
demonstrate and validate methodologies for the use of data from
space and airborne Platforms in both operational applications and
scientific investigations related to the marine environment.
URI: http://me-www.jrc.it
Example: Service Directory
---------------------------
Version: 1.00
Name: CORSA / Ocean Colour European Archive Network
Type: Searchable
Description: The Ocean Colour European Archive Network (OCEAN)
Project, was established in 1990 as a co-operation between the
Joint Research Centre (JRC) of the European Commission (EC), with
the support of the EC Directorate General XI, and the European
URI: http://me-www.jrc.it/OCEAN/ocean.html
SearchURI: http://www.ceo.org/hgs/dbi.pl/corsa
Version: 1.00
Name: CORSA / Cloud and Ocean Remote Sensing around Africa
Description: The Cloud and Ocean Remote Sensing around Africa
(CORSA) project aims to provide a quality controlled data set of
surface, atmospheric and cloud parameters over a time period, and
at a resolution, not available from any other source. The proj
URI: http://me-www.jrc.it/CORSA/index.html
SearchURI: http://www.ceo.org/hgs/dbi.pl/ocean
Version: 1.00
Name: AVHRR Seachable HGS Resources
Description: A locall held list of searchable AVHRR resources
relevant to the CORSA Project of the Marine Environment Unit
URI: http://me-www.jrc.it/CORSA/AVHRR.html
DirectoryURI: SetURI: http://me-www.jrc.it/CORSA/AVHRR/hgs.txt
Higher Levels
=============
'Search' queries are to use standard HTTP GET and
POST requests; conveying values using the CGI/1.0 standard.
A number of field/value pairs in the request is defined for
all requests. The standard HTTP Accept-type, encoding and
language specifications are to be followed.
UserAgent The requesting user agent; string followed by a version
number.
Example: GeoSpava 1.00 (Sol 2.4/X11)
Mandatory, non-repeatable
Version Version of the request protocol used,
Example: 1.00
Mandatory, non-repeatable
Upgrade Client side request to upgrade to a different
protocol version.
Example: 1.02
Optional, non-repeatable
The reply of the server is governed by the normal HTTP protocol
and status codes. It is stressed that HTTP/1.1 already allows
for caching, proxying and access authorization. If the Content-type
of the reply is set to text/x-hgs; the reply is to be in rfc822
colon separated field/value pair format.
The following field/value pairs have a meaning across all levels:
Version Version of the protocol used by the server when
sending out the reply. Changes in minor version
number are to be both backward and forward
compatible; whereas major version are used to denote
an incompatible change. A server should upgrade, if
the client has send a supported 'Upgrade' version.
Example: 1.00
Mandatory, Non repeatable
Engine Software used to carry out the request; name,
followed by a version number. (see http spec for
this, copy that bit) A server should support this.
Example; HGSGeoTem 0.01beta
Optional, Non-repeatable
Upgrade Version number of a higher level protocol, for which
the server is capable of handling requests. A client
can, if desired upgrade to this protocol level.
Example: 1.09
Optional, Non-repeatable
Comment A message, optionally displayed to the user
Example: Your search on the blabla returned 5 hits.
Optional, repeatable
Error An error occured. The field value is intended for
the end user.
Standard Search Request
=======================
Up to three search criteria can be imposed upon a search during
the request. In the reply the server indicates which of these
conditions was applied. A server, or the properties of the dataset
searched, might not support any, one or more of the limitations.
In this case the search is to continue as if that limitation was
not applied. The server MUST be able to cope with a client
breaking the connection when the number of records returned exceed
the clients resources.
In the most extreme case, when the user agent does not specify any
criteria, or when the server cannot apply any of the criteria,
all records are to be returned.
As the search request is a one-off stateless interaction;
discrepancies and inaccurate matching, conversions and comparisons
are to be expected. For this reason the three search criteria are
intentionally in-exact. This allows the server to return possibly
false positives, and it puts some of the burden for detecting this
upon the user-agent and the final user. Unlike the more machine-
oriented exchange of the 'Explain' request, Human pattern recognition
and iterative refining is relied on. The user-agent application is
to be designed with such interaction in mind.
Type definition of criteria
---------------------------
Three datatypes are in use at the level-1 queries; for geospatial
coordinates, for time specifications and for partial substrings.
Servers and Clients must be able to handle floating point numbers
which have the fractional and integral part separated by a period
as well as a comma, regardless of the local and/or language/charset
and encoding triple specified by HTTP.
Servers and Clients must use the following format for all floating
point numbers, regardless of the Locale.
digit = <0|1|2|3|4|5|6|7|8|9>
digits = < digit [digits] >
E = 'E'
sep = < . >
sign = <+ | ->
float = <[sign] digits [ sep [ digits ]] [ E [sign] <digits> ] >
In particular, no separation on the powers of thousand
is allowed; such as 10,000.00 .
Geospatial Coordinates (GC)
---------------------------
The format for the Geospatial coordinate is as defined in the FGDC 1994
standard Content Standards for Digital Geospatial Metadata, with the
exception of the length of the integral part of the latitude of longitude
( two or three digits).
Values for latitude and longitude shall be expressed as decimal
fractions of degrees. Whole degrees of latitude shall be represented
by a two-digit decimal number ranging from 0 through 90. Whole degrees
of longitude shall be represented by a decimal number
ranging from 0 through 180. When a decimal fraction of a degree is
specified, it shall be separated from the whole number of degrees by a
decimal point. Decimal fractions of a degree may be expressed to the
precision desired.
Latitudes north of the equator shall be specified by a plus sign (+),
or by the absence of a minus sign (-), preceding the
designating degrees. Latitudes south of the Equator shall be
designated by a minus sign (-) preceding the two digits designating
degrees. A point on the Equator shall be assigned to the Northern
Hemisphere.
Longitudes east of the prime meridian shall be specified by a plus sign
(+), or by the Longitudes west of the meridian shall be designated by
minus sign (-) preceding the digits designating degrees. A point
on the prime meridian shall be assigned to the Eastern Hemisphere. A
point on the 180th meridian shall be assigned to the Western
Hemisphere. One exception to this last convention is permitted. For
the special condition of describing a band of latitude around the
earth, the East Bounding Coordinate data element shall be assigned the
value +180 (180) degrees.
Any spatial address with a latitude of +90 (90) or -90 degrees will
specify the position at the North or South Pole, respectively. The
component for longitude may have any legal value.
With the exception of the special condition described above, this form
is specified in Department of Commerce, 1986, Representation of
geographic point locations for information interchange (Federal
Information Processing Standard 70-1): Washington, Department of
Commerce, National Institute of Standards and Technology.
Servers and Clients must be able to handle floating point numbers
which have the fractional and integral part separated by a period
as well as comma, regardless of the locale and/or language/charset
and encoding triple specified by HTTP.
Servers and Clients must use the specified float format for
all latitude and longitude formats.
Temporal Dimension (JF)
---------------------------
The temporal dimension is either as defined per rfc1123, as a Julian
date or relative in days. A relative day and a Julian date is expresses
as a Floating point number of arbitrary accuracy denoting the number of
days before (negative), or after (positive) the 14th of September 1752
(for julian days), or the number of days before or after the current day,
i.e. the day the query was dispatched. Relative and Julian days have
a 'R' and a 'J' prefix. This prefix is not case sensitive.
Examples:
J 0
R -5.62E+8 (1.5 Million years ago)
R -1 ( Yesterday )
Wed, 02 Apr 1997 17:06:40 GMT
When the rfc1123 format is used, the zone should be UT or GMT, and
the date-name is optional. Please note that rfc1123 specifies a four
digit year (unlike rfc822).
Search Sub String (SS)
---------------------------
A partial search string, in the appropriate language and charset
as specified on the HTTP transport level.
The criteria names are not case sensitive.
Spatial Limit
-------------
A spatial limit can be imposed on the records returned. In this
case each of the returned records must be partially within the
specified bounding box. The server may only apply this limitation
to records with which a spatial domain can be associated. For each
of the records to which spatial limitation was imposed, the
spatial coverage associated with the record should be returned;
thus allowing the user-agent to do subsequent processing. It is
proposed that rectangles are defined in simple lat/lon co-
ordinates, with up to a tenth of a degree accuracy.
latmin GC the resource(s) returned are to cover a
latitude equal or larger than
the latmin specified.
latmax GC the resource(s) returned are to cover a
latitude equal or smaller than
the latmax specified
lonmax GC the resource(s) returned are to cover a
longitude equal or larger than
the lonmin specified
lonmax GC the resource(s) returned are to cover a
longitude equal or smaller than
the lonmax specified
Each of the above is optional and non-repeatable.
Absent values, for any of the above fields are to be treated as
not-limiting in any way. Consequently if all values are absent, no
spatial limit is to be applied at all.
Temporal Limit
--------------
Time intervals are a pair of dates or Julian day numbers which define
a temporal search interval. The server must be able to handle these
numbers up to a tenth of day accuracy. The client must be able to
cope with a search applied with less accuracy than specified in
the request. The implementation on the server must be designed
with this in-accuracy in mind; possibly at the expense of
returning false positives.
There are three criteria specifying date searching; please note
that, insofar as the service is concerned, the timespan associated
with a resource can effectively be a single point in time.
date_after JF (part of) the timespan of the returned
records is after the date specified
date_before JF (part of) the timespan of the returned
records is before the date specified
date_on JF The date_on is within the timespan of the
returned records
This allows search of the types on-a-date, before-a-date, after-a-
date and any combination; thus making ranges and partial ranges
possible. In particular the server should must make no assumptions
on which, or what combination of these three specifiers is
requested.
Free text limit
---------------------------
Additionally a search string can specify one or more partial
substrings to be matched upon. This option is repeatable and non-
mandatory. Repeatable entries are to be used in parallel; i.e. a
record has to relate to, or contain one or more of the substrings
specified by the user agent.
text SS Partial string.
optional, repeatable
Standard http get and post requests will be supported.
Request procedure
---------------------------
A) HTTP get. Example
http://hgs.ceo.org/cgi/search.pl?latmin=-30&lonmin=30&
latmax=-40&lonmax=40&date_on=27585.1203&Text=Geology&
UserAgent=DraftEx+1.00
B) HTTP Post. Example
<form method='post' action=' hgs.ceo.org/cgi/search.pl'>
<input name='UserAgent' value='DraftEx 1.00'>
<input name='Version' value='1.00'>
<input name='text' value='Geology'>
<input name='latmin' value='-39'>
<input name='latmax' value='30'>
<input name='lonmin' value='-40'>
<input name='lonmax' value='40'>
<input name='date_on' value='27585.1203'>
<input type='submit'>
</form>
C) HTTP Reply
Status of the reply is either 200, for results follow, or 404,
for nothing found, depending on the success of the search. All
other headers, as described in rfc2068, have their normal
meaning. In particular a 401 reply might cause the UserAgent
to prompt for a username and password.
Replies such as 500 indicate a failure. The Normal MIME rules
for labeling the reply apply. The returned content type is
either text/html, text/plain, or text/x-hgs. Only the latter is
intended for machine parsing. Replies in html or plain text should
be forwarded to the user directly.
The content of the reply consist of a header and a set of entries;
each separated by a blank line. Each line contains a field value
pair, in a RFC822 colon separated encoding. Field names are not
case sensitive.
Header Fields
---------------------------
A number of header fields are mandatory; a few are optional;
primarily for user interface purposes. In addition to the normal
reply headers; the following field/value pairs are 'Search'
request type specific.
Applied List of search criteria which where applied
successfully. Space separated, case in-sensitive.
Example: latmin latmax date_on
Mandatory, non repeatable
Name Short, Symbolic name for the dataset(s) or systems
searched; often the same as the value of the 'Name'
field in the directory advert. Allows the user-agent
to distinctly mark the results from a query when
results from more than one HGS interface are collated.
This field has in particular value for searched across
multiple interfaces; without the context of the directory
advert.
Example: CORSA
Optional, non-repeatable
[--- back pointers to directory advert ----]
[--- kind of impossible due to many2many relation ----]
[--- perhaps just the URI/Description/Name info ? ----]
EntriesExpected
A, possible not correct, number of entries
likely to be returned by the server. A server should
try to ensure that this number is accurate. But the
client must not depend on this number to be correct.
It must not be used as an upper limit.
Example: 5
Optional, non-repeatable
Example of a full header;
Version: hgs/1.00
Engine: GeoLite 0.01a
Applied: text latmin latmax lonmin lonmax ton
EntriesExpected: 5
Comment: Your search on the RCS database yielded 5 entries
Record Entries,
---------------------------
Record entries again follow the rfc822 colon separated field value
format; and are separated by a blank line. A server which is able
to apply a spatial or temporal limit should (or must?) confirm the
coverage of the records returned for at least those criteria
specified in the original search; with as much accuracy as
possible.
URI Universal resource identifier; such as a URN or a URL.
Example: http://server.company.org/cgibin/show.pl?1e441aef
Mandatory, Non-repeatable
Name Short descriptive name
Example: Wetlands Survey 1996, Alabama
Optional, non-repeatable
Description Short description of the record, clients must be
able to cope with up to 4k and ..
Example: Someblurp on etc, from the gcmd
Optional, non-repeatable
Coverage Spatial area related to the resource, 2 or 4 space
separated GC, with as much accuracy as possible. A server
should supply this; especially when it was able to
effectuate one or more spatial criteria. If the entry is
repeated, each of the sets should fit the criteria
applied. An absent or criteria is denoted by a '*',
asterisk.
Example: 12 33 33 44
Optional, repeatable
OtherCoverage Any spatial coverage related to the resource, not
relayed in the Coverage field
Example: 12,33 33,44
Optional, repeatable
Period Temporal range or point related to the resource, 1 or 2
space separated JF, with as much accuracy as possible. A
server should supply this; especially when it was able to
effectuate one or more temporal criteria. If the field is
replated; each of the repeated entries should fulfill the
criteria applied.
Example: 1112.33 1198.11
Optional, repeatable
OtherPeriod ?? Any temporal ranges or points not relayed in the
Period field.
Example: 12,33 33,44
Optional, repeatable
Example;
Name: Wetlands in eastern Alabama
URI: http://ala.www.edu/sand.html
Coverage: 12.33 13.44 44.12 34.23
Period: 123.34 125.12
'Explain' Resource
=================
The 'Explain' is an optional object description level for a given
server. It allows a server to define an ordered list of locally
defined searchable object types and their associated attributes.
It is advertised via a uri resource "ExplainURI" at the directory
level. The Explain service is optional, and the search server may
ignore it. It then must send back an http error such as "not supported".
Thus Explain allows customisable local attributes to be defined.
Using this configuration information the client software can
therefore configure the search interface accordingly.
The actual attribute ID numbers used, should be standardised up
to a certain extend; especially in user communities with similar
database schema's. More work will be done in this area.
The format is best illustrated in the following example
Version: 1.00
Name: Advanced Interface to the SatCom data site
Version: 1.00
SearchURI: http://www.ceo.org/hgs/serverside_examples/ISIS.pl
Name: OCEAN Data/Product selection(s)
Id: 100
Name: DALI Dataset name
URI: http://www.dali.fr/guide/datasets.html
Id: 200
Name: COLOUR (Multispectral)
Mother: 100
URI: http://www.dali.fr/guide/datasets.html#MSPT
Id: 201
Name: BLACK and WHITE (Panchromatic)
Mother: 100
URI: http://www.dali.fr/guide/datasets.html#PAN
In the above example a search for 100=200 would mean a search for
the dataset type set to 'Multispectral'. Whereas a search for
100=200&100=201 would imply a search across both datasets.
Another example is depicted below; it concerns a dataset with several
classes of cloud coverage as well as a second dimension concerning the
type of pass the satallite made.
Version: 1.00
Name: Cloud Coverage DBMA
Version: 1.00
SearchURI: http://www.ceo.org/hgs/serverside_examples/PPY.pl
Name: OCEAN Data/Product selection(s)
Id: 100
Name: Cloud Cover
URI: http://www.dali.fr/guide/coverdef.html
Id: 200
Name: below 10%
Mother: 100
URI: http://www.dali.fr/guide/clouds.html#scale
Id: 210
Name: below 10%
Mother: 100
URI: http://www.dali.fr/guide/clouds.html#scale
Id: 220
Name: 10% .. 30%
Mother: 100
URI: http://www.dali.fr/guide/clouds.html#scale
Id: 220
Name: 30 ..60%
Mother: 100
URI: http://www.dali.fr/guide/clouds.html#scale
Id: 220
Name: 60% and more
Mother: 100
URI: http://www.dali.fr/guide/clouds.html#scale
Id: 300
Name: Pass Type
URI: http://www.dali.fr/guide/path.html
Id: 301
Name: Ascending Pass
URI: http://www.dali.fr/guide/path.html#ASC
Mother: 300
Id: 302
Name: Descending Pass
URI: http://www.dali.fr/guide/path.html#DEC
Mother: 300
Customised searching
==============================
Upon receipt of 'Explain' configuration data, the client interface
should be configured dynamically, for example by use of pull
down menus. The interface should allow users to select one or more
object types. These will then be used for subsequent level 1
searches to that server.
The selected object types will be appended to a level 1 search.
GET
http://hgs.ceo.org/cgi/search.pl?latmin=-30&lonmin=30&latmax=-
40&lonmax=40&Ton=27585.1203&text=Geology&Object=101,104,106&
Version=1.00&UserAgent=JavaGot+1.00
POST
<Object=101>
<Object=104>
<Object=106>
Selected objects include all child objects.
Servers not supporting level 2 ignore all Object definitions.
Implementations
===============
An HGS interface to a database will be implemented using CGI
scripts. These can be expected to be similar and developed using a
standard stub. Existing Web gateways to databases are unaffected.
All that is required is an add on CGI gateway which supports HGS.
Multiple server searching at level 1 will be possible. Thus a
distributed search can be made by the client across several
servers contained within a given server directory. In this case
result collation at the client side is necessary.
The level 0 reply typically consists of a simple text file in
a directory 'hgs' under the server root and/or AliasMapping
directives in the server setup
Scalability
===========
Issues of scale are not addressed, in particular broad searches
yeilding thousands of hits are potentially possible; and will
be a pose a serious challenge for User Agent implementors. More work
will be done in this area.
However it is stressed that; because of the mandatory use of complete
URLs on all levels; query interfaces can be distributed; even in
one collection. Furthermore support for mirroring, caching and duplication
of services is potentially avaible; but as 'a client can do what it whats'
it is as yet unclear how to effectuate.
Security Implications
=====================
Security implications are not address; nor are they well understood.
More work is to be done in this area.
Acknowledgements
================
The development of HGS has benefitted from the work done by CEONet, Canada
and presentations at 1996 workshop organised by the CEOS (Commitee on
Earth Observation Satellites) WGISS (Working Group on Information Systems
and Services) WWW task team. Michael Kleih implemented and tested some
early client applications written in Java. Ladson Hayes provided remote
sensing specific information and proof-read this document.
The work has been carried out in part for the Centre for Earth Observation,
of Space Applications Institute by the Software Technologies and automation
unit of the Institute for Systems, Informatics and Safety; both at the
Joint Research Centre Ispra of the European Communities.
Contacts
========
URL: http://www.ceo.org/hgs/index.html
Mailinglist: hgs@harp.gsfc.nasa.gov (Majordomo)
Clive Best Clive.Best@jrc.it
Dirk-Willem van Gulik Dirk.vanGulik@jrc.it
ISIS/STA/CEO - TP 270
Joint Research Centre Ispra
21020 Ispra (Va)
Italy.
Phone: +39 332 78 9549 or 5044
Fax: +39 332 78 9185
draft-vangulik-http-search-01.txt Expires: 23/01/1998